Architecture: cost optimisation, blue-green deployment, reduce to 3 projects
- Reduce from 4 to 3 GCP projects (drop sandbox, use staging namespaces) - Add blue-green deployment strategy via Argo Rollouts - Add cost optimisation section with monthly estimate (~$175-245) - Add blue-green flow diagram and cost pie chart to HLD Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -10,7 +10,7 @@
|
||||
|
||||
This document outlines a robust, scalable, secure, and cost-effective infrastructure design for Company Inc., a startup deploying a web application with a Python/Flask REST API backend, React SPA frontend, and MongoDB database. The design leverages **Google Cloud Platform (GCP)** with **GKE (Google Kubernetes Engine)** as the primary compute platform.
|
||||
|
||||
**Key Design Principles:** Security-by-default, scalability from day one, cost optimization for early stage, and GitOps-based operations.
|
||||
**Key Design Principles:** Cost awareness from day one, security-by-default, scalability when needed, and GitOps-based operations.
|
||||
|
||||
---
|
||||
|
||||
@@ -20,20 +20,26 @@ This document outlines a robust, scalable, secure, and cost-effective infrastruc
|
||||
|
||||
**Rationale:** GCP offers strong managed Kubernetes (GKE) with autopilot options, excellent MongoDB Atlas integration (or GCP-native DocumentDB alternatives), competitive pricing for startups, and simplified networking. GKE Autopilot reduces operational overhead for a small team with limited Kubernetes expertise.
|
||||
|
||||
### 2.2 Multi-Project Structure
|
||||
### 2.2 Project Structure (Cost-Optimised)
|
||||
|
||||
For a startup, fewer projects mean lower overhead and simpler billing. Start with **3 projects** and add more only when traffic or compliance demands it.
|
||||
|
||||
| Project | Purpose | Isolation |
|
||||
|---------|---------|-----------|
|
||||
| **company-inc-prod** | Production workloads | High; sensitive data |
|
||||
| **company-inc-staging** | Staging / pre-production | Medium |
|
||||
| **company-inc-shared** | CI/CD, shared tooling, DNS | Low; no PII |
|
||||
| **company-inc-sandbox** | Dev experimentation | Lowest |
|
||||
| **company-inc-staging** | Staging, QA, and dev experimentation | Medium |
|
||||
| **company-inc-shared** | CI/CD, Artifact Registry, DNS | Low; no PII |
|
||||
|
||||
**Why not 4+ projects?**
|
||||
- A dedicated sandbox project adds billing, IAM, and networking overhead with little benefit at startup scale.
|
||||
- Developers can use Kubernetes namespaces within the staging cluster for experimentation.
|
||||
- A fourth project can be introduced later when team size or compliance (SOC2, HIPAA) requires it.
|
||||
|
||||
**Benefits:**
|
||||
- Billing separation per environment
|
||||
- Billing separation (prod costs are clearly visible)
|
||||
- Blast-radius containment (prod issues do not affect staging)
|
||||
- IAM and network isolation
|
||||
- Aligns with GCP best practices for multi-tenant or multi-env setups
|
||||
- IAM isolation between environments
|
||||
- Minimal fixed cost — only 3 projects to manage
|
||||
|
||||
---
|
||||
|
||||
@@ -96,14 +102,36 @@ flowchart TD
|
||||
- **Frontend (React):** Static assets served via CDN or container; 1–2 replicas
|
||||
- **Ingress:** GKE Ingress for HTTP(S) routing; consider GKE Gateway API for advanced use
|
||||
|
||||
### 4.4 Containerisation and CI/CD
|
||||
### 4.4 Blue-Green Deployment
|
||||
|
||||
Zero-downtime releases without duplicating infrastructure. Both versions run inside the **same GKE cluster**; the load balancer switches traffic atomically.
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
LB[Load Balancer]
|
||||
LB -->|100% traffic| Green[Green — v1.2.0<br/>current stable]
|
||||
LB -.->|0% traffic| Blue[Blue — v1.3.0<br/>new release]
|
||||
Blue -.->|smoke tests pass| LB
|
||||
```
|
||||
|
||||
| Phase | Action |
|
||||
|-------|--------|
|
||||
| **Deploy** | New version deployed to the idle slot (blue) |
|
||||
| **Test** | Run smoke tests / synthetic checks against blue |
|
||||
| **Switch** | Update Service selector or Ingress to point to blue |
|
||||
| **Rollback** | Instant — revert selector back to green (old version still running) |
|
||||
| **Cleanup** | Scale down old slot after confirmation period |
|
||||
|
||||
**Cost impact:** Near-zero — both slots share the same node pool; the idle slot consumes minimal resources until traffic is switched. Argo Rollouts automates the full lifecycle within ArgoCD.
|
||||
|
||||
### 4.5 Containerisation and CI/CD
|
||||
|
||||
| Aspect | Approach |
|
||||
|-------|----------|
|
||||
| **Image build** | Dockerfile per service; multi-stage builds; non-root user |
|
||||
| **Registry** | Artifact Registry (GCR) in `company-inc-shared` |
|
||||
| **CI** | GitHub Actions (or GitLab CI) — build, test, security scan |
|
||||
| **CD** | ArgoCD or Flux — GitOps; app of apps pattern |
|
||||
| **Registry** | Artifact Registry in `company-inc-shared` |
|
||||
| **CI** | GitHub/Gitea Actions — build, test, security scan |
|
||||
| **CD** | ArgoCD + Argo Rollouts — GitOps with blue-green strategy |
|
||||
| **Secrets** | External Secrets Operator + GCP Secret Manager |
|
||||
|
||||
---
|
||||
@@ -138,7 +166,29 @@ flowchart TD
|
||||
|
||||
---
|
||||
|
||||
## 6. High-Level Architecture Diagram
|
||||
## 6. Cost Optimisation Strategy
|
||||
|
||||
| Lever | Approach | Estimated Savings |
|
||||
|-------|----------|-------------------|
|
||||
| **3 projects, not 4** | Drop sandbox; use staging namespaces | ~25% fewer fixed project costs |
|
||||
| **GKE Autopilot** | Pay per pod, not per node; no idle nodes | 30–60% vs standard GKE |
|
||||
| **Blue-green in-cluster** | No duplicate environments for releases | Near-zero deployment cost |
|
||||
| **Spot/preemptible pods** | Use for staging and non-critical workloads | Up to 60–80% off compute |
|
||||
| **Committed use discounts** | 1-year CUDs once baseline is established | 20–30% off sustained use |
|
||||
| **CDN for frontend** | Offload SPA traffic from GKE | Fewer pod replicas needed |
|
||||
| **MongoDB Atlas auto-scale** | Start M10; scale up only when needed | Avoid over-provisioning |
|
||||
| **Cloud NAT shared** | Single NAT in shared project | Avoid per-project NAT cost |
|
||||
|
||||
**Monthly cost estimate (early stage):**
|
||||
- GKE Autopilot (2–3 API pods + 1 SPA): ~$80–150
|
||||
- MongoDB Atlas M10: ~$60
|
||||
- Load Balancer + Cloud NAT: ~$30
|
||||
- Artifact Registry + Secret Manager: ~$5
|
||||
- **Total: ~$175–245/month**
|
||||
|
||||
---
|
||||
|
||||
## 7. High-Level Architecture Diagram
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
@@ -169,16 +219,17 @@ flowchart TB
|
||||
|
||||
---
|
||||
|
||||
## 7. Summary of Recommendations
|
||||
## 8. Summary of Recommendations
|
||||
|
||||
| Area | Recommendation |
|
||||
|------|----------------|
|
||||
| **Cloud** | GCP with 4 projects (prod, staging, shared, sandbox) |
|
||||
| **Cloud** | GCP with 3 projects (prod, staging, shared) |
|
||||
| **Compute** | GKE Autopilot, private nodes, HPA |
|
||||
| **Deployments** | Blue-green via Argo Rollouts — zero downtime, instant rollback |
|
||||
| **Database** | MongoDB Atlas on GCP with multi-AZ, automated backups |
|
||||
| **CI/CD** | GitHub Actions + ArgoCD/Flux |
|
||||
| **CI/CD** | GitHub/Gitea Actions + ArgoCD |
|
||||
| **Security** | Private VPC, TLS everywhere, Secret Manager, least privilege |
|
||||
| **Cost** | Start small; use committed use discounts as usage grows |
|
||||
| **Cost** | ~$175–245/month early stage; spot pods, CUDs as traffic grows |
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user