Pragmatic analysis of components that add cost/complexity without
value at startup scale, with guidance on when to introduce each.
Co-authored-by: Cursor <cursoragent@cursor.com>
This document outlines a robust, scalable, secure, and cost-effective infrastructure design for Company Inc., a startup deploying a web application with a Python/Flask REST API backend, React SPA frontend, and MongoDB database. The design leverages **Google Cloud Platform (GCP)** with **GKE (Google Kubernetes Engine)** as the primary compute platform.
This document outlines a robust, scalable, secure, and cost-effective infrastructure design for Company Inc., a startup deploying a web application with a Python/Flask REST API backend, React SPA frontend, and MongoDB database. The design leverages **Google Cloud Platform (GCP)** with **GKE (Google Kubernetes Engine)** as the primary compute platform.
**Key Design Principles:**Security-by-default, scalability from day one, cost optimization for early stage, and GitOps-based operations.
**Key Design Principles:**Cost awareness from day one, security-by-default, scalability when needed, and GitOps-based operations.
---
---
@@ -20,20 +20,26 @@ This document outlines a robust, scalable, secure, and cost-effective infrastruc
**Rationale:** GCP offers strong managed Kubernetes (GKE) with autopilot options, excellent MongoDB Atlas integration (or GCP-native DocumentDB alternatives), competitive pricing for startups, and simplified networking. GKE Autopilot reduces operational overhead for a small team with limited Kubernetes expertise.
**Rationale:** GCP offers strong managed Kubernetes (GKE) with autopilot options, excellent MongoDB Atlas integration (or GCP-native DocumentDB alternatives), competitive pricing for startups, and simplified networking. GKE Autopilot reduces operational overhead for a small team with limited Kubernetes expertise.
### 2.2 Multi-Project Structure
### 2.2 Project Structure (Cost-Optimised)
For a startup, fewer projects mean lower overhead and simpler billing. Start with **3 projects** and add more only when traffic or compliance demands it.
| Project | Purpose | Isolation |
| Project | Purpose | Isolation |
|---------|---------|-----------|
|---------|---------|-----------|
| **company-inc-prod** | Production workloads | High; sensitive data |
| **company-inc-prod** | Production workloads | High; sensitive data |
| **company-inc-staging** | Staging / pre-production | Medium |
| **company-inc-staging** | Staging, QA, and dev experimentation | Medium |
| **company-inc-shared** | CI/CD, shared tooling, DNS | Low; no PII |
| **company-inc-shared** | CI/CD, Artifact Registry, DNS | Low; no PII |
| **company-inc-sandbox** | Dev experimentation | Lowest |
**Why not 4+ projects?**
- A dedicated sandbox project adds billing, IAM, and networking overhead with little benefit at startup scale.
- Developers can use Kubernetes namespaces within the staging cluster for experimentation.
- A fourth project can be introduced later when team size or compliance (SOC2, HIPAA) requires it.
**Benefits:**
**Benefits:**
- Billing separation per environment
- Billing separation (prod costs are clearly visible)
- Blast-radius containment (prod issues do not affect staging)
- Blast-radius containment (prod issues do not affect staging)
- IAM and network isolation
- IAM isolation between environments
-Aligns with GCP best practices for multi-tenant or multi-env setups
-Minimal fixed cost — only 3 projects to manage
---
---
@@ -96,14 +102,36 @@ flowchart TD
- **Frontend (React):** Static assets served via CDN or container; 1–2 replicas
- **Frontend (React):** Static assets served via CDN or container; 1–2 replicas
- **Ingress:** GKE Ingress for HTTP(S) routing; consider GKE Gateway API for advanced use
- **Ingress:** GKE Ingress for HTTP(S) routing; consider GKE Gateway API for advanced use
### 4.4 Containerisation and CI/CD
### 4.4 Blue-Green Deployment
Zero-downtime releases without duplicating infrastructure. Both versions run inside the **same GKE cluster**; the load balancer switches traffic atomically.
| **Deploy** | New version deployed to the idle slot (blue) |
| **Test** | Run smoke tests / synthetic checks against blue |
| **Switch** | Update Service selector or Ingress to point to blue |
| **Rollback** | Instant — revert selector back to green (old version still running) |
| **Cleanup** | Scale down old slot after confirmation period |
**Cost impact:** Near-zero — both slots share the same node pool; the idle slot consumes minimal resources until traffic is switched. Argo Rollouts automates the full lifecycle within ArgoCD.
### 4.5 Containerisation and CI/CD
| Aspect | Approach |
| Aspect | Approach |
|-------|----------|
|-------|----------|
| **Image build** | Dockerfile per service; multi-stage builds; non-root user |
| **Image build** | Dockerfile per service; multi-stage builds; non-root user |
| **Registry** | Artifact Registry (GCR) in `company-inc-shared` |
| **Registry** | Artifact Registry in `company-inc-shared` |
| **3 projects, not 4** | Drop sandbox; use staging namespaces | ~25% fewer fixed project costs |
| **GKE Autopilot** | Pay per pod, not per node; no idle nodes | 30–60% vs standard GKE |
| **Blue-green in-cluster** | No duplicate environments for releases | Near-zero deployment cost |
| **Spot/preemptible pods** | Use for staging and non-critical workloads | Up to 60–80% off compute |
| **Committed use discounts** | 1-year CUDs once baseline is established | 20–30% off sustained use |
| **CDN for frontend** | Offload SPA traffic from GKE | Fewer pod replicas needed |
| **MongoDB Atlas auto-scale** | Start M10; scale up only when needed | Avoid over-provisioning |
| **Cloud NAT shared** | Single NAT in shared project | Avoid per-project NAT cost |
**Monthly cost estimate (early stage):**
- GKE Autopilot (2–3 API pods + 1 SPA): ~$80–150
- MongoDB Atlas M10: ~$60
- Load Balancer + Cloud NAT: ~$30
- Artifact Registry + Secret Manager: ~$5
- **Total: ~$175–245/month**
### 6.1 What Would Be Overkill at This Stage
Not everything in a "best practices" architecture is worth implementing on day one. The following are valuable at scale but add cost and complexity that a startup with a few hundred users/day does not need yet.
| Component | Why it's overkill now | When to introduce |
| **Multi-region GKE** | Single region handles millions of req/day; multi-region doubles cost | When SLA requires 99.99% or users span continents |
| **Service mesh (Istio/Linkerd)** | Adds sidecar overhead, complexity, and debugging difficulty | When you have 10+ microservices with mTLS requirements |
| **Cross-region MongoDB replica** | Atlas M10 with multi-AZ is sufficient; cross-region adds ~2x DB cost | When RPO < 1 hour is a compliance requirement |
| **Dedicated observability stack** | GKE built-in monitoring + Cloud Logging is free; Prometheus/Grafana adds ops burden | When team has > 2 SREs and needs custom dashboards |
| **4+ GCP projects** | 3 projects cover prod/staging/shared; more adds IAM and billing complexity | When compliance (SOC2, HIPAA) requires strict separation |
| **API Gateway (Apigee, Kong)** | GKE Ingress handles routing; a gateway adds cost and latency | When you need rate limiting, API keys, or monetisation |
| **Vault for secrets** | GCP Secret Manager is cheaper, simpler, and natively integrated | When you need dynamic secrets or multi-cloud secret federation |
**Rule of thumb:** if a component doesn't solve a problem you have *today*, defer it. Every added piece increases the monthly bill and the on-call surface area.
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.