# Architectural Design Document: Company Inc. **Cloud Infrastructure for Web Application Deployment** **Version:** 1.0 **Date:** February 2026 --- ## 1. Executive Summary This document outlines a robust, scalable, secure, and cost-effective infrastructure design for Company Inc., a startup deploying a web application with a Python/Flask REST API backend, React SPA frontend, and MongoDB database. The design leverages **Google Cloud Platform (GCP)** with **GKE (Google Kubernetes Engine)** as the primary compute platform. **Key Design Principles:** Cost awareness from day one, security-by-default, scalability when needed, and GitOps-based operations. --- ## 2. Cloud Provider and Environment Structure ### 2.1 Provider Choice: GCP **Rationale:** GCP offers strong managed Kubernetes (GKE) with autopilot options, excellent MongoDB Atlas integration (or GCP-native DocumentDB alternatives), competitive pricing for startups, and simplified networking. GKE Autopilot reduces operational overhead for a small team with limited Kubernetes expertise. ### 2.2 Project Structure (Cost-Optimised) For a startup, fewer projects mean lower overhead and simpler billing. Start with **3 projects** and add more only when traffic or compliance demands it. | Project | Purpose | Isolation | |---------|---------|-----------| | **company-inc-prod** | Production workloads | High; sensitive data | | **company-inc-staging** | Staging, QA, and dev experimentation | Medium | | **company-inc-shared** | CI/CD, Artifact Registry, DNS | Low; no PII | **Why not 4+ projects?** - A dedicated sandbox project adds billing, IAM, and networking overhead with little benefit at startup scale. - Developers can use Kubernetes namespaces within the staging cluster for experimentation. - A fourth project can be introduced later when team size or compliance (SOC2, HIPAA) requires it. **Benefits:** - Billing separation (prod costs are clearly visible) - Blast-radius containment (prod issues do not affect staging) - IAM isolation between environments - Minimal fixed cost — only 3 projects to manage --- ## 3. Network Design ### 3.1 VPC Architecture - **One VPC per project** (or Shared VPC from `company-inc-shared` for centralised control) - **Regional subnets** in at least 2 zones for HA - **Private subnets** for workloads (no public IPs on nodes) - **Public subnets** only for load balancers and NAT gateways ### 3.2 Security Layers | Layer | Controls | |-------|----------| | **VPC Firewall** | Default deny; allow only required CIDRs and ports | | **GKE node pools** | Private nodes; no public IPs | | **Security groups** | Kubernetes Network Policies + GKE-native security | | **Ingress** | HTTPS only; TLS termination at load balancer | | **Egress** | Cloud NAT for outbound; restrict to necessary destinations | ### 3.3 Network Topology (High-Level) ```mermaid flowchart TD Internet((Internet)) Internet --> LB[Cloud Load Balancer
HTTPS termination] LB --> Ingress[GKE Ingress Controller] subgraph VPC["VPC — Private Subnets"] Ingress --> API[API Pods
Python / Flask] Ingress --> SPA[Frontend Pods
React SPA] API --> DB[(MongoDB
Private Endpoint)] end ``` --- ## 4. Compute Platform: GKE ### 4.1 Cluster Strategy - **GKE Autopilot** for production and staging to minimise node management - **Single regional cluster** per environment initially; consider multi-region as scale demands - **Private cluster** with no public endpoint; access via IAP or Bastion if needed ### 4.2 Node Configuration | Setting | Initial | Growth Phase | |---------|---------|--------------| | **Node type** | Autopilot (no manual sizing) | Same | | **Min nodes** | 0 (scale to zero when idle) | 2 | | **Max nodes** | 5 | 50+ | | **Scaling** | Pod-based (HPA, cluster autoscaler) | Same | ### 4.3 Workload Layout - **Backend (Python/Flask):** Deployment with HPA (CPU/memory); target 2–3 replicas initially - **Frontend (React):** Static assets served via CDN or container; 1–2 replicas - **Ingress:** GKE Ingress for HTTP(S) routing; consider GKE Gateway API for advanced use ### 4.4 Blue-Green Deployment Zero-downtime releases without duplicating infrastructure. Both versions run inside the **same GKE cluster**; the load balancer switches traffic atomically. ```mermaid flowchart LR LB[Load Balancer] LB -->|100% traffic| Green[Green — v1.2.0
current stable] LB -.->|0% traffic| Blue[Blue — v1.3.0
new release] Blue -.->|smoke tests pass| LB ``` | Phase | Action | |-------|--------| | **Deploy** | New version deployed to the idle slot (blue) | | **Test** | Run smoke tests / synthetic checks against blue | | **Switch** | Update Service selector or Ingress to point to blue | | **Rollback** | Instant — revert selector back to green (old version still running) | | **Cleanup** | Scale down old slot after confirmation period | **Cost impact:** Near-zero — both slots share the same node pool; the idle slot consumes minimal resources until traffic is switched. Argo Rollouts automates the full lifecycle within ArgoCD. ### 4.5 Containerisation and CI/CD | Aspect | Approach | |-------|----------| | **Image build** | Dockerfile per service; multi-stage builds; non-root user | | **Registry** | Artifact Registry in `company-inc-shared` | | **CI** | GitHub/Gitea Actions — build, test, security scan | | **CD** | ArgoCD + Argo Rollouts — GitOps with blue-green strategy | | **Secrets** | External Secrets Operator + GCP Secret Manager | --- ## 5. Database: MongoDB ### 5.1 Service Choice **MongoDB Atlas** (or **Google Cloud DocumentDB** if strict GCP-only) recommended for: - Fully managed, automated backups - Multi-region replication - Strong security (encryption at rest, VPC peering) - Easy scaling **Atlas on GCP** provides native VPC peering and private connectivity. ### 5.2 High Availability and DR | Topic | Strategy | |-------|----------| | **Replicas** | 3-node replica set; multi-AZ | | **Backups** | Continuous backup; point-in-time recovery | | **Disaster recovery** | Cross-region replica (e.g. `us-central1` + `europe-west1`) | | **Restore testing** | Quarterly DR drills | ### 5.3 Security - Private endpoint (no public IP) - TLS for all connections - IAM-based access; principle of least privilege - Encryption at rest (default in Atlas) --- ## 6. Cost Optimisation Strategy | Lever | Approach | Estimated Savings | |-------|----------|-------------------| | **3 projects, not 4** | Drop sandbox; use staging namespaces | ~25% fewer fixed project costs | | **GKE Autopilot** | Pay per pod, not per node; no idle nodes | 30–60% vs standard GKE | | **Blue-green in-cluster** | No duplicate environments for releases | Near-zero deployment cost | | **Spot/preemptible pods** | Use for staging and non-critical workloads | Up to 60–80% off compute | | **Committed use discounts** | 1-year CUDs once baseline is established | 20–30% off sustained use | | **CDN for frontend** | Offload SPA traffic from GKE | Fewer pod replicas needed | | **MongoDB Atlas auto-scale** | Start M10; scale up only when needed | Avoid over-provisioning | | **Cloud NAT shared** | Single NAT in shared project | Avoid per-project NAT cost | **Monthly cost estimate (early stage):** - GKE Autopilot (2–3 API pods + 1 SPA): ~$80–150 - MongoDB Atlas M10: ~$60 - Load Balancer + Cloud NAT: ~$30 - Artifact Registry + Secret Manager: ~$5 - **Total: ~$175–245/month** --- ## 7. High-Level Architecture Diagram ```mermaid flowchart TB Users((Users)) Users --> CDN[Cloud CDN
Static Assets] Users --> LB[Cloud Load Balancer
HTTPS] subgraph GKE["GKE Cluster — Private"] LB --> Ingress[Ingress Controller] Ingress --> API[Backend — Flask
HPA 2–3 replicas] Ingress --> SPA[Frontend — React SPA
Nginx] CDN --> SPA API --> Redis[Redis
Memorystore] API --> Obs[Observability
Prometheus / Grafana] end subgraph Data["Managed Services"] Mongo[(MongoDB Atlas
Replica Set · Private Endpoint)] Secrets[Secret Manager
App & DB credentials] Registry[Artifact Registry
Container images] end API --> Mongo API --> Secrets GKE --> Registry ``` --- ## 8. Summary of Recommendations | Area | Recommendation | |------|----------------| | **Cloud** | GCP with 3 projects (prod, staging, shared) | | **Compute** | GKE Autopilot, private nodes, HPA | | **Deployments** | Blue-green via Argo Rollouts — zero downtime, instant rollback | | **Database** | MongoDB Atlas on GCP with multi-AZ, automated backups | | **CI/CD** | GitHub/Gitea Actions + ArgoCD | | **Security** | Private VPC, TLS everywhere, Secret Manager, least privilege | | **Cost** | ~$175–245/month early stage; spot pods, CUDs as traffic grows | --- *See [architecture-hld.md](architecture-hld.md) for the standalone HLD diagram.*