Files
flamingo-tech-test/docs/architecture-design-company-inc.md
Andriy Oblivantsev edc552413e
Helm Chart CI & Release / Lint Helm Chart (push) Failing after 1s
Helm Chart CI & Release / Semantic Release (push) Has been skipped
Architecture: cost optimisation, blue-green deployment, reduce to 3 projects
- Reduce from 4 to 3 GCP projects (drop sandbox, use staging namespaces)
- Add blue-green deployment strategy via Argo Rollouts
- Add cost optimisation section with monthly estimate (~$175-245)
- Add blue-green flow diagram and cost pie chart to HLD

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-19 20:32:30 +00:00

8.8 KiB
Raw Blame History

Architectural Design Document: Company Inc.

Cloud Infrastructure for Web Application Deployment
Version: 1.0
Date: February 2026


1. Executive Summary

This document outlines a robust, scalable, secure, and cost-effective infrastructure design for Company Inc., a startup deploying a web application with a Python/Flask REST API backend, React SPA frontend, and MongoDB database. The design leverages Google Cloud Platform (GCP) with GKE (Google Kubernetes Engine) as the primary compute platform.

Key Design Principles: Cost awareness from day one, security-by-default, scalability when needed, and GitOps-based operations.


2. Cloud Provider and Environment Structure

2.1 Provider Choice: GCP

Rationale: GCP offers strong managed Kubernetes (GKE) with autopilot options, excellent MongoDB Atlas integration (or GCP-native DocumentDB alternatives), competitive pricing for startups, and simplified networking. GKE Autopilot reduces operational overhead for a small team with limited Kubernetes expertise.

2.2 Project Structure (Cost-Optimised)

For a startup, fewer projects mean lower overhead and simpler billing. Start with 3 projects and add more only when traffic or compliance demands it.

Project Purpose Isolation
company-inc-prod Production workloads High; sensitive data
company-inc-staging Staging, QA, and dev experimentation Medium
company-inc-shared CI/CD, Artifact Registry, DNS Low; no PII

Why not 4+ projects?

  • A dedicated sandbox project adds billing, IAM, and networking overhead with little benefit at startup scale.
  • Developers can use Kubernetes namespaces within the staging cluster for experimentation.
  • A fourth project can be introduced later when team size or compliance (SOC2, HIPAA) requires it.

Benefits:

  • Billing separation (prod costs are clearly visible)
  • Blast-radius containment (prod issues do not affect staging)
  • IAM isolation between environments
  • Minimal fixed cost — only 3 projects to manage

3. Network Design

3.1 VPC Architecture

  • One VPC per project (or Shared VPC from company-inc-shared for centralised control)
  • Regional subnets in at least 2 zones for HA
  • Private subnets for workloads (no public IPs on nodes)
  • Public subnets only for load balancers and NAT gateways

3.2 Security Layers

Layer Controls
VPC Firewall Default deny; allow only required CIDRs and ports
GKE node pools Private nodes; no public IPs
Security groups Kubernetes Network Policies + GKE-native security
Ingress HTTPS only; TLS termination at load balancer
Egress Cloud NAT for outbound; restrict to necessary destinations

3.3 Network Topology (High-Level)

flowchart TD
    Internet((Internet))
    Internet --> LB[Cloud Load Balancer<br/>HTTPS termination]
    LB --> Ingress[GKE Ingress Controller]

    subgraph VPC["VPC — Private Subnets"]
        Ingress --> API[API Pods<br/>Python / Flask]
        Ingress --> SPA[Frontend Pods<br/>React SPA]
        API --> DB[(MongoDB<br/>Private Endpoint)]
    end

4. Compute Platform: GKE

4.1 Cluster Strategy

  • GKE Autopilot for production and staging to minimise node management
  • Single regional cluster per environment initially; consider multi-region as scale demands
  • Private cluster with no public endpoint; access via IAP or Bastion if needed

4.2 Node Configuration

Setting Initial Growth Phase
Node type Autopilot (no manual sizing) Same
Min nodes 0 (scale to zero when idle) 2
Max nodes 5 50+
Scaling Pod-based (HPA, cluster autoscaler) Same

4.3 Workload Layout

  • Backend (Python/Flask): Deployment with HPA (CPU/memory); target 23 replicas initially
  • Frontend (React): Static assets served via CDN or container; 12 replicas
  • Ingress: GKE Ingress for HTTP(S) routing; consider GKE Gateway API for advanced use

4.4 Blue-Green Deployment

Zero-downtime releases without duplicating infrastructure. Both versions run inside the same GKE cluster; the load balancer switches traffic atomically.

flowchart LR
    LB[Load Balancer]
    LB -->|100% traffic| Green[Green — v1.2.0<br/>current stable]
    LB -.->|0% traffic| Blue[Blue — v1.3.0<br/>new release]
    Blue -.->|smoke tests pass| LB
Phase Action
Deploy New version deployed to the idle slot (blue)
Test Run smoke tests / synthetic checks against blue
Switch Update Service selector or Ingress to point to blue
Rollback Instant — revert selector back to green (old version still running)
Cleanup Scale down old slot after confirmation period

Cost impact: Near-zero — both slots share the same node pool; the idle slot consumes minimal resources until traffic is switched. Argo Rollouts automates the full lifecycle within ArgoCD.

4.5 Containerisation and CI/CD

Aspect Approach
Image build Dockerfile per service; multi-stage builds; non-root user
Registry Artifact Registry in company-inc-shared
CI GitHub/Gitea Actions — build, test, security scan
CD ArgoCD + Argo Rollouts — GitOps with blue-green strategy
Secrets External Secrets Operator + GCP Secret Manager

5. Database: MongoDB

5.1 Service Choice

MongoDB Atlas (or Google Cloud DocumentDB if strict GCP-only) recommended for:

  • Fully managed, automated backups
  • Multi-region replication
  • Strong security (encryption at rest, VPC peering)
  • Easy scaling

Atlas on GCP provides native VPC peering and private connectivity.

5.2 High Availability and DR

Topic Strategy
Replicas 3-node replica set; multi-AZ
Backups Continuous backup; point-in-time recovery
Disaster recovery Cross-region replica (e.g. us-central1 + europe-west1)
Restore testing Quarterly DR drills

5.3 Security

  • Private endpoint (no public IP)
  • TLS for all connections
  • IAM-based access; principle of least privilege
  • Encryption at rest (default in Atlas)

6. Cost Optimisation Strategy

Lever Approach Estimated Savings
3 projects, not 4 Drop sandbox; use staging namespaces ~25% fewer fixed project costs
GKE Autopilot Pay per pod, not per node; no idle nodes 3060% vs standard GKE
Blue-green in-cluster No duplicate environments for releases Near-zero deployment cost
Spot/preemptible pods Use for staging and non-critical workloads Up to 6080% off compute
Committed use discounts 1-year CUDs once baseline is established 2030% off sustained use
CDN for frontend Offload SPA traffic from GKE Fewer pod replicas needed
MongoDB Atlas auto-scale Start M10; scale up only when needed Avoid over-provisioning
Cloud NAT shared Single NAT in shared project Avoid per-project NAT cost

Monthly cost estimate (early stage):

  • GKE Autopilot (23 API pods + 1 SPA): ~$80150
  • MongoDB Atlas M10: ~$60
  • Load Balancer + Cloud NAT: ~$30
  • Artifact Registry + Secret Manager: ~$5
  • Total: ~$175245/month

7. High-Level Architecture Diagram

flowchart TB
    Users((Users))

    Users --> CDN[Cloud CDN<br/>Static Assets]
    Users --> LB[Cloud Load Balancer<br/>HTTPS]

    subgraph GKE["GKE Cluster — Private"]
        LB --> Ingress[Ingress Controller]
        Ingress --> API[Backend — Flask<br/>HPA 23 replicas]
        Ingress --> SPA[Frontend — React SPA<br/>Nginx]
        CDN --> SPA
        API --> Redis[Redis<br/>Memorystore]
        API --> Obs[Observability<br/>Prometheus / Grafana]
    end

    subgraph Data["Managed Services"]
        Mongo[(MongoDB Atlas<br/>Replica Set · Private Endpoint)]
        Secrets[Secret Manager<br/>App & DB credentials]
        Registry[Artifact Registry<br/>Container images]
    end

    API --> Mongo
    API --> Secrets
    GKE --> Registry

8. Summary of Recommendations

Area Recommendation
Cloud GCP with 3 projects (prod, staging, shared)
Compute GKE Autopilot, private nodes, HPA
Deployments Blue-green via Argo Rollouts — zero downtime, instant rollback
Database MongoDB Atlas on GCP with multi-AZ, automated backups
CI/CD GitHub/Gitea Actions + ArgoCD
Security Private VPC, TLS everywhere, Secret Manager, least privilege
Cost ~$175245/month early stage; spot pods, CUDs as traffic grows

See architecture-hld.md for the standalone HLD diagram.