eSlider/flamingo-tech-test

Fork 0

Files

Andriy Oblivantsev edc552413e

Helm Chart CI & Release / Lint Helm Chart (push) Failing after 1s

Details

Helm Chart CI & Release / Semantic Release (push) Has been skipped

Details

Architecture: cost optimisation, blue-green deployment, reduce to 3 projects

- Reduce from 4 to 3 GCP projects (drop sandbox, use staging namespaces)
- Add blue-green deployment strategy via Argo Rollouts
- Add cost optimisation section with monthly estimate (~$175-245)
- Add blue-green flow diagram and cost pie chart to HLD

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-02-19 20:32:30 +00:00

8.8 KiB

Raw Blame History

Architectural Design Document: Company Inc.

Cloud Infrastructure for Web Application Deployment
Version: 1.0
Date: February 2026

1. Executive Summary

This document outlines a robust, scalable, secure, and cost-effective infrastructure design for Company Inc., a startup deploying a web application with a Python/Flask REST API backend, React SPA frontend, and MongoDB database. The design leverages Google Cloud Platform (GCP) with GKE (Google Kubernetes Engine) as the primary compute platform.

Key Design Principles: Cost awareness from day one, security-by-default, scalability when needed, and GitOps-based operations.

2. Cloud Provider and Environment Structure

2.1 Provider Choice: GCP

Rationale: GCP offers strong managed Kubernetes (GKE) with autopilot options, excellent MongoDB Atlas integration (or GCP-native DocumentDB alternatives), competitive pricing for startups, and simplified networking. GKE Autopilot reduces operational overhead for a small team with limited Kubernetes expertise.

2.2 Project Structure (Cost-Optimised)

For a startup, fewer projects mean lower overhead and simpler billing. Start with 3 projects and add more only when traffic or compliance demands it.

Project	Purpose	Isolation
company-inc-prod	Production workloads	High; sensitive data
company-inc-staging	Staging, QA, and dev experimentation	Medium
company-inc-shared	CI/CD, Artifact Registry, DNS	Low; no PII

Why not 4+ projects?

A dedicated sandbox project adds billing, IAM, and networking overhead with little benefit at startup scale.
Developers can use Kubernetes namespaces within the staging cluster for experimentation.
A fourth project can be introduced later when team size or compliance (SOC2, HIPAA) requires it.

Benefits:

Billing separation (prod costs are clearly visible)
Blast-radius containment (prod issues do not affect staging)
IAM isolation between environments
Minimal fixed cost — only 3 projects to manage

3. Network Design

3.1 VPC Architecture

One VPC per project (or Shared VPC from company-inc-shared for centralised control)
Regional subnets in at least 2 zones for HA
Private subnets for workloads (no public IPs on nodes)
Public subnets only for load balancers and NAT gateways

3.2 Security Layers

Layer	Controls
VPC Firewall	Default deny; allow only required CIDRs and ports
GKE node pools	Private nodes; no public IPs
Security groups	Kubernetes Network Policies + GKE-native security
Ingress	HTTPS only; TLS termination at load balancer
Egress	Cloud NAT for outbound; restrict to necessary destinations

3.3 Network Topology (High-Level)

flowchart TD
    Internet((Internet))
    Internet --> LB[Cloud Load Balancer<br/>HTTPS termination]
    LB --> Ingress[GKE Ingress Controller]

    subgraph VPC["VPC — Private Subnets"]
        Ingress --> API[API Pods<br/>Python / Flask]
        Ingress --> SPA[Frontend Pods<br/>React SPA]
        API --> DB[(MongoDB<br/>Private Endpoint)]
    end

4. Compute Platform: GKE

4.1 Cluster Strategy

GKE Autopilot for production and staging to minimise node management
Single regional cluster per environment initially; consider multi-region as scale demands
Private cluster with no public endpoint; access via IAP or Bastion if needed

4.2 Node Configuration

Setting	Initial	Growth Phase
Node type	Autopilot (no manual sizing)	Same
Min nodes	0 (scale to zero when idle)	2
Max nodes	5	50+
Scaling	Pod-based (HPA, cluster autoscaler)	Same

4.3 Workload Layout

Backend (Python/Flask): Deployment with HPA (CPU/memory); target 2–3 replicas initially
Frontend (React): Static assets served via CDN or container; 1–2 replicas
Ingress: GKE Ingress for HTTP(S) routing; consider GKE Gateway API for advanced use

4.4 Blue-Green Deployment

Zero-downtime releases without duplicating infrastructure. Both versions run inside the same GKE cluster; the load balancer switches traffic atomically.

flowchart LR
    LB[Load Balancer]
    LB -->|100% traffic| Green[Green — v1.2.0<br/>current stable]
    LB -.->|0% traffic| Blue[Blue — v1.3.0<br/>new release]
    Blue -.->|smoke tests pass| LB

Phase	Action
Deploy	New version deployed to the idle slot (blue)
Test	Run smoke tests / synthetic checks against blue
Switch	Update Service selector or Ingress to point to blue
Rollback	Instant — revert selector back to green (old version still running)
Cleanup	Scale down old slot after confirmation period

Cost impact: Near-zero — both slots share the same node pool; the idle slot consumes minimal resources until traffic is switched. Argo Rollouts automates the full lifecycle within ArgoCD.

4.5 Containerisation and CI/CD

Aspect	Approach
Image build	Dockerfile per service; multi-stage builds; non-root user
Registry	Artifact Registry in `company-inc-shared`
CI	GitHub/Gitea Actions — build, test, security scan
CD	ArgoCD + Argo Rollouts — GitOps with blue-green strategy
Secrets	External Secrets Operator + GCP Secret Manager

5. Database: MongoDB

5.1 Service Choice

MongoDB Atlas (or Google Cloud DocumentDB if strict GCP-only) recommended for:

Fully managed, automated backups
Multi-region replication
Strong security (encryption at rest, VPC peering)
Easy scaling

Atlas on GCP provides native VPC peering and private connectivity.

5.2 High Availability and DR

Topic	Strategy
Replicas	3-node replica set; multi-AZ
Backups	Continuous backup; point-in-time recovery
Disaster recovery	Cross-region replica (e.g. `us-central1` + `europe-west1`)
Restore testing	Quarterly DR drills

5.3 Security

Private endpoint (no public IP)
TLS for all connections
IAM-based access; principle of least privilege
Encryption at rest (default in Atlas)

6. Cost Optimisation Strategy

Lever	Approach	Estimated Savings
3 projects, not 4	Drop sandbox; use staging namespaces	~25% fewer fixed project costs
GKE Autopilot	Pay per pod, not per node; no idle nodes	30–60% vs standard GKE
Blue-green in-cluster	No duplicate environments for releases	Near-zero deployment cost
Spot/preemptible pods	Use for staging and non-critical workloads	Up to 60–80% off compute
Committed use discounts	1-year CUDs once baseline is established	20–30% off sustained use
CDN for frontend	Offload SPA traffic from GKE	Fewer pod replicas needed
MongoDB Atlas auto-scale	Start M10; scale up only when needed	Avoid over-provisioning
Cloud NAT shared	Single NAT in shared project	Avoid per-project NAT cost

Monthly cost estimate (early stage):

GKE Autopilot (2–3 API pods + 1 SPA): ~$80–150
MongoDB Atlas M10: ~$60
Load Balancer + Cloud NAT: ~$30
Artifact Registry + Secret Manager: ~$5
Total: ~$175–245/month

7. High-Level Architecture Diagram

flowchart TB
    Users((Users))

    Users --> CDN[Cloud CDN<br/>Static Assets]
    Users --> LB[Cloud Load Balancer<br/>HTTPS]

    subgraph GKE["GKE Cluster — Private"]
        LB --> Ingress[Ingress Controller]
        Ingress --> API[Backend — Flask<br/>HPA 2–3 replicas]
        Ingress --> SPA[Frontend — React SPA<br/>Nginx]
        CDN --> SPA
        API --> Redis[Redis<br/>Memorystore]
        API --> Obs[Observability<br/>Prometheus / Grafana]
    end

    subgraph Data["Managed Services"]
        Mongo[(MongoDB Atlas<br/>Replica Set · Private Endpoint)]
        Secrets[Secret Manager<br/>App & DB credentials]
        Registry[Artifact Registry<br/>Container images]
    end

    API --> Mongo
    API --> Secrets
    GKE --> Registry

8. Summary of Recommendations

Area	Recommendation
Cloud	GCP with 3 projects (prod, staging, shared)
Compute	GKE Autopilot, private nodes, HPA
Deployments	Blue-green via Argo Rollouts — zero downtime, instant rollback
Database	MongoDB Atlas on GCP with multi-AZ, automated backups
CI/CD	GitHub/Gitea Actions + ArgoCD
Security	Private VPC, TLS everywhere, Secret Manager, least privilege
Cost	~$175–245/month early stage; spot pods, CUDs as traffic grows

See architecture-hld.md for the standalone HLD diagram.

8.8 KiB Raw Blame History Unescape Escape