flamingo-tech-test/docs/architecture-design-company-inc.md

# Architectural Design Document: Company Inc.

**Cloud Infrastructure for Web Application Deployment**
**Version:** 1.0
**Date:** February 2026

---

## 1. Executive Summary

This document outlines a robust, scalable, secure, and cost-effective infrastructure design for Company Inc., a startup deploying a web application with a Python/Flask REST API backend, React SPA frontend, and MongoDB database. The design leverages **Google Cloud Platform (GCP)** with **GKE (Google Kubernetes Engine)** as the primary compute platform.

**Key Design Principles:** Security-by-default, scalability from day one, cost optimization for early stage, and GitOps-based operations.

---

## 2. Cloud Provider and Environment Structure

### 2.1 Provider Choice: GCP

**Rationale:** GCP offers strong managed Kubernetes (GKE) with autopilot options, excellent MongoDB Atlas integration (or GCP-native DocumentDB alternatives), competitive pricing for startups, and simplified networking. GKE Autopilot reduces operational overhead for a small team with limited Kubernetes expertise.

### 2.2 Multi-Project Structure

| Project | Purpose | Isolation |
|---------|---------|-----------|
| **company-inc-prod** | Production workloads | High; sensitive data |
| **company-inc-staging** | Staging / pre-production | Medium |
| **company-inc-shared** | CI/CD, shared tooling, DNS | Low; no PII |
| **company-inc-sandbox** | Dev experimentation | Lowest |

**Benefits:**
- Billing separation per environment
- Blast-radius containment (prod issues do not affect staging)
- IAM and network isolation
- Aligns with GCP best practices for multi-tenant or multi-env setups

---

## 3. Network Design

### 3.1 VPC Architecture

- **One VPC per project** (or Shared VPC from `company-inc-shared` for centralised control)
- **Regional subnets** in at least 2 zones for HA
- **Private subnets** for workloads (no public IPs on nodes)
- **Public subnets** only for load balancers and NAT gateways

### 3.2 Security Layers

| Layer | Controls |
|-------|----------|
| **VPC Firewall** | Default deny; allow only required CIDRs and ports |
| **GKE node pools** | Private nodes; no public IPs |
| **Security groups** | Kubernetes Network Policies + GKE-native security |
| **Ingress** | HTTPS only; TLS termination at load balancer |
| **Egress** | Cloud NAT for outbound; restrict to necessary destinations |

### 3.3 Network Topology (High-Level)

```mermaid
flowchart TD
    Internet((Internet))
    Internet --> LB[Cloud Load Balancer<br/>HTTPS termination]
    LB --> Ingress[GKE Ingress Controller]

    subgraph VPC["VPC — Private Subnets"]
        Ingress --> API[API Pods<br/>Python / Flask]
        Ingress --> SPA[Frontend Pods<br/>React SPA]
        API --> DB[(MongoDB<br/>Private Endpoint)]
    end
```

---

## 4. Compute Platform: GKE

### 4.1 Cluster Strategy

- **GKE Autopilot** for production and staging to minimise node management
- **Single regional cluster** per environment initially; consider multi-region as scale demands
- **Private cluster** with no public endpoint; access via IAP or Bastion if needed

### 4.2 Node Configuration

| Setting | Initial | Growth Phase |
|---------|---------|--------------|
| **Node type** | Autopilot (no manual sizing) | Same |
| **Min nodes** | 0 (scale to zero when idle) | 2 |
| **Max nodes** | 5 | 50+ |
| **Scaling** | Pod-based (HPA, cluster autoscaler) | Same |

### 4.3 Workload Layout

- **Backend (Python/Flask):** Deployment with HPA (CPU/memory); target 2–3 replicas initially
- **Frontend (React):** Static assets served via CDN or container; 1–2 replicas
- **Ingress:** GKE Ingress for HTTP(S) routing; consider GKE Gateway API for advanced use

### 4.4 Containerisation and CI/CD

| Aspect | Approach |
|-------|----------|
| **Image build** | Dockerfile per service; multi-stage builds; non-root user |
| **Registry** | Artifact Registry (GCR) in `company-inc-shared` |
| **CI** | GitHub Actions (or GitLab CI) — build, test, security scan |
| **CD** | ArgoCD or Flux — GitOps; app of apps pattern |
| **Secrets** | External Secrets Operator + GCP Secret Manager |

---

## 5. Database: MongoDB

### 5.1 Service Choice

**MongoDB Atlas** (or **Google Cloud DocumentDB** if strict GCP-only) recommended for:
- Fully managed, automated backups
- Multi-region replication
- Strong security (encryption at rest, VPC peering)
- Easy scaling

**Atlas on GCP** provides native VPC peering and private connectivity.

### 5.2 High Availability and DR

| Topic | Strategy |
|-------|----------|
| **Replicas** | 3-node replica set; multi-AZ |
| **Backups** | Continuous backup; point-in-time recovery |
| **Disaster recovery** | Cross-region replica (e.g. `us-central1` + `europe-west1`) |
| **Restore testing** | Quarterly DR drills |

### 5.3 Security

- Private endpoint (no public IP)
- TLS for all connections
- IAM-based access; principle of least privilege
- Encryption at rest (default in Atlas)

---

## 6. High-Level Architecture Diagram

```mermaid
flowchart TB
    Users((Users))

    Users --> CDN[Cloud CDN<br/>Static Assets]
    Users --> LB[Cloud Load Balancer<br/>HTTPS]

    subgraph GKE["GKE Cluster — Private"]
        LB --> Ingress[Ingress Controller]
        Ingress --> API[Backend — Flask<br/>HPA 2–3 replicas]
        Ingress --> SPA[Frontend — React SPA<br/>Nginx]
        CDN --> SPA
        API --> Redis[Redis<br/>Memorystore]
        API --> Obs[Observability<br/>Prometheus / Grafana]
    end

    subgraph Data["Managed Services"]
        Mongo[(MongoDB Atlas<br/>Replica Set · Private Endpoint)]
        Secrets[Secret Manager<br/>App & DB credentials]
        Registry[Artifact Registry<br/>Container images]
    end

    API --> Mongo
    API --> Secrets
    GKE --> Registry
```

---

## 7. Summary of Recommendations

| Area | Recommendation |
|------|----------------|
| **Cloud** | GCP with 4 projects (prod, staging, shared, sandbox) |
| **Compute** | GKE Autopilot, private nodes, HPA |
| **Database** | MongoDB Atlas on GCP with multi-AZ, automated backups |
| **CI/CD** | GitHub Actions + ArgoCD/Flux |
| **Security** | Private VPC, TLS everywhere, Secret Manager, least privilege |
| **Cost** | Start small; use committed use discounts as usage grows |

---

*See [architecture-hld.md](architecture-hld.md) for the standalone HLD diagram.*