Files
flamingo-tech-test/docs/architecture-design-company-inc.md

204 lines
7.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Architectural Design Document: Company Inc.
**Cloud Infrastructure for Web Application Deployment**
**Version:** 1.0
**Date:** February 2026
---
## 1. Executive Summary
This document outlines a robust, scalable, secure, and cost-effective infrastructure design for Company Inc., a startup deploying a web application with a Python/Flask REST API backend, React SPA frontend, and MongoDB database. The design leverages **Google Cloud Platform (GCP)** with **GKE (Google Kubernetes Engine)** as the primary compute platform.
**Key Design Principles:** Security-by-default, scalability from day one, cost optimization for early stage, and GitOps-based operations.
---
## 2. Cloud Provider and Environment Structure
### 2.1 Provider Choice: GCP
**Rationale:** GCP offers strong managed Kubernetes (GKE) with autopilot options, excellent MongoDB Atlas integration (or GCP-native DocumentDB alternatives), competitive pricing for startups, and simplified networking. GKE Autopilot reduces operational overhead for a small team with limited Kubernetes expertise.
### 2.2 Multi-Project Structure
| Project | Purpose | Isolation |
|---------|---------|-----------|
| **company-inc-prod** | Production workloads | High; sensitive data |
| **company-inc-staging** | Staging / pre-production | Medium |
| **company-inc-shared** | CI/CD, shared tooling, DNS | Low; no PII |
| **company-inc-sandbox** | Dev experimentation | Lowest |
**Benefits:**
- Billing separation per environment
- Blast-radius containment (prod issues do not affect staging)
- IAM and network isolation
- Aligns with GCP best practices for multi-tenant or multi-env setups
---
## 3. Network Design
### 3.1 VPC Architecture
- **One VPC per project** (or Shared VPC from `company-inc-shared` for centralised control)
- **Regional subnets** in at least 2 zones for HA
- **Private subnets** for workloads (no public IPs on nodes)
- **Public subnets** only for load balancers and NAT gateways
### 3.2 Security Layers
| Layer | Controls |
|-------|----------|
| **VPC Firewall** | Default deny; allow only required CIDRs and ports |
| **GKE node pools** | Private nodes; no public IPs |
| **Security groups** | Kubernetes Network Policies + GKE-native security |
| **Ingress** | HTTPS only; TLS termination at load balancer |
| **Egress** | Cloud NAT for outbound; restrict to necessary destinations |
### 3.3 Network Topology (High-Level)
```
Internet
|
v
[Cloud Load Balancer] (HTTPS)
|
v
[GKE Ingress Controller]
|
v
[VPC Private Subnets]
|
+-- [GKE Cluster - API Pods]
+-- [GKE Cluster - Frontend Pods]
|
v
[Private connectivity to MongoDB]
```
---
## 4. Compute Platform: GKE
### 4.1 Cluster Strategy
- **GKE Autopilot** for production and staging to minimise node management
- **Single regional cluster** per environment initially; consider multi-region as scale demands
- **Private cluster** with no public endpoint; access via IAP or Bastion if needed
### 4.2 Node Configuration
| Setting | Initial | Growth Phase |
|---------|---------|--------------|
| **Node type** | Autopilot (no manual sizing) | Same |
| **Min nodes** | 0 (scale to zero when idle) | 2 |
| **Max nodes** | 5 | 50+ |
| **Scaling** | Pod-based (HPA, cluster autoscaler) | Same |
### 4.3 Workload Layout
- **Backend (Python/Flask):** Deployment with HPA (CPU/memory); target 23 replicas initially
- **Frontend (React):** Static assets served via CDN or container; 12 replicas
- **Ingress:** GKE Ingress for HTTP(S) routing; consider GKE Gateway API for advanced use
### 4.4 Containerisation and CI/CD
| Aspect | Approach |
|-------|----------|
| **Image build** | Dockerfile per service; multi-stage builds; non-root user |
| **Registry** | Artifact Registry (GCR) in `company-inc-shared` |
| **CI** | GitHub Actions (or GitLab CI) — build, test, security scan |
| **CD** | ArgoCD or Flux — GitOps; app of apps pattern |
| **Secrets** | External Secrets Operator + GCP Secret Manager |
---
## 5. Database: MongoDB
### 5.1 Service Choice
**MongoDB Atlas** (or **Google Cloud DocumentDB** if strict GCP-only) recommended for:
- Fully managed, automated backups
- Multi-region replication
- Strong security (encryption at rest, VPC peering)
- Easy scaling
**Atlas on GCP** provides native VPC peering and private connectivity.
### 5.2 High Availability and DR
| Topic | Strategy |
|-------|----------|
| **Replicas** | 3-node replica set; multi-AZ |
| **Backups** | Continuous backup; point-in-time recovery |
| **Disaster recovery** | Cross-region replica (e.g. `us-central1` + `europe-west1`) |
| **Restore testing** | Quarterly DR drills |
### 5.3 Security
- Private endpoint (no public IP)
- TLS for all connections
- IAM-based access; principle of least privilege
- Encryption at rest (default in Atlas)
---
## 6. High-Level Architecture Diagram
The following diagram illustrates the main components (implement in draw.io or Lucidchart):
```
+------------------------------------------------------------------+
| COMPANY INC. INFRASTRUCTURE |
+------------------------------------------------------------------+
[Users]
|
v
+-------------------+ +-------------------+
| Cloud CDN | | Cloud LB (HTTPS) |
| (Static Assets) | | (API + SPA) |
+-------------------+ +-------------------+
| |
v v
+------------------------------------------------------------------+
| GKE CLUSTER (Private) |
| +------------------+ +------------------+ +-----------------+ |
| | Ingress | | Backend (Flask) | | Frontend (SPA) | |
| | Controller | | - HPA | | - Nginx/React | |
| +------------------+ +------------------+ +-----------------+ |
| | | | |
| +-----------------------+-----------------------+ |
| | |
| +------------------+ +------------------+ |
| | Redis (cache) | | Observability | |
| | (Memorystore) | | (Prometheus/Grafana) |
| +------------------+ +------------------+ |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| MongoDB Atlas (GCP) | Secret Manager | Artifact Registry |
| - Replica Set | - App secrets | - Container images |
| - Private endpoint | - DB credentials| |
+------------------------------------------------------------------+
```
---
## 7. Summary of Recommendations
| Area | Recommendation |
|------|----------------|
| **Cloud** | GCP with 4 projects (prod, staging, shared, sandbox) |
| **Compute** | GKE Autopilot, private nodes, HPA |
| **Database** | MongoDB Atlas on GCP with multi-AZ, automated backups |
| **CI/CD** | GitHub Actions + ArgoCD/Flux |
| **Security** | Private VPC, TLS everywhere, Secret Manager, least privilege |
| **Cost** | Start small; use committed use discounts as usage grows |
---
*This document should be accompanied by an HLD diagram (draw.io or Lucidchart) reflecting the architecture above.*