diff --git a/docs/architecture-design-company-inc.md b/docs/architecture-design-company-inc.md
index 17596d6..1eeeea4 100644
--- a/docs/architecture-design-company-inc.md
+++ b/docs/architecture-design-company-inc.md
@@ -10,7 +10,7 @@
This document outlines a robust, scalable, secure, and cost-effective infrastructure design for Company Inc., a startup deploying a web application with a Python/Flask REST API backend, React SPA frontend, and MongoDB database. The design leverages **Google Cloud Platform (GCP)** with **GKE (Google Kubernetes Engine)** as the primary compute platform.
-**Key Design Principles:** Security-by-default, scalability from day one, cost optimization for early stage, and GitOps-based operations.
+**Key Design Principles:** Cost awareness from day one, security-by-default, scalability when needed, and GitOps-based operations.
---
@@ -20,20 +20,26 @@ This document outlines a robust, scalable, secure, and cost-effective infrastruc
**Rationale:** GCP offers strong managed Kubernetes (GKE) with autopilot options, excellent MongoDB Atlas integration (or GCP-native DocumentDB alternatives), competitive pricing for startups, and simplified networking. GKE Autopilot reduces operational overhead for a small team with limited Kubernetes expertise.
-### 2.2 Multi-Project Structure
+### 2.2 Project Structure (Cost-Optimised)
+
+For a startup, fewer projects mean lower overhead and simpler billing. Start with **3 projects** and add more only when traffic or compliance demands it.
| Project | Purpose | Isolation |
|---------|---------|-----------|
| **company-inc-prod** | Production workloads | High; sensitive data |
-| **company-inc-staging** | Staging / pre-production | Medium |
-| **company-inc-shared** | CI/CD, shared tooling, DNS | Low; no PII |
-| **company-inc-sandbox** | Dev experimentation | Lowest |
+| **company-inc-staging** | Staging, QA, and dev experimentation | Medium |
+| **company-inc-shared** | CI/CD, Artifact Registry, DNS | Low; no PII |
+
+**Why not 4+ projects?**
+- A dedicated sandbox project adds billing, IAM, and networking overhead with little benefit at startup scale.
+- Developers can use Kubernetes namespaces within the staging cluster for experimentation.
+- A fourth project can be introduced later when team size or compliance (SOC2, HIPAA) requires it.
**Benefits:**
-- Billing separation per environment
+- Billing separation (prod costs are clearly visible)
- Blast-radius containment (prod issues do not affect staging)
-- IAM and network isolation
-- Aligns with GCP best practices for multi-tenant or multi-env setups
+- IAM isolation between environments
+- Minimal fixed cost — only 3 projects to manage
---
@@ -96,14 +102,36 @@ flowchart TD
- **Frontend (React):** Static assets served via CDN or container; 1–2 replicas
- **Ingress:** GKE Ingress for HTTP(S) routing; consider GKE Gateway API for advanced use
-### 4.4 Containerisation and CI/CD
+### 4.4 Blue-Green Deployment
+
+Zero-downtime releases without duplicating infrastructure. Both versions run inside the **same GKE cluster**; the load balancer switches traffic atomically.
+
+```mermaid
+flowchart LR
+ LB[Load Balancer]
+ LB -->|100% traffic| Green[Green — v1.2.0
current stable]
+ LB -.->|0% traffic| Blue[Blue — v1.3.0
new release]
+ Blue -.->|smoke tests pass| LB
+```
+
+| Phase | Action |
+|-------|--------|
+| **Deploy** | New version deployed to the idle slot (blue) |
+| **Test** | Run smoke tests / synthetic checks against blue |
+| **Switch** | Update Service selector or Ingress to point to blue |
+| **Rollback** | Instant — revert selector back to green (old version still running) |
+| **Cleanup** | Scale down old slot after confirmation period |
+
+**Cost impact:** Near-zero — both slots share the same node pool; the idle slot consumes minimal resources until traffic is switched. Argo Rollouts automates the full lifecycle within ArgoCD.
+
+### 4.5 Containerisation and CI/CD
| Aspect | Approach |
|-------|----------|
| **Image build** | Dockerfile per service; multi-stage builds; non-root user |
-| **Registry** | Artifact Registry (GCR) in `company-inc-shared` |
-| **CI** | GitHub Actions (or GitLab CI) — build, test, security scan |
-| **CD** | ArgoCD or Flux — GitOps; app of apps pattern |
+| **Registry** | Artifact Registry in `company-inc-shared` |
+| **CI** | GitHub/Gitea Actions — build, test, security scan |
+| **CD** | ArgoCD + Argo Rollouts — GitOps with blue-green strategy |
| **Secrets** | External Secrets Operator + GCP Secret Manager |
---
@@ -138,7 +166,29 @@ flowchart TD
---
-## 6. High-Level Architecture Diagram
+## 6. Cost Optimisation Strategy
+
+| Lever | Approach | Estimated Savings |
+|-------|----------|-------------------|
+| **3 projects, not 4** | Drop sandbox; use staging namespaces | ~25% fewer fixed project costs |
+| **GKE Autopilot** | Pay per pod, not per node; no idle nodes | 30–60% vs standard GKE |
+| **Blue-green in-cluster** | No duplicate environments for releases | Near-zero deployment cost |
+| **Spot/preemptible pods** | Use for staging and non-critical workloads | Up to 60–80% off compute |
+| **Committed use discounts** | 1-year CUDs once baseline is established | 20–30% off sustained use |
+| **CDN for frontend** | Offload SPA traffic from GKE | Fewer pod replicas needed |
+| **MongoDB Atlas auto-scale** | Start M10; scale up only when needed | Avoid over-provisioning |
+| **Cloud NAT shared** | Single NAT in shared project | Avoid per-project NAT cost |
+
+**Monthly cost estimate (early stage):**
+- GKE Autopilot (2–3 API pods + 1 SPA): ~$80–150
+- MongoDB Atlas M10: ~$60
+- Load Balancer + Cloud NAT: ~$30
+- Artifact Registry + Secret Manager: ~$5
+- **Total: ~$175–245/month**
+
+---
+
+## 7. High-Level Architecture Diagram
```mermaid
flowchart TB
@@ -169,16 +219,17 @@ flowchart TB
---
-## 7. Summary of Recommendations
+## 8. Summary of Recommendations
| Area | Recommendation |
|------|----------------|
-| **Cloud** | GCP with 4 projects (prod, staging, shared, sandbox) |
+| **Cloud** | GCP with 3 projects (prod, staging, shared) |
| **Compute** | GKE Autopilot, private nodes, HPA |
+| **Deployments** | Blue-green via Argo Rollouts — zero downtime, instant rollback |
| **Database** | MongoDB Atlas on GCP with multi-AZ, automated backups |
-| **CI/CD** | GitHub Actions + ArgoCD/Flux |
+| **CI/CD** | GitHub/Gitea Actions + ArgoCD |
| **Security** | Private VPC, TLS everywhere, Secret Manager, least privilege |
-| **Cost** | Start small; use committed use discounts as usage grows |
+| **Cost** | ~$175–245/month early stage; spot pods, CUDs as traffic grows |
---
diff --git a/docs/architecture-hld.md b/docs/architecture-hld.md
index c255209..9734b79 100644
--- a/docs/architecture-hld.md
+++ b/docs/architecture-hld.md
@@ -9,22 +9,25 @@ flowchart TB
end
subgraph GCP["Google Cloud Platform"]
- subgraph Projects["Project Structure"]
+ subgraph Projects["Project Structure (3 projects)"]
Prod[company-inc-prod]
- Staging[company-inc-staging]
+ Staging[company-inc-staging
QA + dev namespaces]
Shared[company-inc-shared]
- Sandbox[company-inc-sandbox]
end
subgraph Edge["Edge / Networking"]
LB[Cloud Load Balancer
HTTPS · TLS termination]
CDN[Cloud CDN
Static Assets]
- NAT[Cloud NAT
Egress]
+ NAT[Cloud NAT
Egress · shared]
end
subgraph VPC["VPC — Private Subnets"]
subgraph GKE["GKE Autopilot Cluster"]
Ingress[Ingress Controller]
+ subgraph BlueGreen["Blue-Green Deployment"]
+ Green[Green — stable
receives traffic]
+ Blue[Blue — new release
smoke tests]
+ end
subgraph Workloads
API[Backend — Python / Flask
HPA · 2–3 replicas]
SPA[Frontend — React SPA
Nginx]
@@ -44,14 +47,17 @@ flowchart TB
subgraph CICD["CI / CD"]
Git[Git Repository]
Actions[Gitea / GitHub Actions
Build · Test · Scan]
- Argo[ArgoCD / Flux
GitOps Deploy]
+ Argo[ArgoCD + Argo Rollouts
GitOps · Blue-Green]
end
Users --> LB
Users --> CDN
LB --> Ingress
CDN --> SPA
- Ingress --> API
+ Ingress -->|traffic| Green
+ Ingress -.->|after switch| Blue
+ Green --> API
+ Blue --> API
Ingress --> SPA
API --> Redis
API --> Mongo
@@ -64,6 +70,24 @@ flowchart TB
Argo --> GKE
```
+## Blue-Green Deployment Flow
+
+```mermaid
+flowchart LR
+ subgraph Cluster["GKE Cluster"]
+ LB[Load Balancer
Service Selector]
+ Green[Green — v1.2.0
current stable]
+ Blue[Blue — v1.3.0
new release]
+ end
+
+ Deploy[ArgoCD
Argo Rollouts] -->|deploy new version| Blue
+ Blue -->|smoke tests| Check{Tests pass?}
+ Check -->|yes| LB
+ LB -->|switch 100%| Blue
+ Check -->|no| Rollback[Rollback
keep Green]
+ LB -.->|instant rollback| Green
+```
+
## CI / CD Pipeline
```mermaid
@@ -72,8 +96,8 @@ flowchart LR
Repo -->|webhook| CI[CI Pipeline
lint · test · build]
CI -->|push image| Registry[Artifact Registry]
CI -->|update manifests| GitOps[GitOps Repo]
- GitOps -->|sync| Argo[ArgoCD / Flux]
- Argo -->|deploy| GKE[GKE Cluster]
+ GitOps -->|sync| Argo[ArgoCD]
+ Argo -->|blue-green deploy| GKE[GKE Cluster]
```
## Network Security Layers
@@ -86,3 +110,13 @@ flowchart TD
NP --> Pods[Application Pods
Private IPs only]
Pods --> PE[Private Endpoint
MongoDB Atlas]
```
+
+## Cost Profile (Early Stage)
+
+```mermaid
+pie title Monthly Cost Breakdown (~$200)
+ "GKE Autopilot" : 120
+ "MongoDB Atlas M10" : 60
+ "LB + NAT" : 30
+ "Registry + Secrets" : 5
+```