|
1 | | -# 🚀 Docker ECS Deployment Demo |
| 1 | +# 🚀 docker-ecs-deployment |
2 | 2 |
|
3 | | -This repository demonstrates how to run a **production-like app on AWS ECS Fargate without ALB** and keep costs minimal by using a **Wake/Sleep pattern** with Lambda + API Gateway. |
| 3 | +Spin up a **zero-cost-at-idle** demo app on **AWS ECS Fargate** without an ALB. |
| 4 | +Traffic goes to a **public task IP**, the service **auto-sleeps to 0**, and a small **“wake”** Lambda behind **API Gateway** starts it on demand. Domain: **https://ecs-demo.online**. |
4 | 5 |
|
5 | 6 | --- |
6 | 7 |
|
7 | | -## 📂 Repository Structure |
| 8 | +## 📦 What you get |
8 | 9 |
|
9 | | -``` |
| 10 | +- **Node.js demo app** (Express) with a slick UI (dark/light), live logs (SSE), and simple actions. |
| 11 | +- **ECR** repository to store your images. |
| 12 | +- **VPC** with two public subnets, **security group**, **ECS cluster**, **Fargate service**. |
| 13 | +- **Wake API**: API Gateway → Lambda (Python) that scales the service to **1** and redirects to the task IP. |
| 14 | +- **Auto-sleep**: EventBridge rule → Lambda (Python) that scales the service to **0** after inactivity. |
| 15 | +- **GitHub Actions** (3 workflows): |
| 16 | + - **CI**: Build & push to ECR. |
| 17 | + - **CD**: Terraform apply / destroy and roll service to a new image. |
| 18 | + - **OPS**: Wake or Sleep the service on demand. |
| 19 | + |
| 20 | +> ✅ **Minimal state**: All Terraform is in `infra/main.tf` (no split files). |
| 21 | +
|
| 22 | +--- |
| 23 | + |
| 24 | +## 🧭 Repository structure |
| 25 | + |
| 26 | +```text |
10 | 27 | . |
11 | | -├── app/ # Node.js demo application (Express server with UI, metrics, logs) |
| 28 | +├── app/ |
12 | 29 | │ ├── Dockerfile |
13 | | -│ └── src/server.js |
14 | | -│ |
15 | | -├── infra/ # Terraform IaC for ECS, ECR, VPC, Lambda Wake/Sleep, API Gateway |
16 | | -│ ├── main.tf |
17 | | -│ ├── variables.tf |
18 | | -│ ├── outputs.tf |
19 | | -│ └── ... |
20 | | -│ |
21 | | -├── wake/ # Lambda function to "wake up" ECS service |
22 | | -│ └── lambda_function.py |
23 | | -│ |
24 | | -├── autosleep/ # Lambda function to automatically stop idle ECS service |
25 | | -│ └── auto_sleep.py |
26 | | -│ |
27 | | -├── .github/workflows/ # GitHub Actions CI/CD pipelines |
28 | | -│ ├── ci.yml # Build & Push to ECR |
29 | | -│ ├── cd.yml # Terraform Apply + Deploy/Destroy |
30 | | -│ └── ops.yml # Wake/Sleep ECS Service |
31 | | -│ |
32 | | -└── README.md # Documentation |
| 30 | +│ └── src/ |
| 31 | +│ └── server.js |
| 32 | +├── autosleep/ |
| 33 | +│ └── auto_sleep.py # Lambda: auto-stop service after N minutes |
| 34 | +├── wake/ |
| 35 | +│ └── lambda_function.py # Lambda: scale-to-1 + redirect to task IP |
| 36 | +├── infra/ |
| 37 | +│ └── main.tf # All Terraform in a single file |
| 38 | +├── .github/workflows/ |
| 39 | +│ ├── ci.yml # CI — Build & Push to ECR |
| 40 | +│ ├── cd.yml # CD — Terraform Apply + Deploy/Destroy (ECS) |
| 41 | +│ └── ops.yml # OPS — Wake/Sleep ECS Service helpers |
| 42 | +└── make_zips.sh # Creates Lambda bundles: infra/wake.zip & infra/sleep.zip |
33 | 43 | ``` |
34 | 44 |
|
| 45 | +> If you only keep **`infra/main.tf`**, that’s fine — this repo is designed to work with just one TF file. |
| 46 | +
|
35 | 47 | --- |
36 | 48 |
|
37 | | -## ⚙️ Workflows (CI/CD) |
| 49 | +## 🏗️ Architecture (high-level) |
38 | 50 |
|
39 | 51 | ```mermaid |
40 | | -graph TD |
41 | | - A1[CI — Build & Push to ECR (ci.yml)] --> A2[CD — Terraform Apply + Deploy/Destroy (cd.yml)] |
42 | | - A2 --> A3[OPS — Wake/Sleep ECS Service (ops.yml)] |
43 | | -``` |
| 52 | +flowchart LR |
| 53 | + subgraph GH[GitHub] |
| 54 | + CI[CI • Build & Push to ECR<br/>ci.yml] |
| 55 | + CD[CD • Terraform Apply & Deploy<br/>cd.yml] |
| 56 | + OPS[OPS • Wake / Sleep helpers<br/>ops.yml] |
| 57 | + end |
44 | 58 |
|
45 | | -- **CI**: builds and pushes Docker image to ECR on each push. |
46 | | -- **CD**: provisions/updates ECS + infra with Terraform. |
47 | | -- **OPS**: provides manual wake/sleep operations via GitHub Actions. |
| 59 | + CI --> ECR[(ECR repo)] |
| 60 | + CD --> TF[(Terraform)] |
| 61 | + TF --> VPC[(VPC + Subnets + SG)] |
| 62 | + TF --> ECS[ECS Cluster + Fargate Service] |
| 63 | + TF --> CWL[CloudWatch Logs] |
| 64 | + TF --> LWA[Lambda • Wake] |
| 65 | + TF --> LAS[Lambda • Auto-sleep] |
| 66 | + TF --> APIGW[API Gateway HTTP API] |
| 67 | + TF --> EVB[EventBridge Rule] |
| 68 | +
|
| 69 | + APIGW --> LWA |
| 70 | + EVB --> LAS |
| 71 | + LWA -->|desiredCount=1| ECS |
| 72 | + LAS -->|desiredCount=0| ECS |
| 73 | +
|
| 74 | + subgraph Runtime |
| 75 | + ECS -->|public IP| Internet |
| 76 | + end |
| 77 | +``` |
48 | 78 |
|
49 | 79 | --- |
50 | 80 |
|
51 | | -## 🌐 Application Features |
| 81 | +## 🌐 DNS (optional) |
52 | 82 |
|
53 | | -- Node.js + Express demo app with: |
54 | | - - Health endpoint (`/health`) |
55 | | - - Metrics endpoint (`/api/metrics`) |
56 | | - - Logs (JSON + SSE streaming) |
57 | | - - Simple UI (dark/light theme, live logs, action buttons) |
| 83 | +- Purchased domain: **`ecs-demo.online`** (example). |
| 84 | +- A-record (apex) → **API Gateway custom domain** (if you attach one), *or* use the native **API endpoint**. |
| 85 | +- The **wake URL** returns a “warming up” page and then **redirects** to the current task public IP. |
58 | 86 |
|
59 | | -- ECS Fargate service with **desiredCount = 0** by default (sleeping). |
60 | | -- Lambda + API Gateway **Wake URL** to scale service from 0 → 1 automatically. |
61 | | -- Auto-Sleep Lambda scales back to 0 after inactivity. |
| 87 | +> For this demo, the public check URL you can share is: **https://ecs-demo.online** (fronts the wake API). |
62 | 88 |
|
63 | 89 | --- |
64 | 90 |
|
65 | | -## 🏗️ Infrastructure Overview |
| 91 | +## ⚙️ Prerequisites |
66 | 92 |
|
67 | | -```mermaid |
68 | | -graph TD |
69 | | - subgraph VPC[Custom VPC] |
70 | | - ECS[ECS Fargate Service] |
71 | | - ECR[ECR Repository] |
72 | | - CW[CloudWatch Logs] |
73 | | - end |
| 93 | +- **AWS account**, IAM role for GitHub OIDC (see `cd.yml`). |
| 94 | +- **S3** bucket + **DynamoDB** table for Terraform backend (already referenced in `main.tf`): |
| 95 | + - Bucket: `docker-ecs-deployment` |
| 96 | + - Table: `docker-ecs-deployment` (primary key: `LockID` as a string) |
| 97 | +- **ECR** repository name (default): `ecs-demo-app` |
| 98 | +- **Terraform** 1.6+ (locally or via GitHub Actions) |
| 99 | +- **Docker** (to build/push images locally if needed) |
| 100 | +- **Route 53 / Namecheap** (optional, for domain) |
| 101 | + |
| 102 | +--- |
| 103 | + |
| 104 | +## 🔧 First-time setup (local) |
| 105 | + |
| 106 | +1) Create Lambda zips: |
| 107 | +```bash |
| 108 | +./make_zips.sh |
| 109 | +# → creates: infra/wake.zip and infra/sleep.zip |
| 110 | +``` |
| 111 | + |
| 112 | +2) Initialize Terraform backend & providers: |
| 113 | +```bash |
| 114 | +cd infra |
| 115 | +terraform init -input=false |
| 116 | +``` |
| 117 | + |
| 118 | +3) Apply infrastructure (creates VPC, ECS, ECR, Lambdas, API GW): |
| 119 | +```bash |
| 120 | +terraform apply -auto-approve -input=false |
| 121 | +``` |
74 | 122 |
|
75 | | - API[API Gateway HTTPS Endpoint] --> L1[Lambda Wake Function] |
76 | | - L1 --> ECS |
77 | | - ECS --> CW |
| 123 | +4) Build and push the image (local flow, optional — or use CI): |
| 124 | +```bash |
| 125 | +# login to ECR |
| 126 | +aws ecr get-login-password --region us-east-1 \ |
| 127 | +| docker login --username AWS --password-stdin <ACCOUNT>.dkr.ecr.us-east-1.amazonaws.com |
| 128 | + |
| 129 | +# build & push |
| 130 | +docker build -t ecs-demo-app:latest ./app |
| 131 | +docker tag ecs-demo-app:latest <ACCOUNT>.dkr.ecr.us-east-1.amazonaws.com/ecs-demo-app:latest |
| 132 | +docker push <ACCOUNT>.dkr.ecr.us-east-1.amazonaws.com/ecs-demo-app:latest |
| 133 | +``` |
78 | 134 |
|
79 | | - EB[EventBridge Rule] --> L2[Lambda Auto-Sleep] |
80 | | - L2 --> ECS |
| 135 | +5) Wake the service in browser and you’ll be redirected to the running task: |
81 | 136 | ``` |
| 137 | +https://ecs-demo.online |
| 138 | +``` |
| 139 | + |
| 140 | +--- |
| 141 | + |
| 142 | +## 🤖 GitHub Actions |
| 143 | + |
| 144 | +### CI — Build & Push to ECR (`.github/workflows/ci.yml`) |
| 145 | +- Builds `./app` into an image and pushes to ECR. |
| 146 | +- Outputs the full image URL `ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/ecs-demo-app:<tag>`. |
| 147 | + |
| 148 | +### CD — Terraform Apply + Deploy/Destroy (ECS) (`.github/workflows/cd.yml`) |
| 149 | +- **Apply**: `terraform apply` + roll service to the image tag (or `latest`). |
| 150 | +- **Destroy**: scales service to 0, then `terraform destroy`. |
| 151 | +- Prints the final **wake URL** and the **domain**: `https://ecs-demo.online`. |
| 152 | + |
| 153 | +### OPS — Wake/Sleep helpers (`.github/workflows/ops.yml`) |
| 154 | +- `wake`: calls the Wake URL (API GW) — useful for checks or previews. |
| 155 | +- `sleep`: sets `desiredCount=0` immediately. |
82 | 156 |
|
83 | | -- **ECS Fargate** runs containerized app (ARM64, Node.js). |
84 | | -- **ECR** stores Docker images. |
85 | | -- **CloudWatch Logs** stores app + infra logs. |
86 | | -- **API Gateway + Lambda** handles wake-up. |
87 | | -- **EventBridge + Lambda** enforces auto-sleep after N minutes. |
| 157 | +> All jobs use GitHub OIDC to assume **`github-actions-ecs-role`** in your AWS account. |
88 | 158 |
|
89 | 159 | --- |
90 | 160 |
|
91 | | -## DNS & Public Access |
| 161 | +## 🔍 Variables (Terraform) |
| 162 | + |
| 163 | +| Name | Type | Default | Description | |
| 164 | +|----------------------|--------|----------------|-----------------------------------------------| |
| 165 | +| `project_name` | string | `ecs-demo` | Prefix for AWS resource names | |
| 166 | +| `region` | string | `us-east-1` | AWS region | |
| 167 | +| `vpc_cidr` | string | `10.20.0.0/16` | VPC CIDR | |
| 168 | +| `public_subnets` | list | `["10.20.1.0/24", "10.20.2.0/24"]` | Two public subnets | |
| 169 | +| `desired_count` | number | `0` | 0 = idle, 1 = running | |
| 170 | +| `task_cpu` | string | `256` | Task CPU | |
| 171 | +| `task_memory` | string | `512` | Task memory | |
| 172 | +| `app_port` | number | `80` | Container port | |
| 173 | +| `ecr_repo_name` | string | `ecs-demo-app` | ECR repo name | |
| 174 | +| `enable_wake_api` | bool | `true` | Create Wake Lambda + API GW | |
| 175 | +| `enable_auto_sleep` | bool | `true` | Create Auto-sleep Lambda + EventBridge rule | |
| 176 | +| `sleep_after_minutes`| number | `5` | When to scale to 0 | |
| 177 | + |
| 178 | +> Lambda env `WAIT_MS` in `main.tf` controls the **warm-up budget** shown on the waiting page. |
92 | 179 |
|
93 | | -The project is exposed via a custom domain: |
| 180 | +--- |
| 181 | + |
| 182 | +## 💰 Cost notes |
94 | 183 |
|
95 | | -🔗 **https://ecs-demo.online** |
| 184 | +- **Idle**: $0 for ECS/Fargate (desiredCount=0). You pay pennies for: |
| 185 | + - Lambda invocations (wake/auto-sleep) |
| 186 | + - API Gateway minimal traffic |
| 187 | + - CloudWatch Logs |
| 188 | + - S3+DynamoDB for Terraform backend |
| 189 | + - Route 53 hosted zone (if used) |
| 190 | +- **Active**: Fargate task (0.25 vCPU / 0.5GB) while running. |
96 | 191 |
|
97 | | -- The domain is managed via **Namecheap** and delegated to **Route 53** hosted zone. |
98 | | -- The root (`ecs-demo.online`) is mapped to the **API Gateway (Wake URL)** via Route 53 alias record. |
99 | | -- First visit → API Gateway triggers Lambda wake-up → ECS Fargate task starts. |
100 | | -- After ~30–60s cold start the container becomes reachable on the public IP, and user is redirected to the running service. |
| 192 | +--- |
101 | 193 |
|
102 | | -> ⚠️ If the service is **asleep** (scaled to 0), you may need to reload once and wait for the wake-up screen to complete. |
| 194 | +## 🆘 Troubleshooting |
| 195 | + |
| 196 | +- **Waiting page loops forever** |
| 197 | + Increase `WAIT_MS` in Lambda env (via Terraform) to 120–180 seconds. |
| 198 | +- **Private IP in redirect** |
| 199 | + Ensure **`assign_public_ip = true`** for the ECS service (already set). |
| 200 | +- **Destroy fails on API GW stage** |
| 201 | + If you attached a custom domain (Route 53), remove **base path mappings** first, or use `-target` destroys. |
103 | 202 |
|
104 | 203 | --- |
105 | 204 |
|
106 | | -## 🧑💻 Usage |
| 205 | +## 🧹 Cleanup |
107 | 206 |
|
108 | 207 | ```bash |
109 | | -# Build and push image |
110 | | -docker build -t ecs-demo-app . |
111 | | -docker tag ecs-demo-app:latest <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/ecs-demo-app:latest |
112 | | -docker push <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/ecs-demo-app:latest |
| 208 | +# scale down first (optional) |
| 209 | +aws ecs update-service --cluster ecs-demo-cluster --service ecs-demo-svc --desired-count 0 --region us-east-1 |
113 | 210 |
|
114 | | -# Deploy infra (Terraform) |
| 211 | +# destroy infra |
115 | 212 | cd infra |
116 | | -terraform init |
117 | | -terraform apply -auto-approve |
118 | | - |
119 | | -# Get wake URL |
120 | | -terraform output wake_url |
| 213 | +terraform destroy -auto-approve -input=false |
121 | 214 | ``` |
122 | 215 |
|
123 | 216 | --- |
124 | 217 |
|
125 | | -## 📜 License |
| 218 | +## 📝 License |
126 | 219 |
|
127 | | -MIT — use freely for demo/learning purposes. |
| 220 | +MIT |
0 commit comments