Skip to content

Commit 35cc931

Browse files
committed
docs: add detailed README with architecture diagram and workflows
1 parent 653da20 commit 35cc931

1 file changed

Lines changed: 174 additions & 81 deletions

File tree

README.md

Lines changed: 174 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -1,127 +1,220 @@
1-
# 🚀 Docker ECS Deployment Demo
1+
# 🚀 docker-ecs-deployment
22

3-
This repository demonstrates how to run a **production-like app on AWS ECS Fargate without ALB** and keep costs minimal by using a **Wake/Sleep pattern** with Lambda + API Gateway.
3+
Spin up a **zero-cost-at-idle** demo app on **AWS ECS Fargate** without an ALB.
4+
Traffic goes to a **public task IP**, the service **auto-sleeps to 0**, and a small **“wake”** Lambda behind **API Gateway** starts it on demand. Domain: **https://ecs-demo.online**.
45

56
---
67

7-
## 📂 Repository Structure
8+
## 📦 What you get
89

9-
```
10+
- **Node.js demo app** (Express) with a slick UI (dark/light), live logs (SSE), and simple actions.
11+
- **ECR** repository to store your images.
12+
- **VPC** with two public subnets, **security group**, **ECS cluster**, **Fargate service**.
13+
- **Wake API**: API Gateway → Lambda (Python) that scales the service to **1** and redirects to the task IP.
14+
- **Auto-sleep**: EventBridge rule → Lambda (Python) that scales the service to **0** after inactivity.
15+
- **GitHub Actions** (3 workflows):
16+
- **CI**: Build & push to ECR.
17+
- **CD**: Terraform apply / destroy and roll service to a new image.
18+
- **OPS**: Wake or Sleep the service on demand.
19+
20+
> **Minimal state**: All Terraform is in `infra/main.tf` (no split files).
21+
22+
---
23+
24+
## 🧭 Repository structure
25+
26+
```text
1027
.
11-
├── app/ # Node.js demo application (Express server with UI, metrics, logs)
28+
├── app/
1229
│ ├── Dockerfile
13-
│ └── src/server.js
14-
15-
├── infra/ # Terraform IaC for ECS, ECR, VPC, Lambda Wake/Sleep, API Gateway
16-
│ ├── main.tf
17-
│ ├── variables.tf
18-
│ ├── outputs.tf
19-
│ └── ...
20-
21-
├── wake/ # Lambda function to "wake up" ECS service
22-
│ └── lambda_function.py
23-
24-
├── autosleep/ # Lambda function to automatically stop idle ECS service
25-
│ └── auto_sleep.py
26-
27-
├── .github/workflows/ # GitHub Actions CI/CD pipelines
28-
│ ├── ci.yml # Build & Push to ECR
29-
│ ├── cd.yml # Terraform Apply + Deploy/Destroy
30-
│ └── ops.yml # Wake/Sleep ECS Service
31-
32-
└── README.md # Documentation
30+
│ └── src/
31+
│ └── server.js
32+
├── autosleep/
33+
│ └── auto_sleep.py # Lambda: auto-stop service after N minutes
34+
├── wake/
35+
│ └── lambda_function.py # Lambda: scale-to-1 + redirect to task IP
36+
├── infra/
37+
│ └── main.tf # All Terraform in a single file
38+
├── .github/workflows/
39+
│ ├── ci.yml # CI — Build & Push to ECR
40+
│ ├── cd.yml # CD — Terraform Apply + Deploy/Destroy (ECS)
41+
│ └── ops.yml # OPS — Wake/Sleep ECS Service helpers
42+
└── make_zips.sh # Creates Lambda bundles: infra/wake.zip & infra/sleep.zip
3343
```
3444

45+
> If you only keep **`infra/main.tf`**, that’s fine — this repo is designed to work with just one TF file.
46+
3547
---
3648

37-
## ⚙️ Workflows (CI/CD)
49+
## 🏗️ Architecture (high-level)
3850

3951
```mermaid
40-
graph TD
41-
A1[CI — Build & Push to ECR (ci.yml)] --> A2[CD — Terraform Apply + Deploy/Destroy (cd.yml)]
42-
A2 --> A3[OPS — Wake/Sleep ECS Service (ops.yml)]
43-
```
52+
flowchart LR
53+
subgraph GH[GitHub]
54+
CI[CI • Build & Push to ECR<br/>ci.yml]
55+
CD[CD • Terraform Apply & Deploy<br/>cd.yml]
56+
OPS[OPS • Wake / Sleep helpers<br/>ops.yml]
57+
end
4458
45-
- **CI**: builds and pushes Docker image to ECR on each push.
46-
- **CD**: provisions/updates ECS + infra with Terraform.
47-
- **OPS**: provides manual wake/sleep operations via GitHub Actions.
59+
CI --> ECR[(ECR repo)]
60+
CD --> TF[(Terraform)]
61+
TF --> VPC[(VPC + Subnets + SG)]
62+
TF --> ECS[ECS Cluster + Fargate Service]
63+
TF --> CWL[CloudWatch Logs]
64+
TF --> LWA[Lambda • Wake]
65+
TF --> LAS[Lambda • Auto-sleep]
66+
TF --> APIGW[API Gateway HTTP API]
67+
TF --> EVB[EventBridge Rule]
68+
69+
APIGW --> LWA
70+
EVB --> LAS
71+
LWA -->|desiredCount=1| ECS
72+
LAS -->|desiredCount=0| ECS
73+
74+
subgraph Runtime
75+
ECS -->|public IP| Internet
76+
end
77+
```
4878

4979
---
5080

51-
## 🌐 Application Features
81+
## 🌐 DNS (optional)
5282

53-
- Node.js + Express demo app with:
54-
- Health endpoint (`/health`)
55-
- Metrics endpoint (`/api/metrics`)
56-
- Logs (JSON + SSE streaming)
57-
- Simple UI (dark/light theme, live logs, action buttons)
83+
- Purchased domain: **`ecs-demo.online`** (example).
84+
- A-record (apex) → **API Gateway custom domain** (if you attach one), *or* use the native **API endpoint**.
85+
- The **wake URL** returns a “warming up” page and then **redirects** to the current task public IP.
5886

59-
- ECS Fargate service with **desiredCount = 0** by default (sleeping).
60-
- Lambda + API Gateway **Wake URL** to scale service from 0 → 1 automatically.
61-
- Auto-Sleep Lambda scales back to 0 after inactivity.
87+
> For this demo, the public check URL you can share is: **https://ecs-demo.online** (fronts the wake API).
6288
6389
---
6490

65-
## 🏗️ Infrastructure Overview
91+
## ⚙️ Prerequisites
6692

67-
```mermaid
68-
graph TD
69-
subgraph VPC[Custom VPC]
70-
ECS[ECS Fargate Service]
71-
ECR[ECR Repository]
72-
CW[CloudWatch Logs]
73-
end
93+
- **AWS account**, IAM role for GitHub OIDC (see `cd.yml`).
94+
- **S3** bucket + **DynamoDB** table for Terraform backend (already referenced in `main.tf`):
95+
- Bucket: `docker-ecs-deployment`
96+
- Table: `docker-ecs-deployment` (primary key: `LockID` as a string)
97+
- **ECR** repository name (default): `ecs-demo-app`
98+
- **Terraform** 1.6+ (locally or via GitHub Actions)
99+
- **Docker** (to build/push images locally if needed)
100+
- **Route 53 / Namecheap** (optional, for domain)
101+
102+
---
103+
104+
## 🔧 First-time setup (local)
105+
106+
1) Create Lambda zips:
107+
```bash
108+
./make_zips.sh
109+
# → creates: infra/wake.zip and infra/sleep.zip
110+
```
111+
112+
2) Initialize Terraform backend & providers:
113+
```bash
114+
cd infra
115+
terraform init -input=false
116+
```
117+
118+
3) Apply infrastructure (creates VPC, ECS, ECR, Lambdas, API GW):
119+
```bash
120+
terraform apply -auto-approve -input=false
121+
```
74122

75-
API[API Gateway HTTPS Endpoint] --> L1[Lambda Wake Function]
76-
L1 --> ECS
77-
ECS --> CW
123+
4) Build and push the image (local flow, optional — or use CI):
124+
```bash
125+
# login to ECR
126+
aws ecr get-login-password --region us-east-1 \
127+
| docker login --username AWS --password-stdin <ACCOUNT>.dkr.ecr.us-east-1.amazonaws.com
128+
129+
# build & push
130+
docker build -t ecs-demo-app:latest ./app
131+
docker tag ecs-demo-app:latest <ACCOUNT>.dkr.ecr.us-east-1.amazonaws.com/ecs-demo-app:latest
132+
docker push <ACCOUNT>.dkr.ecr.us-east-1.amazonaws.com/ecs-demo-app:latest
133+
```
78134

79-
EB[EventBridge Rule] --> L2[Lambda Auto-Sleep]
80-
L2 --> ECS
135+
5) Wake the service in browser and you’ll be redirected to the running task:
81136
```
137+
https://ecs-demo.online
138+
```
139+
140+
---
141+
142+
## 🤖 GitHub Actions
143+
144+
### CI — Build & Push to ECR (`.github/workflows/ci.yml`)
145+
- Builds `./app` into an image and pushes to ECR.
146+
- Outputs the full image URL `ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/ecs-demo-app:<tag>`.
147+
148+
### CD — Terraform Apply + Deploy/Destroy (ECS) (`.github/workflows/cd.yml`)
149+
- **Apply**: `terraform apply` + roll service to the image tag (or `latest`).
150+
- **Destroy**: scales service to 0, then `terraform destroy`.
151+
- Prints the final **wake URL** and the **domain**: `https://ecs-demo.online`.
152+
153+
### OPS — Wake/Sleep helpers (`.github/workflows/ops.yml`)
154+
- `wake`: calls the Wake URL (API GW) — useful for checks or previews.
155+
- `sleep`: sets `desiredCount=0` immediately.
82156

83-
- **ECS Fargate** runs containerized app (ARM64, Node.js).
84-
- **ECR** stores Docker images.
85-
- **CloudWatch Logs** stores app + infra logs.
86-
- **API Gateway + Lambda** handles wake-up.
87-
- **EventBridge + Lambda** enforces auto-sleep after N minutes.
157+
> All jobs use GitHub OIDC to assume **`github-actions-ecs-role`** in your AWS account.
88158
89159
---
90160

91-
## DNS & Public Access
161+
## 🔍 Variables (Terraform)
162+
163+
| Name | Type | Default | Description |
164+
|----------------------|--------|----------------|-----------------------------------------------|
165+
| `project_name` | string | `ecs-demo` | Prefix for AWS resource names |
166+
| `region` | string | `us-east-1` | AWS region |
167+
| `vpc_cidr` | string | `10.20.0.0/16` | VPC CIDR |
168+
| `public_subnets` | list | `["10.20.1.0/24", "10.20.2.0/24"]` | Two public subnets |
169+
| `desired_count` | number | `0` | 0 = idle, 1 = running |
170+
| `task_cpu` | string | `256` | Task CPU |
171+
| `task_memory` | string | `512` | Task memory |
172+
| `app_port` | number | `80` | Container port |
173+
| `ecr_repo_name` | string | `ecs-demo-app` | ECR repo name |
174+
| `enable_wake_api` | bool | `true` | Create Wake Lambda + API GW |
175+
| `enable_auto_sleep` | bool | `true` | Create Auto-sleep Lambda + EventBridge rule |
176+
| `sleep_after_minutes`| number | `5` | When to scale to 0 |
177+
178+
> Lambda env `WAIT_MS` in `main.tf` controls the **warm-up budget** shown on the waiting page.
92179
93-
The project is exposed via a custom domain:
180+
---
181+
182+
## 💰 Cost notes
94183

95-
🔗 **https://ecs-demo.online**
184+
- **Idle**: $0 for ECS/Fargate (desiredCount=0). You pay pennies for:
185+
- Lambda invocations (wake/auto-sleep)
186+
- API Gateway minimal traffic
187+
- CloudWatch Logs
188+
- S3+DynamoDB for Terraform backend
189+
- Route 53 hosted zone (if used)
190+
- **Active**: Fargate task (0.25 vCPU / 0.5GB) while running.
96191

97-
- The domain is managed via **Namecheap** and delegated to **Route 53** hosted zone.
98-
- The root (`ecs-demo.online`) is mapped to the **API Gateway (Wake URL)** via Route 53 alias record.
99-
- First visit → API Gateway triggers Lambda wake-up → ECS Fargate task starts.
100-
- After ~30–60s cold start the container becomes reachable on the public IP, and user is redirected to the running service.
192+
---
101193

102-
> ⚠️ If the service is **asleep** (scaled to 0), you may need to reload once and wait for the wake-up screen to complete.
194+
## 🆘 Troubleshooting
195+
196+
- **Waiting page loops forever**
197+
Increase `WAIT_MS` in Lambda env (via Terraform) to 120–180 seconds.
198+
- **Private IP in redirect**
199+
Ensure **`assign_public_ip = true`** for the ECS service (already set).
200+
- **Destroy fails on API GW stage**
201+
If you attached a custom domain (Route 53), remove **base path mappings** first, or use `-target` destroys.
103202

104203
---
105204

106-
## 🧑‍💻 Usage
205+
## 🧹 Cleanup
107206

108207
```bash
109-
# Build and push image
110-
docker build -t ecs-demo-app .
111-
docker tag ecs-demo-app:latest <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/ecs-demo-app:latest
112-
docker push <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/ecs-demo-app:latest
208+
# scale down first (optional)
209+
aws ecs update-service --cluster ecs-demo-cluster --service ecs-demo-svc --desired-count 0 --region us-east-1
113210

114-
# Deploy infra (Terraform)
211+
# destroy infra
115212
cd infra
116-
terraform init
117-
terraform apply -auto-approve
118-
119-
# Get wake URL
120-
terraform output wake_url
213+
terraform destroy -auto-approve -input=false
121214
```
122215

123216
---
124217

125-
## 📜 License
218+
## 📝 License
126219

127-
MIT — use freely for demo/learning purposes.
220+
MIT

0 commit comments

Comments
 (0)