Cloudflare Worker + Cloudflare Workflows runtime used by Manhali to execute background jobs reliably.
This package is the workflow runtime for the monorepo. It receives typed events from @abshahin/workflows-sdk, starts the correct workflow class, calls the backend internal execution endpoints, and recovers permanently failed jobs through Cloudflare Queues.
Related SDK: https://github.com/aashahin/workflows-sdk
- Accepts authenticated event batches at
POST /dispatch - Routes events to workflow classes by event domain
- Supports delayed execution with
step.sleep(...) - Retries transient backend failures inside the workflow runtime
- Pushes exhausted workflow failures into a retry queue
- Re-dispatches failed events from a queue consumer with progressive backoff
- Exposes minimal health and operational endpoints
The worker currently handles four workflow domains:
- Email workflows
- Notification workflows
- Payment workflows
- WhatsApp workflows
Bindings are configured in wrangler.jsonc:
EMAIL_WORKFLOWNOTIFICATION_WORKFLOWPAYMENT_WORKFLOWWHATSAPP_WORKFLOW
The full system is split across three layers.
- The backend creates workflow jobs through
@abshahin/workflows-sdk(https://github.com/aashahin/workflows-sdk) - The SDK sends event batches over HTTP to this worker's
/dispatchendpoint - Event contracts are shared and typed across producer and runtime
- Validates incoming payloads
- Authenticates requests with a bearer token
- Applies lightweight per-isolate rate limiting
- Resolves each event to a workflow binding
- Starts a Cloudflare Workflow instance per event
- Each workflow calls the backend internal endpoint at
POST /workflows/execute/:path - The backend restores tenant context when relevant
- Existing domain services execute the actual side effects
- The backend uses an execution log to avoid duplicate side effects when retries happen
- Backend code creates a job through the SDK.
- The SDK sends an authenticated
POST /dispatchrequest. - This worker validates the batch and starts the matching workflow.
- The workflow optionally delays execution.
- The workflow calls the backend callback endpoint.
- Cloudflare Workflows retries transient step failures.
- If the workflow still fails, the event is persisted to Cloudflare Queues for delayed recovery.
- The queue consumer retries the backend call directly until success or dead-lettering.
flowchart LR
A[Backend producer] -->|SDK dispatch| B[Workflows SDK]
B -->|POST /dispatch| C[Cloudflare Worker]
C --> D[EmailWorkflow]
C --> E[NotificationWorkflow]
C --> F[PaymentWorkflow]
C --> G[WhatsappWorkflow]
D -->|POST /workflows/execute/*| H[Backend execution endpoint]
E -->|POST /workflows/execute/*| H
F -->|POST /workflows/execute/*| H
G -->|POST /workflows/execute/*| H
D -. exhausted retries .-> I[Failed events queue]
E -. exhausted retries .-> I
F -. exhausted retries .-> I
G -. exhausted retries .-> I
I --> J[Queue consumer]
J -->|direct backend retry| H
I -->|max retries exceeded| K[Dead-letter queue]
src/
env.ts Typed Cloudflare bindings
index.ts HTTP entrypoint + queue consumer
lib/
backend.ts Backend callback helpers
failed-events.ts Queue persistence + retry processing
workflows/
email.workflow.ts Email workflow implementation
notification.workflow.ts Notification workflow implementation
payment.workflow.ts Payment workflow implementation
whatsapp.workflow.ts WhatsApp workflow implementation
Starts workflow instances for a batch of events.
Authentication:
Authorization: Bearer <AUTH_TOKEN>
Behavior:
- Rejects payloads larger than 1 MB
- Requires a JSON body with an
eventsarray - Validates the basic event structure before workflow creation
- Returns created workflow IDs plus per-item errors for rejected events
Example request:
{
"events": [
{
"id": "evt_01",
"idempotencyKey": "email:reset-password:user-42",
"traceId": "trace_01",
"delayMs": 0,
"event": {
"name": "email/reset-password",
"data": {
"tenantId": "tenant_123",
"email": "user@example.com",
"otpCode": "123456"
}
}
}
]
}Example response:
{
"ids": ["evt_01"]
}Example curl command:
curl -X POST "$WORKER_URL/dispatch" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"events": [
{
"id": "evt_demo_01",
"idempotencyKey": "demo:email:reset-password:user-42",
"traceId": "trace_demo_01",
"delayMs": 0,
"event": {
"name": "email/reset-password",
"data": {
"tenantId": "tenant_123",
"email": "user@example.com",
"userName": "John Doe",
"otpCode": "123456"
}
}
}
]
}'Partial-failure response shape:
{
"ids": ["evt_01"],
"errors": [
{
"id": "evt_02",
"error": "Unknown event: something/unsupported"
}
]
}Simple liveness endpoint.
Example response:
{
"status": "ok"
}Example curl command:
curl "$WORKER_URL/health"Authenticated operational endpoint that exposes queue names used for retry + dead-letter processing.
Authentication:
Authorization: Bearer <AUTH_TOKEN>
Example curl command:
curl "$WORKER_URL/failed-events" \
-H "Authorization: Bearer $AUTH_TOKEN"Each workflow follows the same high-level pattern:
- Accept event payload from
/dispatch - Optionally sleep for
delayMs - Call the backend execution path through
step.do(...) - Retry transient failures with Cloudflare Workflows retry policy
- Persist exhausted failures to a queue unless the error is non-retryable
The workflow step retry policy is:
- Retry limit:
3 - Initial delay:
1 second - Backoff:
exponential
This retry layer is separate from the queue-based failed-event recovery.
All workflow classes call the backend through shared helpers in src/lib/backend.ts.
The worker forwards these headers when available:
Authorization: Bearer <AUTH_TOKEN>X-Trace-IdX-Workflow-Event-Idx-tenant-idwhentenantIdexists in the payload
Non-retryable backend statuses are currently:
404409422
Other failures remain retryable, including:
400401403429- all
5xxresponses
When a workflow exhausts its internal retries, the worker stores the event in FAILED_EVENTS_QUEUE.
Cloudflare Queues give the worker a better recovery path than ad hoc storage-based retry loops:
- Native delayed retries
- Automatic message delivery to the consumer
- Built-in dead-letter queue support
- No polling cron required
- No custom visibility timeout bookkeeping
- The workflow stores the failed event in
manhali-failed-events. - The first queue delivery is delayed by 60 seconds.
- The queue consumer calls the backend directly.
- On success, the message is acknowledged.
- On retryable failure, the message is requeued with a progressive delay.
- After
max_retries, Cloudflare moves the message tomanhali-failed-events-dlq.
Current delay schedule implemented in src/lib/failed-events.ts:
1m5m15m30m45m60m90m120m
The consumer is configured with max_retries = 10, so later attempts use the final capped delay value.
From wrangler.jsonc:
max_retries = 10dead_letter_queue = "manhali-failed-events-dlq"max_batch_size = 10max_batch_timeout = 30
email/reset-passwordemail/new-account-credentialsemail/change-email-verificationemail/verificationemail/cart-recoveryemail/invitationemail/enrollment-confirmationemail/trial-reminder
notification/createnotification/bulk-create
payment/process-payout
whatsapp/send-template
The payment workflow currently orchestrates payout processing in three backend steps:
- Validate payout
- Process payout
- Notify payout status
The worker intentionally keeps its trust model simple:
- Shared bearer token between the backend and the worker
- Constant-time token comparison to reduce timing attack leakage
- Input shape validation at the dispatch boundary
- Lightweight in-memory rate limiting on
/dispatch
Important limitation:
- The rate limiter is per isolate, not global. It helps contain accidental or abusive loops but is not a substitute for a distributed rate-limiting layer.
The worker currently provides lightweight observability primitives:
- Structured console logging for dispatch, retry, and failure paths
- Trace propagation using
X-Trace-Id - Queue / DLQ visibility through Cloudflare dashboard metrics
- Operational endpoint at
/failed-events
For production use, add external alerting around DLQ growth and repeated backend callback failures.
- Bun
- Cloudflare account + Wrangler access
- The monorepo dependencies installed
- A reachable backend instance for workflow callback execution
From the repository root:
bun installCreate apps/workers/workflows-worker/.dev.vars with:
AUTH_TOKEN=replace-with-a-shared-secret
BACKEND_URL=http://localhost:<backend-port>The ENVIRONMENT variable is already defined in wrangler.jsonc for local development.
From this package directory:
bun run devDefault local port:
8787
bun run dev
bun run deploy
bun run tailBefore deploying:
- Provision the Cloudflare Workflows and Queue resources referenced by
wrangler.jsonc. - Set the required worker secrets.
- Make sure the backend exposes the internal callback routes.
- Verify the worker and backend share the same
AUTH_TOKEN. - Confirm the backend URL is reachable from the worker environment.
Set secrets with Wrangler:
wrangler secret put AUTH_TOKEN
wrangler secret put BACKEND_URLDeploy:
bun run deployAUTH_TOKENBACKEND_URL
ENVIRONMENT
WORKFLOWS_WORKER_URLWORKFLOWS_AUTH_TOKEN
apps/backendproduces and executes the business operationspackages/workflows-sdkdefines the client and event contracts- This package provides the Cloudflare runtime and recovery layer
The backend currently executes WhatsApp business sends asynchronously through this worker, while manual WhatsApp test sends and manual delivery retries remain direct backend operations.
This separation keeps event production, orchestration, and business execution decoupled.
- Deterministic idempotency keys are not used as Cloudflare Workflow instance IDs because completed instance IDs remain reserved.
- Queue retries call the backend directly instead of creating a new workflow instance so Cloudflare Queue attempt counters drive the backoff correctly.
- The backend is responsible for final side-effect idempotency through execution logging.
If you want to extend the worker, keep changes aligned with these conventions:
- Add new event contracts in
@abshahin/workflows-sdkfirst - Update worker routing in
src/index.ts - Implement the workflow in
src/workflows/ - Keep backend execution endpoints explicit and idempotent
- Treat
404,409, and422as permanently broken inputs unless the backend contract changes
If you add a new workflow domain, update this README's event catalog and operational notes in the same change.
MIT. Add the LICENSE file in the published cloudflare-workflows-worker repository.