Skip to content

Commit 2e925e7

Browse files
Add Production Safety section to README
Document operational risks and recommendations: - Metadata keyspace replication requirements for production - Rollback limitations and data loss warnings - IF EXISTS/IF NOT EXISTS idempotency best practices - Repeatable migration behavior and caveats Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 1bc32a1 commit 2e925e7

1 file changed

Lines changed: 72 additions & 0 deletions

File tree

README.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -293,6 +293,78 @@ Before applying new migrations, scylla-migrate verifies that previously applied
293293
6. **Use `--dry-run`** to preview changes before applying.
294294
7. **Set `NetworkTopologyStrategy`** for production metadata keyspace replication.
295295

296+
## Production Safety
297+
298+
This section covers important operational concerns for running scylla-migrate in production environments.
299+
300+
### Metadata Keyspace Replication
301+
302+
By default, scylla-migrate creates its metadata keyspace (`scylla_migrate`) with `SimpleStrategy` and `replication_factor: 1`. **This is intended for development only.**
303+
304+
In production, you **must** configure `NetworkTopologyStrategy` with an appropriate replication factor for your cluster topology:
305+
306+
```yaml
307+
metadata_replication:
308+
class: "NetworkTopologyStrategy"
309+
datacenters:
310+
dc1: 3
311+
dc2: 3
312+
```
313+
314+
**Why this matters:**
315+
- With RF=1, losing a single node means losing migration metadata — the tool will not know which migrations have been applied.
316+
- LWT-based distributed locking requires a quorum, which is impossible with RF=1 if the replica node goes down.
317+
- If the metadata becomes unavailable, all future migrations will be blocked until the node is restored.
318+
319+
> **Recommendation:** Set the replication factor to at least 3 per datacenter (or match your application keyspace's replication strategy).
320+
321+
### Rollback Limitations
322+
323+
Rollbacks in CQL/ScyllaDB are fundamentally different from SQL databases:
324+
325+
- **DDL statements are not transactional.** If a rollback fails midway (e.g., network error after `DROP TABLE` but before `DROP INDEX`), the schema will be in a partially rolled-back state. You'll need to manually finish or fix it.
326+
- **Data loss is irreversible.** `DROP TABLE` permanently deletes all data in that table. There is no automatic backup before rollback — plan your own backup strategy.
327+
- **Undo scripts must be written manually.** scylla-migrate generates empty `U<version>__<description>.cql` files with `--with-undo`. It's your responsibility to write correct undo CQL.
328+
- **Undo migrations are not tracked in metadata.** When rollback executes undo statements, it removes the corresponding versioned migration record but does not create a new entry. This means rollbacks won't appear in `scylla-migrate status` history.
329+
330+
> **Recommendation:** Always test rollback scripts in staging first. For critical production tables, take a snapshot (`nodetool snapshot`) before running rollbacks.
331+
332+
### `IF EXISTS` / `IF NOT EXISTS` in Migrations
333+
334+
Always use guard clauses in your CQL statements:
335+
336+
```sql
337+
-- Good: safe for retries
338+
CREATE TABLE IF NOT EXISTS users (id UUID PRIMARY KEY, name TEXT);
339+
CREATE INDEX IF NOT EXISTS users_name_idx ON users (name);
340+
341+
-- Dangerous: fails on retry if table already exists
342+
CREATE TABLE users (id UUID PRIMARY KEY, name TEXT);
343+
```
344+
345+
**Why this matters:**
346+
- If a migration partially succeeds (e.g., the first statement runs, then a network timeout occurs before metadata is recorded), re-running `scylla-migrate migrate` will re-execute all statements in that migration.
347+
- Without `IF NOT EXISTS`, the retry will fail with `AlreadyExists` error, leaving you stuck.
348+
- Similarly, undo scripts should use `DROP TABLE IF EXISTS` and `DROP INDEX IF EXISTS`.
349+
350+
> **Recommendation:** Treat every migration as potentially needing to be idempotent. Use `IF NOT EXISTS` for CREATE, `IF EXISTS` for DROP and ALTER operations.
351+
352+
### Repeatable Migrations
353+
354+
Repeatable migrations (`R__<description>.cql`) are re-applied whenever their content (checksum) changes. Be aware of the following:
355+
356+
- **Repeatable migrations must be fully idempotent.** They will run multiple times over the lifecycle of your project. Every statement inside must be safe to execute repeatedly.
357+
- **They run after all versioned migrations.** On every `migrate` invocation, pending versioned migrations are applied first, then any repeatable migrations with changed checksums.
358+
- **There is no ordering guarantee between repeatable migrations.** If you have multiple `R__*.cql` files, don't assume they run in any particular order. Each should be self-contained.
359+
- **Checksums are validated only for versioned migrations.** Changes to repeatable migration files are expected — that's their purpose. The tool will re-apply them, not flag them as tampered.
360+
361+
Common use cases for repeatable migrations:
362+
- Refreshing materialized views
363+
- Recreating custom functions or aggregates
364+
- Updating role permissions
365+
366+
> **Recommendation:** Keep repeatable migrations small and focused. If the content of a repeatable migration grows complex, consider splitting it into independent files.
367+
296368
## Development
297369

298370
```bash

0 commit comments

Comments
 (0)