You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Document operational risks and recommendations:
- Metadata keyspace replication requirements for production
- Rollback limitations and data loss warnings
- IF EXISTS/IF NOT EXISTS idempotency best practices
- Repeatable migration behavior and caveats
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: README.md
+72Lines changed: 72 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -293,6 +293,78 @@ Before applying new migrations, scylla-migrate verifies that previously applied
293
293
6.**Use `--dry-run`** to preview changes before applying.
294
294
7.**Set `NetworkTopologyStrategy`** for production metadata keyspace replication.
295
295
296
+
## Production Safety
297
+
298
+
This section covers important operational concerns for running scylla-migrate in production environments.
299
+
300
+
### Metadata Keyspace Replication
301
+
302
+
By default, scylla-migrate creates its metadata keyspace (`scylla_migrate`) with `SimpleStrategy` and `replication_factor: 1`. **This is intended for development only.**
303
+
304
+
In production, you **must** configure `NetworkTopologyStrategy` with an appropriate replication factor for your cluster topology:
305
+
306
+
```yaml
307
+
metadata_replication:
308
+
class: "NetworkTopologyStrategy"
309
+
datacenters:
310
+
dc1: 3
311
+
dc2: 3
312
+
```
313
+
314
+
**Why this matters:**
315
+
- With RF=1, losing a single node means losing migration metadata — the tool will not know which migrations have been applied.
316
+
- LWT-based distributed locking requires a quorum, which is impossible with RF=1 if the replica node goes down.
317
+
- If the metadata becomes unavailable, all future migrations will be blocked until the node is restored.
318
+
319
+
> **Recommendation:** Set the replication factor to at least 3 per datacenter (or match your application keyspace's replication strategy).
320
+
321
+
### Rollback Limitations
322
+
323
+
Rollbacks in CQL/ScyllaDB are fundamentally different from SQL databases:
324
+
325
+
- **DDL statements are not transactional.** If a rollback fails midway (e.g., network error after `DROP TABLE` but before `DROP INDEX`), the schema will be in a partially rolled-back state. You'll need to manually finish or fix it.
326
+
- **Data loss is irreversible.** `DROP TABLE` permanently deletes all data in that table. There is no automatic backup before rollback — plan your own backup strategy.
327
+
- **Undo scripts must be written manually.** scylla-migrate generates empty `U<version>__<description>.cql` files with `--with-undo`. It's your responsibility to write correct undo CQL.
328
+
- **Undo migrations are not tracked in metadata.** When rollback executes undo statements, it removes the corresponding versioned migration record but does not create a new entry. This means rollbacks won't appear in `scylla-migrate status` history.
329
+
330
+
> **Recommendation:** Always test rollback scripts in staging first. For critical production tables, take a snapshot (`nodetool snapshot`) before running rollbacks.
331
+
332
+
### `IF EXISTS` / `IF NOT EXISTS` in Migrations
333
+
334
+
Always use guard clauses in your CQL statements:
335
+
336
+
```sql
337
+
-- Good: safe for retries
338
+
CREATE TABLE IF NOT EXISTS users (id UUID PRIMARY KEY, name TEXT);
339
+
CREATE INDEX IF NOT EXISTS users_name_idx ON users (name);
340
+
341
+
-- Dangerous: fails on retry if table already exists
342
+
CREATE TABLE users (id UUID PRIMARY KEY, name TEXT);
343
+
```
344
+
345
+
**Why this matters:**
346
+
- If a migration partially succeeds (e.g., the first statement runs, then a network timeout occurs before metadata is recorded), re-running `scylla-migrate migrate` will re-execute all statements in that migration.
347
+
- Without `IF NOT EXISTS`, the retry will fail with `AlreadyExists` error, leaving you stuck.
348
+
- Similarly, undo scripts should use `DROP TABLE IF EXISTS` and `DROP INDEX IF EXISTS`.
349
+
350
+
> **Recommendation:** Treat every migration as potentially needing to be idempotent. Use `IF NOT EXISTS` for CREATE, `IF EXISTS` for DROP and ALTER operations.
351
+
352
+
### Repeatable Migrations
353
+
354
+
Repeatable migrations (`R__<description>.cql`) are re-applied whenever their content (checksum) changes. Be aware of the following:
355
+
356
+
- **Repeatable migrations must be fully idempotent.** They will run multiple times over the lifecycle of your project. Every statement inside must be safe to execute repeatedly.
357
+
- **They run after all versioned migrations.** On every `migrate` invocation, pending versioned migrations are applied first, then any repeatable migrations with changed checksums.
358
+
- **There is no ordering guarantee between repeatable migrations.** If you have multiple `R__*.cql` files, don't assume they run in any particular order. Each should be self-contained.
359
+
- **Checksums are validated only for versioned migrations.** Changes to repeatable migration files are expected — that's their purpose. The tool will re-apply them, not flag them as tampered.
360
+
361
+
Common use cases for repeatable migrations:
362
+
- Refreshing materialized views
363
+
- Recreating custom functions or aggregates
364
+
- Updating role permissions
365
+
366
+
> **Recommendation:** Keep repeatable migrations small and focused. If the content of a repeatable migration grows complex, consider splitting it into independent files.
0 commit comments