Skip to content

fix(dynamodb): serialise concurrent mutations and transactions#572

Merged
hectorvent merged 6 commits intofloci-io:mainfrom
stashev:fix/dynamodb-concurrency
Apr 25, 2026
Merged

fix(dynamodb): serialise concurrent mutations and transactions#572
hectorvent merged 6 commits intofloci-io:mainfrom
stashev:fix/dynamodb-concurrency

Conversation

@stashev
Copy link
Copy Markdown
Contributor

@stashev stashev commented Apr 23, 2026

Context

#571 reports that concurrent UpdateItem calls using the standard counter idiom SET x = if_not_exists(x, :start) + :inc return duplicate values.

The root cause is structural and wider than the reported symptom: every mutating DynamoDB path in DynamoDbService ran its read → check-condition → mutate → put sequence against unsynchronised state. Individual ConcurrentHashMap / ConcurrentSkipListMap operations are atomic, but the compound sequences are not. The problem affects putItem (conditional writes), updateItem (all mutating expressions), deleteItem (conditional deletes), batchWriteItem, and transactWriteItems — the last one was worst because its two-phase "check all, then write all" relies on the gap between phases being private, which it never was.

What changed

  • Added a per-item ReentrantLock registry ConcurrentHashMap<storageKey, ConcurrentHashMap<itemKey, ReentrantLock>> on DynamoDbService. Locks are created lazily on first access and never removed; re-entrancy lets transactions call back into single-item helpers without deadlocking.
  • Wrapped putItem, updateItem, and deleteItem so each runs its full critical section — including condition check, the mutation itself, persistItems, and the stream/Kinesis fan-out — under the item's lock. batchWriteItem keeps its existing per-item dispatch loop and inherits the new locking for free.
  • Rewrote transactWriteItems to acquire every participant's lock up front in a deterministic (storageKey, itemKey) order, evaluate all conditions against that locked snapshot, and only then apply writes. Locks release in reverse order. Total-ordering prevents deadlock between concurrent transactions; reentrancy makes the inner calls free.
  • Item-level locking means unrelated items still run fully in parallel — the common case is unaffected. Only callers that target the same key serialise, which is what real DynamoDB does too.

Before / After

The bug case from #571 — 50 threads running SET cnt = if_not_exists(cnt, :start) + :inc on the same item — now:

Before After
Final counter value non-deterministic (often 2–20) exactly 50
Distinct returned values fewer than 50 exactly 50
Returned values duplicates common permutation of [1, 50]

Covered by DynamoDbConcurrencyIntegrationTest.concurrent_updateItem_arithmetic_is_atomic (unit-level, @RepeatedTest(5) with a CountDownLatch starting gate). 30 of 51 repetitions failed on main before the fix; all pass after.

Tests

  • Unit-levelDynamoDbConcurrencyIntegrationTest in src/test/java/…/services/dynamodb/. Exercises DynamoDbService directly with a 32-thread pool and @RepeatedTest(5) to make intermittent races observable in a single CI run. Scenarios cover arithmetic counters, ADD counters, conditional PutItem/DeleteItem, mixed PutItem+UpdateItem, overlapping and disjoint TransactWriteItems, BatchWriteItem, and stream event ordering, plus single-threaded baselines.
  • CompatibilityDynamoDbConcurrencyTest in compatibility-tests/sdk-test-java/ covers the four mandatory end-to-end scenarios (arithmetic counter, attribute_not_exists, overlapping TransactWriteItems, mixed update+put) at N=20 through the AWS SDK.

Test coverage gaps

  • Secondary-index behaviour under contention is not exercised; there is no existing Floci coverage for concurrent writes to GSIs, and GSIs are projected synchronously from the base table so they inherit the new ordering.
  • ConditionCheck-only transaction items are locked and evaluated but no scenario asserts their isolation specifically. The underlying mechanism is the same as for Put/Update/Delete participants.

Known follow-up items

  • CHANGELOG entry explicitly flags the behaviour change: callers whose previously-racing requests silently succeeded may now see the ConditionalCheckFailedException or TransactionCanceledException that real DynamoDB would raise.

Closes #571.

Copilot AI review requested due to automatic review settings April 23, 2026 13:50
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes DynamoDB concurrency correctness by serializing per-item mutations and making TransactWriteItems lock all participants in a deterministic order to prevent lost updates and transaction race windows (issue #571).

Changes:

  • Added per-item ReentrantLock registry and wrapped putItem, updateItem, and deleteItem critical sections under the item lock.
  • Updated transactWriteItems to acquire all participant locks up front (sorted) before condition evaluation + write application.
  • Added new concurrency-focused tests at both service level and SDK compatibility-test level; documented the behavior change in the changelog.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
src/main/java/io/github/hectorvent/floci/services/dynamodb/DynamoDbService.java Introduces per-item locking and total-ordered transaction lock acquisition to prevent concurrent lost updates and transaction races.
src/test/java/io/github/hectorvent/floci/services/dynamodb/DynamoDbConcurrencyIntegrationTest.java Adds in-process concurrency regression coverage for item-level atomicity and transaction behavior.
compatibility-tests/sdk-test-java/src/test/java/com/floci/test/DynamoDbConcurrencyTest.java Adds end-to-end AWS SDK concurrency scenarios against a running Floci instance.
CHANGELOG.md Notes the DynamoDB concurrency behavior change and its user-visible implications.

Comment thread src/main/java/io/github/hectorvent/floci/services/dynamodb/DynamoDbService.java Outdated
Comment thread src/main/java/io/github/hectorvent/floci/services/dynamodb/DynamoDbService.java Outdated
stashev added a commit to stashev/floci that referenced this pull request Apr 23, 2026
…erences

Addresses review feedback on floci-io#572:

* Replace the delimited-string ordering key for transactWriteItems with a
  Comparator over TransactParticipant tuples. User-supplied PK/SK bytes can
  contain any character, so "storageKey + \0 + itemKey" is not an injective
  ordering — tuple comparison avoids the collision risk entirely.
* Release a table's item locks alongside itemsByTable / itemStore in
  deleteTable so the registry tracks the table's lifetime instead of growing
  unboundedly across create/drop cycles.
* Remove Javadoc and inline comment references to PLAN-dynamodb-concurrency.md,
  which was a local working doc and was never committed. Issue floci-io#571 stays as
  the public reference.
@hectorvent
Copy link
Copy Markdown
Collaborator

@stashev thanks for the PR. I will do another round of test and I will get it merged.

So far it looks good.

@hectorvent hectorvent added bug Something isn't working dynamodb labels Apr 24, 2026
stashev added 6 commits April 24, 2026 14:28
Exercises DynamoDbService directly with a starting-gate CountDownLatch and
@RepeatedTest(5) to catch read-check-mutate-put races that single-threaded
tests cannot see.

Scenarios cover: arithmetic if_not_exists counters, ADD updates, conditional
PutItem/DeleteItem with attribute_exists/attribute_not_exists, mixed put+update
on the same key, transactWriteItems (overlap and disjoint), batchWriteItem,
and stream event ordering. Each scenario has a baseline single-threaded sibling.

30 of 51 executions fail on main, matching the races described in the plan.
Adds a per-(storageKey, itemKey) ReentrantLock registry and wraps the full
read → check-condition → mutate → put → persist → stream-capture sequence
inside putItem, updateItem, and deleteItem. Locks are created lazily on first
access and never removed; ReentrantLock is used so the upcoming transaction
path can re-enter these methods after acquiring each participant's lock.

Unrelated items still run in parallel — only threads contending for the same
item serialise, matching real DynamoDB's per-item linearisability.

Fixes the updateItem, putItem, deleteItem, and stream-ordering scenarios in
DynamoDbConcurrencyIntegrationTest. The transactWriteItems scenario still
races and is addressed in the next commit.

Refs floci-io#571.
…quisition

Resolves every participant's (storageKey, itemKey) up front, acquires each
ReentrantLock in deterministic ascending order, then evaluates all conditions
against a snapshot nobody else can mutate. Writes apply under the same locks;
inner putItem/updateItem/deleteItem calls re-enter their lock for free.

Fixes the last remaining DynamoDbConcurrencyIntegrationTest scenarios — the
all-or-nothing transaction test now observes committed == versionA == versionB
across repeated runs, and the disjoint-commute scenario proves the sort order
prevents deadlock.

Closes floci-io#571.
Adds the four mandatory compatibility tests that exercise the same locking
semantics through the AWS SDK over HTTP: concurrent UpdateItem arithmetic,
concurrent PutItem(attribute_not_exists), concurrent overlapping
TransactWriteItems, and mixed concurrent UpdateItem + PutItem.

N=20 per scenario to keep CI fast; each scenario uses a CountDownLatch starting
gate so client threads dispatch simultaneously and the server actually sees
overlapping work.
Flags the behaviour change for callers whose previously-racing
UpdateItem/PutItem/TransactWriteItems calls will now see the
ConditionalCheckFailedException or TransactionCanceledException that real
DynamoDB would raise.
…erences

Addresses review feedback on floci-io#572:

* Replace the delimited-string ordering key for transactWriteItems with a
  Comparator over TransactParticipant tuples. User-supplied PK/SK bytes can
  contain any character, so "storageKey + \0 + itemKey" is not an injective
  ordering — tuple comparison avoids the collision risk entirely.
* Release a table's item locks alongside itemsByTable / itemStore in
  deleteTable so the registry tracks the table's lifetime instead of growing
  unboundedly across create/drop cycles.
* Remove Javadoc and inline comment references to PLAN-dynamodb-concurrency.md,
  which was a local working doc and was never committed. Issue floci-io#571 stays as
  the public reference.
@stashev stashev force-pushed the fix/dynamodb-concurrency branch from 23707d3 to f364777 Compare April 24, 2026 12:32
@hectorvent hectorvent merged commit e0dc45e into floci-io:main Apr 25, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working dynamodb

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] DynamoDB UpdateItem atomic counter returns duplicate values under concurrency

3 participants