Skip to content

implicit shm_key makes Client.close() fail to clean up SHM/shared-connection state #1058

@mpszn

Description

@mpszn

Python client 19.2.1: implicit shm_key makes Client.close() fail to clean up SHM/shared-connection state

Summary

When SHM is enabled and use_shared_connection=True, omitting shm_key causes the Python wrapper to register shared state under a key built before the implicit shm_key is assigned. Client.close() later looks up that state using a key that includes the actual shm_key, so the lookup misses and the SHM/shared entry is left behind.

This leaves stale SHM behind even when the application calls close(), and later processes can unexpectedly attach to that stale segment.

Environment

  • Aerospike Python client 19.2.1
  • Reproduced on Debian Bookworm with Python 3.11.2
  • Reproduced on Debian Trixie with Python 3.13.5
  • SHM enabled with shm={}
  • use_shared_connection=True
  • No explicit shm_key

Minimal Reproduction

from aerospike import Client

conf = {
    "hosts": [("seed1.example", 3000), ("seed2.example", 3000)],
    "shm": {},
    "use_shared_connection": True,
}

client = Client(conf)
client.close()

Expected Behavior

client.close() should detach and clean up the SHM/shared-connection state created for that client, or at least use the same lookup key at connect time and close time.

Actual Behavior

With implicit shm_key, the close-time lookup key does not match the connect-time registration key, so the shared entry is not found and cleanup does not happen.

In tests, a fresh process reused the same implicit key and the stale SHM segment remained visible and reusable after close().

Impact

  • SHM state persists across processes even when the application calls close().
  • Later processes can attach to obsolete cluster state.
  • This creates the precondition for follow-on bugs in the SHM follower path.

Technical Analysis

The issue appears to be in the Python wrapper's shared-connection alias handling:

  • The shared-connection alias or search string is created before the implicit shm_key is assigned.
  • close() later rebuilds the lookup string including the real shm_key.
  • When shm_key was implicit, those two strings differ.
  • The lookup misses, so the entry is not cleaned up.

From source inspection in 19.2.1:

  • The implicit SHM key is assigned during connect.
  • The shared alias is created too early.
  • Close-time lookup uses the actual SHM key, so it does not match the original registration.

Relevant Source Locations

Verified against the extracted 19.2.1 source tree.

  • src/main/client/connect.c:53-54 builds alias_to_search before SHM key generation.
  • src/main/client/connect.c:93-119 assigns the implicit shm_key only after the alias has already been computed.
  • src/main/client/connect.c:127-129 registers the shared/global entry under the pre-key alias.
  • src/main/client/close.c:64-66 rebuilds alias_to_search during close().
  • src/main/client/close.c:138-141 appends shm_key inside return_search_string() when SHM is enabled.
  • src/main/client/type.c:873-876 sets user_shm_key = true only when the caller provided an explicit shm_key.
  • src/main/aerospike.c:47-49 defines the global implicit-key state: counter = 0xA8000000 and user_shm_key = false.

Workaround

Always set an explicit shm_key.

Likely Fix Scope

  • Primary fix surface is the Python wrapper, not the C client.
  • The most likely code changes are in src/main/client/connect.c, where alias_to_search is built before the implicit SHM key is assigned.
  • A minimal fix would assign the final shm_key before computing the alias, or recompute the alias after shm_key is finalized and before registering the shared/global entry.
  • src/main/client/close.c should be reviewed at the same time to confirm the lookup and cleanup path uses the same alias contract.
  • Risk looks low to medium because the behavior is scoped to shared-connection bookkeeping, but it affects lifecycle semantics across processes.
  • The most important regression tests would cover use_shared_connection=True with both implicit and explicit shm_key, followed by close() and verification that the shared entry and SHM cleanup behavior are consistent.

Notes

This is separate from the follower-startup latency bug. This issue is about stale SHM/shared state surviving close(). The startup latency bug is a downstream effect that becomes visible once stale SHM exists, I will create another issue for that. #1059

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions