Lock-free arena-backed skip list for LSM-tree memtables. Designed for high-throughput write workloads in storage engines like RocksDB, LevelDB, and CockroachDB.
| Feature | fastskip | dashmap | std::collections::BTreeMap |
|---|---|---|---|
| Lock-free writes | Yes | Yes | No (global lock) |
| Lock-free reads | Yes | No (read lock) | No |
| Snapshot isolation | Yes | No | No |
| Range cursors | Yes | No | Yes |
| Seal/freeze lifecycle | Yes | No | No |
| Arena memory | Yes (per-thread shards) | No (heap) | No |
| Bulk deallocation | Yes | No | No |
| Auto-seal (size/count) | Yes | No | No |
| Backpressure detection | Yes | No | No |
| Batch operations | Yes | Limited | No |
- O(1) insert — Bump allocation from per-thread arena shard
- O(log n) get — Skip list traversal with small constant
- Lock-free reads — No locks required for point lookups
- Snapshot isolation — Point-in-time iteration under concurrent writes
- Memory efficiency — Per-thread arenas avoid allocation contention
- Bulk deallocation — Dropping memtable reclaims all memory at once
- Auto-sealing — Automatic sealing based on memory usage or entry count
- Backpressure — Detect when nearing capacity limits
- Batch operations — Efficient bulk insert and multi-get operations
[dependencies]
fastskip = "0.1"use fastskip::ConcurrentSkipList;
let sl = ConcurrentSkipList::new();
sl.insert(b"user:1001", b"alice");
sl.insert(b"user:1002", b"bob");
let (val, tombstone) = sl.get(b"user:1001").unwrap();
assert_eq!(val, b"alice");
assert!(!tombstone);use fastskip::ConcurrentSkipList;
// Create with auto-seal at 1MB memory or 100k entries
let sl = ConcurrentSkipList::with_capacity_and_shards(
64 * 1024, // 64KB initial arena per shard
4, // 4 shards (should match writer thread count)
1 * 1024 * 1024, // Auto-seal at 1MB total memory
100_000 // Auto-seal at 100k entries
);
// Check if nearing limits
if !sl.is_under_backpressure() {
// Safe to insert batch
let batch = vec![(b"key1", b"val1"), (b"key2", b"val2")];
sl.insert_batch(&batch).unwrap();
}
// Get memory statistics
println!("Used: {} bytes", sl.memory_usage());
println!("Utilization: {:.1}%", sl.memory_utilization() * 100.0);
println!("Avg key size: {} bytes", sl.avg_key_size() as usize);The skip list uses per-thread arena shards for allocation:
Thread 0 → Arena 0 (bump pointer, no sync)
Thread 1 → Arena 1 (bump pointer, no sync)
Thread 2 → Arena 2 (bump pointer, no sync)
↓
SkipList (atomic tower pointers)
Each writer thread has its own arena shard. Writes don't contend on allocation — they're pure bump pointer operations. The skip list itself uses atomic operations for the lock-free CAS at level 0.
Nodes are allocated as a single contiguous block:
[ header (32 bytes) ]
[ tower[0..height-1] ] (8 bytes each)
[ key bytes ]
[ value bytes ]
Key and value are stored inline after the tower, enabling zero-copy reads.
use fastskip::ConcurrentSkipList;
let sl = ConcurrentSkipList::new();
// Insert key-value pairs
sl.insert(b"user:1001", b"alice");
sl.insert(b"user:1002", b"bob");
// Point lookup
let (val, is_tombstone) = sl.get(b"user:1001").unwrap();
assert_eq!(val, b"alice");
assert!(!is_tombstone);
// Check if key exists (non-deleted)
assert!(sl.contains_key(b"user:1001"));
// Get or insert atomically
let (val, is_new) = sl.get_or_insert(b"user:1003", b"charlie");
assert!(is_new);Delete marks a key with a tombstone. Deleted keys remain in the skip list until flushed to SSTable.
use fastskip::ConcurrentSkipList;
let sl = ConcurrentSkipList::new();
sl.insert(b"key", b"value");
// Delete returns true if key was found
assert!(sl.delete(b"key"));
// get still returns the key, but with tombstone = true
let (_, tombstone) = sl.get(b"key").unwrap();
assert!(tombstone);
// get_live filters out tombstones
assert_eq!(sl.get_live(b"key"), None);Iterate in sorted key order:
use fastskip::ConcurrentSkipList;
let sl = ConcurrentSkipList::new();
sl.insert(b"apple", b"1");
sl.insert(b"banana", b"2");
sl.insert(b"cherry", b"3");
for entry in sl.iter() {
println!("{:?} -> {:?}", entry.key, entry.value);
}The intended lifecycle for an LSM-tree storage engine:
use fastskip::ConcurrentSkipList;
// 1. Create active memtable with auto-seal thresholds
let memtable = ConcurrentSkipList::with_capacity_and_shards(
10 * 1024 * 1024, // 10 MB per shard
4, // 4 shards
50 * 1024 * 1024, // Auto-seal at 50 MB total
1_000_000 // Auto-seal at 1M entries
);
// 2. Concurrent writers insert/delete
memtable.insert(b"key1", b"val1");
memtable.delete(b"key2");
// 3. Check for backpressure before inserting batch
if !memtable.is_under_backpressure() {
let batch = vec![(b"key3", b"val3"), (b"key4", b"val4")];
memtable.insert_batch(&batch).unwrap();
}
// 4. When full (auto-sealed or manual), seal it — returns frozen (for flushing) + fresh (for writes)
let (frozen, fresh) = memtable.seal().unwrap();
// 5. Flush frozen to SSTable (iterate snapshot)
for entry in frozen.iter() {
if entry.is_tombstone {
// write tombstone marker to SSTable
} else {
// write key-value to SSTable
}
}
// 6. Drop frozen (reclaims arena memory)
std::mem::drop(frozen);
// 7. Fresh memtable is ready for new writes
fresh.insert(b"key5", b"val5");
// Monitor memory usage
println!("Memory usage: {} bytes", fresh.memory_usage());
println!("Utilization: {:.1}%", fresh.memory_utilization() * 100.0);Readers are fully lock-free. For consistent point-in-time reads, use a snapshot:
use fastskip::ConcurrentSkipList;
let sl = ConcurrentSkipList::new();
sl.insert(b"a", b"1");
sl.insert(b"b", b"2");
// Snapshot captures sequence number — skips post-snapshot inserts
let snap = sl.snapshot();
// Insert more after snapshot
sl.insert(b"c", b"3");
// Snapshot sees only "a" and "b"
assert_eq!(snap.iter().count(), 2);
// Live iterator sees all three
assert_eq!(sl.iter().count(), 3);Use cursor_at to seek to a key range:
use fastskip::ConcurrentSkipList;
let sl = ConcurrentSkipList::new();
sl.insert(b"apple", b"1");
sl.insert(b"banana", b"2");
sl.insert(b"cherry", b"3");
sl.insert(b"date", b"4");
// Seek to first key >= "banana"
let mut cursor = sl.cursor_at(b"banana").unwrap();
while let Some(entry) = cursor.next_entry() {
println!("{:?}", entry.key);
}
// Output: banana, cherry, dateConcurrentSkipList is Send + Sync — can be shared across threads:
use fastskip::ConcurrentSkipList;
use std::sync::Arc;
use std::thread;
let sl = Arc::new(ConcurrentSkipList::new());
let sl1 = sl.clone();
let t1 = thread::spawn(move || {
for i in 0..1000 {
sl1.insert(format!("key:{}", i).as_bytes(), b"value");
}
});
let sl2 = sl.clone();
let t2 = thread::spawn(move || {
for i in 1000..2000 {
sl2.insert(format!("key:{}", i).as_bytes(), b"value");
}
});
t1.join().unwrap();
t2.join().unwrap();
assert_eq!(sl.len(), 2000);use fastskip::ConcurrentSkipList;
let sl = ConcurrentSkipList::new();
sl.insert(b"key", b"value");
println!("allocated: {} bytes", sl.memory_usage());
println!("reserved: {} bytes", sl.memory_reserved());
println!("utilization: {:.1}%", sl.memory_utilization() * 100.0);
println!("idle: {} bytes", sl.memory_idle());The main lock-free skip list. Send + Sync.
| Method | Description |
|---|---|
new() |
Create with default shard count (CPU count) |
with_shards(n) |
Create with n arena shards |
with_capacity(bytes) |
Create with custom arena capacity |
with_capacity_and_shards(bytes, n) |
Custom capacity and shards |
| Method | Description |
|---|---|
insert(key, value) |
Insert, returns false on OOM/duplicate |
try_insert(key, value) |
Returns Result<(), InsertError> |
delete(key) |
Delete (tombstone), returns false if not found |
get_or_insert(key, value) |
Get or insert atomically |
| Method | Description |
|---|---|
get(key) |
Returns Option<(value, is_tombstone)> |
get_live(key) |
Returns None if deleted or missing |
contains_key(key) |
Returns true if key exists and not deleted |
| Method | Description |
|---|---|
iter() |
Live iterator |
snapshot() |
Point-in-time snapshot |
cursor() |
Cursor at first entry |
cursor_at(target) |
Cursor at first key >= target |
| Method | Description |
|---|---|
len() |
Number of live entries |
is_empty() |
Returns true if no live entries |
is_sealed() |
Returns true if sealed |
memory_usage() |
Arena bytes allocated |
memory_reserved() |
Arena bytes reserved |
memory_utilization() |
Utilization (0.0 to 1.0) |
memory_idle() |
Reserved but unused bytes |
avg_key_size() |
Approximate average key size in bytes |
avg_value_size() |
Approximate average value size in bytes |
total_inserts() |
Total insert attempts (including duplicates/failures) |
max_memory_bytes() |
Maximum memory bytes configured (0 = unlimited) |
max_entries() |
Maximum entries configured (0 = unlimited) |
is_under_backpressure() |
Returns true if nearing capacity limits (90% threshold) |
let sl = ConcurrentSkipList::new();
sl.insert(b"key", b"value");
println!("allocated: {} bytes", sl.memory_usage());
println!("reserved: {} bytes", sl.memory_reserved());
println!("utilization: {:.1}%", sl.memory_utilization() * 100.0);
println!("idle: {} bytes", sl.memory_idle());
println!("avg key size: {} bytes", sl.avg_key_size() as usize);
println!("avg value size: {} bytes", sl.avg_value_size() as usize);
println!("total insert attempts: {}", sl.total_inserts());
println!("under backpressure: {}", sl.is_under_backpressure());A positioned iterator for range scans.
| Method | Description |
|---|---|
next_entry() |
Advance, returns false at end |
entry() |
Current entry |
seek(target) |
Seek to first key >= target |
Implements Iterator<Entry>.
A point-in-time view for consistent iteration.
let snap = sl.snapshot();
for entry in snap.iter() {
// Sees only entries at snapshot time
}pub enum InsertError {
DuplicateKey, // Key already exists
OutOfMemory, // Arena shard out of memory
}
pub enum SealError {
AlreadySealed, // Memtable was already sealed
}Inserting a key that was tombstoned in the same memtable returns false (duplicate). Seal the memtable and write to a fresh one to re-insert.
User-inserted empty keys (b"") work correctly. The sentinel head node uses an empty key identified by position.
Arena memory is bulk-allocated in blocks and bulk-reclaimed when the memtable is dropped. No per-node allocation/deallocation overhead.
- Performance: Up to 27% faster concurrent inserts, 20% faster sequential inserts
- O(1)
memory_usage()via atomic running total (was O(N shards)) - Adaptive backoff on CAS failure to reduce cache-line bouncing
- Optimized
insert_batch(): single sealed check, single arena lookup - Bulk u64 header writes in
init_node()(3 stores vs 8) - Conditional allocation tracking — zero overhead when memory limits aren't configured
- Release profile:
lto = "fat",strip = true
- Initial release
- Lock-free skip list with per-thread arena shards
- Point lookups, iteration, snapshots
- Tombstone delete support
- Seal/freeze lifecycle for LSM memtables
- Range cursors for prefix scans
- Memory utilization metrics:
memory_reserved()— total arena capacitymemory_utilization()— fraction used (0.0-1.0)memory_idle()— unused bytes
- Production readiness features:
- Size-based auto-seal (
max_memory_bytesthreshold) - Count-based auto-seal (
max_entriesthreshold) - Backpressure detection (
is_under_backpressure()) - Batch insert (
insert_batch()) - Multi-get (
get_many()) - Enhanced memory stats (
avg_key_size,avg_value_size,total_inserts) - Configuration preserved on seal/fresh memtable creation
- Size-based auto-seal (
MIT License — see LICENSE for details.