Skip to content

SSL-ACTX/chronos-db

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChronosDB Banner

Language License Architecture

A crash-safe, strictly serializable, and horizontally scalable research database.


Warning

Proof of Concept & Research Only

This project is strictly for educational and research purposes. It is designed to demonstrate the implementation of distributed consensus, vector indexing, and bi-temporal data management in Rust.

It is not intended for production use, and active development is not guaranteed. There are no warranties regarding data integrity or security.


📖 Overview

ChronosDB is an experimental high-performance distributed database built in Rust. It uniquely combines Vector Similarity Search (for AI applications) with Bi-Temporal Data Management (Time Travel) and Distributed Consensus (Raft).

It solves the problem of "lossy" AI memory by ensuring that every vector embedding ever written is preserved in a strictly append-only history, reachable via time-travel queries.

🚀 Key Features

  • Distributed Consensus (Raft): Built on openraft, ensuring linearizable writes and automatic leader election. Supports dynamic membership changes (adding/removing nodes) without downtime.
  • Vector Search Engine: Custom implementation of HNSW (Hierarchical Navigable Small World) graph for approximate nearest neighbor search. Supports SIMD-optimized Euclidean and Cosine distance metrics.
  • Time Travel (History): Every record maintains a valid_time and tx_time. Data is never overwritten; it is appended. You can query the full history of any object using the HISTORY command.
  • Disk-Based Architecture: Uses memory-mapped files (mmap) for managing storage segments. It supports both "Strict Durability" (fsync) and "High Throughput" (async) modes based on hardware detection.
  • Zero-Copy Serialization: Utilizes rkyv for guaranteeing data layout alignment and zero-copy deserialization from disk.
  • Binary Snapshots: Uses rkyv for compact, high-speed binary snapshots, significantly reducing storage size and recovery time compared to JSON.
  • Probabilistic Filtering: Implements Bloom Filters and SeaHash to minimize disk reads for non-existent keys.

🏗 Architecture

1. The Storage Layer

ChronosDB uses an immutable, append-only log structure to ensure crash safety and historical retention.

  • Segments: Data is written to 64MB memory-mapped file segments.
  • Records: Each entry contains a 128-dimensional vector, a binary payload, and temporal metadata.
  • Safety: The system detects concurrency levels and adjusts fsync behavior automatically via system profiling.

2. The Index Layer

Vector indexing is handled by a persistent HNSW graph that updates in real-time.

  • Nodes: Graph nodes are stored in a separate optimized index file.
  • Search: Uses a priority queue-based search (beam search) to traverse graph layers.
  • Recall: Tuned with M=16 and ef_construction=100 for high recall on random data distributions.

3. The Cluster Layer

  • Replication: Logs are replicated to a quorum of nodes via the Raft protocol.
  • Snapshotting: Supports auto-snapshotting based on log length (default: every 20 logs) to compact the Write Ahead Log (WAL). Snapshots are serialized in an optimized binary format.
  • Network protocol: Raft control plane uses direct Tokio TCP with a binary frame:
    • [4-byte BE len] [1-byte route] [rkyv payload bytes]
    • routes: 1 for vote, 2 for append_entries, 3 for install_snapshot
    • payloads are rkyv-aligned.
  • Client data API (main query transport): writes now include an operation flag byte for RETURNING in the payload (e.g. 0x01 for RETURNING id). Responses are always [1-byte status][4-byte LE payload_len][payload_bytes].
  • Control API: Exposes an HTTP admin API for bootstrapping and membership operations (default port 20002).`

🛠️ Installation & Build

Prerequisites:

  • Rust
  • Cargo

Build:

# Clone the repository
git clone https://github.com/SSL-ACTX/chronos-db.git
cd chronos-db

cargo build --release

Run a Single Node:

# Runs on TCP 9000, Raft API 20001
./target/release/chronos --node-id 1

💻 SQL-Like Query Interface

ChronosDB includes a custom SQL-like parser and dedicated CLI client. The full usage guide is in docs/usage.md.

Quick CLI start

cargo run --bin chronos-cli

Available high-level operations

  • INSERT / UPDATE / DELETE (with optional RETURNING id), all routed through Raft write path.
  • SELECT / vector search.
  • GET by ID.
  • HISTORY and AS OF time-travel.

For detailed examples and commands, see:


🌐 Clustering & Distribution

ChronosDB uses a custom binary protocol for data transmission and a dedicated HTTP control path for cluster bootstrap/membership.

Bootstrapping a Cluster

A helper script test_cluster.py is provided to bootstrap a 3-node cluster locally.

  1. Start Nodes: Nodes must be started with unique IDs and ports.
  • Node 1: Client TCP 9000, Raft TCP 20001, Control HTTP 20002
  • Node 2: Client TCP 9001, Raft TCP 20002, Control HTTP 20003
  • Node 3: Client TCP 9002, Raft TCP 20003, Control HTTP 20004
  1. Initialize Leader: Bootstrapping can be automatic on node startup:
./target/release/chronos --node-id 1 --addr 127.0.0.1:9000 --raft-port 20001 --control-port 20002 --bootstrap

Or with explicit control API call:

curl -X POST http://127.0.0.1:20002/init
  1. Add Learners & Promote: Add Node 2 and Node 3 to the cluster via the control API:
curl -X POST -H "Content-Type: application/json" -d '{"id":2,"addr":"127.0.0.1:20002","auto_promote":true}' http://127.0.0.1:20002/add-learner
curl -X POST -H "Content-Type: application/json" -d '{"id":3,"addr":"127.0.0.1:20003","auto_promote":true}' http://127.0.0.1:20002/add-learner
curl -X POST -H "Content-Type: application/json" -d '[1,2,3]' http://127.0.0.1:20002/change-membership

Snapshotting

The cluster automatically creates snapshots when the log grows too large. This allows new nodes to catch up by downloading a compressed state rather than replaying the entire history.

  • Trigger: Default policy creates a snapshot every 20 logs.
  • Recovery: Nodes automatically restore HNSW graphs and Bloom filters from snapshots upon restart.

🧪 Testing

The project includes Python integration tests for verifying cluster consistency and snapshot logic.

  • Cluster Test: python3 test_cluster.py

  • Verifies Write -> Kill Leader -> Election -> Write -> Read Consistency.

  • Snapshot Test: python3 test_snapshot.py

  • Inserts data past the log limit, triggers a snapshot, adds a new node, and verifies the new node hydrated correctly from the binary snapshot.

Buggy

  • CLI Integration: python3 test_cli.py
  • Verifies the full SQL grammar, vector padding, and CRUD lifecycle.

⚙️ Configuration (Internal)

ChronosDB automatically detects environment resources via a System Profile:

Hardware Mode Behavior
1 Core Potato Mode No fsync (Async durability), High Raft timeout (1s).
< 4 Cores Standard Strict durability, 500ms heartbeat.
Server Server Mode Strict durability, 250ms heartbeat, max worker threads.

Author: Seuriin (SSL-ACTX)

About

A high-performance distributed database built in Rust. It uniquely combines Vector Similarity Search (for AI applications) with Bi-Temporal Data Management (Time Travel) and Distributed Consensus (Raft).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages