"Every domain has its jargon. Understand the terminology, understand the system."
The fundamental unit of data in ThemisDB. Combines relational, document, graph, vector, and time-series aspects in a single unified structure.
Example:
{
"_id": "users/alice",
"_key": "alice",
"_rev": 5,
"name": "Alice",
"email": "[email protected]",
"created_at": "2025-01-01T10:00:00Z",
"embedding": [0.1, -0.2, 0.3, ...],
"graph_edges": ["users/bob", "users/charlie"]
}A group of related Base Entities (similar to a SQL table, but more flexible).
Types:
- Relational: users, orders, products
- Document: articles, posts, documents
- Graph: relationships, friendships
- Vector: embeddings, semantic data
An atomic unit of work. Either all operations succeed (COMMIT) or all rollback. Provides ACID guarantees.
BEGIN
INSERT {...} INTO users
UPDATE {...} IN orders
COMMIT
- Atomicity: All or nothing
- Consistency: Data integrity maintained
- Isolation: No dirty reads/writes
- Durability: Persisted to disk
Unified query language for all data models in ThemisDB. Combines SQL (relational), graph traversal, document processing, and vector search.
Core Constructs:
- FOR ... IN: Iteration
- FILTER: Conditional selection
- LET: Variable binding
- COLLECT: Grouping/aggregation
- SORT: Ordering
- LIMIT: Result limiting
- RETURN: Output specification
Parameterized query parameters (prevent injection attacks).
FOR user IN users
FILTER user.email == @email
RETURN user
Run with: db.query(query, bind_vars={'email': '[email protected]'})
Execution strategy optimized by the query optimizer.
Components:
- Collection Scan: Read all documents (slow)
- Index Seek: Use index to find matching rows (fast)
- Filter: Apply WHERE conditions
- Sort: Order results
- Aggregation: GROUP BY operations
Data structure enabling fast lookups. Trade: space for speed.
Types:
| Type | Ordered | Point Lookup | Range Query | Size |
|---|---|---|---|---|
| BTree | ✓ | Fast | Fast | Large |
| Hash | ✗ | Very Fast | Slow | Medium |
| Skiplist | ✓ | Fast | Fast | Medium |
| Inverted | ✗ | Fast | N/A | Large |
Estimated number of rows matching query conditions.
Low cardinality (< 1% of docs):
→ Use index for fast filtering
High cardinality (> 50% of docs):
→ Collection scan might be faster (no index overhead)
How well an index reduces result set.
Selectivity = Matching rows / Total rows
High selectivity (< 5%):
→ Use index
Low selectivity (> 50%):
→ Collection scan faster
Single database node accepting writes. Log changes to replicas.
Read-only copy of primary. Applies changes asynchronously from WAL.
Delay between write on primary and replica visibility.
Write at Primary: T0
Arrive at Follower: T0 + lag_ms
Visible to read: T0 + lag_ms + apply_time
Acceptable lag:
- Strongly consistent: 0ms (no lag)
- Eventual consistent: < 5 seconds
- Archive: minutes to hours
Durability mechanism. All writes logged before applied.
Client write:
1. Write to WAL on disk
2. Apply to in-memory database
3. Ack to client
On crash:
1. Replay WAL on startup
2. Resume from last committed state
Write acknowledged only after reaching majority of replicas.
3-node cluster:
Quorum = (3 + 1) / 2 = 2 nodes
Write must be on Primary + 1 Follower before ack
Dense vector representation of unstructured data.
"Alice is an engineer"
→ [0.1, -0.2, 0.3, 0.15, -0.08, ...] # 384 dimensionsMeasure of how close two vectors are.
| Metric | Range | Best For |
|---|---|---|
| Cosine | [-1, 1] | Normalized embeddings (angles) |
| Euclidean | [0, ∞) | Raw embeddings (distances) |
| Dot Product | (-∞, ∞) | Unnormalized embeddings |
Find K closest vectors to a query vector.
FOR v IN vectors
FILTER DISTANCE(v.embedding, @query) < @threshold
SORT DISTANCE(v.embedding, @query)
LIMIT @top_k
RETURN v
Approximate nearest neighbor index algorithm.
Parameters:
- M: Connections per node (16-64, default 16)
- ef_construct: Quality during index building
- ef_search: Quality during searches
Entity in a graph. Represents an object or concept.
Connection between two nodes. Directed or undirected.
Following edges to discover related nodes.
Alice ← (follows) ← Bob ← (follows) ← Charlie
| |
└────────────────┬──────────┘
▼
(mutual connections)
Sequence of nodes and edges.
Shortest path from Alice to Charlie:
Alice --(follows)-- Bob --(follows)-- Charlie
Length: 2 hops
Measure of node importance.
Types:
- Degree: Number of connections
- Betweenness: How often on shortest paths
- Closeness: Average distance to others
- PageRank: Voting-based importance
Ordered sequence of measurements over time.
Metric: CPU Usage
Timestamp 1: 45%
Timestamp 2: 52%
Timestamp 3: 48%
...
Grouping data into time periods.
FOR metric IN metrics
COLLECT hour = DATE_FORMAT(metric.timestamp, '%yyyy-%mm-%dd %hh:00')
AGGREGATE avg = AVG(metric.value)
RETURN { hour, avg }
Reducing time-series resolution (e.g., minute → hour).
High resolution: Every second (86,400 points/day)
Low resolution: Every hour (24 points/day)
Compression: 99.97% reduction
Contractual commitment on availability and performance.
SLA: "99.9% uptime with p99 latency < 200ms"
→ Downtime allowed: 43.2 minutes/month
→ 1 in 1000 queries can exceed 200ms
Measurable metric of actual performance.
SLI: Actual uptime = 99.92%
SLI: Actual p99 latency = 187ms
Acceptable downtime/errors for SLA.
SLA: 99.95% uptime
Error budget = 0.05% = 21.6 minutes/month
Used: 15 minutes
Remaining: 6.6 minutes (use for deployments/maintenance)
Average time to fix a problem.
Incident starts: 10:00
Recovery complete: 10:15
MTTR = 15 minutes
Average time before next failure.
Downtime: 15 minutes
Recovery: 30 minutes (MTTR)
Uptime: 30 days until next failure
MTTF = 30 days
Access control based on user roles.
Role: 'data_analyst'
Permissions:
- READ from analytics_*
- EXECUTE report queries
- NO WRITE/DELETE
Stateless authentication token.
Header: { alg: "HS256", typ: "JWT" }
Payload: { sub: "alice", exp: 1726000000 }
Signature: HMAC-SHA256(header + payload, secret)
Data encrypted on disk.
File on disk: [encrypted bytes]
In memory: Clear text (needed for processing)
Key location: HSM (Hardware Security Module)
Data encrypted while traveling network.
TLS 1.3 + Perfect Forward Secrecy
→ Even if long-term key compromised, old traffic safe
Multiple authentication mechanisms.
1. Password (something you know)
2. TOTP (something you have) - app code
3. WebAuthn (something you are) - fingerprint
All components in single process. Easy to develop, hard to scale.
Each service independent. Scales well, operational complexity.
- Monolithic Core: Single database engine
- Distributed: Horizontal sharding for scale-out
- Multi-Model: All data models in one engine (no microservices)
Add more nodes (scale out). Divide data via sharding.
1 node (10TB limit)
↓
2 nodes (5TB each)
↓
4 nodes (2.5TB each)
Upgrade hardware (scale up). More RAM/CPU per node.
32GB RAM
↓
64GB RAM
Quantitative measurements (numbers).
CPU usage: 45%
Memory: 8.2 GB
Queries per second: 1,200
Latency p99: 185ms
Textual records of events.
2025-01-01T10:00:00Z [INFO] Query executed: SELECT * FROM users
2025-01-01T10:00:01Z [WARN] Query latency: 245ms > threshold 200ms
2025-01-01T10:00:02Z [ERROR] Connection timeout from client 1.2.3.4
Request flow through system (distributed tracing).
User Request
├─ API Gateway (5ms)
├─ Authentication (12ms)
├─ Database Query (185ms)
│ ├─ Query Plan (2ms)
│ ├─ Index Seek (80ms)
│ ├─ Filter (50ms)
│ └─ Return (53ms)
└─ Response (3ms)
Total: 205ms
Number of unique values.
High cardinality: User IDs (millions)
Low cardinality: Status (10 values)
API: Application Programming Interface - interface for programmatic access
ACID: Atomicity, Consistency, Isolation, Durability guarantees
AQL: Adaptive Query Language - ThemisDB's query language
Backup: Copy of data for recovery
Batch: Group of operations processed together
Cardinality: Number of distinct values
Cache: Fast memory storage layer
Changefeed: Stream of database changes
Collection: Group of related entities
Consistency: Data integrity maintained across system
Continuous Batching: Dynamic request batching technique for LLM inference that allows new requests to join active batches, dramatically improving throughput (176%) and reducing latency (57%) compared to static batching. See Chapter 20.9A.3.
Cursor: Pointer to result set for iteration
Denormalization: Storing redundant data for performance
Document: Semi-structured data (JSON-like)
DSGVO: German data protection regulation (GDPR equivalent)
Edge: Connection in graph
Embedding: Vector representation of data
Failover: Automatic switch to backup system
Flash Attention: IO-aware attention mechanism for LLMs that uses SRAM tiling instead of HBM storage, reducing GPU memory usage by 37% and increasing throughput by 69%. Requires NVIDIA Ampere+ GPUs. See Chapter 20.9A.1.
Flush: Write data from memory to disk
Garbage Collection: Freeing unused memory
GBNF (GGML BNF): Grammar notation used by llama.cpp for constrained generation. Extends EBNF with specific syntax for controlling LLM output format. See Grammar-Constrained Generation.
Grammar-Constrained Generation: LLM technique that uses EBNF/GBNF grammar rules to guarantee syntactically valid outputs (JSON, XML, CSV). Achieves 95-99% success rate vs 60-70% without constraints, eliminating need for output validation and retries. See Chapter 17.12.6.
Graph: Network of connected nodes
Heap: Memory area for dynamic allocation
HNSW: Hierarchical Navigable Small Worlds - vector index
Hot Data: Frequently accessed data
Hot Spare: Fully configured standby node in a database cluster that can automatically take over (failover) when an active shard fails, typically within <5 seconds. Provides high availability with zero data loss when combined with WAL replication. See Chapter 16.10.1.
Index: Data structure for fast lookups
Ingestion: Loading data into database
Inverted Index: Index mapping values to documents
Isolation: Transactions don't interfere
JIT: Just-In-Time compilation
JSON: JavaScript Object Notation
JWT: JSON Web Token - stateless auth
Keyspace: Logical division of data
Latency: Time delay for operation
LoRA (Low-Rank Adaptation): Efficient fine-tuning technique for large language models that adds trainable low-rank matrices to pretrained models, reducing memory requirements by 99% and training time by 3-10x compared to full fine-tuning. Multiple LoRA adapters can run on a single base model. See Chapter 17.12.5.
LSM: Log-Structured Merge tree
MVCC: Multi-Version Concurrency Control
Normalization: Organizing data to reduce redundancy
Node: Entity in graph or cluster
OLAP: Online Analytical Processing
OLTP: Online Transaction Processing
Optimization: Making something run faster
Paged Attention: Memory management technique for LLM attention mechanisms that organizes KV-cache into fixed-size pages instead of continuous memory allocation, reducing GPU memory waste by 80% and increasing concurrent request capacity by 5x. See Chapter 17.12.4.
Pagination: Dividing results into pages
Partition: Division of data across nodes
Percentile: Value below which percentage falls
Persistence: Data survives shutdown
Piper: Fast, local neural Text-to-Speech (TTS) engine used in ThemisDB Voice Assistant. Provides natural-sounding voice synthesis in multiple languages with <50ms latency and minimal CPU usage. See Chapter 10.7.
Prefix Caching: LLM optimization that caches the attention states of frequently used prompt prefixes (such as system prompts), enabling 75% cost savings and 95% latency reduction for repeated queries with common prompt beginnings. See Chapter 17.12.1.
Projection: Selecting subset of columns
Query Plan: Execution strategy for query
Quorum: Majority consensus
Range: Ordered selection of values
Replica: Copy of data
Replication: Process of copying data to replicas
Response Caching: Intelligent caching system for LLM responses that uses embedding-based semantic similarity to identify and reuse answers to similar questions, providing 60-80% cost savings for repetitive queries. Supports configurable TTL and similarity thresholds. See Chapter 17.12.2.
RocksDB: Embedded key-value store (ThemisDB storage engine)
Rollback: Undo transaction
RoPE (Rotary Position Embedding): Position encoding technique for transformer models that enables context window extension beyond training length through scaling. ThemisDB supports Linear, NTK-aware, and YaRN scaling methods to extend context from 4K to 32K+ tokens. See Chapter 17.12.7.
RoPE Scaling: Technique to extend LLM context windows beyond their original training length by adjusting the rotary position embedding frequency. YaRN method achieves 8x context extension (4K→32K tokens) with <10% quality loss. See Chapter 17.12.7.
RPO: Recovery Point Objective (max data loss)
RTO: Recovery Time Objective (max downtime)
Sampling: Statistical subset of data
Schema: Data structure definition
Selectivity: Fraction of rows matching filter
Serialization: Converting to byte format
Sharding: Horizontal data partitioning
Snapshot: Point-in-time data view
Speculative Decoding: LLM acceleration technique that uses a small, fast "draft model" to speculatively generate multiple tokens in parallel, which are then validated by the larger "target model". Achieves 2-3x speedup with 82-88% token acceptance rates. See Chapter 20.9A.2.
SQL: Structured Query Language
SSTable: Sorted String Table
StreamingIngestManager: High-throughput key-value ingest component using an in-memory ring buffer drained by a background flush thread into a single RocksDB WriteBatch. Supports ≥ 1 M events/s at ≤ 50 ms end-to-end latency. OverflowPolicy BLOCK or DROP. See Chapter 11.8.
TsStreamCursor: Lazy, paginated streaming cursor over TSStore query results. Fetches results in configurable pages (default 4 096 DataPoints) to avoid materialising large result sets in memory. See Chapter 9.11.2.
TSStore::putBatch: Zero-copy batch write API for TSStore. Accepts std::span<const TSRow> and commits all rows in a single RocksDB WriteBatch for maximum write throughput. See Chapter 9.11.1.
TemporalCompressor: Component responsible for compressing and decompressing temporal (time-versioned) JSON data in ThemisDB. Supports ZSTD and LZ4 algorithms. LZ4 is optimised for high-throughput, low-latency hot paths. See Chapter 9.11.3.
TTL: Time-To-Live (auto-delete after interval)
Transaction: Atomic unit of work
Throughput: Operations per unit time
LockFreeHistogram: Header-only, lock-free latency histogram using per-bucket atomic counters. record() costs ≤ 20 ns. Supports Exponential and Linear bucket modes. See Chapter 21.1.1.
RequestCoalescer: Cache singleflight implementation that coalesces concurrent requests with the same key so that the backend function fn() is executed exactly once per in-flight key, eliminating thundering-herd cache-miss storms. See Chapter 21.1.2.
IoUringBatchedSender: Linux io_uring-backed batched network sender that submits multiple WireProtocolBatcher flush operations as a single io_uring_enter() syscall, reducing syscall count from O(N) to O(1) per round. Falls back to writev(2) when io_uring is unavailable. See Chapter 21.1.3.
ColumnarCache: LRU in-memory cache for columnar ColumnSegment objects. Cached data is in the same layout as ColumnBatch, enabling zero-copy analytics access. Supports PinGuard RAII for eviction protection. See Chapter 15.13.1.
IStreamingJoin: Interface for streaming join operators over ColumnBatch streams. Concrete implementations: HashJoin (equi-join, Inner/LeftOuter) and IntervalJoin (time-based event correlation). See Chapter 15.13.2.
AiHardwareDispatcher: Universal AI-hardware dispatch layer that selects the best available backend at runtime following the priority chain: NPU → ONNX Runtime → GPU → CPU. Supports INT4/W4A8/W8A8 precision modes. See Chapter 16.11.
ArgumentStore: Ethics AI Plugin component that persists EthicalArgument objects as ThemisDB BaseEntity entries in RocksDB (or an in-memory fallback for tests). See Chapter 24.9.
EthicalDiscourseEngine: Ethics AI Plugin orchestrator that coordinates multi-philosophy debates via initializeDebate() and synthesises an EthicalDecision via makeDecision(). See Chapter 24.9.
EthicsEvaluator: Ethics AI Plugin component that scores an EthicalDecision across five dimensions: Decision Quality, Consistency, Fairness, Alignment, and Transparency. See Chapter 24.9.
IAudioBackend: Plugin interface implemented by WhisperPlugin. Provides transcribe(), transcribeFile(), and detectLanguage() methods. See Chapter 10.7.x.
IImageGenerationBackend: Plugin interface implemented by SDPlugin. Provides generate(), generateBatch(), and generateImg2Img() methods. See Chapter 12.10.
PhilosophyLoader: Ethics AI Plugin component that loads and caches philosophy school profiles from YAML files. See Chapter 24.9.
RAGContextEngine: Ethics AI Plugin component providing 7 optimised AQL query patterns for context retrieval from the ethics knowledge base. See Chapter 24.9.
SDPlugin: Stable Diffusion image-generation plugin implementing IImageGenerationBackend. Provides text-to-image, batch, and img2img generation with prompt sanitisation and provenance stamps. See Chapter 12.10.
SDPromptSanitizer: Stable Diffusion content-policy component that blocks prompts containing forbidden keywords (case-insensitive, file-loadable blocklist). Covers negative prompts (security gap SD-NP-01). See Chapter 12.10.
WhisperPlugin: Speech-to-text plugin implementing IAudioBackend. Wraps IWhisperTranscriber strategy and IAudioChunkReader for file I/O. Thread-safe; adds provenance stamps to every result. See Chapter 10.7.x.
WavAudioChunkReader: WAV file reader without external library dependency. Supports 16-bit PCM and IEEE float32 RIFF/WAV. Used by WhisperPlugin as default audio input. See Chapter 10.7.x.
LIRS (Low Inter-Reference Recency Set): Advanced cache eviction algorithm distinguishing LIR (low inter-reference recency, "hot") from HIR (high inter-reference recency, "warm/cold") entries. ThemisDB's LIRS implementation uses std::shared_mutex for thread-safe access.
RCU (Read-Copy-Update): Lock-free synchronisation technique for shared data: readers proceed without locks while writers create a new version. g_rcu_reader_count tracks active readers; writers wait until the count reaches zero.
UUID v7: UUID version 7 as defined by RFC 9562, embedding a 48-bit millisecond Unix timestamp for time-sortable identifiers. ThemisDB generates UUID v7 via generate_uuid_v7() using a thread-local monotonic sequence counter and MT19937-64 randomness.
Vector: Ordered list of numbers
View: Virtual table derived from query
Voice Assistant: Enterprise feature providing natural language voice interaction using Whisper (STT), Piper (TTS), and llama.cpp (LLM). Enables call center automation, meeting protocol generation, and voice-controlled database queries with DSGVO-compliant storage. See Chapter 10.7.
WAL (Write-Ahead Log): Transaction log that records all database changes before they are applied, ensuring durability and enabling replication. In ThemisDB v1.5.0-dev, WAL replication provides zero-data-loss failover with support for synchronous, asynchronous, and hybrid replication modes. See Chapter 16.10.2.
WAL Replication: Replication mechanism based on Write-Ahead Log streaming that continuously transfers transaction log entries from primary to replica nodes. Supports sync (zero data loss, higher latency), async (minimal latency, potential data loss), and hybrid modes. See Chapter 16.10.2.
Warm Data: Occasionally accessed data
Whisper: OpenAI's high-accuracy Speech-to-Text (STT) model integrated into ThemisDB Voice Assistant via whisper.cpp. Supports 100+ languages with auto-detection, speaker diarization, and 5 model sizes (tiny to large) trading accuracy for speed. See Chapter 10.7.
Workload: Pattern of database usage
BpmnSerializer: Process module component that imports and exports BPMN 2.0 XML using a state-machine tokenizer (no external XML library). Handles Camunda/Flowable/Signavio/VCC-VPB files. 10 MiB input guard. See Chapter 29.14.
EPK (Ereignisgesteuerte Prozesskette): Event-driven Process Chain — a German process notation standard. ThemisDB supports EPK via EpkSerializer for both text and JSON formats. See Chapter 29.14.
EpkSerializer: Process module component for EPK text/JSON import and export. importText() accepts line-based EPK notation; exportJson() produces a machine-readable JSON graph. See Chapter 29.14.
LlmProcessDescriptor: Process module component that generates structured JSON descriptors and system prompts from process models, optimised for GPT-4, Claude, and local LLMs. See Chapter 29.14.
ProcessAttachment: Descriptor of a data object attached to a process instance, stored under proc:attach:<instance_id>:<object_id> in RocksDB. See Chapter 29.14.
ProcessDomain: Classification for process models: ADMINISTRATION, BUSINESS, IT_SERVICE, HEALTHCARE, FINANCE, CUSTOMER_SERVICE, CUSTOM. See Chapter 29.14.
ProcessGraphRag: Graph-RAG engine that bridges the process execution graph with LLMs. Produces ProcessRagContext with subgraph, attachments, missing documents, similar cases, and a ready-to-send LLM prompt. See Chapter 29.14.
ProcessLinkType: Typed relationship between a process instance and a data object or another instance: HAS_DOCUMENT, HAS_METADATA, REQUIRES_DOCUMENT, IS_INSTANCE_OF, SUB_PROCESS, CROSS_REFERENCE, TRIGGERS, EVIDENCE_FOR. See Chapter 29.14.
ProcessLinker: Process module component managing attachments (proc:attach:), typed process-to-process links (proc:link:), and required-document registrations (proc:req_doc:) in RocksDB. See Chapter 29.14.
ProcessModelManager: Process module CRUD manager storing versioned process models (proc:def:<id>) in RocksDB. Supports BPMN 2.0, EPK, and VCC-VPB import/export as well as deployment to ProcessGraphManager. See Chapter 29.14.
ProcessModelRecord: Metadata record stored alongside each process model: id, name, notation, domain, state, normalised graph, compliance tags, version, and embedding. See Chapter 29.14.
ProcessNotation: Format enum for process models: BPMN_2_0, EPK, VCC_VPB, CMMN_1_1, DMN_1_5. See Chapter 29.14.
ProcessRagContext: Full Graph-RAG result produced by ProcessGraphRag::retrieve(). Contains the LLM prompt, subgraph, attachments, similar cases, compliance check, and missing documents list. See Chapter 29.14.
VccVpbImporter: Process module component that imports VCC-VPB YAML process definitions into the ThemisDB internal graph format. See Chapter 29.14.
Verwaltungsvorgang: German administrative case/procedure. ThemisDB's Process Module and ProcessGraphRag are specifically optimised for German Verwaltungsprozesse (e.g., Bauantrag, Führerscheinantrag). See Chapter 29.14.
LIRS (Low Inter-Reference Recency Set): Advanced cache eviction algorithm distinguishing LIR (low inter-reference recency, "hot") from HIR (high inter-reference recency, "warm/cold") entries. ThemisDB's LIRS implementation uses std::shared_mutex for thread-safe access.
RCU (Read-Copy-Update): Lock-free synchronisation technique for shared data: readers proceed without locks while writers create a new version. g_rcu_reader_count tracks active readers; writers wait until the count reaches zero.
UUID v7: UUID version 7 as defined by RFC 9562, embedding a 48-bit millisecond Unix timestamp for time-sortable identifiers. ThemisDB generates UUID v7 via generate_uuid_v7() using a thread-local monotonic sequence counter and MT19937-64 randomness.
Vector: Ordered list of numbers
View: Virtual table derived from query
Voice Assistant: Enterprise feature providing natural language voice interaction using Whisper (STT), Piper (TTS), and llama.cpp (LLM). Enables call center automation, meeting protocol generation, and voice-controlled database queries with DSGVO-compliant storage. See Chapter 10.7.
WAL (Write-Ahead Log): Transaction log that records all database changes before they are applied, ensuring durability and enabling replication. In ThemisDB v1.5.0-dev, WAL replication provides zero-data-loss failover with support for synchronous, asynchronous, and hybrid replication modes. See Chapter 16.10.2.
WAL Replication: Replication mechanism based on Write-Ahead Log streaming that continuously transfers transaction log entries from primary to replica nodes. Supports sync (zero data loss, higher latency), async (minimal latency, potential data loss), and hybrid modes. See Chapter 16.10.2.
Warm Data: Occasionally accessed data
Whisper: OpenAI's high-accuracy Speech-to-Text (STT) model integrated into ThemisDB Voice Assistant via whisper.cpp. Supports 100+ languages with auto-detection, speaker diarization, and 5 model sizes (tiny to large) trading accuracy for speed. See Chapter 10.7.
Workload: Pattern of database usage
IntegrationTestSuite: LLM module testing class with 14 scenarios covering component integration (LazyLoader + GPU Memory, Scheduler + Paged Attention, Kernel Fusion + Inference, full E2E pipeline), multi-model serving/switching/LoRA management, failure scenarios (OOM, load failure, cancellation, preemption), and performance (high concurrency, burst traffic, long requests). See Chapter 17.24.
LlamaWrapper: Central llama.cpp adapter in ThemisDB's LLM module. Implements ILLMPlugin, wrapping llama.cpp inference with full production features: Multi-LoRA, KV-Cache / Prefix Cache, RoPE Scaling, grammar-constrained generation, streaming, and multi-modal vision support. See Chapter 17.24.
MultiLoRAManager: vLLM-inspired LoRA adapter manager supporting up to N simultaneous adapters, dynamic load/unload without model reload, INT8/INT4 quantization (quantizeLoRA()), and multi-GPU placement (ROUND_ROBIN, DATA_PARALLEL, MODEL_PARALLEL). See Chapter 17.24.
ProductionValidator: End-to-end validation framework for the LLM module. Covers 72-hour stress tests, load tests (100 concurrent, 50 RPS), quality validation (≥80% pass rate), and performance regression detection (≤1% tolerance). See Chapter 17.24.
RoPE Scaling: Rotary Position Embedding scaling for extending the context window beyond a model's training length. ThemisDB supports LINEAR, NTK, YARN, and DYNAMIC methods. YARN provides the best quality for 8×+ extension (4K→32K tokens). See Chapter 17.24.
VisionEncoder: CLIP-based image encoder (include/llm/vision_encoder.h) used by LlamaWrapper::generateVision(). Loads CLIP GGUF models, encodes image files to float embedding vectors, and supports GPU acceleration. Configured via enable_vision + clip_model_path in LlamaWrapper::Config. See Chapter 17.24.
VisionRequest / VisionResponse: Structs for multi-modal LLM inference. VisionRequest contains text_prompt, image_path/image_paths, and generation parameters. VisionResponse contains text, tokens_generated, inference_time_ms, and image_encoding_time_ms. See Chapter 17.24.
BatchEvaluator: Parallel batch RAG evaluation using configurable worker threads and async futures/promises. Aggregates individual EvaluationResults into statistics (pass_rate, avg_faithfulness, avg_overall). See Chapter 17.3.5.
CalibrationManager: Aligns RAGJudge scores with human annotations via temperature scaling, Platt scaling, and isotonic regression. Reports ECE (Expected Calibration Error), Brier score, and Pearson/Spearman correlation. See Chapter 17.3.5.
DocumentSplitter: Configurable text chunking for RAG ingestion pipelines. Strategies: FIXED (token count), SENTENCE (boundary-aware), SEMANTIC (embedding-similarity), RECURSIVE (hierarchical). Configurable chunk_size and chunk_overlap. See Chapter 17.3.5.
EvaluationCache: Thread-safe LRU cache for EvaluationResult objects with TTL expiry and invalidation triggers. Tracks hit/miss/eviction statistics. Prevents redundant LLM judge calls for identical inputs. See Chapter 17.3.5.
EvaluationMode (RAG): Evaluation speed/depth trade-off for RAGJudge. FAST (~100 ms, single-dimension), BALANCED (~500 ms, multi-dimension, default), THOROUGH (~2 s, CoT + NLI verification). See Chapter 17.3.5.
HallucinationDashboard: Rolling-window hallucination rate tracker for RAGJudge evaluations. Reports current rate (0.0–1.0) and trend (IMPROVING/STABLE/DEGRADING) over a configurable window. See Chapter 17.3.5.
HybridRetriever: Fuses BM25 (sparse/keyword) and vector (dense/semantic) candidate lists using Reciprocal Rank Fusion (RRF, k=60) or linear combination. Configurable per-source weights (default 0.5/0.5). See Chapter 17.3.5.
RAGJudge: Central RAG evaluation orchestrator in themis::rag::judge. Evaluates generated answers across 5 dimensions: Faithfulness, Relevance, Completeness, Coherence, Ethical Compliance. Supports pairwise comparison, batch evaluation, and pluggable NLI/G-Eval scorers. See Chapter 17.3.5.
RRF (Reciprocal Rank Fusion): Rank-based fusion formula combining multiple ranked lists: score(d) = Σ 1/(k + rank(d)). The constant k=60 (default) controls rank-sensitivity. Used by HybridRetriever to combine BM25 and vector results. See Chapter 17.3.5.
Understanding these terms is essential for:
- Development: Writing efficient queries
- Operations: Configuring and monitoring
- Architecture: Designing systems
- Communication: Discussing with team
Keep this glossary handy when learning or explaining ThemisDB concepts.