title	Chapter 12 — Limitations, Open Problems, Future Work
chapter	12
lang	en
license	CC-BY-NC-4.0
last_modified_at	2026-04-22T03:40:36Z

Chapter 12 — Limitations, Open Problems, Future Work

The honest chapter. What we haven't solved, what we might reverse, what's coming.

12.1 Engineering Limitations

12.1.1 pgvector Scale Ceiling

Per-tenant embeddings at ~100k volume: HNSW P95 < 120 ms. Past 5M, performance degrades. If any tenant hits 5M, evaluate Qdrant/Milvus migration or sharding.

12.1.2 Chinese Tokenization Long Tail

zh_parser (SCWS-based) misses new words, brand names, product names. We patch with synonym dictionaries — ongoing maintenance burden. Experimental alternative: LLM-time tokenization — better accuracy, +100 ms, higher cost.

12.1.3 No Multimodal Yet

Text only. Real knowledge is mixed:

Product photo + text description
Construction workflow diagram
Cosmetic SDS PDF with tables and chemical structures

CLIP-style multimodal embedding experimental; targeted 2026 Q3.

12.1.4 Single Region

Deployed only in AWS Tokyo. EU compliance needs EU region. Docker Compose architecture doesn't support multi-region; needs K8s refactor.

12.2 Algorithmic Limitations

12.2.1 Wiki Compile LLM Bias

LLM-authored Wiki has systematic biases: Western-centric examples, inconsistent transliteration, stale post-training-cutoff knowledge. Our mitigations (strict "chunks-only" instruction, cross-chunk consistency lint) partially help but don't fix the root.

12.2.2 RRF k=60 Is Empirical

The paper suggests 60, but gives no theoretical justification. We haven't run sufficient A/B to validate it for Chinese. Worth revisiting.

12.2.3 Intent Classifier Drift

GPT-4o-mini misclassifies vague openings ("I was wondering..."). Knowledge → smalltalk means the customer gets a polite non-answer. Fix direction: expanded training set + confidence threshold + dual-path on low confidence.

12.2.4 Chinese NLI Availability

English NLI (DeBERTa-v3-NLI) is excellent. Chinese NLI quality varies; we use mDeBERTa-multi + human audit at ~85% accuracy. Production-grade Chinese NLI is an open problem.

12.3 Commercial Limitations

12.3.1 Pricing-Cost Misalignment

Current pricing by message count. Actual cost varies:

Simple CS ask: USD 0.001
PIF regulatory citation: USD 0.02
NLI + Rerank high-precision: USD 0.05

High-precision tenants underpay; low-precision overpay. 2026 Q3: precision-tier pricing.

12.3.2 Cross-Product Cost Attribution

Shared infra is wonderful; "GEO-triggered RAG repair token usage" is hard to attribute. Currently GEO API calls count against RAG tenant quota — financially imprecise.

12.3.3 Breaking Changes Cost

Upgrading embedding model (text-embedding-3-small → -large) requires full re-embed. For a large tenant: USD 2,000+. We've deferred such upgrades — tech debt accumulates.

12.4 Open Problems

12.4.1 Wiki Fresh vs Stale Balance

How often to compile?

Daily: waste (most pages unchanged)
Monthly: stale for regulated domains
Event-driven: "change" itself is hard to define

Current: fingerprint + weekly lint + manual trigger — no clean theory.

12.4.2 User Authority vs RAG Authority

Customer says "your website says CEO is Bob." RAG Wiki says "Alice." Who wins?

Trust RAG: maybe the customer was spoofed
Trust customer: maybe our system is stale

This is a trust chain problem with no engineering answer yet.

12.4.3 Does Long-Context LLM End RAG?

Claude 200k, Gemini 2M — tempting to "stuff everything in prompt." Our position: RAG doesn't die, it mutates.

Cost: 200k input per query → USD 0.5+, unsustainable
"Lost in the middle" — LLMs lose focus in ultra-long contexts
Permission control: multi-tenant can't put everything in LLM

L1 Wiki becomes the tool to align LLM attention precisely, rather than a substitute for vector retrieval.

12.4.4 Multimodal L1 Wiki

Text Wiki is natural. What is a Wiki for images / video / audio?

Formula photo → OCR + visual description?
Construction video → timeline of events?
Meeting recording → summary + speaker separation?

No unified answer.

12.5 Next 12-Month Roadmap

Tentative (subject to market feedback):

Quarter	Item	Priority
2026 Q2	Multimodal embedding (CLIP-style)	High
2026 Q2	Rerank default-on evaluation	Medium
2026 Q2	GEO ↔ RAG Wiki patch API launch	High
2026 Q3	Precision-tier pricing	High
2026 Q3	EU region deployment (K8s)	Medium
2026 Q3	Japanese NLI self-training	Medium
2026 Q4	Long-context + Wiki hybrid strategy	Medium
2026 Q4	Self-hosted edition	Low

12.5.1 Living Document

This book is a living document: minor versions each quarter, major annually. GitHub Issues capture reader feedback. Updates in CHANGELOG.md.

Key Takeaways

pgvector needs migration plan past 5M vectors
Chinese tokenization, multimodal, multi-region are main engineering gaps
LLM bias in Wiki compile, RRF k=60 empirical, Chinese NLI quality are algorithmic gaps
Pricing ↔ cost, cross-product attribution, breaking changes are commercial gaps
Long-context LLMs reshape, don't end, RAG
2026 roadmap centers on multimodal, Wiki patch API, precision-tier pricing

References

Lost in the Middle — Liu et al. · Qdrant vs pgvector

Navigation: ← Ch 11 · 📖 Contents · Appendix A →

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 12 — Limitations, Open Problems, Future Work

12.1 Engineering Limitations

12.1.1 pgvector Scale Ceiling

12.1.2 Chinese Tokenization Long Tail

12.1.3 No Multimodal Yet

12.1.4 Single Region

12.2 Algorithmic Limitations

12.2.1 Wiki Compile LLM Bias

12.2.2 RRF k=60 Is Empirical

12.2.3 Intent Classifier Drift

12.2.4 Chinese NLI Availability

12.3 Commercial Limitations

12.3.1 Pricing-Cost Misalignment

12.3.2 Cross-Product Cost Attribution

12.3.3 Breaking Changes Cost

12.4 Open Problems

12.4.1 Wiki Fresh vs Stale Balance

12.4.2 User Authority vs RAG Authority

12.4.3 Does Long-Context LLM End RAG?

12.4.4 Multimodal L1 Wiki

12.5 Next 12-Month Roadmap

12.5.1 Living Document

Key Takeaways

References

FilesExpand file tree

ch12-limitations.md

Latest commit

History

ch12-limitations.md

File metadata and controls

Chapter 12 — Limitations, Open Problems, Future Work

12.1 Engineering Limitations

12.1.1 pgvector Scale Ceiling

12.1.2 Chinese Tokenization Long Tail

12.1.3 No Multimodal Yet

12.1.4 Single Region

12.2 Algorithmic Limitations

12.2.1 Wiki Compile LLM Bias

12.2.2 RRF k=60 Is Empirical

12.2.3 Intent Classifier Drift

12.2.4 Chinese NLI Availability

12.3 Commercial Limitations

12.3.1 Pricing-Cost Misalignment

12.3.2 Cross-Product Cost Attribution

12.3.3 Breaking Changes Cost

12.4 Open Problems

12.4.1 Wiki Fresh vs Stale Balance

12.4.2 User Authority vs RAG Authority

12.4.3 Does Long-Context LLM End RAG?

12.4.4 Multimodal L1 Wiki

12.5 Next 12-Month Roadmap

12.5.1 Living Document

Key Takeaways

References