Skip to content

perf(mining): batch per-chunk upserts + optional GPU acceleration#1185

Merged
igorls merged 6 commits intodevelopfrom
perf/batched-upsert-gpu
Apr 24, 2026
Merged

perf(mining): batch per-chunk upserts + optional GPU acceleration#1185
igorls merged 6 commits intodevelopfrom
perf/batched-upsert-gpu

Conversation

@igorls
Copy link
Copy Markdown
Member

@igorls igorls commented Apr 24, 2026

Summary

mempalace mine was slow because every drawer was upserted in its own collection.upsert(...) call (one tokenizer + ONNX forward pass per chunk), and because no EmbeddingFunction was ever wired through the backend (CPU-only, by omission).

This PR fixes both, independently:

  1. Batched upsert in mempalace/miner.py and mempalace/convo_miner.py — one collection.upsert(...) per file instead of per chunk.
  2. Hardware-accelerated embedding function — new mempalace/embedding.py wraps ChromaDB's ONNXMiniLM_L6_V2 with configurable preferred_providers. Selectable via MEMPALACE_EMBEDDING_DEVICE (auto / cpu / cuda / coreml / dml) or embedding_device in ~/.mempalace/config.json. Wired through mempalace/backends/chroma.py so miner writes and searcher reads always agree.

Same all-MiniLM-L6-v2 model, same 384-dim vectors — existing palaces need no rebuild when turning GPU on. The factory subclasses ONNXMiniLM_L6_V2 and renames it "default" so ChromaDB's persisted-EF check matches palaces created with the bare library default.

Benchmark

Hardware: Intel Core i9-12900KF (24 threads) · NVIDIA RTX 3090 · Linux 6.8 · Python 3.13.12
Software: chromadb 1.5.7 · onnxruntime 1.24.4 (CPU) / onnxruntime-gpu 1.25.0 (CUDA)
Method: seeded synthetic corpora, fresh palace per run, model warm-loaded before the first scenario.

Scenario Files Drawers Per-chunk + CPU
(old baseline)
Batched + CPU
(this PR, no GPU)
Batched + CUDA
(this PR, mempalace[gpu])
End-to-end
Small (~50 paragraphs) 10 70 3.36 s · 21 drw/s 2.08 s · 34 drw/s 0.49 s · 143 drw/s 6.9×
Medium (~200 paragraphs) 20 568 19.77 s · 29 drw/s 8.07 s · 70 drw/s 2.15 s · 264 drw/s 9.2×
Large (~500 paragraphs) 10 709 21.13 s · 34 drw/s 8.95 s · 79 drw/s 2.13 s · 334 drw/s 9.9×

Isolating the two wins (medium scenario, same numbers sliced):

CPU CUDA GPU-over-CPU
Per-chunk upsert (old path) 29 drw/s 47 drw/s 1.6×
Batched upsert (new path) 70 drw/s 264 drw/s 3.8×
Batching-over-per-chunk 2.4× 5.6×

Key observation: GPU alone is not enough. Per-chunk + CUDA (47 drw/s) is only modestly better than per-chunk + CPU (29 drw/s) — per-call launch overhead dominates. Batching is what unlocks the accelerator.

Reproduce via benchmarks/mine_bench.py:

# CPU:
uv run python benchmarks/mine_bench.py

# GPU:
uv pip install -e '.[gpu]'
export MEMPALACE_EMBEDDING_DEVICE=cuda
python benchmarks/mine_bench.py --device cuda

Install paths

pip install mempalace[gpu]      # NVIDIA CUDA (onnxruntime-gpu)
pip install mempalace[dml]      # DirectML (Windows AMD/Intel/NVIDIA)
pip install mempalace[coreml]   # macOS Neural Engine

mempalace mine header now prints Device: cpu|cuda|... so users can confirm the accelerator actually engaged.

Known ergonomics gap (follow-up)

pip install mempalace[gpu] alone is not sufficient on Linux today — current onnxruntime-gpu wheels do not bundle cuDNN/cuBLAS/cudart, and the nvidia-*-cu12 runtime wheels require LD_LIBRARY_PATH to be set before Python starts. The failure mode is a silent fallback to CPU. A follow-up PR should: (a) extend [gpu] to also pull nvidia-cudnn-cu12 nvidia-cuda-runtime-cu12 nvidia-cublas-cu12, (b) ctypes.CDLL-preload their .so files from site-packages/nvidia/*/lib at import time (same trick PyTorch uses), and (c) turn the silent fallback into a loud, actionable warning. Not scoped into this PR — this one is about the core speedup.

Principles respected

  • Local-first, zero API — all acceleration is local ONNX execution, no cloud dependency.
  • Verbatim always — batching only changes upsert granularity, not chunking or content.
  • Privacy by architecture — no telemetry, no new network surface.

Test plan

  • uv run pytest tests/ --ignore=tests/benchmarks — 1234 passed locally
  • uvx --from 'ruff>=0.4.0,<0.5' ruff check + format --check clean
  • pre-commit hooks (ruff + ruff-format) pass
  • CPU benchmark reproduces the published numbers from the final script location
  • CUDA benchmark reproduces the published numbers from the final script location
  • MEMPALACE_EMBEDDING_DEVICE=cuda on a host without CUDA logs a clear fallback warning instead of crashing
  • Existing CPU-created palaces continue to work with the new EF (name-spoofing check)

The miner upserted one drawer per ChromaDB call, paying tokenizer +
ONNX session setup per chunk. The embedding device was CPU-only because
no EmbeddingFunction was ever wired through the backend.

Two changes, each a speedup in its own right; stacked they give ~10x
end-to-end on a medium corpus (20 files, 568 drawers):

1. Batched upsert. `process_file` and `_file_chunks_locked` now collect
   all chunks of a file into a single `collection.upsert(...)` so the
   embedding model runs one forward pass per file instead of N.

2. Hardware-accelerated embedding function. New `mempalace/embedding.py`
   wraps `ONNXMiniLM_L6_V2` with configurable `preferred_providers`.
   `MEMPALACE_EMBEDDING_DEVICE` (or `embedding_device` in config.json)
   selects auto / cpu / cuda / coreml / dml. Unavailable accelerators
   log a warning and fall back to CPU.

   The factory subclasses `ONNXMiniLM_L6_V2` and spoofs its `name()` to
   `"default"` so the persisted EF identity matches existing palaces
   created with ChromaDB's bare `DefaultEmbeddingFunction` -- same
   model, same 384-dim vectors, no rebuild needed when turning GPU on.

   `ChromaBackend.get_collection` / `create_collection` now pass the
   resolved EF on every call so miner writes and searcher reads agree.

Benchmarks (i9-12900KF + RTX 3090, medium scenario, 568 drawers):

  per-chunk + CPU   19.77s ·  29 drw/s   (baseline)
  batched   + CPU    8.07s ·  70 drw/s   (2.4x)
  batched   + CUDA   2.15s · 264 drw/s   (9.2x)

Reproducible via `benchmarks/mine_bench.py`.

Install paths:
  pip install mempalace[gpu]       # NVIDIA CUDA
  pip install mempalace[dml]       # DirectML (Windows)
  pip install mempalace[coreml]    # macOS Neural Engine

Mine header now prints `Device: cpu|cuda|...` so users can confirm the
accelerator engaged.
Copilot AI review requested due to automatic review settings April 24, 2026 22:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves mining throughput and enables optional hardware-accelerated embeddings by batching ChromaDB upserts per file and wiring a configurable ONNX Runtime execution provider through the Chroma backend.

Changes:

  • Batch per-file drawer upserts in the project miner and conversation miner to reduce embedding overhead.
  • Add a new embedding-function factory with configurable device/provider selection (auto/cpu/cuda/coreml/dml) and expose it via config/env.
  • Pass the resolved embedding function into Chroma collection creation/open calls; add optional dependency extras for accelerator runtimes.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
uv.lock Adds locked optional-dependency entries for gpu/dml/coreml extras (onnxruntime variants).
pyproject.toml Defines new optional-dependency extras and documents how to select devices.
mempalace/miner.py Reworks mining to batch per-file upserts and prints the resolved embedding device.
mempalace/convo_miner.py Batches per-file upserts for conversation mining.
mempalace/embedding.py New embedding-function factory with provider resolution, caching, and device description.
mempalace/config.py Adds embedding_device config/env property.
mempalace/backends/chroma.py Wires embedding function into get_collection / create_collection.
benchmarks/mine_bench.py Adds a benchmark script comparing legacy per-chunk upserts vs batched upserts across devices.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mempalace/miner.py Outdated
Comment on lines +751 to +762
# Batch all chunks of this file into a single upsert so the embedding
# model runs one forward pass over the whole file instead of N passes
# of one chunk each. On CPU this is typically a 10-30x speedup; on
# GPU the speedup is larger because per-call overhead dominates.
try:
source_mtime = os.path.getmtime(source_file)
except OSError:
source_mtime = None

batch_docs: list = []
batch_ids: list = []
batch_metas: list = []
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process_file() now materializes batch_docs/batch_ids/batch_metas for the entire file at once. Given MAX_FILE_SIZE is 500MB and CHUNK_SIZE is 800 chars, a single large text file can produce an enormous number of chunks; duplicating those into three additional lists can blow memory and can also exceed Chroma/SQLite practical batch limits. Consider chunking the upsert into bounded sub-batches (e.g., 256–5000 items) to keep memory and DB variable counts under control while still getting most of the batching speedup.

Copilot uses AI. Check for mistakes.
Comment thread mempalace/convo_miner.py Outdated
Comment on lines +335 to +342
# Batch the whole file into one upsert so the embedding model runs
# a single forward pass for all chunks — dramatically faster than
# one call per chunk, especially on GPU where per-call overhead
# dominates over the actual matmul.
batch_docs: list = []
batch_ids: list = []
batch_metas: list = []
filed_at = datetime.now().isoformat()
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function batches all chunks for a file into one collection.upsert(...). For large conversation transcripts this can create very large batch_* lists and a huge single upsert, risking high peak memory and potential Chroma/SQLite batch limits. Consider splitting the lists into bounded sub-batches (keeping filed_at constant) to preserve most performance gains without risking OOMs.

Copilot uses AI. Check for mistakes.
Comment thread mempalace/embedding.py
Comment on lines +79 to +84
logger.warning(
"embedding_device=%r requested but %s is not installed — "
"falling back to CPU. Install mempalace[gpu] for CUDA.",
device,
preferred,
)
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback warning always says "Install mempalace[gpu] for CUDA" even when the requested device is coreml or dml. This will mislead users on macOS/Windows. Suggest tailoring the message based on device (e.g., coreml→mempalace[coreml], dml→mempalace[dml], cuda→mempalace[gpu]) or wording it generically ("install the matching extra for your device").

Copilot uses AI. Check for mistakes.
Comment thread mempalace/embedding.py
Comment on lines +111 to +147
def get_embedding_function(device: Optional[str] = None):
"""Return a cached embedding function bound to the requested device.

``device=None`` reads from :class:`MempalaceConfig.embedding_device`.
The returned function is shared across calls with the same resolved
provider list so we only pay model-load cost once per process.
"""
if device is None:
from .config import MempalaceConfig

device = MempalaceConfig().embedding_device

providers, effective = _resolve_providers(device)
cache_key = tuple(providers)
cached = _EF_CACHE.get(cache_key)
if cached is not None:
return cached

ef_cls = _build_ef_class()
ef = ef_cls(preferred_providers=providers)
_EF_CACHE[cache_key] = ef
logger.info("Embedding function initialized (device=%s providers=%s)", effective, providers)
return ef


def describe_device(device: Optional[str] = None) -> str:
"""Return a short human-readable label for the resolved device.

Used by the miner CLI header so users can see at a glance whether GPU
acceleration actually engaged.
"""
if device is None:
from .config import MempalaceConfig

device = MempalaceConfig().embedding_device
_, effective = _resolve_providers(device)
return effective
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior in get_embedding_function() / describe_device() isn’t covered by tests (provider resolution, unknown device fallback warning, caching semantics). Adding unit tests with monkeypatch (e.g., stubbing onnxruntime.get_available_providers) would help prevent regressions across platforms where provider availability differs.

Copilot uses AI. Check for mistakes.
Comment thread mempalace/config.py
Comment on lines +239 to +254
@property
def embedding_device(self):
"""Hardware device for the ONNX embedding model.

Values: ``"auto"`` (default), ``"cpu"``, ``"cuda"``, ``"coreml"``,
``"dml"``. Read from env ``MEMPALACE_EMBEDDING_DEVICE`` first, then
``embedding_device`` in ``config.json``, then ``"auto"``.

``auto`` resolves to the first available accelerator at runtime via
:mod:`mempalace.embedding`; requesting an unavailable accelerator
logs a warning and falls back to CPU.
"""
env_val = os.environ.get("MEMPALACE_EMBEDDING_DEVICE")
if env_val:
return env_val.strip().lower()
return str(self._file_config.get("embedding_device", "auto")).strip().lower()
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

embedding_device is a new config surface area but there are no tests validating precedence (env var vs config.json vs default) or normalization (trim/lower). Consider adding coverage in tests/test_config.py similar to the existing palace_path env override tests so future refactors don’t break device selection.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

@bensig bensig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed locally on `perf/batched-upsert-gpu`. Quality is high — clean design, honest fallbacks, well-documented EF workaround, real benchmarks. Approving with one ask + a few minor notes worth a follow-up commit if you want.

Full pytest matches v3.3.3: 1234 passed in 35s on this branch.

Strengths

  • Two wins kept independent. Batching gives ~2.4× CPU; GPU adds another 3.8× on top. Either alone is useful, the combination is multiplicative.
  • EF identity rename to `"default"` is the high-value detail. The persisted-EF check in ChromaDB 1.x would otherwise force a palace rebuild on the cutover. Lets the new GPU-capable EF read palaces created by the bare library default and vice versa. The inline comment in `_build_ef_class` documents the why clearly enough that the next reader won't accidentally undo it.
  • Fallback chain is solid. Unknown device → CPU+warn, missing accelerator → CPU+warn, EF construction error → ChromaDB's own default. The "still works on a laptop" goal is honored.
  • One-shot warning via `_WARNED` — won't spam logs across a 10k-file mine. Process-global EF cache keyed by provider tuple — one model load per device per process.
  • `Device:` line in the mine banner — small UX win, users see whether GPU actually engaged.
  • Optional deps split into 3 extras (`gpu` / `dml` / `coreml`) — no over-pulling for users who only have one acceleration path available.

Ask: unit tests for `mempalace/embedding.py`

The module has zero tests today. `_resolve_providers` has 5+ branches all driven by a string and all mockable without a GPU:

```python
@patch("onnxruntime.get_available_providers")
def test_auto_picks_cuda(mock):
mock.return_value = ["CUDAExecutionProvider", "CPUExecutionProvider"]
assert _resolve_providers("auto") == (["CUDAExecutionProvider", "CPUExecutionProvider"], "cuda")

def test_auto_falls_to_cpu(mock): ...
def test_cuda_missing_warns(...)
def test_unknown_device_warns(...)
def test_onnxruntime_import_error(...)
```

This is exactly the kind of code that breaks silently in three months when ChromaDB or onnxruntime renames a provider string and we never notice because nothing exercises the resolver. Cheap to add, high coverage win.

Minor notes (not blockers)

  1. Atomicity trade-off is unmentioned in the diff. Per-chunk → per-file upsert means one bad chunk takes the whole file's chunks down. Almost certainly the right call, but worth a one-line comment in `process_file` so the next reader doesn't accidentally "fix" it back to per-chunk thinking they're improving error isolation.
  2. `filed_at` is now per-file in `convo_miner` (was per-chunk). Practically fine, arguably more correct semantically — flagging only because behavior changed and isn't called out.
  3. `tomli` dep + `add_to_known_entities` helper look unrelated to the PR title — likely develop-merge overlap. Worth a quick check with #1183 to make sure we're not duplicating work.
  4. Subclassing `chromadb.utils.embedding_functions.ONNXMiniLM_L6_V2` is an internal-API dependency on ChromaDB. Calculated risk for the persistence-compat win, but worth a comment so a future reader knows it's deliberate, not accidental.
  5. CUDA-installed-but-no-driver edge case: `onnxruntime-gpu` may report `CUDAExecutionProvider` as available on a machine without a CUDA driver — session creation then fails at runtime. The `[CUDAExecutionProvider, CPUExecutionProvider]` cascade may save us, but a brief test or note would confirm.

None of the minor notes block — they're "the next person reading this code in six months will thank you" material. The unit tests are the one I'd push for before merge.

Approving on the strength of the design, test passes, and clean fallback story.

@igorls
Copy link
Copy Markdown
Member Author

igorls commented Apr 24, 2026

@copilot apply changes based on the comments in this thread

also read the comment from ben

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 24, 2026

Reviewed locally on `perf/batched-upsert-gpu`. Quality is high — clean design, honest fallbacks, well-documented EF workaround, real benchmarks. Approving with one ask + a few minor notes ...

Addressed the embedding unit-test ask and related follow-up items in 0315124. Added provider-resolution/cache tests, config precedence/normalization tests, bounded upsert batches for large files/transcripts, and device-specific fallback warning text.

@copilot apply changes based on the comments in this thread

also read the comment from ben

Applied the requested review-thread changes and Ben’s unit-test ask in 0315124. The branch now has bounded Chroma upsert batches, tailored accelerator fallback warnings, embedding/config coverage, and targeted validation passing.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • chroma-onnx-models.s3.amazonaws.com
    • Triggering command: /usr/bin/python python -m pytest tests/test_config.py tests/test_miner.py tests/test_convo_miner_unit.py -q (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@igorls igorls merged commit ed2ba72 into develop Apr 24, 2026
7 checks passed
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
…HNSW fixes

Bring in 29 commits from upstream/develop since the last merge (2026-04-23):

Major absorbed changes:
- MemPalace#976 (Felipe Truman): HNSW graph corruption fix, mine_global_lock for
  fan-out, MAX_PRECOMPACT_BLOCK_ATTEMPTS=2 for /compact deadlocks. Closes
  MemPalace#974/MemPalace#965/MemPalace#955; likely resolves MemPalace#1172 too.
- MemPalace#1179 (Igor): CLI mempalace search routes through _hybrid_rank,
  legacy-metric warning + _warn_if_legacy_metric, invariant tests on
  hnsw:space=cosine across all 5 collection-creation paths.
- MemPalace#1180/MemPalace#1183/MemPalace#1184: cross-wing topic tunnels, init mine UX, --auto-mine.
- MemPalace#1185 (perf/batched-upsert-gpu): batched ChromaDB upserts, GPU device
  detection via mempalace/embedding.py.
- MemPalace#1182: graceful Ctrl-C during mempalace mine.
- MemPalace#1168: tunnel permissions security fix.

Conflict resolutions (8 files):
- searcher.py: kept fork's CLI delegation through search_memories (cleaner
  single-source-of-truth path); upstream's inline-retrieval branch dropped.
  Merged upstream's print formatting (cosine= + bm25=) with fork's
  matched_via reporting from sqlite BM25 fallback.
- backends/chroma.py: kept fork's _BLOB_FIX_MARKER + palace_path arg to
  ChromaCollection (MemPalace#1171 write lock); merged upstream's **ef_kwargs
  (embedding_function support from MemPalace#1185). Removed duplicate
  _pin_hnsw_threads (fork already cherry-picked Felipe's earlier).
- mcp_server.py: kept fork's palace_path arg + upstream's clearer comment
  on hnsw:num_threads=1 rationale.
- miner.py: took upstream's serial mine() flow (mine_global_lock + Ctrl-C),
  RESTORED fork's strict detect_room — substring matching from upstream
  breaks fork-only test_detect_room_priority1_no_substring_match. Added
  `import re` for word-boundary regex matching. Fork-ahead concurrent
  mining (workers=, ThreadPoolExecutor from 5cd14bd) is dropped — daemon
  migration deprioritizes local concurrent mining; can re-port if needed.
- CHANGELOG.md: combined fork's segfault-trio narrative with upstream's
  v3.3.4 release notes.
- HOOKS_TUTORIAL.md: took MEMPALACE_PYTHON env var name (fork README was
  stale; hooks already use this name per fork-ahead MemPalace#19).
- test_convo_miner_unit.py: took both contextlib + pytest imports.
- test_searcher.py: imported upstream's 3 new TestSearchCLI tests but
  skipped them with TODOs — they assume upstream's inline-retrieval CLI
  with simpler mocks. Rewrite needed for fork's delegated search_memories
  path (sqlite BM25 fallback + scope counting).

Test result: 1334 passed, 3 skipped (the upstream tests above), 3 warnings.

Implications for open fork PRs:
- MemPalace#1171 (cross-process write lock at adapter) becomes structurally
  redundant given MemPalace#976's mine_global_lock + daemon-strict serialization.
  Slated for close with thank-you to Felipe.
- MemPalace#1173/MemPalace#1177 still relevant (quarantine threshold + blob_seq marker).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
After today's work:

- Merged upstream/develop (29 commits), absorbing MemPalace#976 (HNSW + mine_global_lock + PreCompact attempt-cap), MemPalace#1179 (CLI BM25), MemPalace#1180/MemPalace#1182/MemPalace#1183/MemPalace#1185 features.
- Filed PR MemPalace#1198 (`_tokenize` None guard) upstream.
- Closed PR MemPalace#1171 (adapter-level flock) — superseded by MemPalace#976 + daemon-strict.
- palace-daemon migration commits (c09582c + 0e97b19) make the daemon the
  single canonical writer; in-tree adapter flock no longer carries its weight.

README:
- Lead paragraph: 2026-04-21 → 2026-04-25 sync date, 165,632 → 151,420
  drawer count, daemon-on-disks topology, listed absorbed upstream PRs.
- Fork change queue: dropped MemPalace#1171 row, added MemPalace#1198 row.
- "Multi-client coordination" section: rewrote — palace-daemon is now the
  fork's primary answer (was "deferred"), MemPalace#1171 retired, narrative shifted
  to defense-in-depth around MemPalace#976 + daemon.
- Two-layer memory model: storage path is now palace-daemon HTTP, not
  ~/.mempalace/palace.
- Ecosystem palace-daemon row: marks integration as complete (commits
  cited).
- Open upstream PRs table: refreshed to MemPalace org URLs, added MemPalace#1198
  row, removed MemPalace#1171, added 2026-04-25 develop sync to merged list, added
  MemPalace#1171 to closed list with rationale.
- Test count 1096 → 1334.

CLAUDE.md:
- Row 20 (MemPalace#1198): "Upstream PR pending" → "Filed as MemPalace#1198 on 2026-04-24".
- Row MemPalace#1171 in PR table: open → closed (with MemPalace#976 supersession).
- Added MemPalace#1198 row to PR table.
- Top-of-section count: "14 merged, 7 open, 9 closed" → "14 merged, 7 open
  (including MemPalace#1198), 10 closed (added MemPalace#1171)" + sync date 2026-04-25.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants