Skip to content

Add all new controls#1805

Draft
ahuang11 wants to merge 14 commits intomainfrom
abstract_ingest_controls
Draft

Add all new controls#1805
ahuang11 wants to merge 14 commits intomainfrom
abstract_ingest_controls

Conversation

@ahuang11
Copy link
Copy Markdown
Contributor

@ahuang11 ahuang11 commented Apr 10, 2026

Description

TLDR: Abstracts source control to support different data ingestion patterns.

Motivation

BaseSourceControls today conflates two concerns: lifecycle management (progress bars, error messages, load triggering, source registration) and file-processing mechanics (file cards, format detection, CSV/Parquet/JSON parsing). Also, the previous design only covers one data ingestion pattern: the user provides raw files (upload or URL download), but there can be so many more:

  • Catalogs — browse a registry of datasets (CELLxGENE, Hugging Face Datasets, data.gov), select one, fetch it on demand
  • Parametric APIs — configure parameters (station, date range, geographic extent), build a query, fetch data. Examples: Iowa Mesonet weather, censusdis, weather.gov
  • Connections (future) — provide a database URL, discover tables. Examples: SQLAlchemy, BigQuery, MotherDuck

Each pattern has a different UI, a different user interaction flow, and a different metadata shape that matters for agent-driven discovery.

What this PR does

Refactors the source controls into a proper class hierarchy:

BaseSourceControls              ← lifecycle only: progress, messages, _run_load, SourceResult
├── FileSourceControls          ← file-card machinery extracted here
│   ├── UploadSourceControls    ← (was UploadControls)
│   └── DownloadSourceControls  ← (was DownloadControls)
├── CatalogSourceControls       ← NEW: browse/search/select pattern
│   └── (CellXGeneSourceControls in lumen-anndata)
└── ParametricSourceControls    ← NEW: configure params → fetch
    ├── URLSourceControls       ← NEW: params interpolated into URL template
    └── CodeSourceControls      ← NEW: params passed to a Python callable

The key changes:

  1. BaseSourceControls becomes a clean lifecycle base — just progress, messages, _run_load, and source registration. No file knowledge.

  2. CatalogSourceControls is agent-ready by design. It accepts a vector_store param. When the catalog loads, entries are embedded in the background (same asyncio.create_task pattern as SourceCatalog._sync_metadata_to_vector_store). The search_columns param controls what text gets embedded. This means a future CatalogLookupTool can semantically search catalogs without any additional plumbing.

  3. ParametricSourceControls auto-generates widgets from param.Parameter definitions. Subclasses just declare params and implement _fetch_data(). The _get_parameter_schema() method exposes parameter structure for future agent-driven parameter filling.

  4. File-specific code moved to FileSourceControls_process_files, _generate_file_cards, _add_table, _read_json_file, _read_geo_file, and all the UploadedFileRow management. Upload and Download controls inherit from this instead of the base.

Toward agent-driven data discovery

This refactoring is explicitly designed as the foundation for a conversational discovery agent. The control hierarchy exposes clean hooks:

  • Catalogs expose searchable metadata via _entry_to_text() → vector store embedding → agent can semantically search ("find me mouse brain spatial transcriptomics data")
  • Parametric controls expose _get_parameter_schema() → agent can fill parameters from natural language ("get me weather data for Seattle last week")
  • Both expose _fetch_entry() / _fetch_data() → agent can trigger loading programmatically

The agent layer itself is not in this PR (will be separate), it builds on top of these controls as a separate piece.

How Has This Been Tested?

Will try actual use case scenarios, starting with Anndata, then others holoviz-topics/lumen-anndata#45

AI Disclosure

  • This PR contains AI-generated content.
    • I have tested all AI-generated content in my PR.
    • I take responsibility for all AI-generated content in my PR.

Opus for back and forth planning and Sonnet for implementation

Tools and Models: {e.g., Cursor + Sonnet 4.6, Claude Code + Opus 4.6, Antigravity + Gemini Flash 3, ChatGPT, etc.}

Checklist

  • Tests added and are passing
  • Added documentation

@ahuang11 ahuang11 mentioned this pull request Apr 10, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

❌ Patch coverage is 0.30488% with 654 lines in your changes missing coverage. Please review.
✅ Project coverage is 32.47%. Comparing base (d87d25b) to head (79341b5).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
lumen/ai/controls/ingest/utils.py 0.00% 118 Missing ⚠️
lumen/ai/controls/ingest/download.py 0.00% 95 Missing ⚠️
lumen/ai/controls/ingest/catalog.py 0.00% 92 Missing ⚠️
lumen/ai/controls/ingest/base.py 0.00% 89 Missing ⚠️
lumen/ai/controls/ingest/url_source.py 0.00% 64 Missing ⚠️
lumen/ai/controls/ingest/file_row.py 0.00% 63 Missing ⚠️
lumen/ai/controls/ingest/parametric.py 0.00% 54 Missing ⚠️
lumen/ai/controls/ingest/code_source.py 0.00% 24 Missing ⚠️
lumen/ai/controls/ingest/result.py 0.00% 22 Missing ⚠️
lumen/ai/controls/ingest/constants.py 0.00% 13 Missing ⚠️
... and 4 more
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1805       +/-   ##
===========================================
- Coverage   70.26%   32.47%   -37.80%     
===========================================
  Files         175      185       +10     
  Lines       30192    30805      +613     
===========================================
- Hits        21215    10003    -11212     
- Misses       8977    20802    +11825     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ahuang11 ahuang11 mentioned this pull request Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants