This document defines the draft machine-facing protocol for operating GENtle through a shared core engine.
Goal:
- GUI, CLI, JavaScript, Lua, and Python wrappers call the same core routines.
- AI tools can run deterministic cloning workflows with reproducible logs.
- Protocol-first: versioned JSON request/response shapes
- Capability negotiation: clients discover supported operations and formats
- Deterministic operation log: each operation emits a stable op id and result
- Structured errors: machine-parseable error code + message
gentle_cli capabilities returns:
protocol_versionsupported_operationssupported_export_formatsdeterministic_operation_log
Canonical, adapter-independent examples are defined in:
docs/examples/workflows/*.json- schema:
gentle.workflow_example.v1
Each example includes:
- metadata (
id,title,summary) - test policy (
test_mode:always|online|skip) - required local files (
required_files) - canonical
workflowpayload
Adapter snippets (CLI/shared shell/JavaScript/Lua) are generated on demand from those canonical files:
cargo run --bin gentle_examples_docs -- generateValidation only:
cargo run --bin gentle_examples_docs -- --checkTutorial manifest + generated outputs:
- discovery catalog:
docs/tutorial/catalog.json - discovery schema:
gentle.tutorial_catalog.v1 - shared tutorial source units:
docs/tutorial/sources/catalog_meta.jsondocs/tutorial/sources/*.json
- source-unit schemas:
gentle.tutorial_catalog_meta.v1gentle.tutorial_source.v2
- generated runtime manifest:
docs/tutorial/manifest.json - runtime manifest schema:
gentle.tutorial_manifest.v1 - committed generated outputs:
docs/tutorial/generated/
Catalog/manifest split:
docs/tutorial/catalog.jsonis the canonical discovery layer for all tutorials, including hand-written walkthroughs and agent/reference guides.docs/tutorial/sources/is the authoring layer for both the discovery catalog and the executable tutorial runtime manifest.docs/tutorial/manifest.jsonis a generated runtime contract used for chapter output and tutorial runtime checks.- GUI help/tutorial discovery may consume the catalog directly for curated ordering and metadata, while executable tutorial project materialization still resolves through the manifest/workflow example path.
Generate/check tutorial outputs:
cargo run --bin gentle_examples_docs -- tutorial-generate
cargo run --bin gentle_examples_docs -- tutorial-check
cargo run --bin gentle_examples_docs -- tutorial-catalog-generate
cargo run --bin gentle_examples_docs -- tutorial-catalog-check
cargo run --bin gentle_examples_docs -- tutorial-manifest-generate
cargo run --bin gentle_examples_docs -- tutorial-manifest-checkGENtle now exposes a portable grouped TFBS summary contract for comparing one focus window against a wider context window on the same sequence.
Current shared-shell route:
gentle_cli shell 'features tfbs-summary SEQ_ID --focus START..END [--context START..END] [--min-focus-count N] [--min-context-count N] [--limit N]'First-class operation route:
{"SummarizeTfbsRegion":{"seq_id":"SEQ_ID","focus_start_0based":2900,"focus_end_0based_exclusive":3100,"context_start_0based":0,"context_end_0based_exclusive":6001,"min_focus_occurrences":1,"min_context_occurrences":0,"limit":25}}Portable schema:
gentle.tfbs_region_summary.v1
Request fields:
seq_idfocus_start_0basedfocus_end_0based_exclusive- optional
context_start_0based - optional
context_end_0based_exclusive min_focus_occurrencesmin_context_occurrences- optional
limit
Result fields:
- sequence/focus/context bounds and widths
- total TFBS hit counts in the focus and context spans
- grouped rows keyed by TF name with:
motif_idsfocus_occurrencescontext_occurrencesoutside_focus_occurrences- focus/context/outside densities per kb
- focus-vs-context and focus-vs-outside density ratios
Grouping policy:
- prefer
bound_moiety - otherwise
standard_name - otherwise
gene - otherwise
name - otherwise
tf_id
Purpose:
- describe one Gibson cloning project in a destination-first way,
- separate user-specified plan inputs from derived design consequences,
- provide one canonical JSON artifact that future routines, primer design, and protocol-cartoon rendering can all read from.
Status:
- destination-first single-insert and ordered multi-insert plans are now
consumed by the shared Gibson preview/apply path (
gibson preview ...,gibson apply, and thePatterns -> Gibson...specialist window) - current limit:
- multi-insert execution currently assumes a defined destination opening
existing_terminiremains the single-fragment handoff path for now
Canonical examples:
docs/examples/plans/gibson_destination_first_single_insert.jsondocs/examples/plans/gibson_destination_first_multi_insert.json
Top-level structure:
schema,id,title,summarydestination- destination molecule (
seq_id, prior topology) - explicit opening definition (
mode, label, resulting left/right ends)
- destination molecule (
product- intended output topology and output-id hint
fragments[]- participating inserts or non-destination fragments
- orientation plus per-end adaptation strategy
- optional source-coordinate hints for deterministic ordering
assembly_order[]- explicit left-to-right order of destination ends and inserts
- supports future multi-fragment Gibson plans without changing the model
junctions[]- one record per adjacent join
- required overlap length
- explicit overlap partition across the left/right adjacent members
- whether overlap is derived from destination context or user-specified
- explicit
distinct_fromconstraints for terminal junctions
validation_policy- hard requirements:
- unique/unambiguous destination opening
- distinct terminal junctions
- adjacency-consistent overlaps
- advisory checks:
- overlap-length design range
- fragment-count-aware overlap targets
- overlap Tm
- destination/fragment/reference uniqueness heuristics
- optional design request:
desired_unique_restriction_site_enzyme_nameasks the shared preview/apply path to try introducing one new unique REBASE cutter site on one terminal overlap if the assembled product can still remain uniquely cut there
- hard requirements:
derived_design- derived overlap sequences
- primer design suggestions
- advisory notes and validation outcomes
Current draft value vocabulary:
destination.topology_before_openinglinearcircular
destination.opening.modeexisting_termini- use the current termini of an already-linear destination sequence
defined_site- a user-selected opening site/window on an existing destination molecule
- reserved future values:
restriction_digestpcr_linearizationinverse_pcr
destination.opening.uniqueness_requirementmust_be_unambiguous- opening ambiguity is a hard validation error
advisory_only- ambiguity is surfaced, but not automatically fatal
product.topologylinearcircular
fragments[].roleinsert- reserved future values:
backbone_fragmentbridge_fragment
fragments[].orientationforwardreverse
fragments[].source_span_1based- optional source-coordinate hint for plans that preserve source order
- shape:
source_seq_idstartend
fragments[].left_end_strategy.mode/right_end_strategy.modenative_overlap- fragment terminus is already expected to satisfy the required overlap
primer_added_overlap- overlap is expected to be introduced via a primer tail
- this is the current draft bucket for overlap-extension / primer-stitching style adaptation
- reserved future values:
synthetic_terminal_sequencelibrary_defined_overlap
assembly_order[].kinddestination_endfragment
junctions[].overlap_sourcederive_from_destination_left_flankderive_from_destination_right_flankderive_from_adjacent_fragment_endsdesigned_bridge_sequence- internal junction overlap chosen as a synthetic bridge/adaptor sequence
- reserved future value:
user_specified_sequence
junctions[].overlap_partition- explicit contribution of the overlap region from the adjacent members
- shape:
left_member_bpright_member_bp
- invariant:
left_member_bp + right_member_bp == required_overlap_bp
- examples:
- left-member only overlap:
30 + 0 - right-member only overlap:
0 + 30 - split overlap:
20 + 20
- left-member only overlap:
validation_policy.adjacency_overlap_mismatcherrorwarn
validation_policy.uniqueness_checks.*offwarnerror
validation_policy.reference_contexts[].severitywarnerror
Input-vs-derived boundary in the draft model:
- Intended user/planner inputs:
destinationproductfragmentsassembly_orderjunctions[].required_overlap_bpjunctions[].overlap_partitionjunctions[].distinct_fromvalidation_policy
- Intended normalized/derived outputs:
derived_design.junction_overlapsderived_design.primer_design_suggestionsderived_design.notes
- Transition fields that may begin as user hints and later become resolved
values:
junctions[].overlap_source- fragment end strategies (
native_overlapvsprimer_added_overlap)
Interpretation:
- Gibson plans are modeled as explicit assembly junctions around an opened destination, not merely as an unordered bag of fragments.
- The destination opening defines two terminal junctions, and therefore two required overlap regions.
- Inserts may already satisfy those terminal overlaps or may require primer-tail adaptation.
- The overlap at one junction should be treated as a selection around the
in-silico junction, not merely as one scalar length:
- it may come entirely from the left member
- entirely from the right member
- or be split across both members
- For plans that preserve an existing source order, fragment ordering should follow ascending bp coordinates (low bp positions first) unless an explicit alternative order is requested.
- For multi-fragment plans, internal fragment-fragment junctions may be created through primer-added bridge overlaps rather than relying on pre-existing native overlap.
- Uniqueness is best treated in layers:
- destination opening uniqueness: hard validation
- left/right terminal overlap distinctness: hard validation
- destination/fragment/genome uniqueness heuristics: advisory checks
Design intent:
- make the same JSON artifact useful for:
- preflight Gibson validation
- primer design derivation
- workflow/macro instantiation
- factual protocol-cartoon generation
- reproducible AI-facing project context
Practical overlap heuristics (draft defaults):
- single-insert / two-fragment style assemblies often fit comfortably in the 20-40 bp range
- multi-fragment assemblies should usually move toward longer overlaps
- a practical starting rule for the draft model is:
- 1-2 assembled fragments: 20-40 bp overlaps
- 3-5 assembled fragments: 40 bp overlaps
- 6+ assembled fragments: 50-100 bp overlaps
- internal multi-fragment junctions introduced by primer-added bridge overlaps should follow the same fragment-count-aware guidance rather than being treated as exempt from overlap design heuristics
Primer design conventions (draft):
- Gibson primer suggestions should be modeled as two-part primers:
overlap_5prime- non-priming 5' overlap segment used for assembly of adjacent fragments
priming_3prime- gene-specific 3' priming segment used for PCR amplification from the source template
- Primer design should start from an in silico assembled product/junction view, then work backward to fragment-specific PCR primers.
- Overlap choice is best treated as Tm-aware rather than length-only:
- simple PCR-fragment-into-vector assemblies may use shorter overlaps when overlap Tm is already adequate
- more complex or multi-fragment assemblies often justify longer overlaps
- The overlap region may lie entirely within one adjacent member or be split across the two members around a junction.
- Two primers that implement the same overlap sequence can still belong to different PCR reactions, because each primes a different template fragment.
Assembly setup heuristics (draft advisory layer):
- linearized destination can be prepared by PCR amplification or by restriction digestion
- PCR cleanup is not always required, but carryover should stay modest relative to the final assembly reaction volume
- column purification is especially worth recommending for:
- assemblies of three or more PCR fragments
- assemblies involving fragments longer than ~5 kb
- direct vector + insert assemblies often benefit from insert concentration above vector concentration
- multi-fragment vector assemblies should generally move toward equimolar fragment usage
- some constructs may validate in silico but still perform poorly because of biological burden or instability in the propagation host (for example repeats or toxic products)
Planning implication:
- these factors should usually surface as
derived_designadvisories or future execution/setup guidance, not as hard failures in the core Gibson junction model
Purpose:
- provide one deterministic, non-mutating preview response for the current Gibson specialist flow,
- keep GUI, shared shell, and direct CLI on the same overlap/primer/cartoon derivation path.
Current shared entry points:
gibson preview PLAN_JSON_OR_@FILE [--output OUTPUT.json]gibson apply PLAN_JSON_OR_@FILE- GUI specialist window:
Patterns -> Gibson...
Top-level structure:
schema,plan_id,title,summarycan_executedestination- resolved opening mode/span or cutpoint and actual topology
fragments[]- resolved ordered insert rows (fragment id, template seq id, orientation, length)
insert- compatibility mirror of the first insert row for older single-insert consumers
resolved_junctions[]- overlap bp, left/right member contributions, overlap Tm, resolved overlap sequence, source note
primer_suggestions[]- full primer sequence plus explicit
overlap_5primeandpriming_3primesegments
- full primer sequence plus explicit
warnings[],errors[],notes[]- includes the shared Tₘ-model note used by GUI/CLI so the assumptions stay visible to the user
- notes also carry explicit design-review guidance that separates:
- overlap-side success/failure
- PCR 3' priming-side success/failure so adapters can explain when the current blocker is priming rather than Gibson overlap derivation
suggested_design_adjustments[]- optional structured next-step relaxations when overlap derivation already succeeds and the remaining blocker is only the 3' priming window
- current v1 targets:
- increasing
priming_segment_max_length_bp - lowering
priming_segment_tm_min_celsius
- increasing
- intended for adapters to offer deterministic “apply and rerun preview” actions without parsing prose notes
unique_restriction_site- optional structured outcome for a requested
validation_policy.desired_unique_restriction_site_enzyme_name - reports whether the requested site was:
- already unique in the assembled product
- newly engineered on one terminal overlap
- carries the enzyme name, terminal side/junction, engineered overlap sequence, motif offset, mutation count, and user-facing message so adapters do not have to infer this from notes/error prose
- optional structured outcome for a requested
cartoon- built-in protocol id plus template bindings for single-insert flows
- multi-insert previews may instead carry one fully resolved
ProtocolCartoonSpecdirectly - intended to stay mechanism-first:
- show resolved fragment flow and achieved homology/overlap relationships
- preserve strand-specific 5' chew-back / exposed-tail geometry rather than flattening the mechanism to duplex-only blocks
- avoid drawing full primer objects or low-level PCR parameterization inside the cartoon itself
- keep primer sequences, priming segments, Tm assumptions, and related PCR details in adjacent textual/review payloads instead
routine_handoff- best-effort Routine Assistant handoff metadata for existing execution paths
Current v1 scope and limits:
- one or more insert fragments in an explicit ordered chain
- destination-first order:
destination_left -> insert_1 -> ... -> insert_n -> destination_right - the shared preview derives
n + 1explicit Gibson junctions forninserts - terminal overlaps are derived from destination context; internal junctions are normalized from the adjacent fragment ends / partition rules
- user influence over PCR design stays high-level and Gibson-specific: overlap bp range, minimum overlap Tm, priming-segment Tm window, and priming-segment length window
- current execution limitation:
- multi-insert apply currently requires
destination.opening.mode=defined_site existing_terminiremains the single-fragment path used by the current Routine Assistant handoff
- multi-insert apply currently requires
- current unique-site engineering limitation:
- only the single-insert
defined_sitepath is supported - only palindromic cutter recognition sequences are currently handled
- overlap windows must be non-wrapping in the displayed destination sequence
- only the single-insert
- current Tₘ fields use the shared GENtle nearest-neighbor estimate with fixed
assumptions:
- exact complement
- 50 mM monovalent salt
- 250 nM total oligo concentration
- no mismatch/dangling-end/Mg correction
- fallback to the simple 2/4 estimate for ambiguous or very short sequences
- generic PCR/qPCR request editing is intentionally out of scope for this specialist flow
- mutating execution now exists as engine operation
ApplyGibsonAssemblyPlan:- consumes the same plan JSON
- creates deterministic sequence outputs for:
- left insert primer
- right insert primer
- assembled product
- creates one shared serial arrangement for downstream gel review:
- original destination vector
- ordered insert lane(s)
- assembled product
- recommended DNA ladders carried with the arrangement for flanking export
- transfers destination and insert features onto the assembled product deterministically through the shared engine path
- destination features intersecting the consumed opening are now projected
when a truthful rewrite is available:
- one-sided overlaps are trimmed to the surviving product span
- simple spanning features can survive as multipart remnants
- MCS-like annotations are projected to the edited locus and revalidated against actual restriction-enzyme sites on the assembled product
- the MCS cross-check is product-aware:
mcs_expected_sitesis rewritten to the currently unique cutter set for that annotated region on the assembled productmcs_expected_sites_original,mcs_region_sites,mcs_nonunique_sites,mcs_gained_unique_sites, andmcs_lost_or_nonunique_sitespreserve the cross-check result- insert-derived sequence may introduce new sites, and those new sites are considered during the same validation pass
- records one operation-log row so GUI lineage/CLI state replay can reopen the specialist from the saved plan without silently re-running it
Normalization/derivation phases:
- Resolve the destination opening into explicit
dest_left/dest_rightterminal context.- for cutter-derived openings, the resolved coordinates represent the actual cleavage window between the recessed termini rather than the whole recognition span
- equal start/end is therefore valid and means a blunt cutpoint
- Normalize
assembly_order[]into one adjacency chain.- when fragments carry compatible
source_span_1basedhints for one source context, default normalization should preserve ascending bp order
- when fragments carry compatible
- Materialize one
junctionper adjacent pair in that chain. - Derive required overlap sequences from destination flanks and/or adjacent
fragment termini.
- respect the junction-specific overlap partition when choosing the final overlap sequence around that adjacency
- internal multi-fragment junctions may instead use designed bridge sequences introduced by primer-added overlaps
- Detect whether each fragment end already satisfies its required overlap or requires adaptation (for example primer-added tails).
- Run hard validation and advisory design checks.
- Expose derived overlaps, primer design suggestions, and cartoon-ready event
semantics through
derived_design. - Attach reaction/setup advisories (cleanup, stoichiometry, host-risk notes) without conflating them with the hard overlap/junction logic.
Current invariants for the draft model:
assembly_order[]defines the intended adjacency order explicitly.junctions[]should cover every adjacent pair inassembly_order[].- terminal junctions are the ones adjacent to the opened destination ends.
junctions[].overlap_partition.left_member_bp + junctions[].overlap_partition.right_member_bpshould equaljunctions[].required_overlap_bp.- terminal junction distinctness is a hard validation rule for opened destination-vector Gibson plans.
- when source-order hints are present and no contrary manual order is given, low bp positions should precede high bp positions in normalization.
- destination-opening uniqueness is a hard validation rule.
- broader destination/fragment/genome uniqueness checks are advisory unless a stricter policy is requested.
derived_designmay contain unresolved/null sequences at pure planning time; this allows the same schema to exist before sequence extraction or primer design has been run.junctions[].distinct_fromis currently intended primarily for terminal destination-defined junctions, not as a requirement that every internal fragment-fragment junction be globally unique.native_overlapis an expectation about the fragment terminus; it still requires sequence confirmation in validation/derivation.designed_bridge_sequenceshould be treated as a designed internal overlap, suitable for primer-stitching style workflows; GENtle should validate it for distinctness/design heuristics rather than treating it as biologically privileged just because it was user-supplied.
Purpose:
- provide one deterministic, non-mutating inspection payload for prepared reference/helper cache roots,
- let GUI, shared shell, and direct CLI report exactly what GENtle created locally before any deletion happens.
Current shared entry points:
cache inspect [--references|--helpers|--both] [--cache-dir PATH ...]- GUI specialist window:
Genome -> Clear Caches... Prepared References... -> Clear Caches...
Top-level structure:
schema,cache_roots[]entries[]entry_count,total_size_bytes,total_file_count
Entry structure:
entry_idclassificationprepared_installorphaned_remnant
cache_root,pathartifact_stats[]groupcached_sourcesderived_indexesblast_db
total_size_bytesfile_count
total_size_bytes,file_count
Inspection rules:
- inspection stays rooted in the selected cache roots only
- default roots are adapter-facing conventions:
- references:
data/genomes - helpers:
data/helper_genomes
- references:
- orphaned remnants are inspectable even when they are not backed by a manifest
- catalog JSON, project state files, MCP/runtime files, backdrop/runtime caches, and developer build artifacts are out of scope
Purpose:
- provide one deterministic cleanup result payload for conservative prepared cache deletion workflows,
- keep partial rebuild/reindex cleanup and full prepared-install deletion on the same shared contract across GUI/CLI/shell.
Current shared entry points:
cache clear blast-db-only|derived-indexes-only|selected-prepared|all-prepared-in-cache ...- GUI specialist window:
Genome -> Clear Caches...
Top-level structure:
schema,mode,cache_roots[]selected_prepared_ids[]selected_prepared_paths[]include_orphaned_remnantsresults[]entry_count,removed_item_count,removed_bytes,removed_file_count
Per-entry result structure:
entry_idclassificationprepared_installorphaned_remnant
cache_root,pathremovedremoved_artifact_groups[]removed_bytes,removed_file_countskipped_reason?
Cleanup modes:
blast_db_only- remove only BLAST DB sidecars for selected manifest-backed installs
derived_indexes_only- remove BLAST DB sidecars plus
sequence.fa.faiandgenes.json - cached sources and manifests remain so reindex-from-cached-files still works
- remove BLAST DB sidecars plus
selected_prepared_installs- remove only explicitly selected prepared installs
- optional
include_orphaned_remnantsalso removes orphaned remnants under the same selected roots
all_prepared_in_cache- remove all prepared installs under the selected roots
- optional
include_orphaned_remnantsextends that deletion to orphaned remnants
Cleanup rules:
blast_db_onlyandderived_indexes_onlyapply only to manifest-backed prepared installs- selective cleanup modes accept either
selected_prepared_ids[]orselected_prepared_paths[] selected_prepared_paths[]are the precise selector when duplicate prepared ids exist across multiple selected cache roots- orphaned remnants can only be deleted through the full-delete modes
- cleanup never scans the whole workspace; it only touches the selected roots
- cleanup does not treat catalog JSON,
.gentle_state.json, MCP/runtime files, backdrop/runtime caches, ortarget/as cache
{
"sequences": {"seq_id": "DNAsequence object"},
"metadata": {"any": "json"},
"display": {"ui_visibility_tfbs_and_linear_viewport_state": "..."},
"lineage": {"nodes": {}, "edges": []},
"parameters": {"max_fragments_per_container": 80000},
"container_state": {
"containers": {},
"arrangements": {},
"racks": {},
"seq_to_latest_container": {}
}
}Semantic interpretation:
- In GUI terms, a project window represents a wet-lab container context.
- A container may map to multiple candidate sequences/fragments.
- Explicit container objects are first-class state (
container_state) and are indexed from sequence ids viaseq_to_latest_container. - Containers now also record
declared_contents_exclusive:true(default): the declared members are intended to be the full known contents of a clean vial/tubefalse: the declared members are measured/known constituents of a more complex sample, and additional unlisted molecules may also be present
- Arrangements stay the semantic experiment-order layer.
- Racks are the linked physical placement layer and may host one or more arrangements without changing arrangement identity.
RackProfileKind- built-in physical carriers:
small_tube_4x6plate_96plate_384
- persisted custom snapshots use:
custom
- built-in physical carriers:
RackProfileSnapshot- persisted row/column/fill-direction/blocked-slot snapshot used by one saved rack
fill_directionrow_majorcolumn_major
blocked_coordinates[]- normalized A1-style coordinate list
Rack- one saved physical rack/plate draft
RackPlacementEntry- one occupied A1-style coordinate on that rack
- points back to:
arrangement_id- arrangement-local
order_index - one
occupant
RackOccupantcontainerladder_reference
Rack-placement invariants:
- rack placement consumes arrangement order instead of duplicating experiment meaning in a second free-form list
- default placement is deterministic:
- choose the smallest fitting built-in profile
- fill row-major
- use A1-style coordinates
- saved rack snapshots may then refine physical layout with:
fill_direction = row_major|column_majorblocked_coordinates[]
- A1-style row labels continue beyond
ZasAA,AB, ... - moving one sample or arrangement block is shift-neighbor by default; it preserves occupied order instead of creating arbitrary holes
Purpose:
- provide one deterministic inspection payload for saved rack state
- keep GUI rack view and CLI/shell inspection on one shared state contract
Current shared entry point:
racks show RACK_ID
Top-level structure:
schemarackplacements[]
Placement payload:
coordinatearrangement_idorder_indexrole_labeloccupantkind = containercontainer_idcontainer_name?seq_id?
kind = ladder_referenceladder_name
kind = empty
Current draft operations:
LoadFile { path, as_id? }SaveFile { seq_id, path, format }RenderSequenceSvg { seq_id, mode, path }- linear exports honor the current stored linear viewport in
display(linear_view_start_bp/linear_view_span_bp) when that viewport is a proper subsequence crop - single-base
variationfeatures render as baseline markers in linear SVG output rather than as generic detached feature blocks - linear exports now also mark transcription starts/directions for
strand-bearing
gene/mRNA/CDS/promoterfeatures and suppress unlabeled fallback coordinate text that would otherwise clutter figure-oriented exports - linear exports also prefer gene-style labels over accession-only transcript ids when possible and compact nearby repeated non-gene labels
- direction-bearing
mRNA/promoterbars render with arrowed ends, and the linear TSS cue uses a short hooked arrow so direction survives figure-oriented contexts - circular exports now use a transparent canvas and render single-base
variationfeatures as explicit radial markers on the DNA ring - circular exports also mark transcription starts for strand-bearing
gene/mRNA/CDS/promoterfeatures with a short arrow shaft plus direction arrowhead - circular exports also use a slightly larger ring and larger label fonts so figure-oriented construct maps stay readable when embedded in docs
- linear exports honor the current stored linear viewport in
RenderDotplotSvg { seq_id, dotplot_id, path, flex_track_id?, display_density_threshold?, display_intensity_gain? }RenderFeatureExpertSvg { seq_id, target, path }- shared renderer contract across GUI/CLI/JS/Lua for TFBS/restriction/splicing/isoform expert exports
- splicing SVG includes explicit junction-support counts, frequency-encoded transcript-vs-exon matrix coloring, predicted exon->exon transition matrix support coloring, exon
len%3(genomic-length modulo 3) cues, and CDS flank phase edge coloring (0/1/2) when transcriptcds_ranges_1basedare available
RenderIsoformArchitectureSvg { seq_id, panel_id, path }RenderRnaStructureSvg { seq_id, path }RenderLineageSvg { path }RenderPoolGelSvg { inputs, path, ladders?, container_ids?, arrangement_id?, conditions? }CreateArrangementSerial { container_ids, arrangement_id?, name?, ladders? }SetArrangementLadders { arrangement_id, ladders? }CreateRackFromArrangement { arrangement_id, rack_id?, name?, profile? }PlaceArrangementOnRack { arrangement_id, rack_id }MoveRackPlacement { rack_id, from_coordinate, to_coordinate, move_block? }MoveRackSamples { rack_id, from_coordinates[], to_coordinate }MoveRackArrangementBlocks { rack_id, arrangement_ids[], to_coordinate }SetRackProfile { rack_id, profile }ApplyRackTemplate { rack_id, template }SetRackFillDirection { rack_id, fill_direction }SetRackProfileCustom { rack_id, rows, columns }SetRackBlockedCoordinates { rack_id, blocked_coordinates }ExportRackLabelsSvg { rack_id, path, arrangement_id?, preset }ExportRackFabricationSvg { rack_id, path, template }ExportRackIsometricSvg { rack_id, path, template }ExportRackOpenScad { rack_id, path, template }ExportRackCarrierLabelsSvg { rack_id, path, arrangement_id?, template, preset }ExportRackSimulationJson { rack_id, path, template }RenderProtocolCartoonSvg { protocol, path }RenderProtocolCartoonTemplateSvg { template_path, path }ValidateProtocolCartoonTemplate { template_path }RenderProtocolCartoonTemplateWithBindingsSvg { template_path, bindings_path, path }ExportProtocolCartoonTemplateJson { protocol, path }ExportDnaLadders { path, name_filter? }ExportRnaLadders { path, name_filter? }ExportPool { inputs, path, pool_id?, human_id? }ExportProcessRunBundle { path, run_id? }Digest { input, enzymes, output_prefix? }Ligation { inputs, circularize_if_possible, protocol, output_id?, output_prefix?, unique? }MergeContainers { inputs, output_prefix? }Pcr { template, forward_primer, reverse_primer, output_id?, unique? }PcrAdvanced { template, forward_primer, reverse_primer, output_id?, unique? }PcrMutagenesis { template, forward_primer, reverse_primer, mutations, output_id?, unique?, require_all_mutations? }DesignPrimerPairs { ... }(implemented baseline)PcrOverlapExtensionMutagenesis { ... }(implemented baseline; insertion/deletion/replacement overlap-extension flow)DesignQpcrAssays { ... }(implemented baseline; forward/reverse/probe)ComputeDotplot { seq_id, reference_seq_id?, span_start_0based?, span_end_0based?, reference_span_start_0based?, reference_span_end_0based?, mode, word_size, step_bp, max_mismatches?, tile_bp?, store_as? }(implemented baseline, self + pairwise)ComputeFlexibilityTrack { seq_id, span_start_0based?, span_end_0based?, model, bin_bp, smoothing_bp?, store_as? }(implemented baseline)DeriveSplicingReferences { seq_id, span_start_0based, span_end_0based, seed_feature_id?, scope?, output_prefix? }(implemented baseline; emits derived DNA window + mRNA isoforms + exon-reference sequence)AlignSequences { query_seq_id, target_seq_id, query_span_start_0based?, query_span_end_0based?, target_span_start_0based?, target_span_end_0based?, mode?, match_score?, mismatch_score?, gap_open?, gap_extend? }(implemented baseline; returns structured pairwise local/global report inOpResult.sequence_alignment)ImportSequencingTrace { path, trace_id?, seq_id? }(implemented baseline; imports one ABI/AB1 or SCF evidence file into the shared sequencing-trace store without mutating construct sequences)ListSequencingTraces { seq_id? }ShowSequencingTrace { trace_id }ConfirmConstructReads { expected_seq_id, baseline_seq_id?, read_seq_ids?, trace_ids?, targets?, alignment_mode?, match_score?, mismatch_score?, gap_open?, gap_extend?, min_identity_fraction?, min_target_coverage_fraction?, allow_reverse_complement?, report_id? }(implemented baseline; accepts already-loaded read sequences and/or imported sequencing traces as evidence inputs into one shared confirmation report, with optional baseline context for intended-edit vs reversion classification)InterpretRnaReads { seq_id, seed_feature_id, profile, input_path, input_format, scope, origin_mode?, target_gene_ids?, roi_seed_capture_enabled?, seed_filter, align_config, report_id?, report_mode?, checkpoint_path?, checkpoint_every_reads?, resume_from_checkpoint? }(Nanopore cDNA phase-1 seed-filter pass;multi_gene_sparseexpands local transcript-template indexing, while ROI capture remains planned)AlignRnaReadReport { report_id, selection, align_config_override?, selected_record_indices? }(Nanopore cDNA phase-2 retained-hit alignment pass; updates mapping/MSA/abundance report fields and re-ranks retained hits by alignment-aware retention rank)ListRnaReadReports { seq_id? }ShowRnaReadReport { report_id }ExportRnaReadReport { report_id, path }ExportRnaReadHitsFasta { report_id, path, selection, selected_record_indices?, subset_spec? }ExportRnaReadSampleSheet { path, seq_id?, report_ids?, gene_ids?, complete_rule?, append? }ExportRnaReadExonPathsTsv { report_id, path, selection, selected_record_indices?, subset_spec? }ExportRnaReadExonAbundanceTsv { report_id, path, selection, selected_record_indices?, subset_spec? }ExportRnaReadScoreDensitySvg { report_id, path, scale, variant }ExportRnaReadAlignmentsTsv { report_id, path, selection, limit?, selected_record_indices?, subset_spec? }ExportRnaReadAlignmentDotplotSvg { report_id, path, selection, max_points }ExtractRegion { input, from, to, output_id? }PrepareGenome { genome_id, catalog_path?, cache_dir?, timeout_seconds? }ExtractGenomeRegion { genome_id, chromosome, start_1based, end_1based, output_id?, annotation_scope?, max_annotation_features?, include_genomic_annotation?, catalog_path?, cache_dir? }annotation_scopeacceptsnone|core|fulland defaults tocorewhen omitted.max_annotation_featuresis an optional safety cap (0 or omitted = unlimited for explicit requests).- legacy
include_genomic_annotationis still accepted (true->core,false->none) for compatibility. - operation results include
genome_annotation_projectiontelemetry (requested/effective scope, feature counts, fallback metadata). - for helper genome IDs containing
pUC18/pUC19, the engine applies a deterministic fallback MCSmisc_featureannotation when source annotation does not already include an MCS feature and exactly one canonical MCS motif is found. - source-derived and fallback MCS features expose
mcs_expected_siteswith REBASE-normalized enzyme names when recognizable.
ExtractGenomeGene { genome_id, gene_query, occurrence?, output_id?, extract_mode?, promoter_upstream_bp?, annotation_scope?, max_annotation_features?, include_genomic_annotation?, catalog_path?, cache_dir? }annotation_scopeacceptsnone|core|fulland defaults tocorewhen omitted.max_annotation_featuresis an optional safety cap (0 or omitted = unlimited for explicit requests).- legacy
include_genomic_annotationis still accepted (true->core,false->none) for compatibility. - operation results include
genome_annotation_projectiontelemetry (requested/effective scope, feature counts, fallback metadata). - for helper genome IDs containing
pUC18/pUC19, the same deterministic MCS fallback annotation behavior applies when an MCS feature is missing; non-unique motif matches are warned and skipped.
ExtendGenomeAnchor { seq_id, side, length_bp, output_id?, catalog_path?, cache_dir?, prepared_genome_id? }VerifyGenomeAnchor { seq_id, catalog_path?, cache_dir?, prepared_genome_id? }
Catalog-backed reference/helper discovery notes:
- shared shell/CLI discovery commands
genomes listandhelpers listaccept optional--catalog PATHand--filter TEXT PATHmay point to either one JSON catalog file or a directory of top-level*.jsonfragments; directory fragments are merged deterministically by sorted filename and duplicate entry ids fail fast- when
catalog_pathis omitted, the engine now resolves a deterministic discovery chain in this order:- built-in catalog file plus optional built-in fragment directory
- system overlay file/directory under
/etc/gentle/catalogs/ - user overlay file/directory under
$XDG_CONFIG_HOME/gentle/catalogs/or~/.config/gentle/catalogs/ - project overlay file/directory under
PROJECT_ROOT/.gentle/catalogs/
- the root locations for built-in/system/project discovery may be overridden in
controlled environments via
GENTLE_ASSET_ROOT,GENTLE_SYSTEM_CONFIG_ROOT, andGENTLE_PROJECT_ROOT - adapters that need to preserve "use default discovery" intent through
persisted operation/provenance records may emit
gentle://catalog/reference/defaultorgentle://catalog/helper/default - list results now include both the stable entry id array and richer
entriesmetadata rows so frontends, agents, and ClawBio integrations can search and display the same catalog facts without re-encoding them - helper/reference catalog entries may now carry typed discovery metadata such
as
summary,aliases,tags,search_terms,species,helper_kind,host_system,procurement, and optional structuredsemantics - helper-list/status routes may now also expose an engine-owned normalized
interpretationrecord derived from those helper fields:helper_iddescription,summary,aliaseshelper_kinds,host_systemsoffered_functions,constraintsprocurement_channels,local_variant_unpublished- deterministic
components[]andrelationships[]
- that metadata is intended to stay compatible with the emerging reasoning/constraint engine and with later ontology-backed helper/vector descriptions rather than forcing a future rewrite of catalog records
Sequencing-trace evidence notes:
- raw traces are stored separately from
SequencingConfirmationReportpayloads; importing a trace does not run confirmation and does not mutate any sequence entry ImportSequencingTracecurrently auto-detects:- ABIF/AB1 via
ABIFmagic bytes - SCF via
.scfmagic bytes
- ABIF/AB1 via
- stored
SequencingTraceRecordpayloads preserve:- file-supplied called bases
- called-base confidence arrays when available
- peak locations when available
- raw per-channel intensity arrays when available
- compact per-channel trace-length summaries
- optional clip window metadata when present in the source file
- optional sample/run/machine metadata when present in the source file
- trace-aware confirmation now reuses the same
SequencingConfirmationReportmodel:ConfirmConstructReadsacceptstrace_idsin addition toread_seq_idsConfirmConstructReadsaccepts optionalbaseline_seq_idso the expected construct remains primary truth while baseline context can distinguish intended edits from reference reversions- per-evidence rows expose evidence kind plus optional
trace_id - target support/contradiction ids may now refer to imported trace ids when traces provide the relevant evidence
- report payloads now include:
baseline_seq_id?- per-target
expected_bases?/baseline_bases?for expected-edit loci variants[]rows with observed allele, evidence id, confidence summary, peak center, and classification:expected_match|intended_edit_confirmed|reference_reversion|unexpected_difference|low_confidence_or_ambiguous|insufficient_evidence
- persisted confirmation reports now project as lineage analysis artifacts in
both the GUI lineage workspace and shared
RenderLineageSvgexport: nodes are keyed byreport_id, attach to the expected construct plus optional baseline/reference sequence, and reopen the sequencing-confirmation specialist on that stored report in GUI adapters
SuggestSequencingPrimers { expected_seq_id, primer_seq_ids[], confirmation_report_id?, min_3prime_anneal_bp, predicted_read_length_bp }- non-mutating helper for sequencing-confirmation review and primer coverage planning
primer_seq_ids[]may be empty whenconfirmation_report_idis present: that mode proposes fresh sequencing primers for unresolved loci using the expected construct plus the saved report context- returns
SequencingPrimerOverlayReportwith per-hit orientation, anneal span, predicted read span, optional coverage annotations against a persisted sequencing-confirmation report, per-problem guidance rows naming the best existing primer hit for unresolved targets or variant loci, andproposals[]rows for fresh primer candidates when no good existing hit is available
ImportBlastHitsTrack { seq_id, hits[], track_name?, clear_existing?, blast_provenance? }- optional
blast_provenancepayload preserves invocation context (genome_id,query_label,query_length,max_hits,task,blastn_executable,blast_db_prefix, rawcommand[],command_line,catalog_path?,cache_dir?,options_override_json?,effective_options_json?) for sequence-history/audit views.
- optional
SelectCandidate { input, criterion, output_id? }ImportIsoformPanel { seq_id, panel_path, panel_id?, strict }ImportUniprotSwissProt { path, entry_id? }FetchUniprotSwissProt { query, entry_id? }ImportUniprotEntrySequence { entry_id, output_id? }- imports one first-class protein sequence plus projected UniProt feature annotations into regular project sequence state.
FetchGenBankAccession { accession, as_id? }FetchDbSnpRegion { rs_id, genome_id, flank_bp?, output_id?, annotation_scope?, max_annotation_features?, catalog_path?, cache_dir? }DeriveProteinSequences { seq_id, feature_ids[], scope?, output_prefix? }ReverseTranslateProteinSequence { seq_id, output_id?, speed_profile?, speed_mark?, translation_table?, target_anneal_tm_c?, anneal_window_bp? }ProjectUniprotToGenome { seq_id, entry_id, projection_id?, transcript_id? }GenerateCandidateSet { set_name, seq_id, length_bp, step_bp, feature_kinds[], feature_label_regex?, max_distance_bp?, feature_geometry_mode?, feature_boundary_mode?, feature_strand_relation?, limit? }GenerateCandidateSetBetweenAnchors { set_name, seq_id, anchor_a, anchor_b, length_bp, step_bp, limit? }DeleteCandidateSet { set_name }UpsertGuideSet { guide_set_id, guides[] }DeleteGuideSet { guide_set_id }FilterGuidesPractical { guide_set_id, config?, output_guide_set_id? }GenerateGuideOligos { guide_set_id, template_id, apply_5prime_g_extension?, output_oligo_set_id?, passed_only? }ExportGuideOligos { guide_set_id, oligo_set_id?, format: csv_table|plate_csv|fasta, path, plate_format? }ExportGuideProtocolText { guide_set_id, oligo_set_id?, path, include_qc_checklist? }ScoreCandidateSetExpression { set_name, metric, expression }ScoreCandidateSetDistance { set_name, metric, feature_kinds[], feature_label_regex?, feature_geometry_mode?, feature_boundary_mode?, feature_strand_relation? }FilterCandidateSet { input_set, output_set, metric, min?, max?, min_quantile?, max_quantile? }CandidateSetOp { op: union|intersect|subtract, left_set, right_set, output_set }ScoreCandidateSetWeightedObjective { set_name, metric, objectives[], normalize_metrics? }TopKCandidateSet { input_set, output_set, metric, k, direction?, tie_break? }ParetoFrontierCandidateSet { input_set, output_set, objectives[], max_candidates?, tie_break? }UpsertWorkflowMacroTemplate { name, description?, details_url?, parameters[], script }DeleteWorkflowMacroTemplate { name }UpsertCandidateMacroTemplate { name, description?, details_url?, parameters[], script }DeleteCandidateMacroTemplate { name }FilterByMolecularWeight { inputs, min_bp, max_bp, error, unique, output_prefix? }FilterByDesignConstraints { inputs, gc_min?, gc_max?, max_homopolymer_run?, reject_ambiguous_bases?, avoid_u6_terminator_tttt?, forbidden_motifs?, unique, output_prefix? }Reverse { input, output_id? }Complement { input, output_id? }ReverseComplement { input, output_id? }Branch { input, output_id? }SetDisplayVisibility { target, visible }SetLinearViewport { start_bp, span_bp }SetTopology { seq_id, circular }RecomputeFeatures { seq_id }SetParameter { name, value }(purely in-silico project parameter change)
Isoform-panel operation semantics (current):
ImportIsoformPanelloads curated panel resources with schemagentle.isoform_panel_resource.v1and binds them to one sequence context.strict=trueenforces hard failure when panel transcript mapping fails;strict=falserecords warnings and keeps partial mappings.RenderIsoformArchitectureSvgemits a deterministic two-section architecture SVG (transcript/exon lanes + protein/domain lanes) derived from the same expert payload used by GUI/shell inspection.- when CDS ranges are available for mapped transcripts, SVG rendering uses
dual coding in the top panel (faint full exons + solid CDS blocks) and
adds a genome boundary rail with semi-transparent flank ribbons mapping
boundary intervals to amino-acid spans on the shared protein reference axis
(
1 aa ... max aa), so mapping is readable across all protein lanes. Identical ribbons are merged and rendered once with support-weighted opacity.
- when CDS ranges are available for mapped transcripts, SVG rendering uses
dual coding in the top panel (faint full exons + solid CDS blocks) and
adds a genome boundary rail with semi-transparent flank ribbons mapping
boundary intervals to amino-acid spans on the shared protein reference axis
(
gentle.isoform_panel_resource.v1supports optional protein reference-span hints per isoform:reference_start_aa(1-based inclusive)reference_end_aa(1-based inclusive)- when present, protein lanes render and clip domains within this span while keeping one shared amino-acid axis across isoforms (useful for TP53 N-terminus/C-terminus class overlays).
gentle.isoform_panel_resource.v1also supports panel-level transcript geometry mode:transcript_geometry_mode: exon|cds(defaultexon)cdsrenders top-panel lanes from transcript CDS segments when available, falling back to exon geometry per transcript if CDS metadata is missing.
LoadFile import detection semantics (current):
- deterministic probe order:
GenBank -> EMBL -> FASTA -> XML - XML scope:
GBSet/GBSeqis supported - unsupported XML dialects (for example
INSDSet/INSDSeq) return explicit schema/dialect diagnostics
ExtendGenomeAnchor side semantics:
sideacceptsfive_primeorthree_prime.- Direction is contextual to anchor strand.
- On anchor strand
-,five_primeincreases physical genomic position. - If the anchor genome id is not prepared exactly, the engine can auto-resolve
to one compatible prepared assembly-family entry (for example
GRCh38.p14->Human GRCh38 Ensembl 116). - If multiple compatible prepared entries exist, extension fails with a deterministic options list so caller/GUI can choose explicitly.
prepared_genome_idcan be passed explicitly to force a specific prepared cache and bypass compatibility auto-selection.
VerifyGenomeAnchor semantics:
- Re-checks one anchored sequence against the selected prepared genome cache at recorded coordinates/strand.
- Writes one new provenance entry with
operation = VerifyGenomeAnchorandanchor_verified = true|false. - Returns an in-place state change (
changed_seq_ids) for the same sequence id so GUI/CLI can refresh verification badges/status lines deterministically.
Local SequenceAnchor semantics (distinct from genome provenance anchoring):
SequenceAnchorcurrently supports:Position { zero_based }FeatureBoundary { feature_kind?, feature_label?, boundary, occurrence? }
boundaryacceptsStart,End, orMiddle.- This anchor model resolves in-sequence positions and is used for
in-silico extraction/scoring workflows (
ExtractAnchoredRegion,GenerateCandidateSetBetweenAnchors).
Adapter utility contracts (current, non-engine operations):
For narrative/operator guidance on when to use CLI, MCP, Agent Assistant, or an external coding agent runtime, see:
-
docs/agent_interfaces_tutorial.md -
help [COMMAND ...] [--format text|json|markdown] [--interface ...]- backed by structured glossary source
docs/glossary.json --format textrenders human-readable help--format jsonrenders machine-readable help catalog/topic payload--format markdownrenders documentation-ready markdown--interfaceaccepts:all|cli-direct|cli-shell|gui-shell|js|lua|mcp(mcpcurrently aliases to shared shell command docs)
- backed by structured glossary source
-
shared-shell isoform panel routes:
panels import-isoform SEQ_ID PANEL_PATH [--panel-id ID] [--strict]panels inspect-isoform SEQ_ID PANEL_IDpanels render-isoform-svg SEQ_ID PANEL_ID OUTPUT.svgpanels validate-isoform PANEL_PATH [--panel-id ID]
-
shared-shell UniProt routes:
uniprot fetch QUERY [--entry-id ID]uniprot import-swissprot PATH [--entry-id ID]uniprot listuniprot show ENTRY_IDuniprot map ENTRY_ID SEQ_ID [--projection-id ID] [--transcript ID]uniprot projection-list [--seq SEQ_ID]uniprot projection-show PROJECTION_IDuniprot feature-coding-dna PROJECTION_ID FEATURE_QUERY [--transcript ID] [--mode genomic_as_encoded|translation_speed_optimized|both] [--speed-profile human|mouse|yeast|ecoli]
-
shared feature-expert route now also accepts persisted UniProt projections as a target:
inspect-feature-expert SEQ_ID uniprot-projection PROJECTION_IDrender-feature-expert-svg SEQ_ID uniprot-projection PROJECTION_ID OUTPUT.svg- semantics:
- resolve one persisted
gentle.uniprot_genome_projections.v1record - build a shared
IsoformArchitectureView - transcript lanes come from the stored transcript/CDS-to-genome projection
- the protein lane uses the UniProt reference-protein coordinate system and projected UniProt interval features
- resolve one persisted
-
UniProt feature-coding DNA query semantics:
- resolves one persisted
gentle.uniprot_genome_projection.v1record - matches
FEATURE_QUERYcase-insensitively against mapped UniProt feature key/note text - returns one
gentle.uniprot_feature_coding_dna_query.v1report - each transcript match includes:
genomic_coding_dna: spliced coding-strand DNA exactly as encoded in the current genome sequencetranslation_speed_optimized_dna: optional preferred-codon alternative using the selected or inferredTranslationSpeedProfileexon_spans[]andexon_pairs[]so GUI/CLI can report the exon or exon pair carrying the feature
- exon ordinals follow transcript order, not raw genomic left-to-right position; reverse-strand transcript exon 1 is the transcript 5' exon
- resolves one persisted
-
shared-shell GenBank route:
genbank fetch ACCESSION [--as-id ID]
-
shared-shell dbSNP route:
dbsnp fetch RS_ID GENOME_ID [--flank-bp N] [--output-id ID] [--annotation-scope none|core|full] [--max-annotation-features N] [--catalog PATH] [--cache-dir PATH]
-
shared-shell protocol-cartoon routes:
protocol-cartoon listprotocol-cartoon render-svg PROTOCOL_ID OUTPUT.svgprotocol-cartoon render-template-svg TEMPLATE.json OUTPUT.svgprotocol-cartoon template-validate TEMPLATE.jsonprotocol-cartoon render-with-bindings TEMPLATE.json BINDINGS.json OUTPUT.svgprotocol-cartoon template-export PROTOCOL_ID OUTPUT.json- command surface is intentionally canonical: protocol-cartoon routes do not expose extra alias names
-
Python adapter wrapper (
integrations/python/gentle_py):- thin subprocess-based wrapper over
gentle_cli - deterministic methods:
capabilities()state_summary()op(operation)workflow(workflow|workflow_path)shell(line, expect_json=False)render_dotplot_svg(seq_id, dotplot_id, output_svg, ...)
- raises structured
GentleCliErrorwith:code(best-effort extracted stable code token)command,exit_code,stdout,stderr
- executable resolution order:
- constructor
cli_cmd GENTLE_CLI_CMDgentle_clionPATH- repository fallback
cargo run --quiet --bin gentle_cli --
- constructor
- thin subprocess-based wrapper over
-
gentle_mcp(stdio MCP adapter, expanded UI-intent parity baseline)- MCP role:
- request/response transport for tool execution (
tools/call) - standardized capability discovery/negotiation (
tools/list,capabilities,help)
- request/response transport for tool execution (
- current tools:
capabilitiesstate_summaryop(apply oneOperation; requires explicitconfirm=true)workflow(apply oneWorkflow; requires explicitconfirm=true)helpreference_catalog_entries(sharedgenomes listcatalog contract)helper_catalog_entries(sharedhelpers listcatalog contract)host_profile_catalog_entries(sharedhosts listcatalog contract)ensembl_installable_genomes(shared Ensembl discovery contract for currently installable candidates)helper_interpretation(direct helper-construct interpretation lookup)ui_intents(sharedui intentscatalog)ui_intent(sharedui open|focus ...resolution path)ui_prepared_genomes(sharedui prepared-genomes ...query path)ui_latest_prepared(sharedui latest-prepared ...query path)
- successful mutating calls (
op,workflow) persist state to the resolvedstate_path - UI-intent tools route through the shared shell parser/executor
(
parse_shell_tokens+execute_shell_command_with_options) and are required to remain non-mutating (state_changed = false) - tool handlers are adapter wrappers over existing deterministic engine/shell contracts (no MCP-only biology logic branch)
- stdio framing/validation hardening:
Content-Lengthis required, duplicate headers are rejected- maximum accepted frame size is
8 MiB - parsed JSON nesting depth is capped at
96 tools/callparams are strict (name, optionalargumentsonly)tools/call.argumentsmust be a JSON object
- MCP role:
MCP query/introspection tool contracts (current):
-
reference_catalog_entries- arguments:
- optional:
catalog_path,filter
- optional:
- behavior:
- returns the same structured payload shape as shared shell
genomes list [--catalog ...] [--filter ...]
- returns the same structured payload shape as shared shell
- result:
catalog_path,filter,genome_count,genomes[],entries[]
- arguments:
-
helper_catalog_entries- arguments:
- optional:
catalog_path,filter
- optional:
- behavior:
- returns the same structured payload shape as shared shell
helpers list [--catalog ...] [--filter ...]
- returns the same structured payload shape as shared shell
- result:
catalog_path,filter,genome_count,genomes[],entries[]- helper
entries[]may carry normalizedinterpretationrecords
- arguments:
-
host_profile_catalog_entries- arguments:
- optional:
catalog_path,filter
- optional:
- behavior:
- returns the same structured payload shape as shared shell
hosts list [--catalog ...] [--filter ...]
- returns the same structured payload shape as shared shell
- result:
catalog_path,filter,profile_count,profile_ids[],entries[]
- arguments:
-
ensembl_installable_genomes- arguments:
- optional:
collection,filter
- optional:
- behavior:
- returns the same Ensembl discovery report used by GUI/CLI/JS/Lua for answering which genomes currently look installable because both FASTA and GTF species-directory listings are present
- result:
collection_filter,availability_basis,collection_latest_releases{},candidates[],warnings[]
- arguments:
Shell/engine quick-install contracts:
-
genomes install-ensembl SPECIES_DIR ... -
helpers install-ensembl SPECIES_DIR ...- behavior:
- resolve current Ensembl FASTA/GTF files for one species directory
- derive or accept an explicit target
genome_id - write a real catalog entry before preparation starts
- choose between two safe write modes:
full_catalog: update one writable JSON catalog file in place or as a standalone copyoverlay_entry: write only the new entry into an overlay fragment/file so default discovery does not duplicate built-in ids
- run the existing prepare pipeline after the catalog write succeeds
- result:
preview.collection,preview.species_dir,preview.display_namepreview.file_stem,preview.releasepreview.genome_idpreview.output_catalog_pathpreview.catalog_write_modepreview.catalog_entry_actionpreview.sequence_remote,preview.annotations_remoteprepare_report
- behavior:
-
helper_interpretation- arguments:
- required:
helper_id(id or alias) - optional:
catalog_path
- required:
- behavior:
- resolves one helper entry through the shared catalog/alias lookup logic and returns the normalized helper-construct interpretation if semantics are available
- result:
querycatalog_pathinterpretation(nullwhen the entry exists but carries no structured helper semantics)
- arguments:
-
ui_intents- arguments:
state_path?(optional; accepted for interface symmetry)
- behavior:
- executes shared shell command:
ui intents
- executes shared shell command:
- result:
- structured payload schema:
gentle.ui_intents.v1 - includes stable
targets,commands, and deterministic notes
- structured payload schema:
- arguments:
-
ui_intent- arguments:
- required:
action(open|focus),target - optional:
state_path,genome_id,helpers,catalog_path,cache_dir,filter,species,latest
- required:
- current stable targets:
prepared-referencesprepare-reference-genomeretrieve-genome-sequenceblast-genome-sequenceimport-genome-trackpcr-designsequencing-confirmationagent-assistantprepare-helper-genomeretrieve-helper-sequenceblast-helper-sequence
- behavior:
- executes shared shell command:
ui open TARGET ...orui focus TARGET ...
- for
target = prepared-references, optional query flags can resolveselected_genome_iddeterministically through the same helper path used by shared shell/CLI - parser guardrails are preserved:
- query flags (
--helpers,--catalog,--cache-dir,--filter,--species,--latest) are rejected for non-prepared-referencestargets
- query flags (
- executes shared shell command:
- result:
- structured payload schema:
gentle.ui_intent.v1 - fields include
ui_intent,selected_genome_id, optionalprepared_query,applied=false, and deterministicmessage
- structured payload schema:
- arguments:
-
ui_prepared_genomes- arguments:
- optional:
state_path,helpers,catalog_path,cache_dir,filter,species,latest
- optional:
- behavior:
- executes shared shell command:
ui prepared-genomes ...
- executes shared shell command:
- result:
- structured payload schema:
gentle.ui_prepared_genomes.v1 - includes
prepared_count, sortedgenomes[], andselected_genome_id
- structured payload schema:
- arguments:
-
ui_latest_prepared- arguments:
- required:
species - optional:
state_path,helpers,catalog_path,cache_dir
- required:
- behavior:
- executes shared shell command:
ui latest-prepared SPECIES ...
- executes shared shell command:
- result:
- structured payload schema:
gentle.ui_latest_prepared.v1 - includes
selected_genome_idand nestedprepared_querypayload
- structured payload schema:
- arguments:
MCP UI-intent JSON-RPC example (abbreviated):
{
"jsonrpc": "2.0",
"id": 7,
"method": "tools/call",
"params": {
"name": "ui_intent",
"arguments": {
"action": "open",
"target": "prepared-references",
"catalog_path": "assets/genomes.json",
"species": "human",
"latest": true
}
}
}Result envelope shape:
{
"jsonrpc": "2.0",
"id": 7,
"result": {
"isError": false,
"structuredContent": {
"schema": "gentle.ui_intent.v1",
"selected_genome_id": "Human GRCh38 Ensembl 116",
"applied": false
}
}
}Adapter-equivalence guarantee for UI-intent tools:
-
deterministic parity tests compare MCP UI-intent tool outputs with direct shared shell
ui ...command outputs for:- intent catalog (
ui_intents) - prepared query (
ui_prepared_genomes) - latest helper (
ui_latest_prepared) - open/focus intent resolution (
ui_intent)
- intent catalog (
-
macros run/instance-list/instance-show/template-list/template-show/template-put/template-delete/template-import/template-run- shared-shell macro adapter family for full operation/workflow scripting
- template persistence is backed by engine operations
UpsertWorkflowMacroTemplate/DeleteWorkflowMacroTemplate template-putsupports optional typed port contracts:--input-port PORT_ID:KIND[:one|many][:required|optional][:description]--output-port PORT_ID:KIND[:one|many][:required|optional][:description]
template-import PATHaccepts:- one pack JSON file (
gentle.cloning_patterns.v1) - one single-template JSON file (
gentle.cloning_pattern_template.v1) - one directory tree (recursive
*.jsonimport; files must use one of the schemas above)
- one pack JSON file (
- imports are transactional; if one template fails validation, no imported template changes are kept
- expanded scripts can execute
op ...andworkflow ...statements and optionally roll back via--transactional template-runsupports non-mutating preflight mode via--validate-only- template-run responses now include a preflight payload
(
gentle.macro_template_preflight.v1) with warnings/errors and typed input/output port validation rows (contract_sourceindicates whether checks came from template metadata or routine catalog) - preflight includes cross-port semantic checks (alias/collision checks, input sequence/container consistency, and sequence-anchor semantics when sequence context is unambiguous)
- routine-family semantic checks are now supported:
- Gibson routines validate adjacent fragment overlap compatibility against configured overlap length before execution
- Restriction routines validate enzyme-name resolution, duplicate-enzyme
misuse, enzyme-site presence across bound input sequences, and common
digest parameter sanity (
left_fragment/right_fragment,extract_from/extract_to)
- mutating
macros run/macros template-runexecutions always persist one lineage macro-instance record (ok/failed/cancelled) - successful runs return
macro_instance_id; failed runs includemacro_instance_id=...in error messages macros instance-listandmacros instance-showexpose persisted lineage macro-instance records as first-class introspection contracts
-
routines list [--catalog PATH] [--family NAME] [--status NAME] [--tag TAG] [--query TEXT]- shared-shell/CLI routine catalog discovery surface
- default catalog path:
assets/cloning_routines.json - typed catalog schema:
gentle.cloning_routines.v1 - response schema:
gentle.cloning_routines_list.v1 - filters are case-insensitive; query performs substring match across routine id/title/family/status/template/tags/summary plus explainability metadata fields
-
routines explain ROUTINE_ID [--catalog PATH]- shared-shell/CLI routine explainability surface
- response schema:
gentle.cloning_routine_explain.v1 - returns one routine definition plus normalized explanation payload (purpose/mechanism/requires/contraindications/disambiguation/failure modes) and resolved confusing alternatives
-
routines compare ROUTINE_A ROUTINE_B [--catalog PATH]- shared-shell/CLI deterministic routine comparison surface
- response schema:
gentle.cloning_routine_compare.v1 - returns both routine definitions plus comparison payload: shared/unique tags, cross-reference status, aligned difference-matrix rows, and merged disambiguation questions
- includes planning-aware estimate rows in comparison payload:
estimated_time_hoursestimated_costlocal_fit_scorecomposite_meta_score
-
Planning meta-layer contracts (shared shell/CLI, engine-owned):
- profile schema:
gentle.planning_profile.v1 - objective schema:
gentle.planning_objective.v1 - estimate schema:
gentle.planning_estimate.v1 - suggestion schema:
gentle.planning_suggestion.v1 - sync-status schema:
gentle.planning_sync_status.v1 - merge precedence for effective profile:
global_profile -> confirmed_agent_overlay -> project_override
- purchasing latency heuristic in v1:
- each missing required material class adds default
procurement_business_days_default(default10) to estimate (Monday-Friday business-day model; no holiday calendar yet) - business-day delays are converted to
estimated_time_hourswith a deterministic weekend-aware factor (24h * 7/5per business day)
- each missing required material class adds default
- schema compatibility rule:
- profile/objective payloads with mismatched schema ids are rejected
(
InvalidInput) instead of silently coerced
- profile/objective payloads with mismatched schema ids are rejected
(
- profile schema:
-
planning profile show [--scope global|project_override|confirmed_agent_overlay|effective]- inspect one planning profile scope or merged effective profile
-
planning profile set JSON_OR_@FILE [--scope global|project_override|confirmed_agent_overlay]- set/replace selected planning profile scope
-
planning profile clear [--scope global|project_override|confirmed_agent_overlay]- clear selected planning profile scope
-
planning objective show- inspect current planning objective
- objective may now also carry:
- optional
helper_profile_id - optional
preferred_routine_families[]
- optional
- when present, routine-ranking routes synthesize helper-aware
routine_preference_contextand apply a transparent family-alignment bonus - the same synthesized context is now also reused by routine-decision traces and engine-owned macro-template suggestions so planner-facing adapters do not need their own helper-specific heuristics
-
planning objective set JSON_OR_@FILE- set/replace planning objective
-
planning objective clear- clear planning objective (engine defaults apply)
-
planning suggestions list [--status pending|accepted|rejected]- list pending/resolved planning sync suggestions
-
planning suggestions accept SUGGESTION_ID- accept suggestion and apply patch into confirmed overlay/objective
-
planning suggestions reject SUGGESTION_ID [--reason TEXT]- reject suggestion with optional reason
-
planning sync status- inspect planning sync lifecycle metadata
-
planning sync pull JSON_OR_@FILE [--source ID] [--confidence N] [--snapshot-id ID]- register inbound advisory suggestion as pending
-
planning sync push JSON_OR_@FILE [--source ID] [--confidence N] [--snapshot-id ID]- register outbound advisory suggestion as pending
- payload for
planning sync pull|push:- optional
profile_patch(gentle.planning_profile.v1) - optional
objective_patch(gentle.planning_objective.v1) - optional
message
- optional
- activation policy remains explicit user action (
accept/reject); no auto-apply in v1
-
screenshot-window OUTPUT.png- currently disabled by security policy
- returns deterministic disabled message from shared shell/CLI/GUI command paths
- kept as reserved adapter contract for future re-enable after explicit endpoint-security approval
-
agents list [--catalog PATH]- Lists configured agent systems from catalog JSON.
- Default catalog:
assets/agent_systems.json.
-
agents ask SYSTEM_ID --prompt TEXT [--catalog PATH] [--base-url URL] [--model MODEL] [--timeout-secs N] [--connect-timeout-secs N] [--read-timeout-secs N] [--max-retries N] [--max-response-bytes N] [--allow-auto-exec] [--execute-all] [--execute-index N ...] [--no-state-summary]- Invokes one configured agent system via catalog transport.
--base-urlapplies a per-request runtime base URL override for native transports (native_openai,native_openai_compat).--modelapplies a per-request runtime model override for native transports (native_openai,native_openai_compat).--timeout-secsapplies a per-request timeout override for stdio/native transports (maps toGENTLE_AGENT_TIMEOUT_SECS).--connect-timeout-secsapplies a per-request HTTP connect timeout override for native transports (maps toGENTLE_AGENT_CONNECT_TIMEOUT_SECS).--read-timeout-secsapplies a per-request read timeout override for stdio/native transports (maps toGENTLE_AGENT_READ_TIMEOUT_SECS).--max-retriesapplies a per-request transient retry budget override (maps toGENTLE_AGENT_MAX_RETRIES;0disables retries).--max-response-bytesapplies a per-request response body/output cap override (maps toGENTLE_AGENT_MAX_RESPONSE_BYTES).--no-state-summarysuppresses project context injection.- Suggested-command execution is per-suggestion only (no global always-execute).
Agent bridge catalog schema (gentle.agent_systems.v1):
{
"schema": "gentle.agent_systems.v1",
"systems": [
{
"id": "openai_gpt5_stdio",
"label": "OpenAI GPT-5 (stdio bridge)",
"description": "Optional human-readable description",
"transport": "external_json_stdio",
"command": ["openai-agent-bridge", "--model", "gpt-5"],
"env": {},
"working_dir": null
},
{
"id": "openai_gpt5_native",
"label": "OpenAI GPT-5 (native HTTP)",
"transport": "native_openai",
"model": "gpt-5",
"base_url": "https://api.openai.com/v1",
"env": {}
}
]
}Transport notes:
builtin_echo: offline/demo transport.external_json_stdio: requires local bridge executable fromcommand[0].native_openai: built-in OpenAI HTTP adapter; requiresOPENAI_API_KEY(environment or system-levelenvoverride in catalog entry).native_openai_compat: built-in OpenAI-compatible local HTTP adapter (/chat/completions), intended for local services such as Jan/Msty/Ollama when they expose an OpenAI-compatible endpoint. API key is optional.GENTLE_AGENT_BASE_URL(or CLI--base-url) overrides catalogbase_urlper request fornative_openaiandnative_openai_compat.GENTLE_AGENT_MODEL(or CLI--model) overrides catalogmodelper request fornative_openaiandnative_openai_compat.GENTLE_AGENT_TIMEOUT_SECS(or CLI--timeout-secs) overrides request timeout per attempt for agent transports.GENTLE_AGENT_CONNECT_TIMEOUT_SECS(or CLI--connect-timeout-secs) overrides HTTP connect timeout for native transports.GENTLE_AGENT_READ_TIMEOUT_SECS(or CLI--read-timeout-secs) overrides read timeout for stdio/native transports.GENTLE_AGENT_MAX_RETRIES(or CLI--max-retries) overrides transient retry count (0disables retries).GENTLE_AGENT_MAX_RESPONSE_BYTES(or CLI--max-response-bytes) overrides response-size cap per attempt (stdout/stderr or HTTP body).native_openai_compatrequires a concrete model name; valueunspecifiedis treated as missing and the request is rejected until a model is provided.native_openai_compatdoes not silently switch host/port; it uses catalogbase_urlor explicitGENTLE_AGENT_BASE_URL.
Agent request payload schema (gentle.agent_request.v1):
{
"schema": "gentle.agent_request.v1",
"system_id": "openai_gpt5_stdio",
"prompt": "User request text",
"sent_at_unix_ms": 1768860000000,
"state_summary": {}
}Agent response payload schema (gentle.agent_response.v1):
{
"schema": "gentle.agent_response.v1",
"assistant_message": "Text response",
"questions": ["Optional follow-up question"],
"suggested_commands": [
{
"title": "Optional short label",
"rationale": "Optional reason",
"command": "state-summary",
"execution": "ask"
}
]
}Agent execution intent semantics:
chat: explain/ask only, never executed as shell command.ask: executable suggestion requiring explicit user confirmation.auto: executable suggestion eligible for automatic execution only when caller enables--allow-auto-exec.
Agent schema/compatibility policy:
schemais mandatory for catalog/request/response JSON objects.- Supported major versions (current):
gentle.agent_systems.v1,gentle.agent_request.v1,gentle.agent_response.v1. - Future incompatible major versions (for example
.v2) are rejected with a deterministic schema-unsupported error. - Response validation is strict for canonical fields:
- top-level allowed:
schema,assistant_message,questions,suggested_commandsplus extension keys prefixed withx_orx- suggested_commands[]allowed:title,rationale,command,executionplus extension keys prefixed withx_orx-- unsupported canonical fields (for example
commands,mode) are rejected
- top-level allowed:
Execution safety rules:
- There is no global always-execute mode.
- Execution is per suggestion:
- explicit run (
--execute-index,--execute-all, GUI rowRun) - optional auto-run only for
execution = auto+--allow-auto-exec
- explicit run (
- Recursive
agents askexecution from suggested commands is blocked.
Failure-handling policy for external adapters:
- Adapter invocations use bounded retry with exponential backoff for transient failures.
- OpenAI
429withinsufficient_quotais treated as non-transient (no retry) and returned with the original API error body plus billing/usage guidance. - Missing/unreachable adapter binaries fail gracefully with deterministic adapter-unavailable errors.
- CLI/shell errors are stable and prefixed for scripting, e.g.:
AGENT_INVALID_INPUTAGENT_SCHEMA_VALIDATIONAGENT_SCHEMA_UNSUPPORTEDAGENT_ADAPTER_UNAVAILABLEAGENT_ADAPTER_TRANSIENTAGENT_ADAPTER_FAILEDAGENT_RESPONSE_PARSEAGENT_RESPONSE_VALIDATION
ClawBio/OpenClaw integration scaffold schemas:
- integration path:
integrations/clawbio/skills/gentle-cloning/ - included helper launchers:
gentle_local_checkout_cli.shfor local editable GENtle checkoutsgentle_apptainer_cli.shfor Apptainer/Singularity-backed:cliimages
- wrapper request schema:
gentle.clawbio_skill_request.v1mode:capabilities|state-summary|shell|op|workflow|raw- optional:
state_path,timeout_secs - optional:
expected_artifacts[]- wrapper-declared output files to copy into the ClawBio output bundle after command execution
- relative paths resolve from the actual execution working directory
- mode-specific:
shell:shell_lineop:operation(JSON object/string)workflow:workfloworworkflow_path- relative
workflow_pathresolves via current working directory, thenGENTLE_REPO_ROOT, then the local GENtle repo containing the scaffold when discoverable
- relative
raw:raw_args[]
- wrapper result schema:
gentle.clawbio_skill_result.v1status:ok|command_failed|timeout|failed|degraded_demo- includes resolver details, executed command, exit code, stdout/stderr, and generated artifact paths
artifacts.collected[]may enumerate declared output files copied into the wrapper bundle withdeclared_path,source_path, andcopied_path
- reproducibility outputs:
report.mdresult.jsonreproducibility/commands.shreproducibility/environment.ymlreproducibility/checksums.sha256
- included bootstrap example requests:
request_genomes_list_human.jsonrequest_genomes_status_grch38.jsonrequest_genomes_prepare_grch38.jsonrequest_helpers_status_puc19.jsonrequest_helpers_prepare_puc19.json
- included follow-on example requests:
request_genomes_extract_gene_tp53.jsonrequest_helpers_blast_puc19_short.jsonrequest_workflow_vkorc1_planning.jsonrequest_protocol_cartoon_gibson_svg.json- declares
expected_artifacts[]so the generated SVG is copied into the output bundle undergenerated/...
- declares
Planned operation refinements:
MergeContainers { inputs, output_prefix? }- Explicitly models wet-lab mixing of multiple tubes/pools.
- Protocol-based ligation:
Ligation { input_container, protocol, output_container?, ... }protocoldetermines allowed end joins.- Initial protocol values:
stickyblunt
- Future protocol values may include established ligation workflows represented as named presets.
Current parameter support:
max_fragments_per_container(default80000)- limits digest fragment output per operation
- also serves as ligation product-count limit guard
require_verified_genome_anchor_for_extension(defaultfalse)- when
true,ExtendGenomeAnchorrequires anchor provenance withanchor_verified=true - anchors with
anchor_verified=falseor missing verification status are rejected in strict mode - alias parameters accepted:
strict_genome_anchor_verification,strict_anchor_verification
- when
genome_anchor_prepared_fallback_policy(defaultsingle_compatible)- controls how
ExtendGenomeAnchor/VerifyGenomeAnchorresolve anchor genome ids when exact prepared cache id is not present. - accepted values:
off(no compatibility fallback; must match exact prepared id)single_compatible(auto-fallback only when one compatible prepared cache exists)always_explicit(never auto-fallback; require explicit selection even when only one compatible prepared cache exists)
- alias parameters accepted:
genome_anchor_fallback_mode,genome_anchor_prepared_mode
- controls how
- primer-design backend controls:
primer_design_backend(defaultauto)- accepted values:
auto,internal,primer3 autotries Primer3 and falls back deterministically to internal scoring with explicit warning + fallback reason in report metadata
- accepted values:
primer3_executable(default"primer3_core")- executable path/name used when backend is
primer3orauto - alias parameters accepted:
primer3_backend_executable,primer3_path
- executable path/name used when backend is
feature_details_font_size(default9.0, range8.0..24.0)- controls GUI font size for the feature tree entries and feature range details
regulatory_feature_max_view_span_bp(default50000, range>= 0)- hides regulatory feature overlays in linear view when current view span
exceeds this threshold (
0disables regulatory overlays)
- hides regulatory feature overlays in linear view when current view span
exceeds this threshold (
gc_content_bin_size_bp(default100, range>= 1)- controls GC-content aggregation bin size for linear/circular rendering and SVG export
- Linear DNA-letter routing parameters:
linear_sequence_letter_layout_mode(defaultAutoAdaptive)- supported canonical modes:
auto|adaptive|auto_adaptivestandard|standard_linearhelical|continuous_helicalcondensed_10_row|condensed
- auto mode uses deterministic viewport-density tiers:
<= 1.5x: standard<= 2x: helical (if compressed letters enabled)<= 10x: condensed-10 (if compressed letters enabled)> 10x:OFF
- supported canonical modes:
linear_sequence_helical_letters_enabled(defaulttrue)- applies to auto mode only (allows/disallows compressed auto tiers)
linear_sequence_helical_phase_offset_bp(range0..9)- seam offset used by helical/condensed row mapping
- reverse/helical strand geometry controls:
linear_show_double_strand_bases/linear_show_reverse_strand_bases(bool alias pair; controls reverse-strand letter visibility)linear_helical_parallel_strands(defaulttrue)true: forward/reverse helical slant stays parallelfalse: forward/reverse helical slant is mirrored (cross-over look)
reverse_strand_visual_opacity(range0.2..1.0, default0.55)- shared reverse-strand emphasis in linear map and sequence panel
- Legacy linear-letter threshold knobs are compatibility-only and return
deterministic deprecated no-op messages (no routing effect):
linear_sequence_base_text_max_view_span_bplinear_sequence_helical_max_view_span_bplinear_sequence_condensed_max_view_span_bp
- VCF display filter parameters (shared GUI/SVG state):
vcf_display_show_snpvcf_display_show_insvcf_display_show_delvcf_display_show_svvcf_display_show_othervcf_display_pass_onlyvcf_display_use_min_qualvcf_display_min_qualvcf_display_use_max_qualvcf_display_max_qualvcf_display_required_info_keys(CSV string or string array)
- TFBS display filter parameters (shared GUI/SVG state):
show_tfbstfbs_display_use_llr_bitstfbs_display_min_llr_bitstfbs_display_use_llr_quantiletfbs_display_min_llr_quantiletfbs_display_use_true_log_odds_bitstfbs_display_min_true_log_odds_bitstfbs_display_use_true_log_odds_quantiletfbs_display_min_true_log_odds_quantile
- Restriction-enzyme display parameters (shared GUI/SVG state):
show_restriction_enzymesshow_restriction_enzyme_sites(bool alias)restriction_enzyme_display_moderestriction_display_mode(string alias)- supported values:
preferred_onlypreferred_and_uniqueunique_onlyall_in_view
- supported values:
preferred_restriction_enzymespreferred_restriction_enzymes_csv(CSV alias)restriction_preferred_enzymes(CSV/string-array alias)- accepts either a CSV string or a string array
- BLAST options-layer parameters:
blast_options_override(JSON object ornull)- project-level BLAST option layer merged before per-command request JSON
- supports the same keys as request JSON (
task,max_hits,thresholds)
blast_options_defaults_path(string path ornull)- optional defaults-file path used ahead of project/request layers
- if unset, engine falls back to
assets/blast_defaults.json
Current ligation protocol behavior:
protocolis mandatory.- If
protocol = Blunt, ligation enumerates ordered input pairs with blunt-end compatibility checks. - If
protocol = Sticky, ligation enumerates ordered input pairs with sticky-end overhang compatibility checks. unique = truerequires exactly one product.
FilterByMolecularWeight semantics:
- Applies a bp-range filter across provided input sequence ids.
- Effective accepted range is expanded by
error:effective_min = floor(min_bp * (1 - error))effective_max = ceil(max_bp * (1 + error))
unique = truerequires exactly one match, otherwise the operation fails.
FilterByDesignConstraints semantics:
- Applies practical design-constraint filters across provided input sequence ids.
- Optional GC bounds:
gc_minand/orgc_max(fractional range0.0..1.0)- when both are provided,
gc_min <= gc_maxis required
- Optional homopolymer cap:
max_homopolymer_run >= 1- rejects candidates with a longer A/C/G/T run
reject_ambiguous_bases(defaulttrue):- rejects sequences containing non-ACGT letters
avoid_u6_terminator_tttt(defaulttrue):- rejects sequences containing
TTTT
- rejects sequences containing
- Optional
forbidden_motifs:- IUPAC motifs; reject when motif appears on either strand
unique = truerequires exactly one match, otherwise the operation fails.
Guide-design semantics:
- Guide sets persist in
ProjectState.metadata["guide_design"](schema = gentle.guide_design.v1) and include:- guide sets
- practical-filter reports
- oligo sets
- audit log entries for guide operations/exports
UpsertGuideSet:- normalizes guide fields and validates required properties
- sorts by rank (then guide id) and rejects duplicate
guide_idwithin one set
FilterGuidesPractical:- applies deterministic practical filters over one guide set
- supports GC bounds, global/per-base homopolymer limits, ambiguous-base
rejection, U6
TTTTavoidance, dinucleotide repeat cap, forbidden motifs, and required 5' base checks - can emit a passed-only output guide set (
output_guide_set_id) - always persists a structured per-guide report with reasons/warnings/metrics
GenerateGuideOligos:- generates forward/reverse oligos using a named template
- supports optional 5' G extension and passed-only mode
- persists generated oligo records in named oligo sets
ExportGuideOligos:- exports an oligo set as
csv_table,plate_csv(96/384), orfasta - records export actions in the guide-design audit log
- exports an oligo set as
ExportGuideProtocolText:- exports a deterministic human-readable protocol text artifact
- optional QC checklist can be included/excluded
Candidate-set semantics:
GenerateCandidateSetcreates a persisted candidate window set over one source sequence and computes baseline metrics for each candidate.GenerateCandidateSetBetweenAnchorscreates a persisted candidate window set constrained to the in-sequence interval between two local anchors.ScoreCandidateSetExpressioncomputes a derived metric from an arithmetic expression over existing metrics.ScoreCandidateSetDistancecomputes feature-distance metrics against filtered feature targets.FilterCandidateSetkeeps/drops candidates by absolute bounds and/or quantile bounds for a named metric.CandidateSetOpsupports set algebra (union,intersect,subtract) over candidate identity (seq_id,start_0based,end_0based).ScoreCandidateSetWeightedObjectivecomputes one metric from weighted objective terms (maximize/minimizeper term, optional normalization).TopKCandidateSetselects an explicit top-k subset for one metric with a deterministic tie-break policy.ParetoFrontierCandidateSetkeeps non-dominated candidates for multiple objectives (maximize/minimizeper objective), with optional tie-break truncation.- Workflow macro templates are persisted in project metadata:
UpsertWorkflowMacroTemplatestores/replaces named templatesDeleteWorkflowMacroTemplateremoves templates- each template now carries
template_schema(gentle.cloning_macro_template.v1) so cloning-operation macro intent is explicit at engine level - optional
details_urlcan link to external protocol/reference material - optional typed
input_ports/output_portscan be persisted directly in template metadata (same port shape as routine catalog ports) - template expansion/binding is exposed through adapter command surfaces
(
macros template-*, includingmacros template-import PATH) - expanded scripts run through shared shell execution (
macros run) and can orchestrate full cloning operations viaop ...orworkflow ...payloads - shipped starter assets:
- legacy pack:
assets/cloning_patterns.json(gentle.cloning_patterns.v1) - hierarchical catalog:
assets/cloning_patterns_catalog/**/*.json(gentle.cloning_pattern_template.v1, one template per file) - Gibson baseline template:
assets/cloning_patterns_catalog/gibson/overlap_assembly/gibson_two_fragment_overlap_preview.json - Restriction baseline template:
assets/cloning_patterns_catalog/restriction/digest_ligation/digest_ligate_extract_sticky.json
- legacy pack:
- Typed cloning-routine catalog baseline:
- manifest:
assets/cloning_routines.json - schema:
gentle.cloning_routines.v1 - typed routine metadata fields include routine family/status/tags, linked template name/path, and typed input/output port declarations
- includes Gibson + restriction family baselines:
gibson.two_fragment_overlap_previewrestriction.digest_ligate_extract_sticky
- adapter discovery surface:
routines list [--catalog PATH] [--family NAME] [--status NAME] [--tag TAG] [--query TEXT] - explainability and comparison surfaces:
routines explain ROUTINE_ID [--catalog PATH]routines compare ROUTINE_A ROUTINE_B [--catalog PATH]
- manifest:
- Macro-instance lineage baseline:
- mutating
macros run/macros template-runappend oneLineageMacroInstancerecord in project lineage state for success and failure pathways - records include deterministic
macro_instance_id, optionalroutine_id/template_name, typed bound inputs/outputs, emittedop_ids, status, and optionalstatus_message - lineage graph + lineage SVG consume these records as macro box nodes with explicit input/output edges where sequence/container references resolve
- mutating
- Candidate macro templates are persisted in project metadata:
UpsertCandidateMacroTemplatestores/replaces named templatesDeleteCandidateMacroTemplateremoves templates- optional
details_urlcan link to external protocol/reference material - template expansion/binding is exposed through adapter command surfaces
(
candidates template-*)
- Between-anchor generation augments baseline metrics with anchor-aware fields
(
distance_to_anchor_a_bp,distance_to_anchor_b_bp,distance_to_nearest_anchor_bp, interval span metadata).
Feature-distance geometry controls (candidate generation and distance scoring):
feature_geometry_mode(optional, defaultfeature_span):feature_span: one interval per feature using whole-feature boundsfeature_parts: one interval per explicit location part (ignores intronic gaps for multipart features)feature_boundaries: boundary points of explicit location parts
feature_boundary_mode(optional, defaultany):any,five_prime,three_prime,start,end- only meaningful when
feature_geometry_mode = feature_boundaries
feature_strand_relation(optional, defaultany):any,same,opposite- current engine interpretation is sequence-forward relative
(
same = '+',opposite = '-'feature strand)
- Directed-boundary interpretation:
- on
+strand:five_prime = start,three_prime = end - on
-strand:five_prime = end,three_prime = start - for unknown strand,
five_prime/three_primeconservatively include both boundaries.
- on
RenderPoolGelSvg semantics:
- Accepts explicit
inputs(sequence ids) and an outputpath. - Optional
container_idsrenders one lane per referenced stored container. - Optional
arrangement_idrenders one lane per stored serial-arrangement lane. - Optional
conditionscarries one shared gel-run profile for the whole render:agarose_percentbuffer_model(tae/tbe)topology_aware
- Computes pool migration from sequence bp length plus one deterministic
heuristic condition model:
- agarose/buffer reshape the shared migration curve
- topology-aware mode distinguishes:
- generic
circular supercoiledrelaxed circularnicked/open circularlinear
- generic
- when sequence names/definitions/comments carry explicit circular-form hints,
those richer forms are used; otherwise circular sequences fall back to the
generic
circularmodel
- Computes sample-band brightness from estimated DNA mass proxy, not only multiplicity.
- Applies one deterministic co-migration grouping threshold so nearby fragments that would plausibly collapse into one readout are exported as one merged band annotation instead of several visually indistinguishable bars.
- Chooses one or two ladders to span pool range:
- from explicit
ladderslist when provided - otherwise from saved arrangement ladders when
arrangement_idis present and the arrangement stores a ladder choice - otherwise from built-in ladder catalog (auto mode)
- from explicit
- Renders ladder lanes plus pooled band lanes as SVG artifact.
- The SVG also includes a compact fragment table for non-ladder lanes:
- observed apparent size
- actual bp
- topology-form hint
- estimated mass proxy
- merged-band annotation when several fragments co-migrate
- source fragment labels
- When serial lanes carry interpretable roles such as
vector,insert_*, andproduct, the right-hand detail panel also addsComparison hintsfor:- insert vs fine ladder sizing
- vector vs product size-shift reading
- simple product-minus-vector vs summed-insert consistency checks
- role badges under the lane labels so generic container names still read as
VECTOR,INSERT, orPRODUCT
- When merged bands are present, the right-hand detail panel also adds a short
Merged-band notesblock that explains observed band position vs the underlying actual source-size span.
CreateArrangementSerial semantics:
- Persists an ordered serial-lane setup over stored containers.
- Optional
ladderscan store one symmetric ladder or one left/right ladder pair for later gel preview/export reuse. - Also materializes one default physical rack draft:
- choose the smallest built-in rack/plate profile that fits the arrangement payload plus ladder-reference positions
- place the arrangement block row-major in that rack
- link the arrangement back to the resulting
default_rack_id
SetArrangementLadders semantics:
- Mutates an existing serial arrangement in place.
ladders = nullclears back to shared engine auto ladder selection.- one ladder name means the same ladder is used on both sides during arrangement-based gel preview/export.
- two ladder names mean explicit left/right ladder selection.
CreateRackFromArrangement semantics:
- Creates one new physical rack/plate draft from one stored arrangement.
- Optional
profileoverrides the default smallest-fitting profile choice. - If
profileis omitted, the engine chooses in this order:small_tube_4x6plate_96plate_384
- Placement is row-major and preserves arrangement order.
- Ladder-bearing arrangements reserve left/right ladder-reference positions in the same contiguous block.
PlaceArrangementOnRack semantics:
- Places one arrangement onto an existing rack as one contiguous block at the next free region in fill order.
- Existing rack occupants stay in order; the appended arrangement does not reorder earlier blocks.
- Shared racks are therefore possible without losing arrangement identity.
MoveRackPlacement semantics:
- Moves one occupied rack coordinate within one saved rack.
move_block=falsemeans move one sample within its arrangement block and shift neighboring occupied positions to preserve order.move_block=truemeans move the whole arrangement block and shift later occupied blocks in fill order.- The operation is order-preserving by design; it does not treat arbitrary holes as the primary editing model.
MoveRackSamples semantics:
- Moves two or more selected samples together within one saved rack.
from_coordinates[]must all resolve to occupied positions from the same arrangement block.- The shared engine normalizes the selected samples against the rack's current occupied order; it preserves that rack order even if the request lists the source coordinates differently.
- The selected samples move as one contiguous combined group within that same arrangement block.
- Neighboring occupied positions shift in fill order to keep the block contiguous.
- This is the shared engine contract behind rack-editor sample multi-select moves.
MoveRackArrangementBlocks semantics:
- Moves two or more selected arrangement blocks together within one saved rack.
arrangement_idsare normalized against the rack's current occupied order; the shared engine preserves the rack-ordering of selected blocks even if the request lists them in another order.- The selected blocks move as one contiguous combined group.
- Later occupied blocks shift in fill order to keep the rack contiguous.
- This is the shared engine contract behind rack-editor multi-select moves.
SetRackProfile semantics:
- Reprojects one saved rack onto another built-in profile.
- Existing arrangement order is preserved while coordinates are reflowed under the target profile geometry.
- Existing fill direction is preserved.
- Existing blocked coordinates are preserved when still in-bounds for the new geometry; out-of-bounds blocked coordinates are dropped deterministically.
ApplyRackTemplate semantics:
- Applies one engine-owned quick-authoring template on top of an existing rack snapshot.
- Built-in templates:
bench_rowsfill_direction = row_majorblocked_coordinates = []
plate_columnsfill_direction = column_majorblocked_coordinates = []
plate_edge_avoidancefill_direction = column_majorblocked_coordinates = outer perimeter of the current profile
- Existing arrangement order is preserved while occupied coordinates are reflowed onto the resulting available slots.
plate_edge_avoidancerequires at least a3 x 3profile so an interior region remains after blocking the perimeter.
SetRackFillDirection semantics:
- Reprojects one saved rack onto the same geometry with a different fill order.
- Supported values:
row_majorcolumn_major
- Existing arrangement order is preserved while occupied coordinates are reassigned under the new fill order.
SetRackProfileCustom semantics:
- Reprojects one saved rack onto one custom A1-style geometry.
rowsandcolumnsare persisted directly in the rack profile snapshot.- Existing fill direction is preserved.
- Existing blocked coordinates are preserved when still in-bounds for the new geometry; out-of-bounds blocked coordinates are dropped deterministically.
- Existing arrangement order is preserved while coordinates are reflowed under the custom geometry.
- A1-style row labels continue beyond
ZasAA,AB, ...
SetRackBlockedCoordinates semantics:
- Persists one normalized blocked/reserved coordinate set on the rack profile.
- Blocked coordinates are excluded from placement capacity and fill-order reflow.
- Existing arrangement order is preserved while occupied coordinates are reassigned onto the remaining available positions.
- Duplicate blocked coordinates are removed deterministically.
ExportRackLabelsSvg semantics:
- Writes one deterministic SVG label sheet for a saved rack.
- Optional
arrangement_idrestricts output to labels belonging to one arrangement block on that rack. presetis engine-owned and defaults tocompact_cards.- Built-in presets:
compact_cardsprint_a4wide_cards
- Label rows currently include:
- rack id
- position
- role
- container/ladder display name
- sequence id when sequence-backed
- bp length/topology when sequence-backed
ExportRackFabricationSvg semantics:
- Writes one deterministic top-view fabrication/planning SVG for a saved rack.
- Uses one engine-owned physical carrier template layered on the current saved rack snapshot rather than inventing a second placement model.
- Current built-in physical templates:
storage_pcr_tube_rackpipetting_pcr_tube_rack
- The export consumes:
- rack geometry (
rows,columns, blocked coordinates) - saved rack occupancy and arrangement ids for visual planning markers
- rack geometry (
- Intended downstream uses:
- fabrication sketching
- bench planning
- future simulation adapters
ExportRackIsometricSvg semantics:
- Writes one deterministic pseudo-3D/isometric SVG for a saved rack.
- Uses the same engine-owned physical carrier template family as
ExportRackFabricationSvgandExportRackOpenScad. - Consumes the same linked rack snapshot:
- rows / columns / blocked coordinates
- occupied placements
- arrangement ids and colors
- Intended downstream uses:
- README / documentation hero figures
- bench-facing communication assets
- presentation-ready rack review without leaving the shared engine path
ExportRackOpenScad semantics:
- Writes one deterministic parameterized OpenSCAD source file for a saved rack.
- Uses the same engine-owned physical carrier template family as
ExportRackFabricationSvg. - Current OpenSCAD export intentionally favors geometry over embedded text:
- tube openings
- outer carrier body
- front label-strip recess
- Printable labels/front strips remain separate shared exports rather than being baked permanently into the 3D geometry in this first baseline.
ExportRackCarrierLabelsSvg semantics:
- Writes one deterministic carrier-matched SVG sheet for a saved rack.
- Uses the same engine-owned physical carrier template family as
ExportRackFabricationSvgandExportRackOpenScad. - Supports optional arrangement scoping:
- whole-rack export when
arrangement_idis omitted - one arrangement/module export when
arrangement_idis provided
- whole-rack export when
- Built-in presets:
front_strip_and_cardsfront_strip_onlymodule_cards_only
- Current baseline can emit:
- one front-strip label sized from the selected physical template
- one module card per arrangement in scope
ExportRackSimulationJson semantics:
- Writes one deterministic machine-readable JSON export for downstream simulation adapters.
- Uses the same engine-owned physical carrier template family as the physical SVG/OpenSCAD exports.
- Current baseline includes:
- rack/profile metadata
- selected physical template geometry
- arrangement block summaries
- one slot record per physical coordinate with:
- row/column/coordinate
- fill ordinal
- blocked status
- physical center in mm
- occupant/arrangement metadata when occupied
RenderDotplotSvg semantics:
- Inputs:
seq_id(owner sequence id for the stored dotplot payload)dotplot_id(stored payload id fromComputeDotplotorComputeDotplotOverlay)path(output SVG)- optional
flex_track_id(adds flexibility panel in same SVG) - optional
display_density_thresholdanddisplay_intensity_gain(display tuning)
- Ownership checks:
- dotplot payload must belong to
seq_id - optional flexibility track must also belong to
seq_id
- dotplot payload must belong to
- Output:
- deterministic SVG dotplot artifact; operation is non-mutating
- overlay payloads render all stored
query_serieswith legend + merged reference-exon side track; flexibility panel is suppressed there because the x-axis is normalized per series.
DeriveTranscriptSequences semantics:
- Inputs:
seq_id- optional
feature_ids[] - optional splicing
scope - optional
output_prefix
- Behavior:
- derives one spliced transcript/cDNA sequence per admitted
mRNA/transcriptfeature (or per transcript admitted by the selected splicing scope). - preserves transcript provenance on the derived sequence through synthetic
local
mRNAandexonfeatures. - when coding context is available, also derives a synthetic local
CDSfeature and attached protein-translation qualifiers on the derived transcript sequence. - CDS/protein derivation now resolves from, in order:
- explicit transcript
cds_ranges_1based - matching source
CDSfeatures that fit within the transcript exons - optional
/codon_startorphase - optional
/transl_table - source/transcript/CDS
organismandorganellecontext
- explicit transcript
- translation-table resolution is deterministic:
- explicit
/transl_tableon CDS, transcript, or source wins - plastid/chloroplast-like organelles default to NCBI table
11 - mitochondrial context without explicit
/transl_tablecurrently falls back to table1and emits an explicit warning because lineage-specific mitochondrial table inference is not implemented yet
- explicit
- the derived transcript still keeps translation qualifiers locally, and
DeriveProteinSequencescan then materialize the corresponding peptides as first-class protein sequence entries.
- derives one spliced transcript/cDNA sequence per admitted
- Derived feature qualifiers:
- derived
mRNAmay now include:cds_ranges_1basedprotein_length_aatranslation_tabletranslation_table_labeltranslation_table_sourcetranslation_context_organismtranslation_context_organelletranslation_speed_profile_hintderived_protein_translation
- derived synthetic
CDSmay now include:translationcodon_starttransl_tabletranslation_table_labeltranslation_table_sourceprotein_length_aaterminal_stop_trimmed- zero or more
translation_warningqualifiers
- derived
- Translation-speed preparation:
- transcript derivation now records a normalized
translation_speed_profile_hintwhen the source organism resolves to one of the initial target species:humanmouseyeastecoli
- transcript derivation now records a normalized
- Output:
- additive sequence creation through regular
OpResult.created_seq_ids - deterministic messages/warnings about CDS absence, translation-table fallback, partial codons, ambiguous codons, or internal stops.
- additive sequence creation through regular
DeriveProteinSequences semantics:
- Inputs:
seq_id- optional
feature_ids[] - optional splicing
scope - optional
output_prefix
- Behavior:
- derives one first-class protein sequence per selected/admitted transcript
- uses annotated CDS translation when available
- if CDS annotation is absent, falls back deterministically to:
- an inferred ATG-start ORF on the derived transcript
- otherwise the longest stop-free reading-frame segment
- emits one full-span local
Proteinfeature on the derived peptide with:- transcript/source provenance
- derivation mode (
annotated_cds,inferred_orf,heuristic_longest_frame) - translation-table context
- organism/organelle context when available
- optional
translation_speed_profile_hint
- Output:
- additive sequence creation through regular
OpResult.created_seq_ids - deterministic warnings when CDS annotation is missing or heuristic inference had to be used
- additive sequence creation through regular
ReverseTranslateProteinSequence semantics:
- Inputs:
seq_id(must resolve to a protein sequence)- optional
output_id - optional
speed_profile:humanmouseyeastecoli
- optional
speed_mark:fastslow
- optional
translation_table - optional
target_anneal_tm_c - optional
anneal_window_bp
- Behavior:
- generates one synthetic coding DNA sequence for the selected protein
- codon choice is deterministic and translation-table-aware
speed_mark=fastbiases toward preferred codons for the selected/bundled species profilespeed_mark=slowbiases away from the preferred codon when synonymous choices exist- optional
target_anneal_tm_capplies a lightweight local suffix Tm heuristic overanneal_window_bpwindows to mildly steer codon choice; it is advisory rather than a full sequence optimizer
- Output:
- additive sequence creation through regular
OpResult.created_seq_ids - one full-span synthetic local
CDSfeature with:- protein provenance
- translation table/label
- optional speed profile/mark qualifiers
- optional annealing-target qualifiers
- zero or more
reverse_translation_warningqualifiers
- additive sequence creation through regular
RenderProtocolCartoonSvg semantics:
- Inputs:
protocol(built-in ids now includegibson.two_fragment,gibson.single_insert_dual_junction,pcr.assay.pair,pcr.assay.pair.no_product,pcr.assay.pair.with_tail, andpcr.assay.qpcr)path(output SVG)
- Behavior:
- renders a deterministic protocol-cartoon strip through one engine route, independent of GUI/CLI entry point.
- emits canonical conceptual step order for the requested protocol as an ordered event-sequence model.
- template representation baseline is now available in engine internals:
- schema id:
gentle.protocol_cartoon_template.v1 - sparse template rows (event/molecule/feature) are resolved with deterministic defaults into render-ready specs.
- built-in protocol families should now be composed from shared internal figure building blocks (feature spans, strand-specific tails, linear molecule rows, event rows) rather than ad-hoc per-protocol struct literals; this keeps future PCR/Gibson growth on one composition model
- schema id:
- internal model used by renderer:
- event -> molecules -> feature fragments
- molecule topology supports
linear|circular - linear molecules may carry end styles
(
NotShown,Continuation,Blunt, orSticky { polarity: FivePrime|ThreePrime, nt }) - feature fragments can optionally render different top-strand and bottom-strand colors and lengths, plus strand-specific nicks after a segment boundary; this is useful for annealed overlaps, exonuclease chew-back cartoons with single-stranded tails, and polymerase-filled intermediates that still require ligase
- malformed protocol cartoon specs fail validation and render deterministic invalid-spec SVG diagnostics instead of panicking.
- Output:
- deterministic SVG artifact; operation is non-mutating.
RenderProtocolCartoonTemplateSvg semantics:
- Inputs:
template_path(JSON file path, schemagentle.protocol_cartoon_template.v1)path(output SVG)
- Behavior:
- reads template JSON from disk and parses it deterministically.
- resolves sparse event/molecule/feature rows using deterministic defaults (action/caption/topology/end styles/feature length/palette).
- validates resolved cartoon semantics before rendering.
- Output:
- deterministic SVG artifact; operation is non-mutating.
ValidateProtocolCartoonTemplate semantics:
- Inputs:
template_path(JSON file path, schemagentle.protocol_cartoon_template.v1)
- Behavior:
- reads and parses template JSON deterministically.
- resolves sparse defaults and validates resolved cartoon semantics.
- emits validation diagnostics through operation result messages; no SVG is written.
- Output:
- non-mutating validation result suitable for pre-render checks in CLI/GUI flows.
RenderProtocolCartoonTemplateWithBindingsSvg semantics:
- Inputs:
template_path(JSON file path, schemagentle.protocol_cartoon_template.v1)bindings_path(JSON file path, schemagentle.protocol_cartoon_template_bindings.v1)path(output SVG)
- Behavior:
- loads template and binding payloads.
- applies deterministic ID-targeted overrides (defaults, event, molecule, feature) and then resolves the bound template.
- validates resolved semantics before SVG rendering.
- Output:
- deterministic SVG artifact; operation is non-mutating.
ExportProtocolCartoonTemplateJson semantics:
- Inputs:
protocol(built-in protocol cartoon id, for examplegibson.two_fragmentorpcr.assay.pair/pcr.assay.pair.with_tail/pcr.assay.qpcr)path(output JSON file)
- Behavior:
- materializes the canonical built-in template
(
gentle.protocol_cartoon_template.v1) for the requested protocol. - writes deterministic pretty JSON suitable for user editing/tweaking.
- materializes the canonical built-in template
(
- Output:
- deterministic JSON artifact; operation is non-mutating.
ExportProcessRunBundle semantics:
- Exports a deterministic JSON run bundle artifact (
gentle.process_run_bundle.v1) for reproducibility/audit. - Inputs:
path(required): output JSON filerun_id(optional): when set, only operation-log rows for thatrun_idare exported; when omitted, all operation-log rows are exported.
- Payload sections:
inputs:- per-operation extracted input references
(
sequence_ids,container_ids,arrangement_ids, candidate/guide sets, genome ids, file inputs) - aggregated referenced ids and inferred
root_sequence_ids
- per-operation extracted input references
(
parameter_overrides:- chronological
SetParameteroverrides withop_id,record_index, parametername, and exact JSONvalue
- chronological
decision_traces:- optional routine-assistant trace rows
(
gentle.routine_decision_trace.v1) captured in project metadata and exported for routine-selection reproducibility (trace_id, selected routine/alternatives, disambiguation questions/answers, binding snapshot, helper-awareroutine_preference_context, candidate planning-score snapshots, suggested macro templates, orderedpreflight_history, canonicalpreflight_snapshot, execution outcome, export events)
- optional routine-assistant trace rows
(
operation_log:- selected immutable operation records (
run_id, operation payload, result)
- selected immutable operation records (
outputs:- created/changed sequence ids
- final sequence summaries for affected ids
- container/arrangement ids created by selected operations
- file artifact paths produced by selected operations
parameter_snapshot:- full current engine parameter snapshot at export time.
construct_reasoning:- portable construct-reasoning payload for ClawBio/OpenClaw and other agent-facing consumers
seq_ids_considered: deterministic union of referenced plus created/changed sequence ids from the exported run slicesummaries: compact per-sequence reasoning summaries (objective, fact/decision coverage, host/helper/medium context, interpreted growth signals, supported selection rules, and warning lines)graphs: the selected storedgentle.construct_reasoning_graph.v1payloads themselves for full offline inspection
- Failure modes:
- empty
path=>InvalidInput - unknown filtered
run_id(no selected rows) =>NotFound
- empty
Construct reasoning graph foundation (implemented first slice):
- Shared portable records now exist for:
gentle.construct_objective.v1gentle.design_evidence.v1gentle.design_fact.v1gentle.design_decision_node.v1gentle.construct_candidate.v1gentle.construct_reasoning_graph.v1gentle.construct_reasoning_store.v1gentle.host_profile_catalog.v1
- Current engine-backed scope:
- project metadata key:
construct_reasoning - objective upsert/store round-trip
- construct-objective schema now reserves additive host/helper context
fields:
propagation_host_profile_idexpression_host_profile_idhost_route[]medium_conditions[]helper_profile_idrequired_host_traits[]forbidden_host_traits[]
- design-evidence schema now also reserves additive non-sequence context
fields:
scopehost_profile_idhost_route_step_idhelper_profile_idmedium_condition_id
- deterministic read-only graph build from:
- construct-objective context such as selected propagation/expression host profiles, helper profile, host-route steps, medium conditions, and required/forbidden host traits
- existing sequence facts: restriction sites plus sequence-feature spans such as exon/CDS/gene/transcript/UTR/promoter/TFBS/variant when present
- deterministic hard-rule fact/decision population for:
- propagation-host context
- expression-host context
- host-transition status
- host-route restriction/methylation review from:
- explicit route-step trait text such as
hsdR- M+,dam+,dcm+,hsdR+, orMDRS+ - deterministic sequence motif tallies for Dam, Dcm, and EcoK-like target sites
- explicit route-step trait text such as
- growth/condition context from structured medium-condition interpretation (for example nutrient omission, antibiotic selection agent, heat shock, and temperature signals)
- helper/MCS context
- variant-effect context derived from overlap of mapped variant markers against promoter/enhancer/TFBS/CDS/exon/UTR/splice evidence already in the graph
- variant-assay context that maps the same deterministic overlap rules onto
first assay-family suggestions such as:
- promoter/regulatory reporter follow-up
- allele-paired coding-expression comparison
- minigene splice follow-up
- UTR reporter / translation comparison
- selection/complementation context built from engine-owned
selection/complementation rules, currently seeded with:
- the proline-rescue baseline (
proA/proB-style annotated construct features plus proline-style medium conditions) - helper-backed selectable-marker context from normalized helper-profile
interpretation (for example helper semantics such as
AmpR/ampicillin_selection)
- the proline-rescue baseline (
- active-sequence graph refresh helper that reuses the existing graph/objective identity when rebuilding deterministic evidence after sequence changes
- JSON export helper for one stored graph
- host-profile catalog loading/list projection from the shared starter JSON
catalog (
assets/host_profiles.json) with filter matching across ids, aliases, genotype/phenotype tags, notes, and source notes
- project metadata key:
- Current GUI-backed scope:
- sequence-window
Reasoningdisplay toggle - read-only linear DNA-map overlay that auto-refreshes from the engine-owned graph and paints evidence spans directly on-sequence
- GUI-side hover/selection inspection for one evidence span at a time
- side-panel construct-reasoning inspector section for non-sequence facts and decision steps (host/helper/host-route restriction-methylation/medium/ growth/selection context) without pretending they are coordinate spans
- Planning-window
Host Profile Browserbacked by the same shared catalog so host/strain traits can be inspected without reparsing raw JSON - GUI-only role/class visibility filters layered on top of the same shared engine-owned graph payload (no adapter-local graph recomputation)
- ClawBio/OpenClaw-facing run-bundle export integration:
- deterministic per-sequence summary rows for concise agent consumption including additive variant effect tags and suggested assay-family ids
- embedded stored reasoning graphs for full offline inspection/replay
- sequence-window
- Current evidence-class rules:
- restriction sites =>
hard_fact - dbSNP / VCF-generated variant markers =>
hard_fact - exon/splice annotations with explicit cDNA-style qualifier hints =>
hard_fact - imported/derived sequence annotations =>
reliable_annotation - TFBS-style annotations =>
soft_hypothesis
- restriction sites =>
- Current evidence-scope behavior:
- graph builds now emit both:
sequence_spanevidence for mapped restriction/annotation features- non-sequence construct-objective context evidence
(
host_profile,host_transition,medium_condition,helper_profile,whole_construct) when the objective carries those fields
- GUI DNA overlays intentionally keep rendering only
sequence_spanevidence; non-sequence evidence stays in the portable graph payload rather than being faked as coordinate spans
- graph builds now emit both:
- Not in this slice yet:
- construct-candidate ranking
- curated host/helper profile catalog loading and biological compatibility scoring against those catalogs
- editable reasoning/decision GUI surfaces beyond the current read-only span overlay plus inspector summary
Protocol-cartoon family growth direction (planned):
- Gibson specialist work now validates the abstraction-first protocol-cartoon
strategy:
- one protocol family should be expressed as a canonical
gentle.protocol_cartoon_template.v1template plus deterministic bindings - the renderer should grow a shared collection of reusable figure building blocks that protocol families compose, instead of embedding protocol- specific drawing code in each built-in cartoon
- protocol growth in count/shape (for example multi-fragment Gibson) should prefer repeated events, repeated molecules, and binding-level overrides rather than renderer-specific special cases
- one protocol family should be expressed as a canonical
- Implemented baseline:
- built-in Gibson cartoons now compose from shared internal building blocks for duplex spans, strand-specific tails, linear molecule rows, and event rows
- this is intentionally still mechanism-first: Gibson cartoons describe fragment flow and achieved homology relationships, not full primer objects or low-level PCR parameter details
- PCR-assay cartoons should follow the same rule. The shared renderer should remain chemistry-agnostic and continue to render only ordered events, molecules, and features; PCR-specific meaning belongs in template structure and bindings, not in new PCR-only drawing primitives.
- PCR cartoon purpose:
- explain assay intent and artifact flow, not every thermocycler sub-step
- stay aligned with engine-owned operations, reports, and lineage-visible artifacts
- keep lower-level primer sequences/thermocycler details in adjacent textual reports or inspectors rather than the cartoon itself
- Canonical PCR assay scene vocabulary should stay stable across modalities:
- source template/context event
- target/ROI event (selected span, feature-derived span, or queued region)
- assay setup event (forward/reverse pair and optional probe or staged inner/outer sets may be named, but do not need literal primer glyphs)
- amplification event
- product/artifact event (amplicon, extracted copy, report, export, or explicit no-accepted-pairs outcome)
- Implemented PCR baseline in the
pcr.assay.*family:pcr.assay.pair: base strip with one selected ROI, one assay-setup lane, one amplification step, one amplicon/report outcome, and explicit forward/ reverse primer glyphs with 5'/3' orientationpcr.assay.pair.no_product: same family with an explicit report-only terminal state when no accepted primer pair yields a productpcr.assay.pair.with_tail: insertion-first strip with requested extension sequences + insertion anchors, anchor-adjacent primer windows, and carried- through inserted terminal tails in the final ampliconpcr.oe.substitution: six-step overlap-extension substitution strip with primer seta..f, first-step product haplotypes (AB/CD/EF), strand-specific anneal-gap geometry, and polymerase fill
- Implemented qPCR baseline in the same
pcr.assay.*family:pcr.assay.qpcr: same base strip enriched with one internal probe window plus explicit forward/reverse primer glyphs, a retained probe marker, and one quantitative readout terminal state
- Planned PCR modality adaptation should continue through the same
pcr.assay.*protocol-cartoon family:- nested PCR: same family with two amplification stages (outer -> inner) instead of one, reusing the same event vocabulary
- inverse PCR: same family with circular-template bindings and outward-facing primer semantics
- batch/multiplex/tiling: repeated assay groups or repeated output lanes in bindings, not new renderer semantics per assay count
- empty/failure outcomes: report/artifact nodes can render without product nodes when no accepted assay is produced
- Recommended rollout order:
- extend the shipped PCR/qPCR baseline to queued batch PCR without changing renderer semantics
- add nested, inverse, long-range, and multiplex variants as further template/binding expansions
- Naming/design rule:
- do not introduce one built-in protocol id per assay count or minor UI view
- prefer one stable protocol family with bindings that carry assay modality, stage count, molecule presence, and repeated-lane structure
- keep generated explanatory strips exportable through the existing
protocol-cartoon ...routes
RNA secondary-structure semantics:
- Inspection API:
GentleEngine::inspect_rna_structure(seq_id)- Runs
rnapkin -v -p <sequence>and returns structured text report (stdout/stderr+ command metadata).
- Export operation:
RenderRnaStructureSvg { seq_id, path }- Runs
rnapkin <sequence> <path>and expects SVG output atpath.
- Input constraints:
- accepted only for single-stranded RNA (
molecule_typeRNAorssRNA) - empty sequence is rejected
- accepted only for single-stranded RNA (
- Runtime dependency:
- external
rnapkinexecutable is required - executable path resolution order:
- env var
GENTLE_RNAPKIN_BIN - fallback executable name
rnapkininPATH
- env var
- external
DNA ladder catalog semantics:
- Inspection API:
GentleEngine::inspect_dna_ladders(name_filter?)- Returns structured ladder metadata:
schema(gentle.dna_ladders.v1)ladder_countladders[](name,loading_hint,min_bp,max_bp,band_count,bands)
- Export operation:
ExportDnaLadders { path, name_filter? }- Writes the same structured payload to JSON at
path. - Optional
name_filterapplies case-insensitive name matching before export.
RNA ladder catalog semantics:
- Inspection API:
GentleEngine::inspect_rna_ladders(name_filter?)- Returns structured ladder metadata:
schema(gentle.rna_ladders.v1)ladder_countladders[](name,loading_hint,min_nt,max_nt,band_count,bands)
- Export operation:
ExportRnaLadders { path, name_filter? }- Writes the same structured payload to JSON at
path. - Optional
name_filterapplies case-insensitive name matching before export.
Historical screenshot artifact contract (currently disabled):
- Guardrail:
- command is currently rejected by security policy even when
--allow-screenshotsis provided.
- command is currently rejected by security policy even when
- Command surface:
- direct CLI:
gentle_cli screenshot-window OUTPUT.png - shared shell (CLI and GUI shell panel):
screenshot-window OUTPUT.png
- direct CLI:
- Scope and safety:
- captures only the active/topmost GENtle window
- window lookup is native AppKit in-process (no AppleScript automation path)
- command is primarily intended for GUI shell contexts with an active window
- rejects full-desktop capture and non-GENtle targets
- rejects request if no eligible active GENtle window is available
- current backend support is macOS (
screencapture); non-macOS returns unsupported
- Output:
- writes an image file at caller-provided
OUTPUTpath (custom filename supported) - recommended default image format is inferred from extension (e.g.
.png)
- writes an image file at caller-provided
- Result payload shape:
{
"schema": "gentle.screenshot.v1",
"path": "docs/images/gui-main.png",
"window_title": "GENtle - pGEX-3X",
"captured_at_unix_ms": 1768860000000,
"pixel_width": 1680,
"pixel_height": 1020,
"backend": "macos.screencapture"
}Operation progress/cancellation semantics:
apply_with_progressand workflow progress callbacks receiveOperationProgressupdates.- Callback return value:
true: continuefalse: request cancellation
- Current event families:
TfbsGenomePrepareGenomeTrackImportDbSnpFetchRnaReadInterpret
- Current cancellation support:
- genome preparation supports cooperative cancellation plus optional
timeout_secondstimeboxing and reports deterministic cancellation/timeout outcomes. - genome-track imports support cooperative cancellation and return partial import warnings.
- dbSNP fetch currently emits staged progress events (
validate_input,inspect_prepared_genome,contact_server,wait_response,parse_response,resolve_placement,extract_region,attach_variant_marker) but does not yet expose cooperative cancellation. - RNA-read interpretation uses cooperative callback checks while emitting periodic progress snapshots (including seed-confirmation histogram bins).
- genome preparation supports cooperative cancellation plus optional
Pcr semantics (current):
- Exact primer matching on linear templates.
- Enumerates all valid amplicons formed by forward-primer matches and downstream reverse-primer binding matches.
unique = truerequires exactly one amplicon; otherwise fails.output_idmay only be used when exactly one amplicon is produced.
PcrAdvanced semantics:
- Primer spec fields:
sequence(full primer, 5'->3')anneal_len(3' suffix length used for template binding)max_mismatches(allowed mismatches within anneal part)require_3prime_exact_bases(hard exact-match requirement at primer 3' end)library_mode(EnumerateorSample) for degenerate/IUPAC primersmax_variantscap for primer-library expansionsample_seeddeterministic seed whenlibrary_mode = Sample
- Supports 5' tails and mismatch-mediated mutagenesis.
- Supports degenerate/randomized synthetic primers via IUPAC codes.
- Product is constructed from:
- full forward primer sequence
- template interior between forward and reverse anneal windows
- reverse-complement of full reverse primer sequence
PcrMutagenesis semantics:
- Builds on
PcrAdvancedprimer behavior. - Accepts explicit SNP intents:
zero_based_positionreferencealternate
- Validates reference bases against the template.
- Filters amplicons to those that introduce requested SNPs.
require_all_mutations(defaulttrue) controls whether all or at least one mutation must be introduced.
DesignPrimerPairs contract (implemented baseline):
- Purpose:
- propose ranked forward/reverse primer pairs for one linear template under explicit constraints
- provide deterministic, machine-readable reports that can be consumed by GUI/CLI/scripting/agents
- Operation payload:
{
"DesignPrimerPairs": {
"template": "seq_id",
"roi_start_0based": 1000,
"roi_end_0based": 1600,
"forward": {
"min_length": 20,
"max_length": 30,
"location_0based": null,
"start_0based": null,
"end_0based": null,
"min_tm_c": 55.0,
"max_tm_c": 68.0,
"min_gc_fraction": 0.35,
"max_gc_fraction": 0.70,
"max_anneal_hits": 1,
"non_annealing_5prime_tail": null,
"fixed_5prime": null,
"fixed_3prime": null,
"required_motifs": [],
"forbidden_motifs": [],
"locked_positions": []
},
"reverse": {
"min_length": 20,
"max_length": 30,
"location_0based": null,
"start_0based": null,
"end_0based": null,
"min_tm_c": 55.0,
"max_tm_c": 68.0,
"min_gc_fraction": 0.35,
"max_gc_fraction": 0.70,
"max_anneal_hits": 1,
"non_annealing_5prime_tail": null,
"fixed_5prime": null,
"fixed_3prime": null,
"required_motifs": [],
"forbidden_motifs": [],
"locked_positions": []
},
"pair_constraints": {
"require_roi_flanking": false,
"required_amplicon_motifs": [],
"forbidden_amplicon_motifs": [],
"fixed_amplicon_start_0based": null,
"fixed_amplicon_end_0based_exclusive": null
},
"min_amplicon_bp": 120,
"max_amplicon_bp": 1200,
"max_tm_delta_c": 2.0,
"max_pairs": 200,
"report_id": "tp73_roi_primers_v1"
}
}-
max_tm_delta_c,max_pairs,report_id, andpair_constraintsare optional in current implementation:max_tm_delta_cdefault:2.0max_pairsdefault:200report_iddefault: auto-generated deterministic-safe id stempair_constraintsdefault:{"require_roi_flanking":false,"required_amplicon_motifs":[],"forbidden_amplicon_motifs":[],"fixed_amplicon_start_0based":null,"fixed_amplicon_end_0based_exclusive":null}
-
Side constraints (
forward,reverse, and qPCRprobe) accept optional sequence-level filters:non_annealing_5prime_tail(added to the final oligo but excluded from anneal Tm/GC/hit calculations)fixed_5prime,fixed_3primerequired_motifs[],forbidden_motifs[]locked_positions[]entries (offset_0based, single IUPACbase)
-
Built-in primer-ranking heuristics (internal and Primer3 pair-ranking stage):
- preferred primer length window:
20..30 bp(outside window is penalized) - 3' GC clamp preference (
G/Cat terminal 3' base) - secondary-structure risk penalty (homopolymer and self-complementary runs)
- primer-dimer risk penalty (global and 3'-anchored complementary runs)
- preferred primer length window:
-
Report schema:
gentle.primer_design_report.v1- deterministic ordering by score then tie-break fields
- backend metadata block:
backend.requested(auto|internal|primer3)backend.used(internal|primer3)- optional
backend.fallback_reason - optional
backend.primer3_executable - optional
backend.primer3_version
- each pair includes:
- forward/reverse primer sequence and genomic binding window
- per-primer diagnostics:
length_bpanneal_length_bpnon_annealing_5prime_tail_bpthree_prime_basethree_prime_gc_clamplongest_homopolymer_run_bpself_complementary_run_bp
- estimated
tm_candgc_fractionfor annealing segment only - anneal-hit counts per side
- amplicon start/end/length
- pair dimer diagnostics:
primer_pair_complementary_run_bpprimer_pair_3prime_complementary_run_bp
- rule-pass flags and aggregate score
- optional rejection summary buckets (for explainability):
- out-of-window
- GC/Tm out of bounds
- non-unique anneal
- primer sequence-constraint failure
- pair constraint failure
- amplicon-size or ROI-coverage failure
- mutating artifact materialization per accepted pair:
- one forward-primer sequence (
..._fwd) - one reverse-primer sequence (
..._rev) - one predicted amplicon sequence (
..._amplicon) built from: full forward primer + template interior + reverse-primer reverse-complement (including non-annealing 5' tails) - one per-pair pool container containing all three artifacts
- one forward-primer sequence (
- optional insertion-anchored context:
insertion_context(present whenDesignInsertionPrimerPairsis used)- records requested forward/reverse anchor positions, extension sequences,
window constraints, shift budget, and per-pair compensation rows
(
forward_anchor_shift_bp,reverse_anchor_shift_bp, compensation segments, and compensated primer/tail strings)
DesignInsertionPrimerPairs contract (implemented MVP):
- Purpose:
- insertion-first wrapper around deterministic pair-primer design when the user already knows insert extensions and requested insertion anchors.
- Operation payload shape:
{
"DesignInsertionPrimerPairs": {
"template": "seq_id",
"insertion": {
"requested_forward_3prime_end_0based_exclusive": 620,
"requested_reverse_3prime_start_0based": 700,
"forward_extension_5prime": "GAATTC",
"reverse_extension_5prime": "CTCGAG",
"forward_window_start_0based": 560,
"forward_window_end_0based_exclusive": 650,
"reverse_window_start_0based": 660,
"reverse_window_end_0based_exclusive": 760,
"max_anchor_shift_bp": 12
},
"forward": {
"min_length": 20,
"max_length": 30
},
"reverse": {
"min_length": 20,
"max_length": 30
},
"pair_constraints": {
"require_roi_flanking": false
},
"min_amplicon_bp": 120,
"max_amplicon_bp": 1200,
"max_tm_delta_c": 2.0,
"max_pairs": 200,
"report_id": "tp73_insert_v1"
}
}- MVP behavior:
- the insertion block is normalized first (IUPAC extension validation + anchor/window bounds checks)
- forward/reverse primer windows are enforced from insertion windows
- forward/reverse non-annealing tails are set from insertion extensions
- primer design backend selection remains identical to
DesignPrimerPairs(auto|internal|primer3) - resulting report is the same primer-report schema with populated
insertion_contextrows for shift/compensation inspection - no dedicated GUI form yet; operation is available through
op/workflow payloads.
PcrOverlapExtensionMutagenesis contract (implemented baseline):
- Purpose:
- deterministic overlap-extension insertion/deletion/replacement mutagenesis planning + staged product materialization in the main operation graph.
- Operation payload shape:
{
"PcrOverlapExtensionMutagenesis": {
"template": "seq_id",
"edit_start_0based": 620,
"edit_end_0based_exclusive": 640,
"insert_sequence": "GGTACC",
"constraints": {
"overlap_bp": 24,
"outer_forward": {
"min_length": 20,
"max_length": 30
},
"outer_reverse": {
"min_length": 20,
"max_length": 30
},
"inner_forward": {
"min_length": 18,
"max_length": 28
},
"inner_reverse": {
"min_length": 18,
"max_length": 28
}
},
"output_prefix": "tp73_oe_mut"
}
}- Baseline behavior:
edit_start_0based..edit_end_0based_exclusivedefines the replaced region on the original template.- insertion:
edit_start == edit_endandinsert_sequencenon-empty - deletion:
insert_sequenceempty andedit_end > edit_start - replacement: both deletion and insertion are non-empty
- insertion:
- inner primers are chosen upstream/downstream of the edit and receive dynamic
5' overlap tails derived from the mutant sequence so stage-1 products share
one explicit overlap segment (minimum
overlap_bp). - outer primers amplify both stage-1 fragments and the stage-2 final mutant amplicon.
- operation materializes graph-visible artifacts:
- primers:
..._outer_fwd,..._outer_rev,..._inner_fwd,..._inner_rev - stage-1 products:
..._stage1_left,..._stage1_right - final stage-2 mutant:
..._mutant - three per-stage pool containers (left, right, final)
- primers:
- operation warnings include deterministic candidate-search limit notices when the combinatorial search budget is exhausted.
- insertion/replacement runs now also emit
OpResult.protocol_cartoon_previewfor built-in protocolpcr.oe.substitution, including deterministicflank_bp/overlap_bp/insert_bpgeometry and bound template overrides (gentle.protocol_cartoon_template_bindings.v1) for adapter rendering.
DesignQpcrAssays contract (implemented baseline):
- Purpose:
- propose ranked qPCR assays with three oligos (forward primer, reverse primer, internal probe) for one linear template.
- Operation payload shape:
- same core fields as
DesignPrimerPairsplus:probe(PrimerDesignSideConstraint)max_probe_tm_delta_c(probe Tm distance to mean primer Tm)max_assays(result cap)
pair_constraintsis supported identically toDesignPrimerPairsand applies to forward/reverse pair proposal before probe selection.
- same core fields as
- Current baseline behavior:
- forward/reverse pair generation follows the same backend routing as
DesignPrimerPairs(auto|internal|primer3for pair proposal). - probe selection is deterministic, constrained to amplicon interior, and
reuses the same side sequence-constraint fields (
fixed_5prime,fixed_3prime, motifs, locked positions). - probe Tm gating is enforced via
max_probe_tm_delta_c.
- forward/reverse pair generation follows the same backend routing as
- Report schema:
gentle.qpcr_design_report.v1- includes ranked
assays[]with forward/reverse/probe oligos, amplicon window, and rule flags. - includes qPCR rejection summary with pair-level and probe-level counters.
Primer-design shell command family (implemented):
- Shared-shell family:
primers design REQUEST_JSON_OR_@FILE [--backend auto|internal|primer3] [--primer3-exec PATH]primers design-qpcr REQUEST_JSON_OR_@FILE [--backend auto|internal|primer3] [--primer3-exec PATH]primers seed-from-feature SEQ_ID FEATURE_IDprimers seed-from-splicing SEQ_ID FEATURE_IDprimers list-reportsprimers show-report REPORT_IDprimers export-report REPORT_ID OUTPUT.jsonprimers list-qpcr-reportsprimers show-qpcr-report REPORT_IDprimers export-qpcr-report REPORT_ID OUTPUT.json
primers designexpects an operation payload whose root variant is{"DesignPrimerPairs": {...}}.primers design-qpcrexpects an operation payload whose root variant is{"DesignQpcrAssays": {...}}.primers seed-from-featureandprimers seed-from-splicingare non-mutating helper commands that resolve an ROI and emit seeded operation payloads for both pair-PCR and qPCR design.- Response schemas:
gentle.primer_seed_request.v1gentle.primer_design_report.v1gentle.primer_design_report_list.v1gentle.qpcr_design_report.v1gentle.qpcr_design_report_list.v1
gentle.primer_seed_request.v1payload fields:templatesource(kind=feature|splicing,feature_id, and splicing metadata when available)roi_start_0basedroi_end_0based_exclusiveoperations.design_primer_pairs({"DesignPrimerPairs": ...})operations.design_qpcr_assays({"DesignQpcrAssays": ...})
Feature-query shell contract (implemented):
- Shared-shell command:
features query SEQ_ID [--kind KIND] [--kind-not KIND] [--range START..END|--start N --end N] [--overlap|--within|--contains] [--strand any|forward|reverse] [--label TEXT] [--label-regex REGEX] [--qual KEY] [--qual-contains KEY=VALUE] [--qual-regex KEY=REGEX] [--min-len N] [--max-len N] [--limit N] [--offset N] [--sort feature_id|start|end|kind|length] [--desc] [--include-source] [--include-qualifiers]
- Execution semantics:
- non-mutating engine inspection over one sequence’s feature table
- deterministic filter pipeline:
kind include/exclude, optional range relation (
overlap|within|contains), strand filter, label contains/regex, qualifier filters, and length bounds - deterministic ordering by requested sort key with stable tie-breaks +
pagination (
offset/limit)
- Response schema:
gentle.sequence_feature_query_result.v1- fields include:
seq_id,sequence_length_bp,total_feature_countmatched_count,returned_count,offset,limit- normalized
query rows[]withfeature_id,kind,start_0based,end_0based_exclusive,length_bp,strand,label,labels[], and optional qualifier maps when requested (--include-qualifiers)
Feature BED export contract (implemented):
- Shared-shell command:
features export-bed SEQ_ID OUTPUT.bed [--coordinate-mode auto|local|genomic] [--include-restriction-sites] [--restriction-enzyme NAME] [--kind KIND] [--kind-not KIND] [--range START..END|--start N --end N] [--overlap|--within|--contains] [--strand any|forward|reverse] [--label TEXT] [--label-regex REGEX] [--qual KEY] [--qual-contains KEY=VALUE] [--qual-regex KEY=REGEX] [--min-len N] [--max-len N] [--limit N] [--offset N] [--sort feature_id|start|end|kind|length] [--desc] [--include-source] [--include-qualifiers]
- Raw/shared operation:
{"ExportFeaturesBed":{"query":{"seq_id":"tp53_region","kind_in":["gene","mRNA","exon","CDS"]},"path":"artifacts/tp53_region.features.bed","coordinate_mode":"auto","include_restriction_sites":false,"restriction_enzymes":[]}}
- Execution semantics:
- non-mutating export built on the same feature-query filter contract used by
features query - when
--limit/query.limitis omitted, the exporter writes all matching rows instead of defaulting to the paged query window coordinate_mode=autoprefers genomic BED coordinates whenever a feature carrieschromosome,genomic_start_1based, andgenomic_end_1based; otherwise the row falls back to localSEQ_IDcoordinatesinclude_restriction_sites=trueappends deterministic REBASE-derivedrestriction_siterows, filtered by the same range/strand/label/qualifier options;restriction_enzymes[]narrows those rows to selected enzymes- TFBS/JASPAR annotations remain ordinary feature rows, so
kind_in=["TFBS"]exports the current binding-site table afterAnnotateTfbs
- non-mutating export built on the same feature-query filter contract used by
- File format:
- BED6 core columns:
chrom,chromStart,chromEnd,name,score,strand - deterministic extra columns:
kind,row_id,coordinate_source,qualifiers_json
- BED6 core columns:
- Response/report schema:
gentle.sequence_feature_bed_export.v1- fields include:
seq_id,path,coordinate_modematched_sequence_feature_count,matched_restriction_site_count,matched_row_countexportable_row_count,exported_row_countlocal_coordinate_row_count,genomic_coordinate_row_countskipped_missing_genomic_coordinatesbed_columns[]
Dotplot + flexibility operation contract (implemented baseline):
- Dotplot operation:
ComputeDotplot { seq_id, reference_seq_id?, span_start_0based?, span_end_0based?, reference_span_start_0based?, reference_span_end_0based?, mode, word_size, step_bp, max_mismatches?, tile_bp?, store_as? }ComputeDotplotOverlay { owner_seq_id, reference_seq_id, reference_span_start_0based?, reference_span_end_0based?, queries[], word_size, step_bp, max_mismatches?, tile_bp?, store_as? }mode:self_forward | self_reverse_complement | pair_forward | pair_reverse_complement- pair modes require
reference_seq_idand use the optionalreference_span_start_0based/reference_span_end_0basedfor the y/reference axis. ComputeDotplotOverlayis reference-centered and requires at least one query spec; each query usespair_forwardorpair_reverse_complementagainst the same reference span- stores payload schema
gentle.dotplot_view.v3 - payload includes:
owner_seq_id- shared reference span + seed parameters
- sparse match points (
points[]) - per-query-bin reference-distribution boxplot summary
(
boxplot_bin_count,boxplot_bins[]withmin/q1/median/q3/max + hit_count) query_series[]for multi-query overlays- optional
reference_annotationwith merged reference-side exon intervals
- guardrails:
word_size >= 1step_bp >= 1- query/reference spans must satisfy
0 <= start < end <= sequence_len - pair-evaluation safety limit is enforced for latency protection
- point count is capped with deterministic truncation warning
- Flexibility operation:
ComputeFlexibilityTrack { seq_id, span_start_0based?, span_end_0based?, model, bin_bp, smoothing_bp?, store_as? }model:at_richness | at_skew- stores payload schema
gentle.flexibility_track.v1 - guardrails:
bin_bp >= 1- same span validation contract as dotplot
- optional smoothing uses deterministic moving-average bins
- Metadata persistence:
- metadata key:
dotplot_analysis - store schema:
gentle.dotplot_analysis_store.v1 - both dotplots and flexibility tracks are persisted under this key
- metadata key:
- Shared-shell command family:
dotplot compute SEQ_ID [--reference-seq REF_SEQ_ID] [--start N] [--end N] [--ref-start N] [--ref-end N] [--mode self_forward|self_reverse_complement|pair_forward|pair_reverse_complement] [--word-size N] [--step N] [--max-mismatches N] [--tile-bp N] [--id DOTPLOT_ID]dotplot list [SEQ_ID]dotplot show DOTPLOT_IDdotplot render-svg SEQ_ID DOTPLOT_ID OUTPUT.svg [--flex-track ID] [--display-threshold N] [--intensity-gain N]render-dotplot-svg SEQ_ID DOTPLOT_ID OUTPUT.svg [--flex-track ID] [--display-threshold N] [--intensity-gain N](alias)flex compute SEQ_ID [--start N] [--end N] [--model at_richness|at_skew] [--bin-bp N] [--smoothing-bp N] [--id TRACK_ID]flex list [SEQ_ID]flex show TRACK_ID
Splicing-reference derivation + pairwise alignment operation contract (implemented baseline):
- Splicing-reference derivation operation:
DeriveSplicingReferences { seq_id, span_start_0based, span_end_0based, seed_feature_id?, scope?, output_prefix? }- emits multiple derived sequence outputs from one genomic window:
- DNA window (
..._dna) - one mRNA sequence per transcript lane (
..._mrna_*, transcript orientation,T->U) - exon-consecutive artificial reference sequence (
..._exon_reference)
- DNA window (
- if
seed_feature_idis omitted, engine selects one overlapping mRNA feature deterministically from the requested span - default
scope:target_group_target_strand
- Pairwise alignment operation:
AlignSequences { query_seq_id, target_seq_id, query_span_start_0based?, query_span_end_0based?, target_span_start_0based?, target_span_end_0based?, mode?, match_score?, mismatch_score?, gap_open?, gap_extend? }mode:global | local(defaultglobal)- scoring defaults:
match=2,mismatch=-3,gap_open=-5,gap_extend=-1 - returns structured payload
sequence_alignmentwith spans, score, coverage, identity, and CIGAR-like compact operations string - non-mutating operation (no sequence/container state mutation)
- Shared-shell command family:
splicing-refs derive SEQ_ID START_0BASED END_0BASED [--seed-feature-id N] [--scope all_overlapping_both_strands|target_group_any_strand|all_overlapping_target_strand|target_group_target_strand] [--output-prefix PREFIX]align compute QUERY_SEQ_ID TARGET_SEQ_ID [--query-start N] [--query-end N] [--target-start N] [--target-end N] [--mode global|local] [--match N] [--mismatch N] [--gap-open N] [--gap-extend N]
RNA-read interpretation contract (Nanopore cDNA phase-1 baseline):
- Operations:
InterpretRnaReads { seq_id, seed_feature_id, profile, input_path, input_format, scope, origin_mode?, target_gene_ids?, roi_seed_capture_enabled?, seed_filter, align_config, report_id?, report_mode?, checkpoint_path?, checkpoint_every_reads?, resume_from_checkpoint? }AlignRnaReadReport { report_id, selection, align_config_override?, selected_record_indices? }SummarizeRnaReadGeneSupport { report_id, gene_ids, selected_record_indices?, complete_rule?, path? }InspectRnaReadGeneSupport { report_id, gene_ids, selected_record_indices?, complete_rule?, cohort_filter?, path? }seed_feature_idmay reference anmRNA,transcript,ncRNA,misc_RNA,exon,gene, orCDSfeature; transcript-template admission then follows the selected splicing-scope rules around that seed.- implemented profile:
nanopore_cdna_v1 - implemented input format:
fasta(.fa/.fasta, optional.fa.gz/.fasta.gz;.sramust be converted externally in phase-1) - default seed/filter constants:
kmer_len=10seed_stride_bp=1min_seed_hit_fraction=0.30(bootstrap default; future SNR calibration track can override policy)min_weighted_seed_hit_fraction=0.05min_unique_matched_kmers=12min_chain_consistency_fraction=0.40max_median_transcript_gap=4.0min_confirmed_exon_transitions=1min_transition_support_fraction=0.05- weighted-hit definition:
weighted_hit_fraction = sum(1 / occurrence_count(seed_bits)) / tested_kmersoccurrence_countis measured inside the active scoped seed index
- seed pass gate:
raw_hit_fraction >= min_seed_hit_fractionAND weighted_hit_fraction >= min_weighted_seed_hit_fractionAND unique_matched_kmers >= min(min_unique_matched_kmers, tested_kmers)AND chain_consistency_fraction >= min_chain_consistency_fractionAND median_transcript_gap <= max_median_transcript_gapAND confirmed_transitions >= min_confirmed_exon_transitionsAND confirmed_transition_fraction >= min_transition_support_fraction
- phase-1 seed-span behavior:
- full-read hashing is always used for every read
- seed-start density is controlled by
seed_stride_bp - default density is one start per base (
seed_stride_bp=1)
- sparse-origin behavior:
origin_modeacceptssingle_gene|multi_gene_sparse(defaultsingle_gene)target_gene_ids[]androi_seed_capture_enabledare persisted in the report payload for deterministic follow-up runsmulti_gene_sparseexpands local transcript-template indexing with transcripts matched fromtarget_gene_ids[]roi_seed_capture_enabled=trueis currently a deterministic no-op with explicit warning in reportwarnings[]until the ROI capture layer is implemented
- report compaction and resume behavior:
report_mode=fullkeeps retained top hits exactly as rankedreport_mode=seed_passed_onlykeeps a smaller retained subset for later inspection/alignment:- retained hits that passed the composite seed gate
- retained hits at or above raw
min hit - counters still remain based on the full stream
checkpoint_path+checkpoint_every_readswrites deterministic JSON snapshots (gentle.rna_read_interpret_checkpoint.v1) during streamingresume_from_checkpoint=trueresumes from the checkpoint snapshot and fast-forwards already-processed records deterministically
- phase-2 alignment behavior:
AlignRnaReadReportloads a persisted report and reprocesses a selected retained subset (all|seed_passed|aligned)- phase-2 progress events now emit once per selected retained row
(
update_every_reads=1) so adapters can show visible row-by-row advance - GUI/shared-shell default selection is
seed_passed - optional
selected_record_indices[](0-based storedrecord_index) overrides the selection preset and aligns only the explicit subset selection=allremains available when you deliberately want the broader rescued-retained working set to receive round-2 similarity/coverage scoresselection=alignedmeans rerun phase 2 only on retained rows that already have a stored mapping from an earlier phase-2 pass- if
selection=seed_passedmatches no retained hits and no explicit record indices were supplied, the engine falls back to retained rows at or above rawmin hit, and if that is still empty, to the highest phase-1 score retained row - aligner configuration uses
align_config_overridewhen supplied, otherwise the report-storedalign_config - mapping backend uses
bio::alignment::pairwise::bandedwithalign_band_bpas band width (w) and transcript-seedkmer_lenas seed length (k), plus deterministic dense fallback when the banded solver yields no mapping - phase-2 pairwise alignment evaluates both query orientations for every retained row (stored query plus reverse complement) and keeps the best-scoring deterministic candidate, preferring semiglobal over local and non-reversed over reversed only as later tie-breakers
- selected retained rows are pairwise aligned regardless of whether their recomputed composite seed-pass flag remains true; the seed-pass result is still recomputed and stored independently for later inspection
- updated report fields include:
- per-hit mapping fields (
best_mapping,secondary_mappings) - per-hit
msa_eligibleandmsa_eligibility_reason - aggregate
read_count_alignedandretained_count_msa_eligible - refreshed seed/path diagnostics
(
transition_support_rows,isoform_support_rows) - refreshed mapped support rows
(
exon_support_frequencies,junction_support_frequencies,mapped_isoform_support_rows) - mapped exon/junction support is derived from aligned transcript-template offsets first and falls back to coarse genomic-span overlap only for legacy mappings that do not carry template offsets
- deterministic retained-hit re-ranking by alignment-aware retention rank
- per-hit mapping fields (
- alignment inspection behavior:
rna-reads inspect-alignmentsaccepts coarseselectionplus a structured subset contract:effect_filter = all_aligned|confirmed_only|disagreement_only|reassigned_only|no_phase1_only|selected_onlysort_key = rank|identity|coverage|scoresearch = free-text match over read ids, transcript ids/labels, effect labels, and#record_indexlabelsselected_record_indices[]provides the explicit subset forselected_onlyscore_density_variant = all_scored|composite_seed_gate|retained_replay_current_controls- optional
score_density_seed_filter_overridecarries the current seed-gate controls when an adapter requests retained-only replay under current controls score_bin_index+score_bin_countprovide a formal score-density-bin subset for reproducible histogram-driven inspection within that chosen histogram population
- inspection payload now includes:
aligned_count: aligned rows admitted by coarseselectionsubset_match_count: aligned rows matching the structured subset beforelimitrow_count: returned rows afterlimitsubset_spec: normalized structured subset object echoed back in the response for deterministic replay
- row
rankremains the original alignment-aware retention rank even when subset sorting reorders the returned rows
- on-demand pairwise-alignment detail behavior:
- the engine can reconstruct the exact phase-2 read-vs-transcript-template alignment for one retained row from the saved report plus admitted transcript-template set
- detail payload schema:
gentle.rna_read_alignment_detail.v1
- payload includes:
- selected retained row id (
record_index,header_id) - transcript/template target identity
- phase-2
alignment_mode - alignment backend (
bandedordense_fallback) - aligned query/template spans, full template length, score, identity, query coverage, transcript-template coverage, and CIGAR
- aligned
query / relation / targettext rows for manual inspection of low-complexity or partial confirmations
- selected retained row id (
- exact-subset export behavior:
ExportRnaReadHitsFasta,ExportRnaReadExonPathsTsv,ExportRnaReadExonAbundanceTsv, andExportRnaReadAlignmentsTsvaccept optionalselected_record_indices[]- when present, the explicit 0-based stored
record_indexsubset overrides the coarseselectionpreset - these exports also accept optional
subset_spec, a human-readable formal description such asfilter=... | sort=... | search=...; when provided, the exported artifact records both the explicitrecord_indexsubset and the subset definition that produced it - intended for exporting the exact contributor reads surfaced by mapped
Auditactions in the GUI
- target-gene cohort summary behavior:
SummarizeRnaReadGeneSupportis non-mutating and consumes one persisted aligned RNA-read report- required
gene_ids[]are normalized/deduplicated and matched case-insensitively against the same splicing group-label logic already used for transcript grouping - output schema:
gentle.rna_read_gene_support_summary.v1
- base cohort:
- retained rows with
best_mappingpresent - optionally intersected with explicit
selected_record_indices[]
- retained rows with
- accepted target cohort:
- base-cohort rows whose
best_mapping.transcript_feature_idresolves to one of the requested matched genes/groups
- base-cohort rows whose
- complete/fragment split:
complete_rule = near|strict|exactcontrols which accepted rows land in thecompletecohortfragmentis the remaining accepted-target cohort- summary still reports nested
complete_strict_countandcomplete_exact_countregardless of the chosencomplete_rule
- support attribution is derived from phase-2 mapped support, not phase-1
exon_path - per-cohort output blocks:
all_targetfragmentscomplete
- each block includes:
read_countexon_support[]exon_pair_support[]direct_transition_support[]
- row semantics:
exon_support[]: each exon counted at most once per readexon_pair_support[]: every ordered exon_i -> exon_j pair observed in the mapped exon order once per read, including skipped pairs like1->3direct_transition_support[]: neighboring exon steps only, so1->2is counted but skipped pairs like1->3are not- all support fractions are normalized by the enclosing cohort size
- exon and pair rows carry deterministic gene-level exon ordinals plus genomic coordinates for auditability
- when
path/ shell--outputis provided, the exact same JSON payload returned to the caller is also written to disk
- target-gene cohort audit behavior:
InspectRnaReadGeneSupportis non-mutating and shares the same requested-gene matching, selected-record restriction, accepted-target logic, andcomplete_ruleclassification used bySummarizeRnaReadGeneSupport- output schema:
gentle.rna_read_gene_support_audit.v1
- evaluation universe:
- all selected saved-report rows after
selected_record_indices[]filtering, including unaligned retained rows
- all selected saved-report rows after
- grouped top-level subset handles:
accepted_target_record_indices[]fragment_record_indices[]complete_record_indices[]complete_strict_record_indices[]complete_exact_record_indices[]
- row status values:
unalignedaligned_other_geneaccepted_fragmentaccepted_complete
- row payload includes:
record_index,header_id- resolved
gene_idwhen available - aligned transcript identity (
transcript_feature_id,transcript_id,transcript_label) - machine-readable
status_reason full_length_exact,full_length_near,full_length_strict, and derivedfull_length_classmapped_exon_ordinals[]- ordered
exon_pairs[] - ordered
direct_transition_pairs[] - phase-2
score,identity_fraction,query_coverage_fraction passed_seed_filteras provenance only
cohort_filter = all|accepted|fragment|complete|rejectedlimits the returnedrows[]set without changing the grouped top-level subset arrays- when
path/ shell--outputis provided, the exact same JSON payload returned to the caller is also written to disk
- Report persistence:
- report schema:
gentle.rna_read_report.v1 - metadata store schema:
gentle.rna_read_reports.v1 - metadata key:
rna_read_reports rna-reads list-reportssummary rows include sparse-origin request provenance:origin_modetarget_gene_countroi_seed_capture_enabled
- report payload now includes per-report:
exon_support_frequencies[]junction_support_frequencies[]score_density_bins[](all_scoredphase-1 histogram)seed_pass_score_density_bins[](composite_seed_gatehistogram)- exact read-length histograms (
length_bp -> count) for deterministic subset auditing:read_length_counts_allread_length_counts_seed_passedread_length_counts_alignedread_length_counts_full_length_exactread_length_counts_full_length_nearread_length_counts_full_length_strict- checkpoint snapshots mirror these vectors so resume/restart keeps histogram accumulation deterministic
- storage/streaming controls:
report_mode(fullorseed_passed_only)checkpoint_path/checkpoint_every_readsresumed_from_checkpoint
- request provenance fields:
origin_modetarget_gene_ids[]roi_seed_capture_enabled
origin_class_counts(running/final deterministic class tallies)
- per-hit payload now includes:
origin_classorigin_reasonorigin_confidencestrand_confidenceorigin_candidates[](selected/plus/minus/seed-chain candidate hints)best_mapping.alignment_mode(semiglobalpreferred, with deterministic local fallback when quality is better)best_mapping.query_reverse_complemented(whether phase-2 had to reverse-complement the stored read to fit the chosen transcript-template mapping)
- alignment inspection payload schema:
gentle.rna_read_alignment_inspection.v1- produced by non-mutating shared-shell inspection command
rna-reads inspect-alignments - each row now carries:
- phase-1 transcript-assignment fields
(
phase1_primary_transcript_id,seed_chain_transcript_id,exon_path_transcript_id,exon_path,exon_transitions_confirmed/total,selected_strand,reverse_complement_applied) - phase-2 best-mapping fields
(
transcript_id,transcript_label,strand,alignment_mode,target_start_1based,target_end_1based,target_length_bp,identity_fraction,query_coverage_fraction,target_coverage_fraction,score,secondary_mapping_count) - full-length classification flags (derived deterministically from
transcript-template coverage and current alignment threshold):
full_length_exact(100%template coverage)full_length_near(>=95%template coverage)full_length_strict(near+ both template ends within15 bp+ identity above active alignment threshold)
- deterministic comparison field
alignment_effect(confirmed_assignment,reassigned_transcript,aligned_without_phase1_assignment) - mapped-support attribution arrays for the best mapping
(
mapped_exon_support[],mapped_junction_support[])
- phase-1 transcript-assignment fields
(
- top-level payload now also carries:
aligned_countsubset_match_countrow_countlimit- normalized
subset_spec(effect_filter,sort_key,search,selected_record_indices[],score_density_variant,score_bin_index,score_bin_count)
- report schema:
- Sample-sheet export:
- operation:
ExportRnaReadSampleSheet { path, seq_id?, report_ids?, gene_ids?, complete_rule?, append? } - export schema:
gentle.rna_read_sample_sheet_export.v1 - output: TSV with run/read metrics, sparse-origin request provenance
(
report_mode,origin_mode,target_gene_count,target_gene_ids_json,roi_seed_capture_enabled), JSON-serialized exon/junction frequency columns, andorigin_class_counts_jsonfor cohort-level downstream analysis. - additional per-report columns include
mean_read_length_bp; when one or moregene_ids[]are requested the same row also carries accepted-target counts/fractions, fragment vs complete counts,gene_support_mean_assigned_read_length_bp, and JSON-serialized exon / exon-pair / direct-transition support tables for the requested gene cohort.
- operation:
- Shared-shell command family:
rna-reads interpret SEQ_ID FEATURE_ID INPUT.fa[.gz] [--report-id ID] [--report-mode full|seed_passed_only] [--checkpoint-path PATH] [--checkpoint-every-reads N] [--resume-from-checkpoint|--no-resume-from-checkpoint] [--profile nanopore_cdna_v1] [--format fasta] [--scope all_overlapping_both_strands|target_group_any_strand|all_overlapping_target_strand|target_group_target_strand] [--origin-mode single_gene|multi_gene_sparse] [--target-gene GENE_ID]... [--roi-seed-capture|--no-roi-seed-capture] [--kmer-len N] [--seed-stride-bp N] [--min-seed-hit-fraction F] [--min-weighted-seed-hit-fraction F] [--min-unique-matched-kmers N] [--min-chain-consistency-fraction F] [--max-median-transcript-gap F] [--min-confirmed-transitions N] [--min-transition-support-fraction F] [--cdna-poly-t-flip|--no-cdna-poly-t-flip] [--poly-t-prefix-min-bp N] [--align-band-bp N] [--align-min-identity F] [--max-secondary-mappings N]rna-reads align-report REPORT_ID [--selection all|seed_passed|aligned] [--record-indices i,j,k] [--align-band-bp N] [--align-min-identity F] [--max-secondary-mappings N]rna-reads list-reports [SEQ_ID]rna-reads show-report REPORT_IDrna-reads summarize-gene-support REPORT_ID --gene GENE_ID [--gene GENE_ID ...] [--record-indices i,j,k] [--complete-rule near|strict|exact] [--output PATH]rna-reads inspect-gene-support REPORT_ID --gene GENE_ID [--gene GENE_ID ...] [--record-indices i,j,k] [--complete-rule near|strict|exact] [--cohort all|accepted|fragment|complete|rejected] [--output PATH]rna-reads inspect-alignments REPORT_ID [--selection all|seed_passed|aligned] [--limit N] [--effect-filter all_aligned|confirmed_only|disagreement_only|reassigned_only|no_phase1_only|selected_only] [--sort rank|identity|coverage|score] [--search TEXT] [--record-indices i,j,k] [--score-bin-variant all_scored|composite_seed_gate] [--score-bin-index N] [--score-bin-count M]rna-reads export-report REPORT_ID OUTPUT.jsonrna-reads export-hits-fasta REPORT_ID OUTPUT.fa [--selection all|seed_passed|aligned] [--record-indices i,j,k] [--subset-spec TEXT]rna-reads export-sample-sheet OUTPUT.tsv [--seq-id ID] [--report-id ID]... [--gene GENE_ID]... [--complete-rule near|strict|exact] [--append]rna-reads export-paths-tsv REPORT_ID OUTPUT.tsv [--selection all|seed_passed|aligned] [--record-indices i,j,k] [--subset-spec TEXT]rna-reads export-abundance-tsv REPORT_ID OUTPUT.tsv [--selection all|seed_passed|aligned] [--record-indices i,j,k] [--subset-spec TEXT]rna-reads export-score-density-svg REPORT_ID OUTPUT.svg [--scale linear|log] [--variant all_scored|composite_seed_gate]rna-reads export-alignments-tsv REPORT_ID OUTPUT.tsv [--selection all|seed_passed|aligned] [--limit N] [--record-indices i,j,k] [--subset-spec TEXT]rna-reads export-alignment-dotplot-svg REPORT_ID OUTPUT.svg [--selection all|seed_passed|aligned] [--max-points N]- shell output convenience fields:
rna-reads list-reportsincludessummary_rows[]with concise human-readable provenance lines (mode,origin, target count, ROI-capture flag, read counters)rna-reads show-reportincludessummarywith the same provenance framing for one reportrna-reads summarize-gene-supportreturns the fullgentle.rna_read_gene_support_summary.v1payload directly, includingrequested_gene_ids,matched_gene_ids,missing_gene_ids, selected-record echo fields, and per-cohort support tablesrna-reads inspect-gene-supportreturns the fullgentle.rna_read_gene_support_audit.v1payload directly, including grouped cohort record-index arrays plus row-levelstatus,status_reason, full-length fields, and mapped exon/junction audit datarna-reads inspect-alignmentsreturns aligned rows ranked by alignment-aware retention score (mapping + seed metrics), plus a structuredsubset_specpayload (effect_filter,sort_key,search,selected_record_indices,score_density_variant,score_bin_index,score_bin_count) andsubset_match_count
- Alignment-TSV export:
- operation:
ExportRnaReadAlignmentsTsv { report_id, path, selection, limit?, selected_record_indices?, subset_spec? } - export schema:
gentle.rna_read_alignment_tsv_export.v1 - output: ranked alignment rows as TSV with:
- leading
#metadata lines for report provenance (selection,limit,selected_record_indices,subset_spec,profile,scope,origin_mode) - seed-screen sampling/gating context (
k,seed_stride_bp, overlap/order-density wording, seed thresholds) - alignment config summary (
min_identity_fraction,max_secondary_mappings) - phase-1 transcript/path diagnostics
- phase-2 mapping metrics
alignment_effect- compact mapped exon/junction attribution columns
- optional top-
Ntruncation vialimit
- leading
- operation:
- Score-density SVG export:
rna-reads export-score-density-svgwrites the same report summary used by the GUI plus seed-screen provenance in the SVG header:variant = all_scored|composite_seed_gate|retained_replay_current_controlsprofile,report_mode,scope,origin_mode- seed-filter summary with
k,seed_stride_bp, thresholds, and overlap/order-density wording - optional
replay_seed_filtersummary when the export uses retained-only replay under current controls - whether bins were stored in the report or derived from retained hits
- Alignment-dotplot export:
- operation:
ExportRnaReadAlignmentDotplotSvg { report_id, path, selection, max_points } - export schema:
gentle.rna_read_alignment_dotplot_svg_export.v1 - output: SVG scatter of query coverage vs identity for aligned hits with score-colored points and report-config threshold guide.
- operation:
- Read-sequence materialization:
- operation:
MaterializeRnaReadHitSequences { report_id, selection, selected_record_indices?, output_prefix? } - output:
- creates one ordinary project sequence per selected retained RNA-read hit
- exact
selected_record_indicestakes precedence over coarseselection - intended for downstream dotplots/manual inspection of saved-report outliers without re-reading the FASTA input
- operation:
rna-reads export-hits-fastaheader extensions:- optional
selected_record_indices[]overrides the coarse selection preset for exact saved-report subset export - optional
subset_specrecords the formal subset definition that produced that explicitrecord_indexsubset exon_path_tx=<transcript_id|none>exon_path=<ordinal_path|none>using:for hash-confirmed adjacent exon transitions and-for unconfirmed adjacencyexon_transitions=<confirmed>/<total>rc_applied=<true|false>(automatic cDNA poly-T reverse-complement normalization marker)origin_class=<...>plusorigin_conf=<...>andstrand_conf=<...>
- optional
rna-reads export-exon-paths-tsvandrna-reads export-exon-abundance-tsvnow begin with the same#report/seed-screen provenance block used by the alignment TSV export, minus alignment-only fields; optionalsubset_specrecords the formal subset definition alongsideselected_record_indices- cDNA/direct-RNA normalization controls in
seed_filter:cdna_poly_t_flip_enabled(defaulttrue)poly_t_prefix_min_bp(default18): minimum T support used by the tolerant 5' poly-T-head detector (minor interruptions in the head are accepted)
- Scope/strand semantics for
InterpretRnaReads:all_overlapping_both_strands: all overlapping transcripts on both strandstarget_group_any_strand: target-group transcripts only, both strandsall_overlapping_target_strand: all overlapping transcripts on target strand onlytarget_group_target_strand: target-group transcripts on target strand only- scoring note: both-strand modes score against the union of admitted strand-specific templates; target-strand modes exclude opposite-strand templates.
- seed-index note: indexed seeds include annotated exon-body and exon-exon transition k-mers for admitted transcripts.
Async BLAST shell contract (agent/MCP-ready baseline):
- Shared-shell families (both
genomesandhelpersscopes):blast-start GENOME_ID QUERY_SEQUENCE ...blast-status JOB_ID [--with-report]blast-cancel JOB_IDblast-list
- Deterministic job payload schemas:
gentle.blast_async_start.v1gentle.blast_async_status.v1gentle.blast_async_cancel.v1gentle.blast_async_list.v1
- External-binary preflight payload:
blast-startresponses now includebinary_preflightwith schemagentle.blast_external_binary_preflight.v1.- payload includes deterministic
blastnandmakeblastdbprobe rows with:found,version,executable, and resolvedpathdiagnostics. - equivalent preflight payload is also emitted by synchronous shared-shell
routes
prepare,blast, andblast-track.
- Job status contract:
job_idstable per process- non-terminal states:
queued | running - terminal states:
completed | failed | cancelled - scheduler metadata:
max_concurrent_jobsrunning_jobsqueued_jobsqueue_position(present while state isqueued)
- optional final
reportonblast-status --with-report
- Durability/restart semantics:
- BLAST async status snapshots are persisted in project metadata as
blast_async_jobs(gentle.blast_async_job_store.v1). - On restart/reload, recovered jobs that were previously non-terminal but no
longer have an active worker context are normalized deterministically:
cancel_requested=true->cancelled- otherwise ->
failedwith explicit restart/reload interruption reason.
blast-start,blast-status,blast-cancel, andblast-listmay mark shell state as changed when they persist updated async job snapshots.
- BLAST async status snapshots are persisted in project metadata as
- Scheduler policy:
- async BLAST jobs are executed by a bounded FIFO scheduler (queue + worker slots)
- default concurrency uses host CPU parallelism
- optional override via environment variable
GENTLE_BLAST_ASYNC_MAX_CONCURRENT(clamped to1..256)
gentle_mcpexposes equivalent tool routes:blast_async_startblast_async_statusblast_async_cancelblast_async_list
{
"run_id": "string",
"ops": ["Operation", "Operation", "..."]
}Notes:
- Splicing Expert
Nanopore cDNA interpretationuses this same workflow shape when you clickCopy Workflow JSON. Prepare Workflow Opin the same panel writesrun_id/opsinto the GUI workflow runner so the exactInterpretRnaReadspayload can be rerun through the generic workflow path.
{
"op_id": "op-1",
"created_seq_ids": ["..."],
"changed_seq_ids": ["..."],
"warnings": ["..."],
"messages": ["..."],
"genome_annotation_projection": null,
"sequence_alignment": null
}{
"code": "InvalidInput|NotFound|Unsupported|Io|Internal",
"message": "human-readable explanation"
}gentle_cli persists engine state in JSON (.gentle_state.json by default).
This supports:
- resumable multi-step workflows
- external inspection
- reproducibility and audit trails
- Query
capabilities - Import or initialize state
- Apply one operation at a time, checking warnings/errors
- Save/export artifacts
- Optionally export final state for handoff
- richer sequence-editing and annotation operation set
- ligation protocol presets with sticky/blunt compatibility derivation
- render/view model endpoint for frontend-independent graphical representation
- schema publication for strict client-side validation
- CRISPR guide-design next phase:
- off-target search/ranking contracts
- on-target efficacy model integration hooks
- guide-design macro/template expansion into deterministic
WorkflowJSON - see draft:
docs/rna_guides_spec.md