Skip to content

Engine D: Gate IRA rollover audit to G/H tax codes with updated notebooks, visuals, and docs#92

Merged
manuel-reyes-ml merged 6 commits intomainfrom
fix/engine-d-rollover-taxcode-filter
Jan 16, 2026
Merged

Engine D: Gate IRA rollover audit to G/H tax codes with updated notebooks, visuals, and docs#92
manuel-reyes-ml merged 6 commits intomainfrom
fix/engine-d-rollover-taxcode-filter

Conversation

@manuel-reyes-ml
Copy link
Copy Markdown
Owner

✅ PR Summary

🎯 Objective

What problem does this PR solve?
Engine D was analyzing IRA check distributions regardless of tax code, which incorrectly sent non‑G/H rows into match_needs_review. This change gates Engine D to only rows with rollover tax codes (G/H) in tax_code_1 or tax_code_2.

Expected output / deliverable
Engine D only evaluates G/H rollover rows; notebooks, visualizations, and docs align with the filtered dataset.


📌 Scope

In scope

  • Filter Engine D to only G/H tax-code rows
  • Align Engine D visualization outputs to filtered dataset
  • Update Engine D analysis + visualization notebooks
  • Update README + matching logic docs

Out of scope

  • Engine B age‑taxcode logic
  • New data sources or schema changes

🧩 Implementation Plan (What changed)

Files changed / added

  • src/engines/ira_rollover_analysis.py
  • src/visualization/ira_rollover_visualization.py
  • notebooks/09_ira_rollover_analysis.ipynb
  • notebooks/10_ira_rollover_visualization.ipynb
  • docs/matching_logic.md
  • README.md
  • .gitignore
  • reports/figures/ira_rollover/.gitkeep
  • reports/outputs/ira_rollover/.gitkeep

High-level approach

  1. Normalize tax_code_1 / tax_code_2 and filter Engine D to rows with G/H in either field.
  2. Update KPI summaries/plots and notebooks to reflect the filtered dataset.
  3. Document the G/H gate in README + matching logic docs.

🧠 Data + Logic Notes

Business rules implemented / updated

  • Rule(s): Engine D only evaluates IRA check distributions where tax_code_1 or tax_code_2 ∈ {G, H} after normalization.
  • Threshold(s): N/A
  • Exclusions / locks: Non‑G/H rows are excluded from Engine D analysis.

Canonical schema impact

  • New columns added: None
  • Columns modified: None
  • No schema change: [x]

Data quality considerations

  • Join keys: N/A
  • Null-handling: Missing tax codes are treated as non‑eligible and excluded.
  • Type enforcement: Tax codes normalized to uppercase before filtering.
  • Idempotence: Filtering is deterministic; reruns are stable.

🧪 Validation (Local)

Smoke checks

  • python -c "from src.engines.ira_rollover_analysis import run_ira_rollover_analysis; print('ok')"
  • Notebook cell(s) run without error

Data quality checks

  • Non‑G/H rows excluded from Engine D output
  • G/H rows still produce expected match_status outcomes

Validation (executed)

python -c "from src.engines.ira_rollover_analysis import run_ira_rollover_analysis; print('ok')"

Results: ok

Rule verification (recommended)

  • Added/updated notebook examples for key scenarios:
    • Rollover normalization cases (B+G, G+blank, blank+G)

✅ Acceptance Criteria

  • AC1: Engine D only includes rows where tax_code_1 or tax_code_2 is G or H.
  • AC2: Non‑G/H rows no longer appear as match_needs_review in Engine D results.
  • AC3: Engine D visualization/notebooks reflect the filtered dataset and updated KPIs.
  • AC4: README/docs accurately describe the G/H filter.

🧯 Risks / Edge Cases

  • Potential risk: Tax codes missing or mixed case.
  • Edge cases covered: G/H in tax_code_2 with blank tax_code_1.
  • Mitigation: Normalize tax codes and treat blanks as non‑eligible.

📎 Reviewer Notes

What to focus on

  • Correctness of the G/H filter placement and normalization
  • match_status logic remains unchanged for eligible rows
  • Documentation alignment (README + matching logic)

Screenshots / sample outputs (optional)

pytest run output (local)

🔗 Linking

@manuel-reyes-ml manuel-reyes-ml self-assigned this Jan 15, 2026
@manuel-reyes-ml manuel-reyes-ml added type: bug Defect or incorrect behavior type: docs Documentation updates priority: P2 Normal priority area: notebooks Notebooks / walkthroughs area: data visualization Plot key metrics / KPIs engine: D-ira-rollover IRA rollover engine labels Jan 15, 2026
@manuel-reyes-ml
Copy link
Copy Markdown
Owner Author

@codex, review!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6f55e784b4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/engines/ira_rollover_analysis.py
…ine D filter keeps the expected rows and the review-reason assertions can find their transaction_ids
@manuel-reyes-ml manuel-reyes-ml merged commit c4a4664 into main Jan 16, 2026
4 checks passed
@manuel-reyes-ml manuel-reyes-ml deleted the fix/engine-d-rollover-taxcode-filter branch January 16, 2026 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: data visualization Plot key metrics / KPIs area: notebooks Notebooks / walkthroughs engine: D-ira-rollover IRA rollover engine priority: P2 Normal priority type: bug Defect or incorrect behavior type: docs Documentation updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Engine D: Restrict IRA rollover audit to G/H tax-code rows and align visualizations/docs

1 participant