Skip to content

Add Engine D IRA rollover tax-form audit with Matrix normalization, KPIs, tests, and docs#90

Merged
manuel-reyes-ml merged 11 commits intomainfrom
feature/engine-d-ira-rollover-taxform
Jan 14, 2026
Merged

Add Engine D IRA rollover tax-form audit with Matrix normalization, KPIs, tests, and docs#90
manuel-reyes-ml merged 11 commits intomainfrom
feature/engine-d-ira-rollover-taxform

Conversation

@manuel-reyes-ml
Copy link
Copy Markdown
Owner

@manuel-reyes-ml manuel-reyes-ml commented Jan 14, 2026

✅ PR Summary

🎯 Objective

What problem does this PR solve?
Add Engine D to detect IRA rollover check distributions where Tax Form = 1099‑R should be treated as No Tax, producing correction-ready outputs.

Expected output / deliverable
Engine D analysis module, visualization helpers, notebooks, tests, and updated docs/README for the IRA rollover workflow.


📌 Scope

In scope

  • Add IRA rollover engine config + Matrix schema updates
  • Normalize Matrix fields used by Engine D
  • Implement IRA rollover analysis + correction output
  • Add visualization helpers and notebooks
  • Add tests for analysis and visualization
  • Update docs/README to describe Engine D workflow

Out of scope

  • Changes to Engines A/B/C
  • Non‑Matrix data sources or external integrations

🧩 Implementation Plan (What changed)

Files changed / added

  • src/config.py
  • src/cleaning/clean_matrix.py
  • src/engines/ira_rollover_analysis.py
  • src/visualization/ira_rollover_visualization.py
  • tests/ira_rollover/test_ira_rollover_analysis.py
  • tests/validators/test_clean_matrix_validations.py
  • tests/visualization/test_ira_rollover_visualization.py
  • notebooks/09_ira_rollover_analysis.ipynb
  • notebooks/10_ira_rollover_visualization.ipynb
  • README.md
  • docs/business_context.md
  • docs/data_dictionary.md
  • docs/matching_logic.md

High-level approach

  1. Add IRA plan detection config and retain Matrix federal taxing method + tax form fields.
  2. Implement Engine D filter/classification rules for IRA rollover check distributions.
  3. Add KPI/visualization helpers, notebooks, tests, and docs updates.

🧠 Data + Logic Notes

Business rules implemented / updated

  • Rule(s): IRA plan + Check Distribution + Federal Taxing Method = Rollover:
    • Tax Form = No Tax → match_no_action
    • Tax Form = 1099‑R → match_needs_correction with new_tax_code = "0"

Canonical schema impact

  • New columns added: federal_taxing_method, tax_form (Matrix canonical fields)
  • Columns modified: normalization of participant_name, transaction_id
  • No schema change: [ ]

Data quality considerations

  • Join keys: existing Matrix keys (plan_id, ssn, gross_amt, txn_date)
  • Null-handling: missing tax form / federal taxing method → needs_review
  • Type enforcement: dates/numerics normalized in Matrix cleaner
  • Idempotence: corrections are deterministic on normalized inputs

🧪 Validation (Local)

Smoke checks

  • python -c "from src.engines.ira_rollover_analysis import run_ira_rollover_analysis" passes
  • Key module import(s) run without error
  • Notebook cell(s) run without error

Data quality checks

  • No duplicate keys where uniqueness is required
  • Expected columns exist in canonical schema
  • Dtypes verified (dates/Int64/Float64)

Validation (executed)

python -c "from src.engines.ira_rollover_analysis import run_ira_rollover_analysis"
python -m pytest tests/ira_rollover/ tests/visualization/ tests/validators/

Results: All passed successfully

Rule verification (recommended)

  • Added/updated notebook examples for key scenarios:
    • Rollover + No Tax (no action)
    • Rollover + 1099‑R (needs correction)
    • Missing/unknown federal taxing method or tax form (needs review)

Export checks (if applicable)

  • Output opens in Excel and columns populate correctly
  • Template headers found (no misalignment)

✅ Acceptance Criteria

  • AC1: Engine D filters IRA plans and Check Distribution + Rollover rows correctly.
  • AC2: 1099‑R rows produce new_tax_code = "0"; No Tax rows are no‑action.
  • AC3: Correction output integrates with existing schema and builder.
  • AC4: README + docs updated for Engine D workflow.
  • AC5: Tests cover analysis + visualization paths.

🧯 Risks / Edge Cases

  • Potential risk: IRA substring match can over‑include non‑IRA plans.
  • Edge cases covered: missing or inconsistent tax form / federal taxing method.
  • Mitigation: normalize fields and emit needs_review with reason tokens.

📎 Reviewer Notes

What to focus on

  • correctness of IRA plan detection and rollover classification
  • correction output fields (action, new_tax_code)
  • normalization in Matrix cleaning (names/transaction IDs)
  • docs and notebooks alignment with Engine D behavior

Screenshots / sample outputs (optional)

pytest run output (local) Screenshot 2026-01-13 at 10 00 05 PM

🔗 Linking

@manuel-reyes-ml manuel-reyes-ml self-assigned this Jan 14, 2026
@manuel-reyes-ml manuel-reyes-ml added type: feature New functionality type: docs Documentation updates type: test Tests / pytest coverage priority: P2 Normal priority area: config Config/schema mappings area: cleaning Cleaning/normalization modules area: export Correction template export area: notebooks Notebooks / walkthroughs area: data visualization Plot key metrics / KPIs labels Jan 14, 2026
@manuel-reyes-ml
Copy link
Copy Markdown
Owner Author

@codex, review!

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@manuel-reyes-ml manuel-reyes-ml merged commit 542ede7 into main Jan 14, 2026
4 checks passed
@manuel-reyes-ml manuel-reyes-ml deleted the feature/engine-d-ira-rollover-taxform branch January 14, 2026 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: cleaning Cleaning/normalization modules area: config Config/schema mappings area: data visualization Plot key metrics / KPIs area: export Correction template export area: notebooks Notebooks / walkthroughs priority: P2 Normal priority type: docs Documentation updates type: feature New functionality type: test Tests / pytest coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Engine D: IRA Rollover Tax-Form Audit (Matrix-only) with correction output + docs/tests/notebooks/visuals refresh.

1 participant