Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 25 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@
![Status](https://img.shields.io/badge/Status-Portfolio-brightgreen)
![CI](https://github.com/manuel-reyes-ml/1099_reconciliation_pipeline/actions/workflows/ci.yml/badge.svg?branch=main)

Automated data pipeline for **reconciling retirement plan distributions** between Relius (distribution exports) and Matrix (disbursement/1099 exports). Standardizes inputs, runs three matching/correction engines (A/B/C), and produces Matrix-ready correction recommendations (tax codes, taxable amount, Roth basis year).
Automated data pipeline for **reconciling retirement plan distributions** between Relius (distribution exports) and Matrix (disbursement/1099 exports). Standardizes inputs, runs four matching/correction engines (A/B/C/D), and produces Matrix-ready correction recommendations (tax codes, taxable amount, Roth basis year).

> 🛡️ **Privacy First:** All data in this repository is synthetic. Original project built for production with real participant data (SSN, tax codes) - cannot be shared for compliance reasons.

### Recruiter Pitch

I built this project to showcase **both Data Engineering and Data Analytics in a real finance workflow**: ingesting messy Excel exports from Relius and Matrix (plus Relius demographics and Roth basis extracts), normalizing them into a canonical schema, applying auditable business rules across **Engine A (inherited plan matching)**, **Engine B (age-based non-Roth tax codes)**, and **Engine C (Roth taxable + Roth tax-code logic)**, and generating **Matrix-ready 1099-R correction files** for review and execution.
I built this project to showcase **both Data Engineering and Data Analytics in a real finance workflow**: ingesting messy Excel exports from Relius and Matrix (plus Relius demographics and Roth basis extracts), normalizing them into a canonical schema, applying auditable business rules across **Engine A (inherited plan matching)**, **Engine B (age-based non-Roth tax codes)**, **Engine C (Roth taxable + Roth tax-code logic)**, and **Engine D (IRA rollover tax-form audit)**, and generating **Matrix-ready 1099-R correction files** for review and execution.

---

Expand Down Expand Up @@ -100,10 +100,11 @@ Total Transactions Processed: 10,247
- Date lag tolerance enforced from `MATCHING_CONFIG` (Matrix txn_date occurs after Relius export)
- Explicit match_status labels: match_no_action, match_needs_correction, date_out_of_range, unmatched_relius, unmatched_matrix

### 🧩 **Engines A/B/C**
### 🧩 **Engines A/B/C/D**
- **Engine A (Inherited matching):** Reconciles Relius vs Matrix distributions and applies inherited-plan tax-code rules (4/G).
- **Engine B (Age-based, non-Roth):** Uses Relius demo data (DOB/term date) to suggest non-Roth tax codes (1/2/7); excludes rollovers and inherited plans.
- **Engine C (Roth taxable):** Uses Matrix + Relius demo + Roth basis to suggest taxable amount, Roth initial year, and Roth tax codes; excludes inherited plans but does not exclude rollovers.
- **Engine D (IRA rollover tax-form audit):** Filters IRA check distributions with federal taxing method = rollover, then flags tax-form mismatches for correction.

### 📈 **Business Intelligence**
- Review-ready outputs: match_status, correction_reason, and action fields for QA
Expand All @@ -125,7 +126,7 @@ Total Transactions Processed: 10,247
**Key Decisions:**
- Define match criteria (plan_id + ssn + gross_amt with date lag tolerance)
- Centralize thresholds in `config.py` (MATCHING_CONFIG, AGE_TAXCODE_CONFIG, ROTH_TAXABLE_CONFIG)
- Separate workflows into Engine A/B/C to keep rules auditable
- Separate workflows into Engine A/B/C/D to keep rules auditable

---

Expand Down Expand Up @@ -218,6 +219,7 @@ WHERE r.plan_id = m.plan_id
1. Inherited-plan corrections (Engine A: code 4/G rules)
2. Age-based non-Roth corrections (Engine B: 1/2/7 rules)
3. Roth taxable + Roth tax-code corrections (Engine C: taxable amount, start year, B* rules)
4. IRA rollover tax-form audit (Engine D: Matrix-only rollover vs tax-form checks)

---

Expand Down Expand Up @@ -301,12 +303,14 @@ New First Year contrib | Reason | Action
│ │ ├── __init__.py
│ │ ├── match_planid.py # Engine A (inherited matching)
│ │ ├── age_taxcode_analysis.py # Engine B (age-based non-Roth)
│ │ └── roth_taxable_analysis.py # Engine C (Roth taxable)
│ │ ├── roth_taxable_analysis.py # Engine C (Roth taxable)
│ │ └── ira_rollover_analysis.py # Engine D (IRA rollover tax-form audit)
│ ├── visualization/
│ │ ├── __init__.py
│ │ ├── match_planid_visualization.py
│ │ ├── age_taxcode_visualization.py
│ │ └── roth_taxable_visualization.py
│ │ ├── roth_taxable_visualization.py
│ │ └── ira_rollover_visualization.py
│ └── outputs/
│ ├── __init__.py
│ ├── export_utils.py # Export helpers
Expand All @@ -321,32 +325,39 @@ New First Year contrib | Reason | Action
│ ├── 05_match_roth_basis_analysis.ipynb
│ ├── 06_age_taxcode_visualization.ipynb
│ ├── 07_match_planid_visualization.ipynb
│ └── 08_roth_taxable_visualization.ipynb
│ ├── 08_roth_taxable_visualization.ipynb
│ ├── 09_ira_rollover_analysis.ipynb
│ └── 10_ira_rollover_visualization.ipynb
├── reports/
│ ├── figures/ # Generated charts (png)
│ │ ├── match_planid/
│ │ ├── age_taxcode/
│ │ └── roth_taxable/
│ │ ├── roth_taxable/
│ │ └── ira_rollover/
│ ├── outputs/ # Timestamped correction files (production default)
│ │ ├── match_planid/
│ │ ├── age_taxcode/
│ │ └── roth_taxable/
│ │ ├── roth_taxable/
│ │ └── ira_rollover/
│ └── samples/ # Sample-mode outputs
│ ├── figures/ # Sample-mode charts
│ │ ├── match_planid/
│ │ ├── age_taxcode/
│ │ └── roth_taxable/
│ │ ├── roth_taxable/
│ │ └── ira_rollover/
│ ├── match_planid/
│ ├── age_taxcode/
│ └── roth_taxable/
│ ├── roth_taxable/
│ └── ira_rollover/
├── templates/
│ └── 1099r_correct_form.xlsx # Matrix correction template
├── tests/ # Unit tests (optional)
│ ├── conftest.py
│ ├── pipelines/
│ ├── ira_rollover/
│ ├── roth_taxable/
│ ├── validators/
│ └── visualization/
Expand Down Expand Up @@ -457,7 +468,9 @@ jupyter notebook
# 7. notebooks/06_age_taxcode_visualization.ipynb
# 8. notebooks/07_match_planid_visualization.ipynb
# 9. notebooks/08_roth_taxable_visualization.ipynb
# (Engine B/C workflows are covered in 04-06 and 07-08 or can be run from scripts)
# 10. notebooks/09_ira_rollover_analysis.ipynb
# 11. notebooks/10_ira_rollover_visualization.ipynb
# (Engine B/C/D workflows are covered in 04-06 and 07-10 or can be run from scripts)
```

#### Option 3: Use as Module
Expand Down
15 changes: 13 additions & 2 deletions docs/business_context.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ For readers unfamiliar with retirement plan administration:
2. **Which mismatches have direct 1099-R impact (amounts, codes, Roth taxable)?**
3. **Can we separate inherited, age-based, and Roth-specific rules cleanly?**
4. **How can we generate a correction file for the operations team?**
5. **Are IRA rollover tax forms aligned with rollover treatment in Matrix?**

---

Expand Down Expand Up @@ -105,10 +106,11 @@ Build an automated **1099 reconciliation pipeline** that:

1. **Ingests** Excel exports from Relius distributions, Relius demographics, Relius Roth basis, and Matrix.
2. **Cleans and normalizes** the data into canonical fields (SSNs, dates, amounts, tax codes).
3. **Runs three engines**:
3. **Runs four engines**:
- Engine A: inherited-plan matching (Relius vs Matrix)
- Engine B: age-based non-Roth tax codes
- Engine C: Roth taxable + Roth tax-code logic
- Engine D: IRA rollover tax-form audit (Matrix-only)
4. **Classifies** results into match statuses and correction/review actions.
5. **Generates** an **Excel correction file** with recommended updates, stored under `reports/samples/<engine>/` for sample runs and `reports/outputs/<engine>/` for production runs by default.

Expand Down Expand Up @@ -160,6 +162,7 @@ Build an automated **1099 reconciliation pipeline** that:
- exported_date/txn_date with a config-driven lag window
- Engine B age-based tax-code logic using Relius demographics
- Engine C Roth taxable logic using Relius Roth basis and Roth plan identifiers
- Engine D IRA rollover tax-form checks using Matrix `federal_taxing_method`, `tax_form`, and `txn_method`
- Classifying results and generating:
- A **1099 correction Excel file**
- Using **synthetic data** in this public repository.
Expand Down Expand Up @@ -303,6 +306,9 @@ To keep changes reliable and auditable, this repository includes automated testi
└─────────────┘ └───────────┘
```

Engine D (IRA rollover audit) runs on the cleaned Matrix export to validate
rollover tax-form selections and produces a correction file when needed.

### Detailed Process Steps

1. **Data Ingestion**
Expand All @@ -328,7 +334,12 @@ To keep changes reliable and auditable, this repository includes automated testi
- Identify Roth plans via configured prefixes/suffixes
- Suggest taxable amount, Roth start year, and Roth tax codes (B*, H*)

6. **Output Generation**
6. **Engine D (IRA Rollover Tax-Form Audit)**
- Filter IRA plans and check distributions in Matrix
- Validate `federal_taxing_method = Rollover` against tax form selection
- Suggest correction (`new_tax_code = "0"`) when 1099-R is set

7. **Output Generation**
- Build correction file with `match_status`, suggested fields, and actions
- Provide correction reasons for audit and review workflows

Expand Down
8 changes: 5 additions & 3 deletions docs/data_dictionary.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,8 @@ When classifying discrepancies, the pipeline prioritizes based on engine output:
1. **Engine A corrections** (inherited-plan tax code updates)
2. **Engine B corrections** (age-based non-Roth tax codes)
3. **Engine C corrections** (Roth taxable amount and Roth start year updates)
4. **Review-only items** (INVESTIGATE) and date out-of-range flags
4. **Engine D corrections** (IRA rollover tax-form audit)
5. **Review-only items** (INVESTIGATE) and date out-of-range flags

### Match Tolerance Guidelines

Expand Down Expand Up @@ -183,10 +184,11 @@ When classifying discrepancies, the pipeline prioritizes based on engine output:
| `gross_amt` | float | `15000.00` | Gross disbursement amount | 🔴 Matching key |
| `fed_taxable_amt` | float | `15000.00` | Taxable amount reported in Matrix | Engine C input |
| `txn_date` | date | `2024-01-17` | Matrix transaction date | Date lag |
| `txn_method` | string | `ACH` | Transaction method/type | Optional |
| `txn_method` | string | `ACH` | Transaction method/type | Engine D |
| `tax_code_1` | string | `7` | Primary 1099-R tax code | 🔴 Correction logic |
| `tax_code_2` | string | `G` | Secondary 1099-R tax code | Engine A/C |
| `tax_form` | string | `1099-R` | Tax form identifier | Optional |
| `tax_form` | string | `1099-R` | Tax form identifier | Engine D |
| `federal_taxing_method` | string | `Rollover` | Federal taxing method | Engine D |
| `dist_type` | string | `Rollover` | Distribution type | Optional |
| `roth_initial_contribution_year` | int | `2016` | Roth start year (Matrix) | Engine C |
| `transaction_id` | string | `44324568` | Matrix transaction ID | Output key |
Expand Down
Loading