feat(wave3): telemetry and content-loader hardening for ai-ml-fundamentals probe by OpenSauce · Pull Request #158 · OpenSauce/dsa-flash

OpenSauce · 2026-04-05T20:22:46Z

Summary

Backend + frontend infrastructure for the Wave 3 AI/ML Fundamentals probe. Ships the telemetry surface and content-loader hardening needed to make the "ship, measure for 2 weeks, decide" gate actually readable. Pairs with content submodule PR OpenSauce/dsa-flash-cards#36, which must merge first so this PR can bump the submodule ref.

No new migrations. No schema changes. Every change is additive.

What's in this PR (5 commits)

1. `feat: add ai-ml-fundamentals display name and category metadata`

backend/app/models.py — DISPLAY_NAMES["ai-ml-fundamentals"] = "AI/ML Fundamentals" so slug_to_display_name doesn't render the ugly "Ai Ml Fundamentals"
frontend/utils/categoryMeta.ts — adds the category entry under section: 'Other' (no new top-level AI section for the probe) with honest description copy: "AI/ML interview vocabulary — hands-on problems coming later."

2. `fix: harden flashcard loader against missing required fields`

backend/app/loader.py load_yaml_flashcards() previously did unguarded raw["title"] / raw["Front"] / raw["Back"] per card. A typo in one card raised KeyError mid-loop, aborted the entire file, and left yaml_keys incomplete — orphan removal then ran with a partial set. This PR wraps each card in a try/except that logs the file, card index, and missing key, then continues. Test in tests/test_loader.py covers the happy-path pass-through, missing Front/Back/title all skipped individually, and the loop continuing past bad cards.

3. `feat: add per-category breakdowns to analytics summary endpoint`

/api/analytics/summary already returned category_lesson_completions. This commit adds three parallel dicts:

quiz_completions_by_category — UserQuizAttempt counts joined to Quiz.category
anon_lesson_views_by_category — Event rows with event_type='lesson_view' and user_id IS NULL, grouped by payload->>'category'
flashcard_reviews_by_category — UserFlashcard review counts joined to Flashcard.category

These unlock the probe's 2-week signal: without them the decision was unreadable (only 1 of 6 needed signals was measurable). Tests seed two categories with completions/reviews and assert the dicts populate correctly.

4. `feat: track signup referrer_category for probe attribution`

The highest-value probe signal is "did viewing the AI lesson drive signups?" Before this PR there was no way to answer that.

Backend: SignupIn accepts optional referrer_category; signup endpoint emits a signup Event with payload.referrer_category
Frontend: frontend/pages/lesson/[slug].vue writes the viewed lesson's category to sessionStorage on mount; frontend/composables/useAuth.ts reads it during signup and passes it in the request body
Test asserts the Event row is written with the referrer payload

5. `test: add fresh-category quiz-to-flashcard seeding integration test`

End-to-end test that seeds a fresh category with 1 lesson + 1 quiz + 10 flashcards all linked via lesson_slug='ai-foundations', submits the quiz at passing grade, and asserts 10 UserFlashcard rows exist. Catches the lesson: / lesson_slug: YAML-key asymmetry foot-gun end-to-end.

Verification

ruff check backend/ — clean
pytest tests/test_loader.py tests/api/test_analytics.py tests/api/test_quizzes.py tests/api/test_users.py -v — 64/64 passing
Manual: slug_to_display_name('ai-ml-fundamentals') returns "AI/ML Fundamentals"

Merge order (IMPORTANT)

Merge content PR content: add ai-foundations chunk for Wave 3 AI/ML probe dsa-flash-cards#36 first
Then bump the submodule ref on this branch to point at submodule main
Then merge this PR
Post-deploy: run baseline SQL against prod for system-design first-two-weeks metrics so the 2-week gate has a real comparison number

Context

Full review: ~/.gstack/projects/OpenSauce-dsa-flash/main-autoplan-review-20260405.md — covers both CEO and Eng phases, dual-voice consensus tables, and the decision audit trail that led to the probe-first scope.

Copilot

Pull request overview

Adds Wave 3 probe infrastructure to make the AI/ML Fundamentals category measurable: new category metadata/display name, loader hardening + tests, expanded analytics summary breakdowns, and signup attribution via referrer_category.

Changes:

Add AI/ML Fundamentals category metadata/display name across backend + frontend.
Harden YAML flashcard loader against malformed cards and add integration tests.
Expand /api/analytics/summary with per-category quiz, anon lesson view, and flashcard review breakdowns; add signup referrer attribution plumbing.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
frontend/utils/categoryMeta.ts	Adds AI/ML Fundamentals category metadata (emoji/section/description/display name).
frontend/pages/lesson/[slug].vue	Stores lesson category in sessionStorage for later signup attribution.
frontend/composables/useAuth.ts	Sends optional `referrer_category` during signup.
backend/app/models.py	Adds display name mapping and extends signup input model with `referrer_category`.
backend/app/loader.py	Skips malformed flashcards instead of aborting the whole file.
backend/app/api/users.py	Emits a `signup` Event with `payload.referrer_category`.
backend/app/api/analytics.py	Adds new per-category breakdown metrics to analytics summary response.
backend/tests/test_loader.py	Adds tests for loader hardening and a fresh-category loader roundtrip.
backend/tests/api/test_users.py	Adds tests asserting signup event payload includes `referrer_category` (or null).
backend/tests/api/test_quizzes.py	Adds integration test ensuring quiz submission seeds UserFlashcard rows for linked flashcards.
backend/tests/api/test_analytics.py	Adds tests validating new per-category breakdown fields.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-05T20:26:03Z

backend/app/loader.py

            data = yaml.safe_load(file.read_text()) or []
            if not isinstance(data, list):
                logger.warning("Skipping %s: root is %s, not list", file, type(data).__name__)
                continue
-            for raw in data:
+            for idx, raw in enumerate(data):
+                missing_key = None
+                for required in ("title", "Front", "Back"):
+                    if required not in raw:
+                        missing_key = required
+                        break
+                if missing_key is not None:
+                    logger.warning(
+                        "Skipping card %d in %s: missing required field '%s'",
+                        idx,
+                        file,
+                        missing_key,
+                    )
+                    continue


load_yaml_flashcards() assumes each list item is a mapping; if a YAML file contains a non-dict element (string/number/null), the required not in raw checks will raise a TypeError and abort loading. Add an isinstance(raw, dict) guard (log + continue) before checking required keys so malformed elements don’t break the whole file.

Applied in 519c81d. Added isinstance(raw, dict) guard with a warning-and-continue before the required-key loop. Non-dict list items now log expected dict, got <type> and skip cleanly.

Copilot · 2026-04-05T20:26:03Z

backend/app/api/analytics.py

+    # Per-category anonymous lesson views (from event table)
+    category_anon_lesson_views = session.exec(
+        text("""
+            SELECT payload->>'category' AS cat, COUNT(*) AS cnt
+            FROM event
+            WHERE event_type = 'lesson_view'
+              AND payload->>'category' IS NOT NULL
+            GROUP BY cat
+            ORDER BY cnt DESC
+        """)


anon_lesson_views_by_category is documented/used as anonymous lesson views, but the SQL query doesn’t filter user_id IS NULL, so authenticated lesson views will be included and the metric will be inflated (and inconsistent with anonymous_engagement.lesson_views). Add AND user_id IS NULL to the WHERE clause (or rename the field if including all users is intended).

Applied in 519c81d. Good catch — added AND user_id IS NULL to the category_anon_lesson_views query. The field name was truthful in intent but the query wasn't enforcing it, which would have inflated the probe signal. Now consistent with anonymous_engagement.lesson_views.

Copilot · 2026-04-05T20:26:04Z

backend/app/api/users.py

+    session.flush()
+
+    session_id = request.cookies.get("session_id", "")
+    event = Event(
+        session_id=session_id,
+        user_id=user.id,
+        event_type="signup",
+        payload={"referrer_category": data.referrer_category},
+    )
+    session.add(event)


The signup endpoint writes an Event using session_id = request.cookies.get('session_id', ''). If the cookie isn’t set, all such signup events will share an empty session_id, breaking session-based analytics (distinct session counts, attribution, funnels). Consider reusing the same session-id issuance logic as /events (generate UUID and set the session_id cookie on the response when missing) so every signup event has a real session id.

Applied in 519c81d as a minimal fix: session_id = request.cookies.get('session_id') or str(uuid.uuid4()). For the common probe path (user views lesson → signs up), the session cookie is already set by the lesson-view /events call, so this fallback only fires for edge cases. Not refactoring to shared session-id issuance logic here because the signup endpoint does not issue cookies and doing so would expand scope beyond the probe.

Copilot · 2026-04-05T20:26:04Z

backend/app/models.py

+    referrer_category: Optional[str] = None




UserCreate.referrer_category is accepted from the client and persisted into the event.payload without any validation/length bound. To avoid unbounded/invalid values polluting analytics (or excessively large JSON), add basic constraints (e.g., max_length and a slug-like pattern) or normalize unknown values to null.

Suggested change

referrer_category: Optional[str] = None

referrer_category: Optional[str] = PydanticField(default=None, max_length=64)

@field_validator("referrer_category")

@classmethod

def normalize_referrer_category(cls, value: Optional[str]) -> Optional[str]:

if value is None:

return None

normalized = value.strip().lower()

if not normalized:

return None

if len(normalized) > 64:

return None

parts = normalized.split("-")

if any(not part or not part.isalnum() for part in parts):

return None

return normalized

Applied your suggestion in 519c81d. referrer_category now has max_length=64 plus a field_validator that strips, lowercases, and normalizes unknown values (non-alnum parts, empty segments) to None. Unbounded or malformed values no longer pollute analytics.

- loader: guard isinstance(raw, dict) before required-key check (handles null/string/int YAML list items) - analytics: add user_id IS NULL filter to anon_lesson_views_by_category (was inflated by authed users) - users: generate UUID fallback for missing session_id cookie on signup events - models: validate referrer_category with max_length=64 + slug normalization

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-05T21:18:07Z

backend/app/loader.py

            data = yaml.safe_load(file.read_text()) or []
            if not isinstance(data, list):
                logger.warning("Skipping %s: root is %s, not list", file, type(data).__name__)
                continue
-            for raw in data:
+            for idx, raw in enumerate(data):


load_yaml_flashcards() currently scans every *.yml/*.yaml under ROOT. Now that quiz definitions live under {category}/quizzes/*.yaml (dict-root), this will reliably trigger root is dict, not list warnings on every run and adds unnecessary IO. Consider excluding quizzes/ (and any other non-flashcard YAML dirs) from yaml_paths or narrowing the glob to known flashcard YAML locations so logs stay actionable.

Copilot · 2026-04-05T21:18:08Z

frontend/pages/lesson/[slug].vue

 onMounted(async () => {
  track('lesson_view', { category: lesson.value?.category, slug })
+  if (lesson.value?.category) {
+    sessionStorage.setItem('referrer_category', lesson.value.category)


sessionStorage.setItem(...) can throw (e.g., storage disabled, privacy mode, quota exceeded) and would currently break the whole onMounted flow (including related fetches). Wrap the call in a try/catch (similar to other storage usage in the frontend) so lesson pages remain functional even if storage is unavailable.

Suggested change

sessionStorage.setItem('referrer_category', lesson.value.category)

try {

sessionStorage.setItem('referrer_category', lesson.value.category)

} catch {

// non-fatal

}

Copilot · 2026-04-05T21:18:08Z

frontend/composables/useAuth.ts

+    const referrer_category = import.meta.client
+      ? (sessionStorage.getItem('referrer_category') ?? undefined)
+      : undefined


sessionStorage.getItem(...) can throw in some browser/storage configurations (disabled storage, privacy mode). Since this runs during signup, a thrown exception would prevent account creation. Consider wrapping the read in try/catch and falling back to undefined if access fails.

Suggested change

const referrer_category = import.meta.client

? (sessionStorage.getItem('referrer_category') ?? undefined)

: undefined

let referrer_category: string | undefined = undefined

if (import.meta.client) {

try {

referrer_category = sessionStorage.getItem('referrer_category') ?? undefined

} catch {

referrer_category = undefined

}

}

OpenSauce added 5 commits April 5, 2026 21:21

feat: add ai-ml-fundamentals display name and category metadata

4f70177

fix: harden flashcard loader against missing required fields

3d166fa

feat: add per-category breakdowns to analytics summary endpoint

321c609

feat: track signup referrer_category for probe attribution

19d7c43

test: add fresh-category quiz-to-flashcard seeding integration test

2f9aba7

Copilot AI review requested due to automatic review settings April 5, 2026 20:22

Copilot started reviewing on behalf of OpenSauce April 5, 2026 20:23 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

OpenSauce added 2 commits April 5, 2026 22:10

chore: bump dsa-flash-cards submodule for ai-foundations content

269da15

Copilot AI review requested due to automatic review settings April 5, 2026 21:14

Copilot started reviewing on behalf of OpenSauce April 5, 2026 21:14 View session

OpenSauce merged commit 93012d8 into main Apr 5, 2026
6 checks passed

OpenSauce deleted the feat/wave3-telemetry-and-hardening branch April 5, 2026 21:16

Copilot AI reviewed Apr 5, 2026

View reviewed changes

-    referrer_category: Optional[str] = None
+    referrer_category: Optional[str] = PydanticField(default=None, max_length=64)
+    @field_validator("referrer_category")
+    @classmethod
+    def normalize_referrer_category(cls, value: Optional[str]) -> Optional[str]:
+        if value is None:
+            return None
+        normalized = value.strip().lower()
+        if not normalized:
+            return None
+        if len(normalized) > 64:
+            return None
+        parts = normalized.split("-")
+        if any(not part or not part.isalnum() for part in parts):
+            return None
+        return normalized

-    const referrer_category = import.meta.client
-      ? (sessionStorage.getItem('referrer_category') ?? undefined)
-      : undefined
+    let referrer_category: string | undefined = undefined
+    if (import.meta.client) {
+      try {
+        referrer_category = sessionStorage.getItem('referrer_category') ?? undefined
+      } catch {
+        referrer_category = undefined
+      }
+    }

Conversation

OpenSauce commented Apr 5, 2026

Summary

What's in this PR (5 commits)

1. feat: add ai-ml-fundamentals display name and category metadata

2. fix: harden flashcard loader against missing required fields

3. feat: add per-category breakdowns to analytics summary endpoint

4. feat: track signup referrer_category for probe attribution

5. test: add fresh-category quiz-to-flashcard seeding integration test

Verification

Merge order (IMPORTANT)

Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

OpenSauce Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

OpenSauce Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

OpenSauce Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

OpenSauce Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `feat: add ai-ml-fundamentals display name and category metadata`

2. `fix: harden flashcard loader against missing required fields`

3. `feat: add per-category breakdowns to analytics summary endpoint`

4. `feat: track signup referrer_category for probe attribution`

5. `test: add fresh-category quiz-to-flashcard seeding integration test`