Conversation
There was a problem hiding this comment.
Pull request overview
Adds Wave 3 probe infrastructure to make the AI/ML Fundamentals category measurable: new category metadata/display name, loader hardening + tests, expanded analytics summary breakdowns, and signup attribution via referrer_category.
Changes:
- Add AI/ML Fundamentals category metadata/display name across backend + frontend.
- Harden YAML flashcard loader against malformed cards and add integration tests.
- Expand
/api/analytics/summarywith per-category quiz, anon lesson view, and flashcard review breakdowns; add signup referrer attribution plumbing.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| frontend/utils/categoryMeta.ts | Adds AI/ML Fundamentals category metadata (emoji/section/description/display name). |
| frontend/pages/lesson/[slug].vue | Stores lesson category in sessionStorage for later signup attribution. |
| frontend/composables/useAuth.ts | Sends optional referrer_category during signup. |
| backend/app/models.py | Adds display name mapping and extends signup input model with referrer_category. |
| backend/app/loader.py | Skips malformed flashcards instead of aborting the whole file. |
| backend/app/api/users.py | Emits a signup Event with payload.referrer_category. |
| backend/app/api/analytics.py | Adds new per-category breakdown metrics to analytics summary response. |
| backend/tests/test_loader.py | Adds tests for loader hardening and a fresh-category loader roundtrip. |
| backend/tests/api/test_users.py | Adds tests asserting signup event payload includes referrer_category (or null). |
| backend/tests/api/test_quizzes.py | Adds integration test ensuring quiz submission seeds UserFlashcard rows for linked flashcards. |
| backend/tests/api/test_analytics.py | Adds tests validating new per-category breakdown fields. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| data = yaml.safe_load(file.read_text()) or [] | ||
| if not isinstance(data, list): | ||
| logger.warning("Skipping %s: root is %s, not list", file, type(data).__name__) | ||
| continue | ||
| for raw in data: | ||
| for idx, raw in enumerate(data): | ||
| missing_key = None | ||
| for required in ("title", "Front", "Back"): | ||
| if required not in raw: | ||
| missing_key = required | ||
| break | ||
| if missing_key is not None: | ||
| logger.warning( | ||
| "Skipping card %d in %s: missing required field '%s'", | ||
| idx, | ||
| file, | ||
| missing_key, | ||
| ) | ||
| continue |
There was a problem hiding this comment.
load_yaml_flashcards() assumes each list item is a mapping; if a YAML file contains a non-dict element (string/number/null), the required not in raw checks will raise a TypeError and abort loading. Add an isinstance(raw, dict) guard (log + continue) before checking required keys so malformed elements don’t break the whole file.
There was a problem hiding this comment.
Applied in 519c81d. Added isinstance(raw, dict) guard with a warning-and-continue before the required-key loop. Non-dict list items now log expected dict, got <type> and skip cleanly.
| # Per-category anonymous lesson views (from event table) | ||
| category_anon_lesson_views = session.exec( | ||
| text(""" | ||
| SELECT payload->>'category' AS cat, COUNT(*) AS cnt | ||
| FROM event | ||
| WHERE event_type = 'lesson_view' | ||
| AND payload->>'category' IS NOT NULL | ||
| GROUP BY cat | ||
| ORDER BY cnt DESC | ||
| """) |
There was a problem hiding this comment.
anon_lesson_views_by_category is documented/used as anonymous lesson views, but the SQL query doesn’t filter user_id IS NULL, so authenticated lesson views will be included and the metric will be inflated (and inconsistent with anonymous_engagement.lesson_views). Add AND user_id IS NULL to the WHERE clause (or rename the field if including all users is intended).
There was a problem hiding this comment.
Applied in 519c81d. Good catch — added AND user_id IS NULL to the category_anon_lesson_views query. The field name was truthful in intent but the query wasn't enforcing it, which would have inflated the probe signal. Now consistent with anonymous_engagement.lesson_views.
| session.flush() | ||
|
|
||
| session_id = request.cookies.get("session_id", "") | ||
| event = Event( | ||
| session_id=session_id, | ||
| user_id=user.id, | ||
| event_type="signup", | ||
| payload={"referrer_category": data.referrer_category}, | ||
| ) | ||
| session.add(event) |
There was a problem hiding this comment.
The signup endpoint writes an Event using session_id = request.cookies.get('session_id', ''). If the cookie isn’t set, all such signup events will share an empty session_id, breaking session-based analytics (distinct session counts, attribution, funnels). Consider reusing the same session-id issuance logic as /events (generate UUID and set the session_id cookie on the response when missing) so every signup event has a real session id.
There was a problem hiding this comment.
Applied in 519c81d as a minimal fix: session_id = request.cookies.get('session_id') or str(uuid.uuid4()). For the common probe path (user views lesson → signs up), the session cookie is already set by the lesson-view /events call, so this fallback only fires for edge cases. Not refactoring to shared session-id issuance logic here because the signup endpoint does not issue cookies and doing so would expand scope beyond the probe.
backend/app/models.py
Outdated
| referrer_category: Optional[str] = None | ||
|
|
||
|
|
There was a problem hiding this comment.
UserCreate.referrer_category is accepted from the client and persisted into the event.payload without any validation/length bound. To avoid unbounded/invalid values polluting analytics (or excessively large JSON), add basic constraints (e.g., max_length and a slug-like pattern) or normalize unknown values to null.
| referrer_category: Optional[str] = None | |
| referrer_category: Optional[str] = PydanticField(default=None, max_length=64) | |
| @field_validator("referrer_category") | |
| @classmethod | |
| def normalize_referrer_category(cls, value: Optional[str]) -> Optional[str]: | |
| if value is None: | |
| return None | |
| normalized = value.strip().lower() | |
| if not normalized: | |
| return None | |
| if len(normalized) > 64: | |
| return None | |
| parts = normalized.split("-") | |
| if any(not part or not part.isalnum() for part in parts): | |
| return None | |
| return normalized |
There was a problem hiding this comment.
Applied your suggestion in 519c81d. referrer_category now has max_length=64 plus a field_validator that strips, lowercases, and normalizes unknown values (non-alnum parts, empty segments) to None. Unbounded or malformed values no longer pollute analytics.
- loader: guard isinstance(raw, dict) before required-key check (handles null/string/int YAML list items) - analytics: add user_id IS NULL filter to anon_lesson_views_by_category (was inflated by authed users) - users: generate UUID fallback for missing session_id cookie on signup events - models: validate referrer_category with max_length=64 + slug normalization
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| data = yaml.safe_load(file.read_text()) or [] | ||
| if not isinstance(data, list): | ||
| logger.warning("Skipping %s: root is %s, not list", file, type(data).__name__) | ||
| continue | ||
| for raw in data: | ||
| for idx, raw in enumerate(data): |
There was a problem hiding this comment.
load_yaml_flashcards() currently scans every *.yml/*.yaml under ROOT. Now that quiz definitions live under {category}/quizzes/*.yaml (dict-root), this will reliably trigger root is dict, not list warnings on every run and adds unnecessary IO. Consider excluding quizzes/ (and any other non-flashcard YAML dirs) from yaml_paths or narrowing the glob to known flashcard YAML locations so logs stay actionable.
| onMounted(async () => { | ||
| track('lesson_view', { category: lesson.value?.category, slug }) | ||
| if (lesson.value?.category) { | ||
| sessionStorage.setItem('referrer_category', lesson.value.category) |
There was a problem hiding this comment.
sessionStorage.setItem(...) can throw (e.g., storage disabled, privacy mode, quota exceeded) and would currently break the whole onMounted flow (including related fetches). Wrap the call in a try/catch (similar to other storage usage in the frontend) so lesson pages remain functional even if storage is unavailable.
| sessionStorage.setItem('referrer_category', lesson.value.category) | |
| try { | |
| sessionStorage.setItem('referrer_category', lesson.value.category) | |
| } catch { | |
| // non-fatal | |
| } |
| const referrer_category = import.meta.client | ||
| ? (sessionStorage.getItem('referrer_category') ?? undefined) | ||
| : undefined |
There was a problem hiding this comment.
sessionStorage.getItem(...) can throw in some browser/storage configurations (disabled storage, privacy mode). Since this runs during signup, a thrown exception would prevent account creation. Consider wrapping the read in try/catch and falling back to undefined if access fails.
| const referrer_category = import.meta.client | |
| ? (sessionStorage.getItem('referrer_category') ?? undefined) | |
| : undefined | |
| let referrer_category: string | undefined = undefined | |
| if (import.meta.client) { | |
| try { | |
| referrer_category = sessionStorage.getItem('referrer_category') ?? undefined | |
| } catch { | |
| referrer_category = undefined | |
| } | |
| } |
Summary
Backend + frontend infrastructure for the Wave 3 AI/ML Fundamentals probe. Ships the telemetry surface and content-loader hardening needed to make the "ship, measure for 2 weeks, decide" gate actually readable. Pairs with content submodule PR OpenSauce/dsa-flash-cards#36, which must merge first so this PR can bump the submodule ref.
No new migrations. No schema changes. Every change is additive.
What's in this PR (5 commits)
1.
feat: add ai-ml-fundamentals display name and category metadatabackend/app/models.py—DISPLAY_NAMES["ai-ml-fundamentals"] = "AI/ML Fundamentals"soslug_to_display_namedoesn't render the ugly "Ai Ml Fundamentals"frontend/utils/categoryMeta.ts— adds the category entry undersection: 'Other'(no new top-level AI section for the probe) with honest description copy: "AI/ML interview vocabulary — hands-on problems coming later."2.
fix: harden flashcard loader against missing required fieldsbackend/app/loader.pyload_yaml_flashcards()previously did unguardedraw["title"]/raw["Front"]/raw["Back"]per card. A typo in one card raisedKeyErrormid-loop, aborted the entire file, and leftyaml_keysincomplete — orphan removal then ran with a partial set. This PR wraps each card in a try/except that logs the file, card index, and missing key, then continues. Test intests/test_loader.pycovers the happy-path pass-through, missing Front/Back/title all skipped individually, and the loop continuing past bad cards.3.
feat: add per-category breakdowns to analytics summary endpoint/api/analytics/summaryalready returnedcategory_lesson_completions. This commit adds three parallel dicts:quiz_completions_by_category— UserQuizAttempt counts joined to Quiz.categoryanon_lesson_views_by_category— Event rows withevent_type='lesson_view'anduser_id IS NULL, grouped bypayload->>'category'flashcard_reviews_by_category— UserFlashcard review counts joined to Flashcard.categoryThese unlock the probe's 2-week signal: without them the decision was unreadable (only 1 of 6 needed signals was measurable). Tests seed two categories with completions/reviews and assert the dicts populate correctly.
4.
feat: track signup referrer_category for probe attributionThe highest-value probe signal is "did viewing the AI lesson drive signups?" Before this PR there was no way to answer that.
SignupInaccepts optionalreferrer_category; signup endpoint emits asignupEvent withpayload.referrer_categoryfrontend/pages/lesson/[slug].vuewrites the viewed lesson's category tosessionStorageon mount;frontend/composables/useAuth.tsreads it during signup and passes it in the request body5.
test: add fresh-category quiz-to-flashcard seeding integration testEnd-to-end test that seeds a fresh category with 1 lesson + 1 quiz + 10 flashcards all linked via
lesson_slug='ai-foundations', submits the quiz at passing grade, and asserts 10 UserFlashcard rows exist. Catches thelesson:/lesson_slug:YAML-key asymmetry foot-gun end-to-end.Verification
ruff check backend/— cleanpytest tests/test_loader.py tests/api/test_analytics.py tests/api/test_quizzes.py tests/api/test_users.py -v— 64/64 passingslug_to_display_name('ai-ml-fundamentals')returns"AI/ML Fundamentals"Merge order (IMPORTANT)
mainsystem-designfirst-two-weeks metrics so the 2-week gate has a real comparison numberContext
Full review:
~/.gstack/projects/OpenSauce-dsa-flash/main-autoplan-review-20260405.md— covers both CEO and Eng phases, dual-voice consensus tables, and the decision audit trail that led to the probe-first scope.