Skip to content

feat(wave3): telemetry and content-loader hardening for ai-ml-fundamentals probe#158

Merged
OpenSauce merged 7 commits intomainfrom
feat/wave3-telemetry-and-hardening
Apr 5, 2026
Merged

feat(wave3): telemetry and content-loader hardening for ai-ml-fundamentals probe#158
OpenSauce merged 7 commits intomainfrom
feat/wave3-telemetry-and-hardening

Conversation

@OpenSauce
Copy link
Copy Markdown
Owner

Summary

Backend + frontend infrastructure for the Wave 3 AI/ML Fundamentals probe. Ships the telemetry surface and content-loader hardening needed to make the "ship, measure for 2 weeks, decide" gate actually readable. Pairs with content submodule PR OpenSauce/dsa-flash-cards#36, which must merge first so this PR can bump the submodule ref.

No new migrations. No schema changes. Every change is additive.

What's in this PR (5 commits)

1. feat: add ai-ml-fundamentals display name and category metadata

  • backend/app/models.pyDISPLAY_NAMES["ai-ml-fundamentals"] = "AI/ML Fundamentals" so slug_to_display_name doesn't render the ugly "Ai Ml Fundamentals"
  • frontend/utils/categoryMeta.ts — adds the category entry under section: 'Other' (no new top-level AI section for the probe) with honest description copy: "AI/ML interview vocabulary — hands-on problems coming later."

2. fix: harden flashcard loader against missing required fields

backend/app/loader.py load_yaml_flashcards() previously did unguarded raw["title"] / raw["Front"] / raw["Back"] per card. A typo in one card raised KeyError mid-loop, aborted the entire file, and left yaml_keys incomplete — orphan removal then ran with a partial set. This PR wraps each card in a try/except that logs the file, card index, and missing key, then continues. Test in tests/test_loader.py covers the happy-path pass-through, missing Front/Back/title all skipped individually, and the loop continuing past bad cards.

3. feat: add per-category breakdowns to analytics summary endpoint

/api/analytics/summary already returned category_lesson_completions. This commit adds three parallel dicts:

  • quiz_completions_by_category — UserQuizAttempt counts joined to Quiz.category
  • anon_lesson_views_by_category — Event rows with event_type='lesson_view' and user_id IS NULL, grouped by payload->>'category'
  • flashcard_reviews_by_category — UserFlashcard review counts joined to Flashcard.category

These unlock the probe's 2-week signal: without them the decision was unreadable (only 1 of 6 needed signals was measurable). Tests seed two categories with completions/reviews and assert the dicts populate correctly.

4. feat: track signup referrer_category for probe attribution

The highest-value probe signal is "did viewing the AI lesson drive signups?" Before this PR there was no way to answer that.

  • Backend: SignupIn accepts optional referrer_category; signup endpoint emits a signup Event with payload.referrer_category
  • Frontend: frontend/pages/lesson/[slug].vue writes the viewed lesson's category to sessionStorage on mount; frontend/composables/useAuth.ts reads it during signup and passes it in the request body
  • Test asserts the Event row is written with the referrer payload

5. test: add fresh-category quiz-to-flashcard seeding integration test

End-to-end test that seeds a fresh category with 1 lesson + 1 quiz + 10 flashcards all linked via lesson_slug='ai-foundations', submits the quiz at passing grade, and asserts 10 UserFlashcard rows exist. Catches the lesson: / lesson_slug: YAML-key asymmetry foot-gun end-to-end.

Verification

  • ruff check backend/ — clean
  • pytest tests/test_loader.py tests/api/test_analytics.py tests/api/test_quizzes.py tests/api/test_users.py -v — 64/64 passing
  • Manual: slug_to_display_name('ai-ml-fundamentals') returns "AI/ML Fundamentals"

Merge order (IMPORTANT)

  1. Merge content PR content: add ai-foundations chunk for Wave 3 AI/ML probe dsa-flash-cards#36 first
  2. Then bump the submodule ref on this branch to point at submodule main
  3. Then merge this PR
  4. Post-deploy: run baseline SQL against prod for system-design first-two-weeks metrics so the 2-week gate has a real comparison number

Context

Full review: ~/.gstack/projects/OpenSauce-dsa-flash/main-autoplan-review-20260405.md — covers both CEO and Eng phases, dual-voice consensus tables, and the decision audit trail that led to the probe-first scope.

Copilot AI review requested due to automatic review settings April 5, 2026 20:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Wave 3 probe infrastructure to make the AI/ML Fundamentals category measurable: new category metadata/display name, loader hardening + tests, expanded analytics summary breakdowns, and signup attribution via referrer_category.

Changes:

  • Add AI/ML Fundamentals category metadata/display name across backend + frontend.
  • Harden YAML flashcard loader against malformed cards and add integration tests.
  • Expand /api/analytics/summary with per-category quiz, anon lesson view, and flashcard review breakdowns; add signup referrer attribution plumbing.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
frontend/utils/categoryMeta.ts Adds AI/ML Fundamentals category metadata (emoji/section/description/display name).
frontend/pages/lesson/[slug].vue Stores lesson category in sessionStorage for later signup attribution.
frontend/composables/useAuth.ts Sends optional referrer_category during signup.
backend/app/models.py Adds display name mapping and extends signup input model with referrer_category.
backend/app/loader.py Skips malformed flashcards instead of aborting the whole file.
backend/app/api/users.py Emits a signup Event with payload.referrer_category.
backend/app/api/analytics.py Adds new per-category breakdown metrics to analytics summary response.
backend/tests/test_loader.py Adds tests for loader hardening and a fresh-category loader roundtrip.
backend/tests/api/test_users.py Adds tests asserting signup event payload includes referrer_category (or null).
backend/tests/api/test_quizzes.py Adds integration test ensuring quiz submission seeds UserFlashcard rows for linked flashcards.
backend/tests/api/test_analytics.py Adds tests validating new per-category breakdown fields.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 72 to +89
data = yaml.safe_load(file.read_text()) or []
if not isinstance(data, list):
logger.warning("Skipping %s: root is %s, not list", file, type(data).__name__)
continue
for raw in data:
for idx, raw in enumerate(data):
missing_key = None
for required in ("title", "Front", "Back"):
if required not in raw:
missing_key = required
break
if missing_key is not None:
logger.warning(
"Skipping card %d in %s: missing required field '%s'",
idx,
file,
missing_key,
)
continue
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_yaml_flashcards() assumes each list item is a mapping; if a YAML file contains a non-dict element (string/number/null), the required not in raw checks will raise a TypeError and abort loading. Add an isinstance(raw, dict) guard (log + continue) before checking required keys so malformed elements don’t break the whole file.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 519c81d. Added isinstance(raw, dict) guard with a warning-and-continue before the required-key loop. Non-dict list items now log expected dict, got <type> and skip cleanly.

Comment on lines +240 to +249
# Per-category anonymous lesson views (from event table)
category_anon_lesson_views = session.exec(
text("""
SELECT payload->>'category' AS cat, COUNT(*) AS cnt
FROM event
WHERE event_type = 'lesson_view'
AND payload->>'category' IS NOT NULL
GROUP BY cat
ORDER BY cnt DESC
""")
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anon_lesson_views_by_category is documented/used as anonymous lesson views, but the SQL query doesn’t filter user_id IS NULL, so authenticated lesson views will be included and the metric will be inflated (and inconsistent with anonymous_engagement.lesson_views). Add AND user_id IS NULL to the WHERE clause (or rename the field if including all users is intended).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 519c81d. Good catch — added AND user_id IS NULL to the category_anon_lesson_views query. The field name was truthful in intent but the query wasn't enforcing it, which would have inflated the probe signal. Now consistent with anonymous_engagement.lesson_views.

Comment on lines +140 to +149
session.flush()

session_id = request.cookies.get("session_id", "")
event = Event(
session_id=session_id,
user_id=user.id,
event_type="signup",
payload={"referrer_category": data.referrer_category},
)
session.add(event)
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signup endpoint writes an Event using session_id = request.cookies.get('session_id', ''). If the cookie isn’t set, all such signup events will share an empty session_id, breaking session-based analytics (distinct session counts, attribution, funnels). Consider reusing the same session-id issuance logic as /events (generate UUID and set the session_id cookie on the response when missing) so every signup event has a real session id.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 519c81d as a minimal fix: session_id = request.cookies.get('session_id') or str(uuid.uuid4()). For the common probe path (user views lesson → signs up), the session cookie is already set by the lesson-view /events call, so this fallback only fires for edge cases. Not refactoring to shared session-id issuance logic here because the signup endpoint does not issue cookies and doing so would expand scope beyond the probe.

Comment on lines 60 to 62
referrer_category: Optional[str] = None


Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UserCreate.referrer_category is accepted from the client and persisted into the event.payload without any validation/length bound. To avoid unbounded/invalid values polluting analytics (or excessively large JSON), add basic constraints (e.g., max_length and a slug-like pattern) or normalize unknown values to null.

Suggested change
referrer_category: Optional[str] = None
referrer_category: Optional[str] = PydanticField(default=None, max_length=64)
@field_validator("referrer_category")
@classmethod
def normalize_referrer_category(cls, value: Optional[str]) -> Optional[str]:
if value is None:
return None
normalized = value.strip().lower()
if not normalized:
return None
if len(normalized) > 64:
return None
parts = normalized.split("-")
if any(not part or not part.isalnum() for part in parts):
return None
return normalized

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied your suggestion in 519c81d. referrer_category now has max_length=64 plus a field_validator that strips, lowercases, and normalizes unknown values (non-alnum parts, empty segments) to None. Unbounded or malformed values no longer pollute analytics.

- loader: guard isinstance(raw, dict) before required-key check (handles null/string/int YAML list items)
- analytics: add user_id IS NULL filter to anon_lesson_views_by_category (was inflated by authed users)
- users: generate UUID fallback for missing session_id cookie on signup events
- models: validate referrer_category with max_length=64 + slug normalization
Copilot AI review requested due to automatic review settings April 5, 2026 21:14
@OpenSauce OpenSauce merged commit 93012d8 into main Apr 5, 2026
6 checks passed
@OpenSauce OpenSauce deleted the feat/wave3-telemetry-and-hardening branch April 5, 2026 21:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 72 to +76
data = yaml.safe_load(file.read_text()) or []
if not isinstance(data, list):
logger.warning("Skipping %s: root is %s, not list", file, type(data).__name__)
continue
for raw in data:
for idx, raw in enumerate(data):
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_yaml_flashcards() currently scans every *.yml/*.yaml under ROOT. Now that quiz definitions live under {category}/quizzes/*.yaml (dict-root), this will reliably trigger root is dict, not list warnings on every run and adds unnecessary IO. Consider excluding quizzes/ (and any other non-flashcard YAML dirs) from yaml_paths or narrowing the glob to known flashcard YAML locations so logs stay actionable.

Copilot uses AI. Check for mistakes.
onMounted(async () => {
track('lesson_view', { category: lesson.value?.category, slug })
if (lesson.value?.category) {
sessionStorage.setItem('referrer_category', lesson.value.category)
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sessionStorage.setItem(...) can throw (e.g., storage disabled, privacy mode, quota exceeded) and would currently break the whole onMounted flow (including related fetches). Wrap the call in a try/catch (similar to other storage usage in the frontend) so lesson pages remain functional even if storage is unavailable.

Suggested change
sessionStorage.setItem('referrer_category', lesson.value.category)
try {
sessionStorage.setItem('referrer_category', lesson.value.category)
} catch {
// non-fatal
}

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +48
const referrer_category = import.meta.client
? (sessionStorage.getItem('referrer_category') ?? undefined)
: undefined
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sessionStorage.getItem(...) can throw in some browser/storage configurations (disabled storage, privacy mode). Since this runs during signup, a thrown exception would prevent account creation. Consider wrapping the read in try/catch and falling back to undefined if access fails.

Suggested change
const referrer_category = import.meta.client
? (sessionStorage.getItem('referrer_category') ?? undefined)
: undefined
let referrer_category: string | undefined = undefined
if (import.meta.client) {
try {
referrer_category = sessionStorage.getItem('referrer_category') ?? undefined
} catch {
referrer_category = undefined
}
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants