feat: integration test suite for Judge0 harness pipeline by OpenSauce · Pull Request #160 · OpenSauce/dsa-flash

OpenSauce · 2026-04-06T09:19:56Z

Summary

Opt-in integration test suite (make test-integration) that exercises the full submit → harness → Judge0 pipeline with real code execution
32 tests: 6 synthetic fixtures × 4 languages (harness matrix) + 2 catalog problems × 4 languages (smoke tests)
Covers all param-type shapes: primitives, arrays, strings, ListNode, TreeNode, GraphNode
Per-language Judge0 resource limits (judge0_limits in LANGUAGE_CONFIG) so JS/Go/Java submissions no longer OOM
Bumped MAX_MAX_PROCESSES_AND_OR_THREADS from 256 → 512 to reduce intermittent Go pthread failures

Test plan

make test-integration passes 32/32 (requires make dev running)
Default pytest tests/ still passes (292 tests, integration excluded via marker)
Regression guard verified: breaking Java memory limit causes all Java tests to fail
Catalog smoke tests validate real YAML content (Two Sum, Maximum Depth of Binary Tree)

Known issue

Go tests have intermittent pthread_create failures (~1 in 3 full runs) due to Judge0 container thread contention. Tracked in Phase 2 follow-ups (TODOS.md).

Uniform resource budget sized for CPython was starving JVM, Go runtime, and V8 at startup. Move limits into LANGUAGE_CONFIG per language and splat into the submission payload. Add the first integration test (add-two-ints x 4 languages) that would have caught this regression. Additional fixes required on this cgroup-v2 host (isolate 1.8.1 requires cgroup v1 --cg mode for per-cgroup memory limits; without it, per-process VIRT limits via isolate -m apply instead): - Judge0 MAX_MEMORY_LIMIT raised to 4096000 KB so Go 1.13 compiler can reserve enough virtual address space during compilation (go tool needs ~4 GB VIRT for cgo thread setup and parallel package builds) - MAX_MAX_PROCESSES_AND_OR_THREADS raised to 256 for Go/JVM thread counts - Java judge0_limits gains compiler_options: -J-Xshare:off -J-Xmx256m to disable OpenJDK 13 Class Data Sharing (which exhausts the VIRT budget during javac initialisation) - Go harness: math.MaxInt (Go 1.17+) replaced with math.MaxInt64 which exists in Go 1.13.5 shipped with Judge0 1.13.1

tests/conftest.py imports app.api.users which transitively imports app.api.problems (via app/api/__init__.py) before our integration conftest loads. This means JUDGE0_URL is already bound to the Docker DNS name "judge0-server" by the time we run. Env-var setdefault won't help since the module constant is already evaluated. Fix: directly patch problems_mod.JUDGE0_URL in the app fixture and restore it on teardown.

…ist, max-tree-depth, graph-node-count fixtures

Go's runtime spawns many threads during compilation inside Judge0's isolate sandbox. 256 was sometimes insufficient, causing intermittent pthread_create failures. Bumped both the server-side MAX_MAX_PROCESSES_AND_OR_THREADS and Go's per-submission limit to 512.

Copilot

Pull request overview

This PR adds an opt-in integration test suite that executes real user submissions end-to-end through the /problems/{id}/submit endpoint (harness generation → Judge0 execution → result parsing), and adjusts Judge0/resource configuration so JS/Go/Java harnesses reliably run under Judge0.

Changes:

Add backend/tests/integration/ with fixture-driven harness-matrix tests (6 fixtures × 4 languages) plus catalog YAML smoke tests (2 problems × 4 languages).
Introduce per-language Judge0 submission limits via LANGUAGE_CONFIG["judge0_limits"] and apply them in the submit endpoint.
Update dev tooling to support integration runs: make test-integration, pytest marker exclusion, and publish Judge0 on host port 2358; adjust Judge0 service caps.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`backend/app/api/problems.py`	Adds per-language Judge0 limits and splats them into the Judge0 submission payload in `submit_code`.
`docker-compose.yml`	Publishes `judge0-server` on host `2358` and raises Judge0 max memory / process caps via env.
`backend/tests/integration/conftest.py`	Integration test app/client setup, TestContainers DB wiring, and Judge0 reachability skip.
`backend/tests/integration/fixtures.py`	Declarative `ProblemFixture` registry covering primitives/arrays/strings/ListNode/TreeNode/GraphNode.
`backend/tests/integration/test_harness_matrix.py`	Parametrized end-to-end harness matrix tests across 4 languages.
`backend/tests/integration/test_catalog_smoke.py`	Loads real catalog YAML by title and smoke-tests execution across 4 languages.
`backend/tests/integration/catalog_solutions.py`	Canonical known-good solutions for the catalog smoke tests.
`backend/pytest.ini`	Registers `integration` marker and excludes integration tests by default via `addopts`.
`Makefile`	Adds `test-integration` target that runs only the integration suite.
`backend/app/harnesses/golang.py`	Fixes unused-import silencer constant to `math.MaxInt64` for broader Go compatibility.
`docs/superpowers/specs/...`	Design spec for the Judge0 config + integration test approach.
`docs/superpowers/plans/...`	Implementation plan documenting the step-by-step rollout.
`TODOS.md`	Tracks Phase 2 follow-ups for integration test expansion and CI wiring.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-06T09:26:06Z

backend/tests/integration/test_harness_matrix.py

+
+from .fixtures import FIXTURES, ProblemFixture
+
+pytestmark = pytest.mark.integration
+


Imports appear after the module-level pytestmark assignment, which triggers Ruff E402 (module level import not at top of file) and can fail CI. Move from .fixtures import ... up with the other imports and keep pytestmark after all imports.

Suggested change

from .fixtures import FIXTURES, ProblemFixture

pytestmark = pytest.mark.integration

from .fixtures import FIXTURES, ProblemFixture

pytestmark = pytest.mark.integration

Copilot · 2026-04-06T09:26:06Z

backend/tests/integration/test_catalog_smoke.py

+
+from .catalog_solutions import CATALOG_SOLUTIONS
+
+pytestmark = pytest.mark.integration
+


Imports appear after the module-level pytestmark assignment, which triggers Ruff E402 (module level import not at top of file). Move from .catalog_solutions import ... up with the other imports and keep pytestmark after all imports.

Suggested change

from .catalog_solutions import CATALOG_SOLUTIONS

pytestmark = pytest.mark.integration

from .catalog_solutions import CATALOG_SOLUTIONS

pytestmark = pytest.mark.integration

Copilot · 2026-04-06T09:26:06Z

backend/tests/integration/test_catalog_smoke.py

+def _find_yaml_by_title(title: str) -> Path:
+    """Locate the YAML file in dsa-flash-cards/ whose title: matches."""
+    for path in CARDS_ROOT.rglob("problems/*.yaml"):
+        try:
+            data = yaml.safe_load(path.read_text()) or {}
+        except Exception:
+            continue
+        if isinstance(data, dict) and data.get("title") == title:
+            return path
+    raise FileNotFoundError(
+        f"Catalog smoke title {title!r} not found under {CARDS_ROOT}. "
+        f"Either the catalog drifted or the smoke title needs updating."
+    )
+
+
+def _load_catalog_problem(session, title: str) -> CodingProblem:
+    """Read the real YAML and insert a CodingProblem row into the test session."""
+    path = _find_yaml_by_title(title)
+    data = yaml.safe_load(path.read_text())
+


_find_yaml_by_title() scans the filesystem and parses YAML on every test invocation (8× per run currently), and then _load_catalog_problem() reads/parses the matched file again. This adds avoidable overhead to an already-slow integration suite. Consider caching a {title: path} index in a module/session-scoped fixture (or functools.lru_cache) and reusing the already-parsed YAML for the matched file.

Copilot · 2026-04-06T09:26:07Z

backend/tests/integration/catalog_solutions.py

+Adding a smoke case: pick a catalog YAML, add its title as a new outer key
+with {lang: solution} inner dict, then add the title to
+test_catalog_smoke.CATALOG_SLUGS.


The module docstring references test_catalog_smoke.CATALOG_SLUGS, but test_catalog_smoke.py parametrizes directly over CATALOG_SOLUTIONS.keys() and does not define CATALOG_SLUGS. Update the docstring to match the current test pattern (or add the referenced constant) to avoid misleading future edits.

Suggested change

Adding a smoke case: pick a catalog YAML, add its title as a new outer key

with {lang: solution} inner dict, then add the title to

test_catalog_smoke.CATALOG_SLUGS.

Adding a smoke case: pick a catalog YAML and add its title as a new outer key

with a {lang: solution} inner dict. The smoke tests parametrize directly over

`CATALOG_SOLUTIONS.keys()`, so no separate slug list needs to be updated.

Copilot · 2026-04-06T09:26:07Z

docker-compose.yml

    profiles: ["prod", "dev"]
    image: judge0/judge0:1.13.1
+    ports:
+      - "2358:2358"


ports: ["2358:2358"] publishes Judge0 on all host interfaces. If this is intended only for local development/integration tests, consider binding to loopback (127.0.0.1:2358:2358) to avoid exposing an unauthenticated code-execution endpoint to the LAN.

Suggested change

- "2358:2358"

- "127.0.0.1:2358:2358"

Copilot · 2026-04-06T09:26:07Z

backend/app/api/problems.py

+        "judge0_limits": {
+            "cpu_time_limit": 10,
+            # Go compiled binaries need modest runtime memory; 4096 MB is set
+            # here so the run step is not more restrictive than the compile step
+            # (which uses MAX_MEMORY_LIMIT = 4096000 to accommodate go tool).
+            "memory_limit": 4096000,
+            "max_processes_and_or_threads": 512,
+            "enable_per_process_and_thread_time_limit": True,
+            "enable_per_process_and_thread_memory_limit": True,
+        },
+    },
+    "java": {
+        "judge0_id": 62,
+        "monaco_mode": "java",
+        "judge0_limits": {
+            "cpu_time_limit": 10,
+            "memory_limit": 4096000,
+            "max_processes_and_or_threads": 256,
+            # OpenJDK 13 Class Data Sharing (CDS) tries to mmap a large shared
+            # archive during javac compilation. Under per-process VIRT limits
+            # on cgroup-v2 hosts, this CDS mmap exhausts the 4 GB budget.
+            # -J-Xshare:off disables CDS for the javac JVM; -J-Xmx256m caps
+            # javac's own Java heap so it leaves room for JVM native memory.
+            "compiler_options": "-J-Xshare:off -J-Xmx256m",
+            "enable_per_process_and_thread_time_limit": True,
+            "enable_per_process_and_thread_memory_limit": True,
+        },


memory_limit for Go and Java is set to 4,096,000 KB (4 GB). This is a very large per-submission budget on the production submit path and materially increases resource-exhaustion risk (especially on cgroup-v2 hosts where you note RLIMIT_AS/VIRT semantics). If the 4 GB requirement is only for compilation, consider lowering runtime memory_limit to the minimum that makes the integration suite pass (and rely on MAX_MEMORY_LIMIT as the server-side cap), or document clearly why runtime needs the full 4 GB.

Copilot AI review requested due to automatic review settings April 6, 2026 09:19

Copilot started reviewing on behalf of OpenSauce April 6, 2026 09:20 View session

OpenSauce added 10 commits April 6, 2026 10:22

chore(judge0): publish port 2358 to host for integration tests

3921047

test: register integration marker and exclude from default run

58be0c4

test: add test-integration Makefile target

47291d3

test: add integration conftest with Judge0 reachability check

592cffe

test: add ProblemFixture dataclass and add-two-ints fixture

de800cc

test(integration): add reverse-array, is-palindrome, reverse-linked-l…

6c7901c

…ist, max-tree-depth, graph-node-count fixtures

test(integration): add catalog smoke tests (two-sum, max-depth)

8f1df54

OpenSauce force-pushed the feat/judge0-integration-tests branch from 6784795 to f0e9172 Compare April 6, 2026 09:22

fix: move pytestmark after imports to satisfy ruff E402

c272344

Copilot AI reviewed Apr 6, 2026

View reviewed changes

fix: bind Judge0 port to loopback, fix stale docstring

f9ec590

OpenSauce enabled auto-merge (squash) April 6, 2026 09:30

OpenSauce merged commit 59577b0 into main Apr 6, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: integration test suite for Judge0 harness pipeline#160

feat: integration test suite for Judge0 harness pipeline#160
OpenSauce merged 12 commits intomainfrom
feat/judge0-integration-tests

OpenSauce commented Apr 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		from .fixtures import FIXTURES, ProblemFixture

		pytestmark = pytest.mark.integration


		from .catalog_solutions import CATALOG_SOLUTIONS

		pytestmark = pytest.mark.integration

Conversation

OpenSauce commented Apr 6, 2026

Summary

Test plan

Known issue

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants