feat: integration test suite for Judge0 harness pipeline#160
feat: integration test suite for Judge0 harness pipeline#160
Conversation
Uniform resource budget sized for CPython was starving JVM, Go runtime, and V8 at startup. Move limits into LANGUAGE_CONFIG per language and splat into the submission payload. Add the first integration test (add-two-ints x 4 languages) that would have caught this regression. Additional fixes required on this cgroup-v2 host (isolate 1.8.1 requires cgroup v1 --cg mode for per-cgroup memory limits; without it, per-process VIRT limits via isolate -m apply instead): - Judge0 MAX_MEMORY_LIMIT raised to 4096000 KB so Go 1.13 compiler can reserve enough virtual address space during compilation (go tool needs ~4 GB VIRT for cgo thread setup and parallel package builds) - MAX_MAX_PROCESSES_AND_OR_THREADS raised to 256 for Go/JVM thread counts - Java judge0_limits gains compiler_options: -J-Xshare:off -J-Xmx256m to disable OpenJDK 13 Class Data Sharing (which exhausts the VIRT budget during javac initialisation) - Go harness: math.MaxInt (Go 1.17+) replaced with math.MaxInt64 which exists in Go 1.13.5 shipped with Judge0 1.13.1
tests/conftest.py imports app.api.users which transitively imports app.api.problems (via app/api/__init__.py) before our integration conftest loads. This means JUDGE0_URL is already bound to the Docker DNS name "judge0-server" by the time we run. Env-var setdefault won't help since the module constant is already evaluated. Fix: directly patch problems_mod.JUDGE0_URL in the app fixture and restore it on teardown.
…ist, max-tree-depth, graph-node-count fixtures
Go's runtime spawns many threads during compilation inside Judge0's isolate sandbox. 256 was sometimes insufficient, causing intermittent pthread_create failures. Bumped both the server-side MAX_MAX_PROCESSES_AND_OR_THREADS and Go's per-submission limit to 512.
6784795 to
f0e9172
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds an opt-in integration test suite that executes real user submissions end-to-end through the /problems/{id}/submit endpoint (harness generation → Judge0 execution → result parsing), and adjusts Judge0/resource configuration so JS/Go/Java harnesses reliably run under Judge0.
Changes:
- Add
backend/tests/integration/with fixture-driven harness-matrix tests (6 fixtures × 4 languages) plus catalog YAML smoke tests (2 problems × 4 languages). - Introduce per-language Judge0 submission limits via
LANGUAGE_CONFIG["judge0_limits"]and apply them in the submit endpoint. - Update dev tooling to support integration runs:
make test-integration, pytest marker exclusion, and publish Judge0 on host port 2358; adjust Judge0 service caps.
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
backend/app/api/problems.py |
Adds per-language Judge0 limits and splats them into the Judge0 submission payload in submit_code. |
docker-compose.yml |
Publishes judge0-server on host 2358 and raises Judge0 max memory / process caps via env. |
backend/tests/integration/conftest.py |
Integration test app/client setup, TestContainers DB wiring, and Judge0 reachability skip. |
backend/tests/integration/fixtures.py |
Declarative ProblemFixture registry covering primitives/arrays/strings/ListNode/TreeNode/GraphNode. |
backend/tests/integration/test_harness_matrix.py |
Parametrized end-to-end harness matrix tests across 4 languages. |
backend/tests/integration/test_catalog_smoke.py |
Loads real catalog YAML by title and smoke-tests execution across 4 languages. |
backend/tests/integration/catalog_solutions.py |
Canonical known-good solutions for the catalog smoke tests. |
backend/pytest.ini |
Registers integration marker and excludes integration tests by default via addopts. |
Makefile |
Adds test-integration target that runs only the integration suite. |
backend/app/harnesses/golang.py |
Fixes unused-import silencer constant to math.MaxInt64 for broader Go compatibility. |
docs/superpowers/specs/... |
Design spec for the Judge0 config + integration test approach. |
docs/superpowers/plans/... |
Implementation plan documenting the step-by-step rollout. |
TODOS.md |
Tracks Phase 2 follow-ups for integration test expansion and CI wiring. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| from .fixtures import FIXTURES, ProblemFixture | ||
|
|
||
| pytestmark = pytest.mark.integration | ||
|
|
There was a problem hiding this comment.
Imports appear after the module-level pytestmark assignment, which triggers Ruff E402 (module level import not at top of file) and can fail CI. Move from .fixtures import ... up with the other imports and keep pytestmark after all imports.
| from .fixtures import FIXTURES, ProblemFixture | |
| pytestmark = pytest.mark.integration | |
| from .fixtures import FIXTURES, ProblemFixture | |
| pytestmark = pytest.mark.integration |
|
|
||
| from .catalog_solutions import CATALOG_SOLUTIONS | ||
|
|
||
| pytestmark = pytest.mark.integration | ||
|
|
There was a problem hiding this comment.
Imports appear after the module-level pytestmark assignment, which triggers Ruff E402 (module level import not at top of file). Move from .catalog_solutions import ... up with the other imports and keep pytestmark after all imports.
| from .catalog_solutions import CATALOG_SOLUTIONS | |
| pytestmark = pytest.mark.integration | |
| from .catalog_solutions import CATALOG_SOLUTIONS | |
| pytestmark = pytest.mark.integration |
| def _find_yaml_by_title(title: str) -> Path: | ||
| """Locate the YAML file in dsa-flash-cards/ whose title: matches.""" | ||
| for path in CARDS_ROOT.rglob("problems/*.yaml"): | ||
| try: | ||
| data = yaml.safe_load(path.read_text()) or {} | ||
| except Exception: | ||
| continue | ||
| if isinstance(data, dict) and data.get("title") == title: | ||
| return path | ||
| raise FileNotFoundError( | ||
| f"Catalog smoke title {title!r} not found under {CARDS_ROOT}. " | ||
| f"Either the catalog drifted or the smoke title needs updating." | ||
| ) | ||
|
|
||
|
|
||
| def _load_catalog_problem(session, title: str) -> CodingProblem: | ||
| """Read the real YAML and insert a CodingProblem row into the test session.""" | ||
| path = _find_yaml_by_title(title) | ||
| data = yaml.safe_load(path.read_text()) | ||
|
|
There was a problem hiding this comment.
_find_yaml_by_title() scans the filesystem and parses YAML on every test invocation (8× per run currently), and then _load_catalog_problem() reads/parses the matched file again. This adds avoidable overhead to an already-slow integration suite. Consider caching a {title: path} index in a module/session-scoped fixture (or functools.lru_cache) and reusing the already-parsed YAML for the matched file.
| Adding a smoke case: pick a catalog YAML, add its title as a new outer key | ||
| with {lang: solution} inner dict, then add the title to | ||
| test_catalog_smoke.CATALOG_SLUGS. |
There was a problem hiding this comment.
The module docstring references test_catalog_smoke.CATALOG_SLUGS, but test_catalog_smoke.py parametrizes directly over CATALOG_SOLUTIONS.keys() and does not define CATALOG_SLUGS. Update the docstring to match the current test pattern (or add the referenced constant) to avoid misleading future edits.
| Adding a smoke case: pick a catalog YAML, add its title as a new outer key | |
| with {lang: solution} inner dict, then add the title to | |
| test_catalog_smoke.CATALOG_SLUGS. | |
| Adding a smoke case: pick a catalog YAML and add its title as a new outer key | |
| with a {lang: solution} inner dict. The smoke tests parametrize directly over | |
| `CATALOG_SOLUTIONS.keys()`, so no separate slug list needs to be updated. |
docker-compose.yml
Outdated
| profiles: ["prod", "dev"] | ||
| image: judge0/judge0:1.13.1 | ||
| ports: | ||
| - "2358:2358" |
There was a problem hiding this comment.
ports: ["2358:2358"] publishes Judge0 on all host interfaces. If this is intended only for local development/integration tests, consider binding to loopback (127.0.0.1:2358:2358) to avoid exposing an unauthenticated code-execution endpoint to the LAN.
| - "2358:2358" | |
| - "127.0.0.1:2358:2358" |
| "judge0_limits": { | ||
| "cpu_time_limit": 10, | ||
| # Go compiled binaries need modest runtime memory; 4096 MB is set | ||
| # here so the run step is not more restrictive than the compile step | ||
| # (which uses MAX_MEMORY_LIMIT = 4096000 to accommodate go tool). | ||
| "memory_limit": 4096000, | ||
| "max_processes_and_or_threads": 512, | ||
| "enable_per_process_and_thread_time_limit": True, | ||
| "enable_per_process_and_thread_memory_limit": True, | ||
| }, | ||
| }, | ||
| "java": { | ||
| "judge0_id": 62, | ||
| "monaco_mode": "java", | ||
| "judge0_limits": { | ||
| "cpu_time_limit": 10, | ||
| "memory_limit": 4096000, | ||
| "max_processes_and_or_threads": 256, | ||
| # OpenJDK 13 Class Data Sharing (CDS) tries to mmap a large shared | ||
| # archive during javac compilation. Under per-process VIRT limits | ||
| # on cgroup-v2 hosts, this CDS mmap exhausts the 4 GB budget. | ||
| # -J-Xshare:off disables CDS for the javac JVM; -J-Xmx256m caps | ||
| # javac's own Java heap so it leaves room for JVM native memory. | ||
| "compiler_options": "-J-Xshare:off -J-Xmx256m", | ||
| "enable_per_process_and_thread_time_limit": True, | ||
| "enable_per_process_and_thread_memory_limit": True, | ||
| }, |
There was a problem hiding this comment.
memory_limit for Go and Java is set to 4,096,000 KB (4 GB). This is a very large per-submission budget on the production submit path and materially increases resource-exhaustion risk (especially on cgroup-v2 hosts where you note RLIMIT_AS/VIRT semantics). If the 4 GB requirement is only for compilation, consider lowering runtime memory_limit to the minimum that makes the integration suite pass (and rely on MAX_MEMORY_LIMIT as the server-side cap), or document clearly why runtime needs the full 4 GB.
Summary
make test-integration) that exercises the full submit → harness → Judge0 pipeline with real code executionjudge0_limitsinLANGUAGE_CONFIG) so JS/Go/Java submissions no longer OOMMAX_MAX_PROCESSES_AND_OR_THREADSfrom 256 → 512 to reduce intermittent Go pthread failuresTest plan
make test-integrationpasses 32/32 (requiresmake devrunning)pytest tests/still passes (292 tests, integration excluded via marker)Known issue
Go tests have intermittent
pthread_createfailures (~1 in 3 full runs) due to Judge0 container thread contention. Tracked in Phase 2 follow-ups (TODOS.md).