Skip to content

feat: integration test suite for Judge0 harness pipeline#160

Merged
OpenSauce merged 12 commits intomainfrom
feat/judge0-integration-tests
Apr 6, 2026
Merged

feat: integration test suite for Judge0 harness pipeline#160
OpenSauce merged 12 commits intomainfrom
feat/judge0-integration-tests

Conversation

@OpenSauce
Copy link
Copy Markdown
Owner

Summary

  • Opt-in integration test suite (make test-integration) that exercises the full submit → harness → Judge0 pipeline with real code execution
  • 32 tests: 6 synthetic fixtures × 4 languages (harness matrix) + 2 catalog problems × 4 languages (smoke tests)
  • Covers all param-type shapes: primitives, arrays, strings, ListNode, TreeNode, GraphNode
  • Per-language Judge0 resource limits (judge0_limits in LANGUAGE_CONFIG) so JS/Go/Java submissions no longer OOM
  • Bumped MAX_MAX_PROCESSES_AND_OR_THREADS from 256 → 512 to reduce intermittent Go pthread failures

Test plan

  • make test-integration passes 32/32 (requires make dev running)
  • Default pytest tests/ still passes (292 tests, integration excluded via marker)
  • Regression guard verified: breaking Java memory limit causes all Java tests to fail
  • Catalog smoke tests validate real YAML content (Two Sum, Maximum Depth of Binary Tree)

Known issue

Go tests have intermittent pthread_create failures (~1 in 3 full runs) due to Judge0 container thread contention. Tracked in Phase 2 follow-ups (TODOS.md).

Copilot AI review requested due to automatic review settings April 6, 2026 09:19
OpenSauce added 10 commits April 6, 2026 10:22
Uniform resource budget sized for CPython was starving JVM, Go runtime,
and V8 at startup. Move limits into LANGUAGE_CONFIG per language and
splat into the submission payload.

Add the first integration test (add-two-ints x 4 languages) that would
have caught this regression.

Additional fixes required on this cgroup-v2 host (isolate 1.8.1 requires
cgroup v1 --cg mode for per-cgroup memory limits; without it, per-process
VIRT limits via isolate -m apply instead):

- Judge0 MAX_MEMORY_LIMIT raised to 4096000 KB so Go 1.13 compiler can
  reserve enough virtual address space during compilation (go tool needs
  ~4 GB VIRT for cgo thread setup and parallel package builds)
- MAX_MAX_PROCESSES_AND_OR_THREADS raised to 256 for Go/JVM thread counts
- Java judge0_limits gains compiler_options: -J-Xshare:off -J-Xmx256m to
  disable OpenJDK 13 Class Data Sharing (which exhausts the VIRT budget
  during javac initialisation)
- Go harness: math.MaxInt (Go 1.17+) replaced with math.MaxInt64 which
  exists in Go 1.13.5 shipped with Judge0 1.13.1
tests/conftest.py imports app.api.users which transitively imports
app.api.problems (via app/api/__init__.py) before our integration
conftest loads. This means JUDGE0_URL is already bound to the Docker
DNS name "judge0-server" by the time we run. Env-var setdefault won't
help since the module constant is already evaluated.

Fix: directly patch problems_mod.JUDGE0_URL in the app fixture and
restore it on teardown.
…ist, max-tree-depth, graph-node-count fixtures
Go's runtime spawns many threads during compilation inside Judge0's
isolate sandbox. 256 was sometimes insufficient, causing intermittent
pthread_create failures. Bumped both the server-side
MAX_MAX_PROCESSES_AND_OR_THREADS and Go's per-submission limit to 512.
@OpenSauce OpenSauce force-pushed the feat/judge0-integration-tests branch from 6784795 to f0e9172 Compare April 6, 2026 09:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an opt-in integration test suite that executes real user submissions end-to-end through the /problems/{id}/submit endpoint (harness generation → Judge0 execution → result parsing), and adjusts Judge0/resource configuration so JS/Go/Java harnesses reliably run under Judge0.

Changes:

  • Add backend/tests/integration/ with fixture-driven harness-matrix tests (6 fixtures × 4 languages) plus catalog YAML smoke tests (2 problems × 4 languages).
  • Introduce per-language Judge0 submission limits via LANGUAGE_CONFIG["judge0_limits"] and apply them in the submit endpoint.
  • Update dev tooling to support integration runs: make test-integration, pytest marker exclusion, and publish Judge0 on host port 2358; adjust Judge0 service caps.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
backend/app/api/problems.py Adds per-language Judge0 limits and splats them into the Judge0 submission payload in submit_code.
docker-compose.yml Publishes judge0-server on host 2358 and raises Judge0 max memory / process caps via env.
backend/tests/integration/conftest.py Integration test app/client setup, TestContainers DB wiring, and Judge0 reachability skip.
backend/tests/integration/fixtures.py Declarative ProblemFixture registry covering primitives/arrays/strings/ListNode/TreeNode/GraphNode.
backend/tests/integration/test_harness_matrix.py Parametrized end-to-end harness matrix tests across 4 languages.
backend/tests/integration/test_catalog_smoke.py Loads real catalog YAML by title and smoke-tests execution across 4 languages.
backend/tests/integration/catalog_solutions.py Canonical known-good solutions for the catalog smoke tests.
backend/pytest.ini Registers integration marker and excludes integration tests by default via addopts.
Makefile Adds test-integration target that runs only the integration suite.
backend/app/harnesses/golang.py Fixes unused-import silencer constant to math.MaxInt64 for broader Go compatibility.
docs/superpowers/specs/... Design spec for the Judge0 config + integration test approach.
docs/superpowers/plans/... Implementation plan documenting the step-by-step rollout.
TODOS.md Tracks Phase 2 follow-ups for integration test expansion and CI wiring.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +14 to +18

from .fixtures import FIXTURES, ProblemFixture

pytestmark = pytest.mark.integration

Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports appear after the module-level pytestmark assignment, which triggers Ruff E402 (module level import not at top of file) and can fail CI. Move from .fixtures import ... up with the other imports and keep pytestmark after all imports.

Suggested change
from .fixtures import FIXTURES, ProblemFixture
pytestmark = pytest.mark.integration
from .fixtures import FIXTURES, ProblemFixture
pytestmark = pytest.mark.integration

Copilot uses AI. Check for mistakes.
Comment on lines +14 to +18

from .catalog_solutions import CATALOG_SOLUTIONS

pytestmark = pytest.mark.integration

Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports appear after the module-level pytestmark assignment, which triggers Ruff E402 (module level import not at top of file). Move from .catalog_solutions import ... up with the other imports and keep pytestmark after all imports.

Suggested change
from .catalog_solutions import CATALOG_SOLUTIONS
pytestmark = pytest.mark.integration
from .catalog_solutions import CATALOG_SOLUTIONS
pytestmark = pytest.mark.integration

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +46
def _find_yaml_by_title(title: str) -> Path:
"""Locate the YAML file in dsa-flash-cards/ whose title: matches."""
for path in CARDS_ROOT.rglob("problems/*.yaml"):
try:
data = yaml.safe_load(path.read_text()) or {}
except Exception:
continue
if isinstance(data, dict) and data.get("title") == title:
return path
raise FileNotFoundError(
f"Catalog smoke title {title!r} not found under {CARDS_ROOT}. "
f"Either the catalog drifted or the smoke title needs updating."
)


def _load_catalog_problem(session, title: str) -> CodingProblem:
"""Read the real YAML and insert a CodingProblem row into the test session."""
path = _find_yaml_by_title(title)
data = yaml.safe_load(path.read_text())

Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_find_yaml_by_title() scans the filesystem and parses YAML on every test invocation (8× per run currently), and then _load_catalog_problem() reads/parses the matched file again. This adds avoidable overhead to an already-slow integration suite. Consider caching a {title: path} index in a module/session-scoped fixture (or functools.lru_cache) and reusing the already-parsed YAML for the matched file.

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +6
Adding a smoke case: pick a catalog YAML, add its title as a new outer key
with {lang: solution} inner dict, then add the title to
test_catalog_smoke.CATALOG_SLUGS.
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring references test_catalog_smoke.CATALOG_SLUGS, but test_catalog_smoke.py parametrizes directly over CATALOG_SOLUTIONS.keys() and does not define CATALOG_SLUGS. Update the docstring to match the current test pattern (or add the referenced constant) to avoid misleading future edits.

Suggested change
Adding a smoke case: pick a catalog YAML, add its title as a new outer key
with {lang: solution} inner dict, then add the title to
test_catalog_smoke.CATALOG_SLUGS.
Adding a smoke case: pick a catalog YAML and add its title as a new outer key
with a {lang: solution} inner dict. The smoke tests parametrize directly over
`CATALOG_SOLUTIONS.keys()`, so no separate slug list needs to be updated.

Copilot uses AI. Check for mistakes.
profiles: ["prod", "dev"]
image: judge0/judge0:1.13.1
ports:
- "2358:2358"
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ports: ["2358:2358"] publishes Judge0 on all host interfaces. If this is intended only for local development/integration tests, consider binding to loopback (127.0.0.1:2358:2358) to avoid exposing an unauthenticated code-execution endpoint to the LAN.

Suggested change
- "2358:2358"
- "127.0.0.1:2358:2358"

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +97
"judge0_limits": {
"cpu_time_limit": 10,
# Go compiled binaries need modest runtime memory; 4096 MB is set
# here so the run step is not more restrictive than the compile step
# (which uses MAX_MEMORY_LIMIT = 4096000 to accommodate go tool).
"memory_limit": 4096000,
"max_processes_and_or_threads": 512,
"enable_per_process_and_thread_time_limit": True,
"enable_per_process_and_thread_memory_limit": True,
},
},
"java": {
"judge0_id": 62,
"monaco_mode": "java",
"judge0_limits": {
"cpu_time_limit": 10,
"memory_limit": 4096000,
"max_processes_and_or_threads": 256,
# OpenJDK 13 Class Data Sharing (CDS) tries to mmap a large shared
# archive during javac compilation. Under per-process VIRT limits
# on cgroup-v2 hosts, this CDS mmap exhausts the 4 GB budget.
# -J-Xshare:off disables CDS for the javac JVM; -J-Xmx256m caps
# javac's own Java heap so it leaves room for JVM native memory.
"compiler_options": "-J-Xshare:off -J-Xmx256m",
"enable_per_process_and_thread_time_limit": True,
"enable_per_process_and_thread_memory_limit": True,
},
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory_limit for Go and Java is set to 4,096,000 KB (4 GB). This is a very large per-submission budget on the production submit path and materially increases resource-exhaustion risk (especially on cgroup-v2 hosts where you note RLIMIT_AS/VIRT semantics). If the 4 GB requirement is only for compilation, consider lowering runtime memory_limit to the minimum that makes the integration suite pass (and rely on MAX_MEMORY_LIMIT as the server-side cap), or document clearly why runtime needs the full 4 GB.

Copilot uses AI. Check for mistakes.
@OpenSauce OpenSauce enabled auto-merge (squash) April 6, 2026 09:30
@OpenSauce OpenSauce merged commit 59577b0 into main Apr 6, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants