Skip to content

Commit 9642be6

Browse files
yaooqinnCopilot
andcommitted
Add sql-jobs command to show jobs for a SQL execution
Fetches all job IDs (succeeded/failed/running) from a SQL execution and displays their details. Uses bulk list_jobs + client-side filter for efficiency and gracefully handles missing job IDs. Co-authored-by: Copilot <[email protected]>
1 parent 6306059 commit 9642be6

4 files changed

Lines changed: 86 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,11 @@
1010
- `--dot` outputs the plan DAG as a Graphviz DOT file for visualization.
1111
- `-o <file>` writes output to a file instead of stdout.
1212
- `--json` returns structured JSON with `isAdaptive`, `sectionCount`, and parsed `sections`.
13-
- `sql-plan` REPL command with the same options.
13+
- **`sql-jobs` command** — Show jobs associated with a SQL execution.
14+
- Fetches all job IDs (succeeded, failed, running) from the SQL execution.
15+
- Displays job details in a table with status, stages, and task counts.
16+
- Gracefully handles cases where referenced job IDs are not found.
17+
- `sql-plan` and `sql-jobs` REPL commands with the same options.
1418

1519
### Changed
1620
- **E2E CI switched to Docker-based SHS** — Uses `apache/spark:4.0.0` Docker image with `actions/cache` for faster CI runs (~5s cached load vs ~2min download).

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,13 +79,17 @@ spark-history-cli --app-id <id> sql-plan <exec-id> --view final # post-AQE pla
7979
spark-history-cli --app-id <id> sql-plan <exec-id> --dot # Graphviz DOT
8080
spark-history-cli --app-id <id> sql-plan <exec-id> --dot -o plan.dot # save to file
8181

82+
# Jobs for a SQL execution
83+
spark-history-cli --app-id <id> sql-jobs <exec-id>
84+
8285
# Download event logs
8386
spark-history-cli --app-id <id> logs output.zip
8487

8588
# JSON output for scripting/agents
8689
spark-history-cli --json apps
8790
spark-history-cli --json --app-id <id> jobs
8891
spark-history-cli --json --app-id <id> sql-plan <exec-id>
92+
spark-history-cli --json --app-id <id> sql-jobs <exec-id>
8993
```
9094

9195
### REPL Commands
@@ -101,6 +105,7 @@ stage <id> [attempt] Show stage details
101105
executors [--all] List executors
102106
sql [id] List or show SQL executions
103107
sql-plan <id> [opts] Show SQL plan (--view, --dot, -o)
108+
sql-jobs <id> Show jobs for a SQL execution
104109
rdds List cached RDDs
105110
env Show environment/config
106111
logs [path] Download event logs

spark_history_cli/cli.py

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,25 @@ def output_status_block(skin, info: dict[str, str], title: str = ""):
6161
skin.status_block(info, title=title)
6262

6363

64+
def _collect_sql_job_ids(sql_exec: dict) -> list[int]:
65+
"""Collect all job IDs from a SQL execution (success + failed + running)."""
66+
ids = []
67+
ids.extend(sql_exec.get("successJobIds", []))
68+
ids.extend(sql_exec.get("failedJobIds", []))
69+
ids.extend(sql_exec.get("runningJobIds", []))
70+
return sorted(set(ids))
71+
72+
73+
def _fetch_sql_jobs(client, app_id: str, sql_exec: dict) -> list[dict]:
74+
"""Fetch job details for a SQL execution using bulk list + filter."""
75+
job_ids = _collect_sql_job_ids(sql_exec)
76+
if not job_ids:
77+
return []
78+
target = set(job_ids)
79+
all_jobs = client.list_jobs(app_id)
80+
return [j for j in all_jobs if j.get("jobId") in target]
81+
82+
6483
# ── Main CLI group ────────────────────────────────────────────────────
6584

6685
@click.group(invoke_without_command=True)
@@ -321,6 +340,25 @@ def repl(state: CliState):
321340
else:
322341
click.echo(parsed["fullPlan"])
323342

343+
elif cmd == "sql-jobs":
344+
app_id = state.resolve_app_id(None)
345+
if not args or not args[0].isdigit():
346+
skin.error("Usage: sql-jobs <execution-id>")
347+
else:
348+
exec_id = int(args[0])
349+
ex = client.get_sql(app_id, exec_id)
350+
job_ids = _collect_sql_job_ids(ex)
351+
if not job_ids:
352+
skin.warning(f"No jobs found for SQL execution {exec_id}")
353+
else:
354+
jobs = _fetch_sql_jobs(client, app_id, ex)
355+
if not jobs:
356+
skin.warning(f"SQL execution {exec_id} references jobs {job_ids} but none were found")
357+
else:
358+
skin.section(f"Jobs for SQL Execution {exec_id} ({len(jobs)}/{len(job_ids)} jobs)")
359+
headers, rows = fmt.format_job_list(jobs)
360+
output_table(skin, headers, rows)
361+
324362
elif cmd == "rdds":
325363
app_id = state.resolve_app_id(None)
326364
rdds = client.list_rdds(app_id)
@@ -598,6 +636,42 @@ def cmd_sql_plan(state: CliState, execution_id: int, view_mode: str, dot_mode: b
598636
click.echo(text)
599637

600638

639+
@cli.command("sql-jobs")
640+
@click.argument("execution_id", type=int)
641+
@pass_state
642+
def cmd_sql_jobs(state: CliState, execution_id: int):
643+
"""Show jobs associated with a SQL execution.
644+
645+
Fetches the SQL execution, collects all job IDs (succeeded, failed,
646+
running), and displays each job's details.
647+
648+
Examples:
649+
650+
spark-history-cli -a <app> sql-jobs 4
651+
652+
spark-history-cli -a <app> --json sql-jobs 4
653+
"""
654+
client = state.ensure_client()
655+
app_id = state.resolve_app_id(None)
656+
ex = client.get_sql(app_id, execution_id)
657+
job_ids = _collect_sql_job_ids(ex)
658+
if not job_ids:
659+
click.echo(f"No jobs found for SQL execution {execution_id}.")
660+
return
661+
jobs = _fetch_sql_jobs(client, app_id, ex)
662+
if not jobs:
663+
click.echo(f"SQL execution {execution_id} references jobs {job_ids} but none were found.")
664+
return
665+
if state.json_mode:
666+
output_json(jobs)
667+
else:
668+
from spark_history_cli.utils.repl_skin import ReplSkin
669+
skin = ReplSkin("spark_history", version=__version__)
670+
skin.section(f"Jobs for SQL Execution {execution_id} ({len(jobs)}/{len(job_ids)} jobs)")
671+
headers, rows = fmt.format_job_list(jobs)
672+
output_table(skin, headers, rows)
673+
674+
601675
@cli.command("rdds")
602676
@pass_state
603677
def cmd_rdds(state: CliState):

spark_history_cli/skills/SKILL.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ spark-history-cli --json --server http://localhost:18080 --app-id <app-id> execu
3737
spark-history-cli --json --server http://localhost:18080 --app-id <app-id> sql
3838
spark-history-cli --json --server http://localhost:18080 --app-id <app-id> sql-plan <exec-id> --view final
3939
spark-history-cli --server http://localhost:18080 --app-id <app-id> sql-plan <exec-id> --dot -o plan.dot
40+
spark-history-cli --json --server http://localhost:18080 --app-id <app-id> sql-jobs <exec-id>
4041
spark-history-cli --json --server http://localhost:18080 --app-id <app-id> env
4142
spark-history-cli --server http://localhost:18080 --app-id <app-id> logs output.zip
4243
```
@@ -62,6 +63,7 @@ python -m spark_history_cli --json apps
6263
- `--dot`: Graphviz DOT output for visualizing the plan DAG
6364
- `--json` + `--view`: structured JSON with `isAdaptive`, `sectionCount`, `plan`, and `sections`
6465
- `-o <file>`: write output to file instead of stdout
66+
- `sql-jobs <id>` for jobs associated with a SQL execution (fetches all linked jobs by ID)
6567
- `env` for Spark config/runtime context
6668
- `logs` only when the user explicitly wants the event log archive saved locally
6769

0 commit comments

Comments
 (0)