You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Analysis Date: 2026-05-12 Repository: github/gh-aw Scope: 219 total workflows, 96 using Copilot engine (44%) Run: §25714049123
📊 Executive Summary
Research Topic: Copilot CLI Optimization Opportunities Key Findings:
max-continuations is critically underused (only 4/96 workflows) despite being a Copilot-exclusive autopilot feature
Custom engine.agent files are available (11 exist in .github/agents/) but only 7 workflows reference them
engine.harness, engine.args, engine.api-target, and BYOK remain completely unused across all 96 Copilot workflows
Model selection has grown significantly (27 workflows now specify models, up from 13 in the last run), driven by multi-agent patterns using model: small
mcp-scripts usage grew from 1 to 5 workflows — a positive trend worth accelerating
This is the 5th consecutive analysis (runs: 25416993511 → 25537169013 → 25620196538 → 25651194663 → 25714049123). Progress is visible in model selection and mcp-scripts adoption. The persistent gaps are max-continuations (stuck at 2→4), zero engine.api-target usage, and zero custom harness scripts.
Critical Findings
🔴 High Priority Issues
1. max-continuations severely underused (4/96 workflows, 4.2%)
Copilot's exclusive autopilot capability — where the agent runs multiple turns unattended — is used by almost no workflows. Complex analysis workflows like agent-performance-analyzer, architecture-guardian, daily-compiler-quality, and deep-report would benefit enormously from this feature but don't configure it.
2. Model selection gap: 69% of Copilot workflows use default model
Only 27 of 96 Copilot workflows specify a model. Many lightweight tasks (triage, labeling, simple comments) run on the full default model when a cheaper/faster model like gpt-5.4-mini or copilot/gpt-4.1-nano would suffice, wasting tokens.
🟡 Medium Priority Opportunities
3. 11 agent files exist, but only 7 workflows use engine.agent
Agent files like developer.instructions.md, grumpy-reviewer.agent.md, interactive-agent-designer.agent.md, and w3c-specification-writer.agent.md are defined but no workflow uses them. These could dramatically improve behavior specialization.
4. engine.harness — zero usage despite being documented
No workflow overrides the default harness script, despite the feature being implemented and documented. Custom harnesses could enable specialized retry logic, logging, or pre/post-execution hooks.
View Full Analysis
1️⃣ Current State Analysis
View Copilot CLI Capabilities Inventory
Copilot CLI Capabilities Inventory
Core Engine Features:
max-continuations — autopilot with --autopilot --max-autopilot-continues N (Copilot-exclusive)
engine.agent — custom agent file via --agent <id> flag (Copilot-exclusive)
engine.harness — replace the default copilot_harness.cjs (Copilot-exclusive)
engine.bare — --no-custom-instructions to disable context loading
engine.model — passed via COPILOT_MODEL env var
engine.version — pin CLI version (e.g. "0.0.422")
engine.args — custom CLI argument injection
engine.env — custom environment variables
engine.api-target — custom API hostname for GHEC/GHES
Tool Permissions:
tools.bash (with granular allow-list)
tools.edit — file write permission
tools.github — GitHub MCP server (with per-tool allowlist)
tools.web-fetch — built-in web fetch
tools.playwright — browser automation
tools.mcp-scripts — custom MCP scripts
Custom MCP servers via tools.<name>
Infrastructure Features:
sandbox (AWF/SRT) — firewall/sandboxing
network.allowed — domain allowlist
cache-memory — persistent cross-run state
BYOK mode (COPILOT_PROVIDER_* env vars)
strict: true — injection protection
View Usage Statistics
Usage Statistics (Run 25714049123 vs Previous Run 25651194663)
Feature
Current
Previous
Trend
Total workflows
219
218
+1
Copilot workflows
96
~115*
—
max-continuations
4
2
✅ +2
engine.agent
7
18*
—
engine.model override
27
13
✅ +14
engine.bare
9
9
=
cache-memory
10
89*
—
sandbox
20
19
+1
mcp-scripts
5
1
✅ +4
web-fetch
20
8
✅ +12
network.allowed
114
115
=
engine.harness
0
0
❌
engine.api-target
0
0
❌
BYOK
0
0
❌
*Note: Counting methodology differs across runs; previous run used broader pattern matching.
Most common timeout-minutes: 30 (56 workflows), 10 (36), 20 (35), 15 (33), 45 (19)
2️⃣ Feature Usage Matrix
Feature Category
Available Features
Used
Not Used
Usage Rate
Autopilot
max-continuations
4
92
4%
Agent Customization
engine.agent (11 files)
7
89
7%
Model Selection
engine.model
27
69
28%
Bare Mode
engine.bare
9
87
9%
CLI Extensions
engine.harness
0
96
0%
CLI Args
engine.args
0
96
0%
Enterprise
engine.api-target
0
96
0%
BYOK
Provider env vars
0
96
0%
State Persistence
cache-memory
10
86
10%
Browser
tools.playwright
~8
~88
8%
MCP Scripts
mcp-scripts
5
91
5%
Sandbox
AWF/SRT
20
76
21%
3️⃣ Missed Opportunities
View High Priority Opportunities
🔴 High Priority
Opportunity 1: Enable max-continuations for complex analysis workflows
What: Copilot-exclusive autopilot mode that lets the agent run multiple consecutive turns without human approval
Why It Matters: Complex workflows like architecture-guardian, agent-performance-analyzer, daily-compiler-quality, and deep-report often time out or produce incomplete results because they can't continue past a single turn
Opportunity 2: Use cheaper models for lightweight tasks
What: Many simple workflows (issue triage, single-comment responses, labeling) run on the full default model when gpt-5.4-mini or a nano model would suffice
Why It Matters: Significant token cost savings; faster responses
What: Only 5 workflows use the mcp-scripts feature despite it enabling powerful custom tool integration
Why It Matters: Custom scripts can interface with internal systems, run pre-flight checks, or access specialized APIs not available through standard MCP servers
Where: Any workflow needing custom data processing or integration
View Low Priority Opportunities
🟢 Low Priority
Opportunity 6: Custom harness script for specialized retry logic
What: engine.harness is completely unused
Why It Matters: Could add specialized retry logic, enhanced logging, or pre-session initialization
When to use: Only if default harness behavior is insufficient for specific workflows
Opportunity 7: Version pinning for stability-critical workflows
What: No production workflows pin engine.version
Why It Matters: New CLI versions can change behavior unexpectedly
Where: contribution-check.md, technical-doc-writer.md, archie.md — workflows used in critical automation
How to Implement:
engine:
id: copilotversion: "0.0.422"
Opportunity 8: engine.env for workflow-specific configuration
What: Environment variables can be injected without rebuilding workflows
Why It Matters: Enables runtime configuration without recompilation
Where: Workflows needing A/B testing or feature flags
4️⃣ Specific Workflow Recommendations
View Workflow-Specific Recommendations
architecture-guardian.md
Current State: Single-run analysis without continuations
Recommended: Add max-continuations: 3, increase timeout-minutes to 60, add cache-memory: true for trend tracking
Expected Benefits: More thorough analysis, cross-run architectural trend detection
contribution-check.md
Current State: Uses max-continuations: 20, agent: contribution-checker — well-configured!
Note: Good reference implementation for other complex analysis workflows
test-quality-sentinel.md
Current State: Uses max-continuations: 15 with multi-agent architecture
Note: Good reference implementation. Consider model: small for sub-agent analyzers
auto-triage-issues.md
Current State: No model specified, complex triage logic
Recommended: Add model: gpt-5.4-mini for cost savings since triage is rule-based
daily-architecture-diagram.md
Current State: No cache-memory
Recommended: Add cache-memory: true to track diagram changes over time
technical-doc-writer.md and glossary-maintainer.md
Current State: Both use agent: technical-doc-writer (good!)
Recommended: Add version pinning for stability in documentation workflows
5️⃣ Trends & Insights
View Historical Trends (5 Runs)
Metric
May 6
May 8
May 10
May 11
May 12
Trend
Total workflows
214
217
218
218
219
↑ steady
max-continuations
0
2
2
2
4
↑ growing
model overrides
n/a
3
n/a
13
27
↑ accelerating
mcp-scripts
n/a
1
n/a
1
5
↑ accelerating
web-fetch
n/a
8
n/a
8
20
↑ significant
engine.api-target
0
0
0
0
0
→ stagnant
engine.harness
0
0
0
0
0
→ stagnant
BYOK
0
0
0
0
0
→ stagnant
Positive Acceleration: Model selection and mcp-scripts are growing rapidly. Multi-agent workflows (using model: small for sub-agents) are driving model adoption.
Persistent Gaps: Enterprise features (api-target, BYOK) and customization features (harness) remain at zero, suggesting either documentation gaps or that these are intentionally only for specific enterprise/advanced use cases.
6️⃣ Best Practice Guidelines
Based on this research, here are recommended best practices:
Match model to task complexity: Use model: gpt-5.4-mini or model: small for simple/fast tasks; reserve default models for complex reasoning
Enable autopilot for long-running analysis: Add max-continuations: 3-10 for any workflow expected to need multiple reasoning steps; scale timeout-minutes accordingly (add ~15 min per continuation)
Leverage agent files for specialization: Before writing a complex prompt, check .github/agents/ — reuse specialized agent files with engine.agent: <id>
Add cache-memory to periodic workflows: Daily/weekly workflows benefit from cross-run state; even simple state like "last processed issue number" prevents redundant work
Grow mcp-scripts for custom integrations: The 5-workflow growth in this cycle shows momentum; any workflow needing data not available via MCP tools should consider mcp-scripts
7️⃣ Action Items
Immediate Actions (this week):
Add max-continuations: 3-5 to architecture-guardian.md, agent-performance-analyzer.md, and deep-report.md
Add model: gpt-5.4-mini to simple triage/labeling workflows like auto-triage-issues.md and bot-detection.md
Create workflow(s) that use the 4 unused agent files (grumpy-reviewer, interactive-agent-designer, w3c-specification-writer, developer.instructions)
Short-term (this month):
Add cache-memory: true to all daily/weekly workflows that don't already have it
Evaluate if engine.harness documentation needs improvement (zero usage may indicate discovery issues)
Expand mcp-scripts to 10+ workflows
Long-term (this quarter):
Document and promote BYOK mode for users with custom LLM deployments
Create a workflow template library showing each Copilot-exclusive feature
Consider version-pinning strategy for production-critical workflows
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Analysis Date: 2026-05-12
Repository: github/gh-aw
Scope: 219 total workflows, 96 using Copilot engine (44%)
Run: §25714049123
📊 Executive Summary
Research Topic: Copilot CLI Optimization Opportunities
Key Findings:
max-continuationsis critically underused (only 4/96 workflows) despite being a Copilot-exclusive autopilot featureengine.agentfiles are available (11 exist in.github/agents/) but only 7 workflows reference themengine.harness,engine.args,engine.api-target, and BYOK remain completely unused across all 96 Copilot workflowsmodel: smallmcp-scriptsusage grew from 1 to 5 workflows — a positive trend worth acceleratingThis is the 5th consecutive analysis (runs: 25416993511 → 25537169013 → 25620196538 → 25651194663 → 25714049123). Progress is visible in model selection and mcp-scripts adoption. The persistent gaps are
max-continuations(stuck at 2→4), zeroengine.api-targetusage, and zero custom harness scripts.Critical Findings
🔴 High Priority Issues
1.
max-continuationsseverely underused (4/96 workflows, 4.2%)Copilot's exclusive autopilot capability — where the agent runs multiple turns unattended — is used by almost no workflows. Complex analysis workflows like
agent-performance-analyzer,architecture-guardian,daily-compiler-quality, anddeep-reportwould benefit enormously from this feature but don't configure it.2. Model selection gap: 69% of Copilot workflows use default model
Only 27 of 96 Copilot workflows specify a model. Many lightweight tasks (triage, labeling, simple comments) run on the full default model when a cheaper/faster model like
gpt-5.4-miniorcopilot/gpt-4.1-nanowould suffice, wasting tokens.🟡 Medium Priority Opportunities
3. 11 agent files exist, but only 7 workflows use
engine.agentAgent files like
developer.instructions.md,grumpy-reviewer.agent.md,interactive-agent-designer.agent.md, andw3c-specification-writer.agent.mdare defined but no workflow uses them. These could dramatically improve behavior specialization.4.
engine.harness— zero usage despite being documentedNo workflow overrides the default harness script, despite the feature being implemented and documented. Custom harnesses could enable specialized retry logic, logging, or pre/post-execution hooks.
View Full Analysis
1️⃣ Current State Analysis
View Copilot CLI Capabilities Inventory
Copilot CLI Capabilities Inventory
Core Engine Features:
max-continuations— autopilot with--autopilot --max-autopilot-continues N(Copilot-exclusive)engine.agent— custom agent file via--agent <id>flag (Copilot-exclusive)engine.harness— replace the defaultcopilot_harness.cjs(Copilot-exclusive)engine.bare—--no-custom-instructionsto disable context loadingengine.model— passed viaCOPILOT_MODELenv varengine.version— pin CLI version (e.g."0.0.422")engine.args— custom CLI argument injectionengine.env— custom environment variablesengine.api-target— custom API hostname for GHEC/GHESTool Permissions:
tools.bash(with granular allow-list)tools.edit— file write permissiontools.github— GitHub MCP server (with per-tool allowlist)tools.web-fetch— built-in web fetchtools.playwright— browser automationtools.mcp-scripts— custom MCP scriptstools.<name>Infrastructure Features:
sandbox(AWF/SRT) — firewall/sandboxingnetwork.allowed— domain allowlistcache-memory— persistent cross-run stateCOPILOT_PROVIDER_*env vars)strict: true— injection protectionView Usage Statistics
Usage Statistics (Run 25714049123 vs Previous Run 25651194663)
max-continuationsengine.agentengine.modeloverrideengine.barecache-memorysandboxmcp-scriptsweb-fetchnetwork.allowedengine.harnessengine.api-target*Note: Counting methodology differs across runs; previous run used broader pattern matching.
Most common timeout-minutes: 30 (56 workflows), 10 (36), 20 (35), 15 (33), 45 (19)
2️⃣ Feature Usage Matrix
max-continuationsengine.agent(11 files)engine.modelengine.bareengine.harnessengine.argsengine.api-targetcache-memorytools.playwrightmcp-scripts3️⃣ Missed Opportunities
View High Priority Opportunities
🔴 High Priority
Opportunity 1: Enable
max-continuationsfor complex analysis workflowsarchitecture-guardian,agent-performance-analyzer,daily-compiler-quality, anddeep-reportoften time out or produce incomplete results because they can't continue past a single turnarchitecture-guardian.md,agent-performance-analyzer.md,daily-compiler-quality.md,deep-report.md,scout.mdOpportunity 2: Use cheaper models for lightweight tasks
gpt-5.4-minior a nano model would sufficeauto-triage-issues.md,bot-detection.md,sub-issue-closer.md,daily-assign-issue-to-user.md,poem-bot.md(actually usesmodel: gpt-5)View Medium Priority Opportunities
🟡 Medium Priority
Opportunity 3: Deploy unused agent files
developer.instructions.md,grumpy-reviewer.agent.md,interactive-agent-designer.agent.md,w3c-specification-writer.agent.mdOpportunity 4: Add
cache-memoryto more periodic workflowsdaily-assign-issue-to-user.md,daily-architecture-diagram.md,ci-doctor.md,architecture-guardian.mdOpportunity 5: Expand
mcp-scriptsadoptionView Low Priority Opportunities
🟢 Low Priority
Opportunity 6: Custom harness script for specialized retry logic
engine.harnessis completely unusedOpportunity 7: Version pinning for stability-critical workflows
engine.versioncontribution-check.md,technical-doc-writer.md,archie.md— workflows used in critical automationOpportunity 8:
engine.envfor workflow-specific configuration4️⃣ Specific Workflow Recommendations
View Workflow-Specific Recommendations
architecture-guardian.mdmax-continuations: 3, increasetimeout-minutesto 60, addcache-memory: truefor trend trackingcontribution-check.mdmax-continuations: 20,agent: contribution-checker— well-configured!test-quality-sentinel.mdmax-continuations: 15with multi-agent architecturemodel: smallfor sub-agent analyzersauto-triage-issues.mdmodel: gpt-5.4-minifor cost savings since triage is rule-baseddaily-architecture-diagram.mdcache-memory: trueto track diagram changes over timetechnical-doc-writer.mdandglossary-maintainer.mdagent: technical-doc-writer(good!)versionpinning for stability in documentation workflows5️⃣ Trends & Insights
View Historical Trends (5 Runs)
max-continuationsmodeloverridesmcp-scriptsweb-fetchengine.api-targetengine.harnessPositive Acceleration: Model selection and mcp-scripts are growing rapidly. Multi-agent workflows (using
model: smallfor sub-agents) are driving model adoption.Persistent Gaps: Enterprise features (
api-target, BYOK) and customization features (harness) remain at zero, suggesting either documentation gaps or that these are intentionally only for specific enterprise/advanced use cases.6️⃣ Best Practice Guidelines
Based on this research, here are recommended best practices:
model: gpt-5.4-miniormodel: smallfor simple/fast tasks; reserve default models for complex reasoningmax-continuations: 3-10for any workflow expected to need multiple reasoning steps; scaletimeout-minutesaccordingly (add ~15 min per continuation).github/agents/— reuse specialized agent files withengine.agent: <id>cache-memoryto periodic workflows: Daily/weekly workflows benefit from cross-run state; even simple state like "last processed issue number" prevents redundant workmcp-scriptsfor custom integrations: The 5-workflow growth in this cycle shows momentum; any workflow needing data not available via MCP tools should consider mcp-scripts7️⃣ Action Items
Immediate Actions (this week):
max-continuations: 3-5toarchitecture-guardian.md,agent-performance-analyzer.md, anddeep-report.mdmodel: gpt-5.4-minito simple triage/labeling workflows likeauto-triage-issues.mdandbot-detection.mdgrumpy-reviewer,interactive-agent-designer,w3c-specification-writer,developer.instructions)Short-term (this month):
cache-memory: trueto all daily/weekly workflows that don't already have itengine.harnessdocumentation needs improvement (zero usage may indicate discovery issues)mcp-scriptsto 10+ workflowsLong-term (this quarter):
View Supporting Evidence & Methodology
📚 References
pkg/workflow/copilot_engine*.godocs/src/content/docs/reference/engines.mdbrave.md,contribution-check.md,test-quality-sentinel.md,archie.md,architecture-guardian.md/tmp/gh-aw/repo-memory/default/copilot-research-notes.mdResearch Methodology
copilot_engine.go,copilot_engine_execution.go,copilot_engine_tools.go, anddocs/reference/engines.mdgrep -rlandgrep -rnacross all 219.github/workflows/*.mdfilesGenerated by Copilot CLI Deep Research Agent (Run: §25714049123)
Beta Was this translation helpful? Give feedback.
All reactions