Skip to content

ingest-claude-code & skills#329

Merged
neongreen merged 6 commits intomainfrom
emily/jj-qmpypprwrrtl
Nov 30, 2025
Merged

ingest-claude-code & skills#329
neongreen merged 6 commits intomainfrom
emily/jj-qmpypprwrrtl

Conversation

@neongreen
Copy link
Copy Markdown
Owner

No description provided.

@neongreen neongreen requested a review from Copilot November 30, 2025 08:36
@neongreen neongreen marked this pull request as ready for review November 30, 2025 08:36
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@mergify
Copy link
Copy Markdown

mergify bot commented Nov 30, 2025

🧪 CI Insights

Here's what we observed from your CI run for 8dae92e.

🟢 All jobs passed!

But CI Insights is watching 👀

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds the ingest-claude-code tool and comprehensive skills documentation for creating new Go projects in the monorepo. The ingest-claude-code tool extracts sessions and messages from Claude Code conversation traces and outputs them as JSONL for further analysis.

Key changes:

  • New CLI tool ingest-claude-code with sessions and messages commands for extracting Claude Code trace data
  • Comprehensive skill documentation (add-go-project) with detailed guidelines for scaffolding new Go projects
  • Development script for running the tool without building

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
ingest-claude-code/main.go CLI entry point with Cobra commands for sessions and messages
ingest-claude-code/sessions.go Extracts session metadata from Claude Code trace files
ingest-claude-code/messages.go Extracts individual messages from trace files
ingest-claude-code/README.md User documentation with usage examples and output format specifications
dev/ingest-claude-code Development wrapper script for running the tool with go run
dev/README.md Documentation explaining the purpose and pattern of dev scripts
.claude/skills/add-go-project/SKILL.md Comprehensive guide for creating new Go projects in the monorepo
.claude/skills/add-go-project/DAGGER.md Dagger-specific integration instructions for new projects

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ingest-claude-code/sessions.go Outdated
Comment on lines +102 to +103
// Convert back to path format
session.ProjectPath = strings.ReplaceAll(dirName, "-", "/")
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project path conversion logic is incorrect. It naively replaces all hyphens with slashes, which will incorrectly transform legitimate hyphens in directory names (e.g., 'my-project' becomes 'my/project'). The correct approach is to only replace the leading dash and convert hyphen-separated path segments, not all hyphens. Consider parsing the encoded format more carefully or using a more robust decoding strategy.

Suggested change
// Convert back to path format
session.ProjectPath = strings.ReplaceAll(dirName, "-", "/")
// Robustly decode project path: remove leading hyphen, split on hyphens, join with slashes
decoded := dirName
if strings.HasPrefix(decoded, "-") {
decoded = decoded[1:]
}
if decoded != "" {
segments := strings.Split(decoded, "-")
session.ProjectPath = strings.Join(segments, string(os.PathSeparator))
}

Copilot uses AI. Check for mistakes.
Comment thread .claude/skills/add-go-project/SKILL.md Outdated
Comment on lines +265 to +267
module = "{full-module-path}" # e.g., "github.com/neongreen/mono/my-tool"
```

Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The examples show inconsistent module path formats. Looking at go-projects.toml, some projects use short names (e.g., 'aihook', 'conf') while others use full paths (e.g., 'github.com/neongreen/mono/tk'). The documentation should clarify when to use which format, as this inconsistency could confuse users setting up new projects.

Suggested change
module = "{full-module-path}" # e.g., "github.com/neongreen/mono/my-tool"
```
module = "{full-module-path}" # REQUIRED: Always use the full module path, e.g., "github.com/neongreen/mono/my-tool"

Important: Always use the full module path for the module field (e.g., github.com/neongreen/mono/my-tool). Do not use short names (e.g., aihook, conf). This ensures proper integration with monorepo tooling.

Copilot uses AI. Check for mistakes.

### Add to go-projects.toml (CRITICAL)

**This step is REQUIRED for the project to be properly integrated into the monorepo.**
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation states that adding to go-projects.toml is REQUIRED (line 258), but then marks the Dagger integration as optional with 'See @DAGGER.md' (line 277), and release-mirror.toml as 'Optional' (line 286). However, the PR adds ingest-claude-code but doesn't include updates to go-projects.toml or dagger.json. This creates confusion about which steps are truly required. The documentation should clearly differentiate between must-have integration steps and optional ones, or the PR should include these required files.

Copilot uses AI. Check for mistakes.
Suggest these dependencies based on project needs:

- **CLI framework**: `github.com/spf13/cobra`
- **Version command**: `github.com/neongreen/mono/lib/version`
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation references lib/version as a common dependency, but this library doesn't appear in the go-projects.toml file provided in the context. Either the documentation should be updated to only reference libraries that exist in the repository, or the missing library should be added.

Copilot uses AI. Check for mistakes.
Comment thread ingest-claude-code/sessions.go Outdated

scanner := bufio.NewScanner(file)
// Increase buffer size for large lines (trace files can have very long lines)
const maxCapacity = 10 * 1024 * 1024 // 10MB
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buffer size constant is duplicated across sessions.go (line 110) and messages.go (line 85). This should be extracted to a shared constant or configuration to maintain consistency and ease future updates.

Copilot uses AI. Check for mistakes.
@neongreen
Copy link
Copy Markdown
Owner Author

@copilot fix everything you reported

Copy link
Copy Markdown
Contributor

Copilot AI commented Nov 30, 2025

@neongreen I've opened a new pull request, #330, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 3 commits November 30, 2025 08:50
Co-authored-by: neongreen <1523306+neongreen@users.noreply.github.com>
Co-authored-by: neongreen <1523306+neongreen@users.noreply.github.com>
Fix ingest-claude-code path decoding and deduplicate constants
@neongreen neongreen requested a review from Copilot November 30, 2025 09:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +115 to +198
func extractMessage(raw map[string]interface{}) *Message {
msg := &Message{}

// Extract session ID
if sid, ok := raw["sessionId"].(string); ok {
msg.SessionID = sid
}

// Extract message UUID
if uuid, ok := raw["uuid"].(string); ok {
msg.MessageID = uuid
}

// Extract parent UUID
if parentUUID, ok := raw["parentUuid"].(string); ok {
msg.ParentID = parentUUID
}

// Extract timestamp
if ts, ok := raw["timestamp"].(string); ok {
msg.Timestamp = ts
}

// Extract type
if typ, ok := raw["type"].(string); ok {
msg.Type = typ
}

// Extract version
if version, ok := raw["version"].(string); ok {
msg.Version = version
}

// Extract git branch
if branch, ok := raw["gitBranch"].(string); ok {
msg.GitBranch = branch
}

// Extract CWD
if cwd, ok := raw["cwd"].(string); ok {
msg.CWD = cwd
}

// Extract request ID
if reqID, ok := raw["requestId"].(string); ok {
msg.RequestID = reqID
}

// Extract message content based on type
if msgData, ok := raw["message"].(map[string]interface{}); ok {
// Extract model for assistant messages
if model, ok := msgData["model"].(string); ok {
msg.Model = model
}

// Extract content
if content, ok := msgData["content"]; ok {
msg.Content = content

// Check if this is a tool result
if contentArr, ok := content.([]interface{}); ok && len(contentArr) > 0 {
if firstBlock, ok := contentArr[0].(map[string]interface{}); ok {
if toolUseID, ok := firstBlock["tool_use_id"].(string); ok {
msg.ToolUseID = toolUseID
msg.Type = "tool_result"
if contentStr, ok := firstBlock["content"].(string); ok {
msg.Content = contentStr
}
if isError, ok := firstBlock["is_error"].(bool); ok {
msg.IsError = isError
}
}
}
}
}
}

// Skip summary and other non-message types unless they're special
if msg.Type != "user" && msg.Type != "assistant" && msg.Type != "tool_result" {
return nil
}

return msg
}
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This project lacks test coverage. Similar projects in the repository (e.g., aihook, ingest) include comprehensive tests. Consider adding tests for message extraction logic, especially for the complex content parsing (lines 163-190) that handles different message types and tool results.

Copilot uses AI. Check for mistakes.
Comment thread go-projects.toml
Comment on lines 21 to 23
[[project]]
dir = "lib/cli"
module = "github.com/neongreen/mono/lib/cli"
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the skill documentation added in this PR (.claude/skills/add-go-project/SKILL.md and DAGGER.md), new Go projects should include Dagger integration. However, the ingest-claude-code project is missing:

  1. An entry in dagger.json include list (should add "ingest-claude-code/**")
  2. A Dagger project file at .dagger/project_ingestclaudecode.go

Consider adding these files to complete the project integration.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
package main

// maxScannerCapacity is the buffer size for scanning large lines in trace files.
// Claude trace files can have very long lines, so we use a 10MB buffer.
const maxScannerCapacity = 10 * 1024 * 1024 // 10MB
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project is missing an AGENTS.md file. According to the skill documentation added in this PR (.claude/skills/add-go-project/SKILL.md) and consistent with other CLI tools in the repo (e.g., aihook, ingest, jj-run, want), CLI tools should include an AGENTS.md file with development guidelines and project structure information.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +38
package main

import (
"fmt"
"os"

"github.com/spf13/cobra"
)

var rootCmd = &cobra.Command{
Use: "ingest-claude-code",
Short: "Ingest Claude Code traces to JSONL",
Long: `ingest-claude-code extracts sessions and messages from Claude Code conversation traces and outputs them as JSONL.`,
}

var sessionsCmd = &cobra.Command{
Use: "sessions",
Short: "Output sessions as JSONL",
Long: `Extract session metadata from all Claude Code trace files and output one JSONL line per session.`,
Run: runSessions,
}

var messagesCmd = &cobra.Command{
Use: "messages",
Short: "Output messages as JSONL",
Long: `Extract all messages from Claude Code trace files and output one JSONL line per message.`,
Run: runMessages,
}

func main() {
rootCmd.AddCommand(sessionsCmd)
rootCmd.AddCommand(messagesCmd)

if err := rootCmd.Execute(); err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(1)
}
}
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project is missing a .gitignore file. According to the skill documentation added in this PR (.claude/skills/add-go-project/SKILL.md), CLI tools should include a .gitignore file to exclude binaries and test artifacts. Other CLI tools in the repo (e.g., aihook, ingest, jj-run, want, printpdf) follow this pattern.

Copilot uses AI. Check for mistakes.
Comment thread dev/README.md

This ensures:
1. The script can be run from anywhere
2. `go run` finds the tool's `go.mod`
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement "go run finds the tool's go.mod" is misleading. According to the skill documentation in this PR (.claude/skills/add-go-project/SKILL.md line 47), this is a single-module monorepo where all projects share the root go.mod. Individual projects don't have their own go.mod files. The -C flag is used to change to the tool's directory for resolving imports, but it uses the root go.mod.

Suggested change
2. `go run` finds the tool's `go.mod`
2. `go run` uses the root `go.mod` for module resolution

Copilot uses AI. Check for mistakes.
Comment on lines +127 to +134
session.MessageCount++

// Parse the line to extract metadata
var raw map[string]interface{}
if err := json.Unmarshal([]byte(line), &raw); err != nil {
continue
}

Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message count is incremented before JSON parsing, which means lines that fail JSON unmarshalling (line 131) are still counted. This creates an inaccurate message count. Consider incrementing the counter only after successful JSON parsing, or at least after validating the line is valid JSON.

Suggested change
session.MessageCount++
// Parse the line to extract metadata
var raw map[string]interface{}
if err := json.Unmarshal([]byte(line), &raw); err != nil {
continue
}
// Parse the line to extract metadata
var raw map[string]interface{}
if err := json.Unmarshal([]byte(line), &raw); err != nil {
continue
}
session.MessageCount++

Copilot uses AI. Check for mistakes.
Comment on lines +24 to +77
func runSessions(cmd *cobra.Command, args []string) {
homeDir, err := os.UserHomeDir()
if err != nil {
fmt.Fprintf(os.Stderr, "Error getting home directory: %v\n", err)
os.Exit(1)
}

// Hardcoded trace locations
paths := []string{
filepath.Join(homeDir, ".claude", "projects"),
filepath.Join(homeDir, ".claude", "debug"),
}

for _, basePath := range paths {
if _, err := os.Stat(basePath); os.IsNotExist(err) {
continue
}

err := filepath.Walk(basePath, func(path string, info os.FileInfo, err error) error {
if err != nil {
return err
}

if info.IsDir() {
return nil
}

// Only process .jsonl files
if !strings.HasSuffix(path, ".jsonl") {
return nil
}

session, err := extractSession(path, info)
if err != nil {
fmt.Fprintf(os.Stderr, "Warning: failed to extract session from %s: %v\n", path, err)
return nil
}

// Output as JSONL
data, err := json.Marshal(session)
if err != nil {
fmt.Fprintf(os.Stderr, "Warning: failed to marshal session: %v\n", err)
return nil
}
fmt.Println(string(data))

return nil
})

if err != nil {
fmt.Fprintf(os.Stderr, "Error walking path %s: %v\n", basePath, err)
os.Exit(1)
}
}
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The hardcoded trace location paths and the directory walking logic are duplicated between runSessions and runMessages. Consider extracting this into a shared helper function that accepts a callback, which would improve maintainability and reduce code duplication.

Copilot uses AI. Check for mistakes.
Comment on lines +80 to +157
func extractSession(path string, info os.FileInfo) (*Session, error) {
file, err := os.Open(path)
if err != nil {
return nil, err
}
defer file.Close()

session := &Session{
FilePath: path,
MessageCount: 0,
ModTime: info.ModTime(),
}

// Parse project path from file path
// e.g., ~/.claude/projects/-Users-artyom-code-neongreen-mono/...
// The format encodes the project path with a leading hyphen and hyphens replacing path separators
if strings.Contains(path, ".claude/projects/") {
parts := strings.Split(path, ".claude/projects/")
if len(parts) >= 2 {
subPath := parts[1]
// Extract directory name which is the project path
dirName := filepath.Dir(subPath)
if dirName != "." {
// Robustly decode project path: remove leading hyphen, split on hyphens, join with slashes
decoded := dirName
if strings.HasPrefix(decoded, "-") {
decoded = decoded[1:]
}
if decoded != "" {
segments := strings.Split(decoded, "-")
session.ProjectPath = filepath.Join(segments...)
}
}
}
}

scanner := bufio.NewScanner(file)
// Increase buffer size for large lines (trace files can have very long lines)
buf := make([]byte, maxScannerCapacity)
scanner.Buffer(buf, maxScannerCapacity)

for scanner.Scan() {
line := scanner.Text()
if strings.TrimSpace(line) == "" {
continue
}

session.MessageCount++

// Parse the line to extract metadata
var raw map[string]interface{}
if err := json.Unmarshal([]byte(line), &raw); err != nil {
continue
}

// Extract session ID if we don't have it yet
if session.SessionID == "" {
if sid, ok := raw["sessionId"].(string); ok {
session.SessionID = sid
}
}

// Extract summary if we don't have it yet and this is a summary type
if session.Summary == "" {
if typ, ok := raw["type"].(string); ok && typ == "summary" {
if summary, ok := raw["summary"].(string); ok {
session.Summary = summary
}
}
}
}

if err := scanner.Err(); err != nil {
return nil, err
}

return session, nil
}
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This project lacks test coverage. Similar projects in the repository (e.g., aihook with hook_response_test.go and integration_test.go, ingest with main_test.go) include comprehensive tests. Consider adding tests for the session extraction logic, especially for edge cases like malformed JSON, missing fields, and the project path decoding logic (lines 96-114).

Copilot uses AI. Check for mistakes.
@neongreen neongreen merged commit 63a3eeb into main Nov 30, 2025
2 of 6 checks passed
@neongreen neongreen deleted the emily/jj-qmpypprwrrtl branch November 30, 2025 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants