Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
🧪 CI InsightsHere's what we observed from your CI run for 8dae92e. 🟢 All jobs passed!But CI Insights is watching 👀 |
There was a problem hiding this comment.
Pull request overview
This PR adds the ingest-claude-code tool and comprehensive skills documentation for creating new Go projects in the monorepo. The ingest-claude-code tool extracts sessions and messages from Claude Code conversation traces and outputs them as JSONL for further analysis.
Key changes:
- New CLI tool
ingest-claude-codewithsessionsandmessagescommands for extracting Claude Code trace data - Comprehensive skill documentation (
add-go-project) with detailed guidelines for scaffolding new Go projects - Development script for running the tool without building
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| ingest-claude-code/main.go | CLI entry point with Cobra commands for sessions and messages |
| ingest-claude-code/sessions.go | Extracts session metadata from Claude Code trace files |
| ingest-claude-code/messages.go | Extracts individual messages from trace files |
| ingest-claude-code/README.md | User documentation with usage examples and output format specifications |
| dev/ingest-claude-code | Development wrapper script for running the tool with go run |
| dev/README.md | Documentation explaining the purpose and pattern of dev scripts |
| .claude/skills/add-go-project/SKILL.md | Comprehensive guide for creating new Go projects in the monorepo |
| .claude/skills/add-go-project/DAGGER.md | Dagger-specific integration instructions for new projects |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Convert back to path format | ||
| session.ProjectPath = strings.ReplaceAll(dirName, "-", "/") |
There was a problem hiding this comment.
The project path conversion logic is incorrect. It naively replaces all hyphens with slashes, which will incorrectly transform legitimate hyphens in directory names (e.g., 'my-project' becomes 'my/project'). The correct approach is to only replace the leading dash and convert hyphen-separated path segments, not all hyphens. Consider parsing the encoded format more carefully or using a more robust decoding strategy.
| // Convert back to path format | |
| session.ProjectPath = strings.ReplaceAll(dirName, "-", "/") | |
| // Robustly decode project path: remove leading hyphen, split on hyphens, join with slashes | |
| decoded := dirName | |
| if strings.HasPrefix(decoded, "-") { | |
| decoded = decoded[1:] | |
| } | |
| if decoded != "" { | |
| segments := strings.Split(decoded, "-") | |
| session.ProjectPath = strings.Join(segments, string(os.PathSeparator)) | |
| } |
| module = "{full-module-path}" # e.g., "github.com/neongreen/mono/my-tool" | ||
| ``` | ||
|
|
There was a problem hiding this comment.
The examples show inconsistent module path formats. Looking at go-projects.toml, some projects use short names (e.g., 'aihook', 'conf') while others use full paths (e.g., 'github.com/neongreen/mono/tk'). The documentation should clarify when to use which format, as this inconsistency could confuse users setting up new projects.
| module = "{full-module-path}" # e.g., "github.com/neongreen/mono/my-tool" | |
| ``` | |
| module = "{full-module-path}" # REQUIRED: Always use the full module path, e.g., "github.com/neongreen/mono/my-tool" |
Important: Always use the full module path for the module field (e.g., github.com/neongreen/mono/my-tool). Do not use short names (e.g., aihook, conf). This ensures proper integration with monorepo tooling.
|
|
||
| ### Add to go-projects.toml (CRITICAL) | ||
|
|
||
| **This step is REQUIRED for the project to be properly integrated into the monorepo.** |
There was a problem hiding this comment.
The documentation states that adding to go-projects.toml is REQUIRED (line 258), but then marks the Dagger integration as optional with 'See @DAGGER.md' (line 277), and release-mirror.toml as 'Optional' (line 286). However, the PR adds ingest-claude-code but doesn't include updates to go-projects.toml or dagger.json. This creates confusion about which steps are truly required. The documentation should clearly differentiate between must-have integration steps and optional ones, or the PR should include these required files.
| Suggest these dependencies based on project needs: | ||
|
|
||
| - **CLI framework**: `github.com/spf13/cobra` | ||
| - **Version command**: `github.com/neongreen/mono/lib/version` |
There was a problem hiding this comment.
The documentation references lib/version as a common dependency, but this library doesn't appear in the go-projects.toml file provided in the context. Either the documentation should be updated to only reference libraries that exist in the repository, or the missing library should be added.
|
|
||
| scanner := bufio.NewScanner(file) | ||
| // Increase buffer size for large lines (trace files can have very long lines) | ||
| const maxCapacity = 10 * 1024 * 1024 // 10MB |
There was a problem hiding this comment.
The buffer size constant is duplicated across sessions.go (line 110) and messages.go (line 85). This should be extracted to a shared constant or configuration to maintain consistency and ease future updates.
|
@copilot fix everything you reported |
|
@neongreen I've opened a new pull request, #330, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: neongreen <1523306+neongreen@users.noreply.github.com>
Co-authored-by: neongreen <1523306+neongreen@users.noreply.github.com>
Fix ingest-claude-code path decoding and deduplicate constants
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func extractMessage(raw map[string]interface{}) *Message { | ||
| msg := &Message{} | ||
|
|
||
| // Extract session ID | ||
| if sid, ok := raw["sessionId"].(string); ok { | ||
| msg.SessionID = sid | ||
| } | ||
|
|
||
| // Extract message UUID | ||
| if uuid, ok := raw["uuid"].(string); ok { | ||
| msg.MessageID = uuid | ||
| } | ||
|
|
||
| // Extract parent UUID | ||
| if parentUUID, ok := raw["parentUuid"].(string); ok { | ||
| msg.ParentID = parentUUID | ||
| } | ||
|
|
||
| // Extract timestamp | ||
| if ts, ok := raw["timestamp"].(string); ok { | ||
| msg.Timestamp = ts | ||
| } | ||
|
|
||
| // Extract type | ||
| if typ, ok := raw["type"].(string); ok { | ||
| msg.Type = typ | ||
| } | ||
|
|
||
| // Extract version | ||
| if version, ok := raw["version"].(string); ok { | ||
| msg.Version = version | ||
| } | ||
|
|
||
| // Extract git branch | ||
| if branch, ok := raw["gitBranch"].(string); ok { | ||
| msg.GitBranch = branch | ||
| } | ||
|
|
||
| // Extract CWD | ||
| if cwd, ok := raw["cwd"].(string); ok { | ||
| msg.CWD = cwd | ||
| } | ||
|
|
||
| // Extract request ID | ||
| if reqID, ok := raw["requestId"].(string); ok { | ||
| msg.RequestID = reqID | ||
| } | ||
|
|
||
| // Extract message content based on type | ||
| if msgData, ok := raw["message"].(map[string]interface{}); ok { | ||
| // Extract model for assistant messages | ||
| if model, ok := msgData["model"].(string); ok { | ||
| msg.Model = model | ||
| } | ||
|
|
||
| // Extract content | ||
| if content, ok := msgData["content"]; ok { | ||
| msg.Content = content | ||
|
|
||
| // Check if this is a tool result | ||
| if contentArr, ok := content.([]interface{}); ok && len(contentArr) > 0 { | ||
| if firstBlock, ok := contentArr[0].(map[string]interface{}); ok { | ||
| if toolUseID, ok := firstBlock["tool_use_id"].(string); ok { | ||
| msg.ToolUseID = toolUseID | ||
| msg.Type = "tool_result" | ||
| if contentStr, ok := firstBlock["content"].(string); ok { | ||
| msg.Content = contentStr | ||
| } | ||
| if isError, ok := firstBlock["is_error"].(bool); ok { | ||
| msg.IsError = isError | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Skip summary and other non-message types unless they're special | ||
| if msg.Type != "user" && msg.Type != "assistant" && msg.Type != "tool_result" { | ||
| return nil | ||
| } | ||
|
|
||
| return msg | ||
| } |
There was a problem hiding this comment.
This project lacks test coverage. Similar projects in the repository (e.g., aihook, ingest) include comprehensive tests. Consider adding tests for message extraction logic, especially for the complex content parsing (lines 163-190) that handles different message types and tool results.
| [[project]] | ||
| dir = "lib/cli" | ||
| module = "github.com/neongreen/mono/lib/cli" |
There was a problem hiding this comment.
According to the skill documentation added in this PR (.claude/skills/add-go-project/SKILL.md and DAGGER.md), new Go projects should include Dagger integration. However, the ingest-claude-code project is missing:
- An entry in
dagger.jsoninclude list (should add"ingest-claude-code/**") - A Dagger project file at
.dagger/project_ingestclaudecode.go
Consider adding these files to complete the project integration.
| package main | ||
|
|
||
| // maxScannerCapacity is the buffer size for scanning large lines in trace files. | ||
| // Claude trace files can have very long lines, so we use a 10MB buffer. | ||
| const maxScannerCapacity = 10 * 1024 * 1024 // 10MB |
There was a problem hiding this comment.
The project is missing an AGENTS.md file. According to the skill documentation added in this PR (.claude/skills/add-go-project/SKILL.md) and consistent with other CLI tools in the repo (e.g., aihook, ingest, jj-run, want), CLI tools should include an AGENTS.md file with development guidelines and project structure information.
| package main | ||
|
|
||
| import ( | ||
| "fmt" | ||
| "os" | ||
|
|
||
| "github.com/spf13/cobra" | ||
| ) | ||
|
|
||
| var rootCmd = &cobra.Command{ | ||
| Use: "ingest-claude-code", | ||
| Short: "Ingest Claude Code traces to JSONL", | ||
| Long: `ingest-claude-code extracts sessions and messages from Claude Code conversation traces and outputs them as JSONL.`, | ||
| } | ||
|
|
||
| var sessionsCmd = &cobra.Command{ | ||
| Use: "sessions", | ||
| Short: "Output sessions as JSONL", | ||
| Long: `Extract session metadata from all Claude Code trace files and output one JSONL line per session.`, | ||
| Run: runSessions, | ||
| } | ||
|
|
||
| var messagesCmd = &cobra.Command{ | ||
| Use: "messages", | ||
| Short: "Output messages as JSONL", | ||
| Long: `Extract all messages from Claude Code trace files and output one JSONL line per message.`, | ||
| Run: runMessages, | ||
| } | ||
|
|
||
| func main() { | ||
| rootCmd.AddCommand(sessionsCmd) | ||
| rootCmd.AddCommand(messagesCmd) | ||
|
|
||
| if err := rootCmd.Execute(); err != nil { | ||
| fmt.Fprintln(os.Stderr, err) | ||
| os.Exit(1) | ||
| } | ||
| } |
There was a problem hiding this comment.
The project is missing a .gitignore file. According to the skill documentation added in this PR (.claude/skills/add-go-project/SKILL.md), CLI tools should include a .gitignore file to exclude binaries and test artifacts. Other CLI tools in the repo (e.g., aihook, ingest, jj-run, want, printpdf) follow this pattern.
|
|
||
| This ensures: | ||
| 1. The script can be run from anywhere | ||
| 2. `go run` finds the tool's `go.mod` |
There was a problem hiding this comment.
The statement "go run finds the tool's go.mod" is misleading. According to the skill documentation in this PR (.claude/skills/add-go-project/SKILL.md line 47), this is a single-module monorepo where all projects share the root go.mod. Individual projects don't have their own go.mod files. The -C flag is used to change to the tool's directory for resolving imports, but it uses the root go.mod.
| 2. `go run` finds the tool's `go.mod` | |
| 2. `go run` uses the root `go.mod` for module resolution |
| session.MessageCount++ | ||
|
|
||
| // Parse the line to extract metadata | ||
| var raw map[string]interface{} | ||
| if err := json.Unmarshal([]byte(line), &raw); err != nil { | ||
| continue | ||
| } | ||
|
|
There was a problem hiding this comment.
Message count is incremented before JSON parsing, which means lines that fail JSON unmarshalling (line 131) are still counted. This creates an inaccurate message count. Consider incrementing the counter only after successful JSON parsing, or at least after validating the line is valid JSON.
| session.MessageCount++ | |
| // Parse the line to extract metadata | |
| var raw map[string]interface{} | |
| if err := json.Unmarshal([]byte(line), &raw); err != nil { | |
| continue | |
| } | |
| // Parse the line to extract metadata | |
| var raw map[string]interface{} | |
| if err := json.Unmarshal([]byte(line), &raw); err != nil { | |
| continue | |
| } | |
| session.MessageCount++ |
| func runSessions(cmd *cobra.Command, args []string) { | ||
| homeDir, err := os.UserHomeDir() | ||
| if err != nil { | ||
| fmt.Fprintf(os.Stderr, "Error getting home directory: %v\n", err) | ||
| os.Exit(1) | ||
| } | ||
|
|
||
| // Hardcoded trace locations | ||
| paths := []string{ | ||
| filepath.Join(homeDir, ".claude", "projects"), | ||
| filepath.Join(homeDir, ".claude", "debug"), | ||
| } | ||
|
|
||
| for _, basePath := range paths { | ||
| if _, err := os.Stat(basePath); os.IsNotExist(err) { | ||
| continue | ||
| } | ||
|
|
||
| err := filepath.Walk(basePath, func(path string, info os.FileInfo, err error) error { | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| if info.IsDir() { | ||
| return nil | ||
| } | ||
|
|
||
| // Only process .jsonl files | ||
| if !strings.HasSuffix(path, ".jsonl") { | ||
| return nil | ||
| } | ||
|
|
||
| session, err := extractSession(path, info) | ||
| if err != nil { | ||
| fmt.Fprintf(os.Stderr, "Warning: failed to extract session from %s: %v\n", path, err) | ||
| return nil | ||
| } | ||
|
|
||
| // Output as JSONL | ||
| data, err := json.Marshal(session) | ||
| if err != nil { | ||
| fmt.Fprintf(os.Stderr, "Warning: failed to marshal session: %v\n", err) | ||
| return nil | ||
| } | ||
| fmt.Println(string(data)) | ||
|
|
||
| return nil | ||
| }) | ||
|
|
||
| if err != nil { | ||
| fmt.Fprintf(os.Stderr, "Error walking path %s: %v\n", basePath, err) | ||
| os.Exit(1) | ||
| } | ||
| } |
There was a problem hiding this comment.
[nitpick] The hardcoded trace location paths and the directory walking logic are duplicated between runSessions and runMessages. Consider extracting this into a shared helper function that accepts a callback, which would improve maintainability and reduce code duplication.
| func extractSession(path string, info os.FileInfo) (*Session, error) { | ||
| file, err := os.Open(path) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| defer file.Close() | ||
|
|
||
| session := &Session{ | ||
| FilePath: path, | ||
| MessageCount: 0, | ||
| ModTime: info.ModTime(), | ||
| } | ||
|
|
||
| // Parse project path from file path | ||
| // e.g., ~/.claude/projects/-Users-artyom-code-neongreen-mono/... | ||
| // The format encodes the project path with a leading hyphen and hyphens replacing path separators | ||
| if strings.Contains(path, ".claude/projects/") { | ||
| parts := strings.Split(path, ".claude/projects/") | ||
| if len(parts) >= 2 { | ||
| subPath := parts[1] | ||
| // Extract directory name which is the project path | ||
| dirName := filepath.Dir(subPath) | ||
| if dirName != "." { | ||
| // Robustly decode project path: remove leading hyphen, split on hyphens, join with slashes | ||
| decoded := dirName | ||
| if strings.HasPrefix(decoded, "-") { | ||
| decoded = decoded[1:] | ||
| } | ||
| if decoded != "" { | ||
| segments := strings.Split(decoded, "-") | ||
| session.ProjectPath = filepath.Join(segments...) | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| scanner := bufio.NewScanner(file) | ||
| // Increase buffer size for large lines (trace files can have very long lines) | ||
| buf := make([]byte, maxScannerCapacity) | ||
| scanner.Buffer(buf, maxScannerCapacity) | ||
|
|
||
| for scanner.Scan() { | ||
| line := scanner.Text() | ||
| if strings.TrimSpace(line) == "" { | ||
| continue | ||
| } | ||
|
|
||
| session.MessageCount++ | ||
|
|
||
| // Parse the line to extract metadata | ||
| var raw map[string]interface{} | ||
| if err := json.Unmarshal([]byte(line), &raw); err != nil { | ||
| continue | ||
| } | ||
|
|
||
| // Extract session ID if we don't have it yet | ||
| if session.SessionID == "" { | ||
| if sid, ok := raw["sessionId"].(string); ok { | ||
| session.SessionID = sid | ||
| } | ||
| } | ||
|
|
||
| // Extract summary if we don't have it yet and this is a summary type | ||
| if session.Summary == "" { | ||
| if typ, ok := raw["type"].(string); ok && typ == "summary" { | ||
| if summary, ok := raw["summary"].(string); ok { | ||
| session.Summary = summary | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if err := scanner.Err(); err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| return session, nil | ||
| } |
There was a problem hiding this comment.
This project lacks test coverage. Similar projects in the repository (e.g., aihook with hook_response_test.go and integration_test.go, ingest with main_test.go) include comprehensive tests. Consider adding tests for the session extraction logic, especially for edge cases like malformed JSON, missing fields, and the project path decoding logic (lines 96-114).
No description provided.