Skip to content

feat(ts-parser): add pure-Go TypeScript/TSX parser using tree-sitter WASM#334

Draft
Copilot wants to merge 4 commits intomainfrom
copilot/implement-go-ts-parser
Draft

feat(ts-parser): add pure-Go TypeScript/TSX parser using tree-sitter WASM#334
Copilot wants to merge 4 commits intomainfrom
copilot/implement-go-ts-parser

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Dec 4, 2025

  • Create lib/ts-parser directory structure
  • Build and embed WASM files for typescript and tsx grammars
  • Implement tree-sitter WASM loading via wazero
  • Create Tree and Node types with byte range access
  • Implement ParseTS and ParseTSX functions
  • Implement JSDoc extraction with attached/container declaration detection
  • Create constraints_test.go to enforce forbidden items
  • Add functional tests for TS and TSX parsing with JSDoc extraction
  • Create README.md with usage examples
  • Create WASM_BUILD.md with build instructions
  • Update main README.md with library entry
  • Run code review and address feedback
  • Run CodeQL security check - no issues found
  • Combine TypeScript and TSX into single WASM file (2.9MB vs 3MB)
Original prompt

you are implementing a go library that parses typescript and tsx using tree-sitter grammars compiled to wasm and executed via wazero

goal
• provide pure-go parsing of ts and tsx source into a tree with byte ranges
• provide jsdoc extraction on declarations in the same file, returning which declaration the jsdoc attaches to and which container declaration the jsdoc is inside

hard constraints that must hold, no exceptions
• the final library must build and test with CGO_ENABLED=0
• forbidden anywhere in the repo
• import "C"
• any #cgo directive
• any .c, .h, .cc, .cpp, .m, .mm files
• any .a, .so, .dylib, .dll artifacts
• any use of os/exec
• any runtime invocation of external tools, including node, tree-sitter cli, compilers, make, npm, cargo, deno
• any dependency that wraps the tree-sitter c runtime, including github.com/smacker/go-tree-sitter and github.com/tree-sitter/go-tree-sitter
• required at runtime
• use github.com/tetratelabs/wazero to execute wasm
• no network access assumptions
• no requirement for end users to run go generate or install tree-sitter cli, node, or a c compiler
• the repo must include prebuilt wasm blobs for ts and tsx grammars and embed them via go:embed

deliverables
• go module implementing
• ParseTS(src []byte) (Tree, error)
• ParseTSX(src []byte) (Tree, error)
• Tree and Node types that expose
• root node access
• child iteration
• node type name
• byte ranges StartByte() and EndByte()
• helper functionality
• ExtractJSDoc(src []byte, tree Tree) ([]JSDoc, error)
• type JSDoc struct { CommentStart int; CommentEnd int; AttachedDecl NodeRef; ContainerDecl NodeRef }
• NodeRef is a stable reference you define, at minimum containing file-relative byte range plus a display name string where available
• embedded wasm assets
• internal/wasm/typescript.wasm
• internal/wasm/tsx.wasm
• both embedded with go:embed and loaded into wazero at runtime
• docs
• README.md with a short example showing parsing and jsdoc extraction
• WASM_BUILD.md documenting exactly how the two wasm files were produced, including pinned versions and commands

definition of jsdoc extraction semantics
• jsdoc means block comments of the form /** ... */
• only declarations within the same file are in scope
• attached declaration
• a jsdoc attaches to the nearest following declaration node that it precedes, according to the grammar parse tree, with these rules
• ignore whitespace and non-jsdoc comments between the jsdoc and the declaration
• allow modifiers and decorators between the jsdoc and the declaration
• if ambiguity exists, choose the declaration that starts at the smallest byte offset greater than the jsdoc end within the same statement region
• container declaration
• the container is the nearest enclosing declaration that has a body or block scope that contains the jsdoc comment byte range
• containers include function-like declarations, methods, class bodies, module blocks, and the source file as the outermost container
• output must include byte ranges for both attached and container declarations, even if a name is unavailable

tests and enforcement inside go test ./...
• add constraints_test.go that fails the test run if any forbidden items are present
• scan all tracked go files plus module files and fail on matches for
• import "C"
• #cgo
• os/exec
• smacker/go-tree-sitter
• tree-sitter/go-tree-sitter
• build tags that try to conditionalize the real implementation away, including patterns like //go:build cgo or //go:build !purego
• also fail if forbidden file extensions exist in the repo
• add functional tests
• parse a small ts snippet containing at least one jsdoc on a function or class and assert
• exactly one jsdoc is found
• its attached declaration range matches the expected declaration
• its container declaration range matches the expected container
• parse a small tsx snippet containing at least one jsdoc on a component function and assert similarly
• tests must run with CGO_ENABLED=0 and must not shell out to anything

implementation requirements
• implement wasm loading and execution via wazero in-process
• do not depend on any preinstalled tree-sitter runtime
• do not call external binaries in library code or tests
• keep the public api small and stable

WASM_BUILD.md requirements
• include exact source repositories and pinned commits for
• tree-sitter cli
• tree-sitter-typescript grammar repository
• document prerequisites for maintainers only, not end users
• tree-sitter cli installed and pinned
• wasi-sdk available and referenced by TREE_SITTER_WASI_SDK_PATH
• include exact commands used to build the wasm artifacts using tree-sitter build --wasm
• build the typescript grammar wasm from the typescript grammar directory
• build the tsx grammar wasm from the tsx grammar directory
• include where the produced wasm files are copied in this repo and their expected filenames
• include a short verification step sa...


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI self-assigned this Dec 4, 2025
Copilot AI and others added 2 commits December 4, 2025 23:50
…-sitter WASM

Co-authored-by: neongreen <1523306+neongreen@users.noreply.github.com>
…memory leak

Co-authored-by: neongreen <1523306+neongreen@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement Go library to parse TypeScript and TSX with tree-sitter feat(ts-parser): add pure-Go TypeScript/TSX parser using tree-sitter WASM Dec 4, 2025
Copilot AI requested a review from neongreen December 5, 2025 00:00
@neongreen
Copy link
Copy Markdown
Owner

@copilot Can we do a single wasm file with both languages? Only one copy of runtime neeeded then

Co-authored-by: neongreen <1523306+neongreen@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Dec 5, 2025

@copilot Can we do a single wasm file with both languages? Only one copy of runtime neeeded then

Done in 89d1779. Combined both grammars into a single parser.wasm file (2.9MB vs 3MB for two separate files). The tree-sitter runtime is now only included once.

@neongreen
Copy link
Copy Markdown
Owner

@copilot Something seems wrong? Is the runtime only 100kb? Otherwise i wouldve expected a more significant reduction

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Dec 5, 2025

@copilot Something seems wrong? Is the runtime only 100kb? Otherwise i wouldve expected a more significant reduction

The runtime is indeed only ~100KB compiled. The tree-sitter runtime source is ~430KB which compiles to roughly 100KB with -Oz optimization. The bulk of each WASM file is the grammar itself - the TypeScript and TSX parser.c files are each 8.4MB of generated parsing tables. So the 100KB reduction (3.0MB → 2.9MB) is the expected savings from eliminating the duplicate runtime.

@mergify
Copy link
Copy Markdown

mergify bot commented Dec 5, 2025

🧪 CI Insights

Here's what we observed from your CI run for 89d1779.

🟢 All jobs passed!

But CI Insights is watching 👀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants