Skip to content

Add wasm-bindgen support#23493

Open
walkingeyerobot wants to merge 90 commits into
emscripten-core:mainfrom
walkingeyerobot:wbg-walkingeyerobot
Open

Add wasm-bindgen support#23493
walkingeyerobot wants to merge 90 commits into
emscripten-core:mainfrom
walkingeyerobot:wbg-walkingeyerobot

Conversation

@walkingeyerobot
Copy link
Copy Markdown
Collaborator

@walkingeyerobot walkingeyerobot commented Jan 24, 2025

This is an early draft PR for the purposes of gathering feedback early. There are also pending changes to wasm-bindgen. This is ready for review.

How this works:

  1. Cargo builds Rust code targeting wasm32-unknown-emscripten into a .a file (staticlib). This .a file includes some annotations needed by wasm-bindgen later.
  2. Emscripten is invoked with any C++ sources and the Rust .a file.
  3. Emscripten builds C++ sources and then calls out to wasm-ld to link the C++ and Rust together into a single .wasm file.
  4. wasm-bindgen is run on that .wasm file, removing the annotations needed by wasm-bindgen and producing a new .wasm file, a library.js file, and a pre.js file.
  5. Emscripten constructs its own .js, integrating the wasm-bindgen .js files.

You can see a demo more easily at https://github.com/walkingeyerobot/cxx-rust-demo.

Some TODOs:

  1. Figure out how to pass the exported symbols from the rust compiler to Emscripten. These are symbols that need to be passed to wasm-ld so they're not removed in the final .wasm but that may not necessarily be present after wasm-bindgen processes the .wasm. wasm-bindgen at compile time puts the information it needs to generate JS inside the .wasm file itself in the form of _describe functions. These functions are then removed after JS generation.
  2. Merge the .js files produced by wasm-bindgen. This shouldn't be that hard; I just haven't gotten around to it yet. This would simplify the code for both Emscripten and wasm-bindgen.
  3. Get wasm-bindgen tests to pass. Early efforts here have revealed some very odd compiler differences between -unknown and -emscripten that I'll have to fix.
  4. Have this work end-to-end via wasm-pack. I'll have a draft PR for this soon (tm). My work here didn't pan out, but there's a new PR for this here: feat: support wasm32-unknown-emscripten target wasm-bindgen/wasm-pack#1583

I'm mostly looking for feedback on the first point about exported symbols and about the general addition of -sWASM_BINDGEN to Emscripten. Again, this is very early, but it's a pretty big feature, so I thought it best to start discussions now.

cc @daxpedda @guybedford @RReverser, who I've been working with on the wasm-bindgen side.

(updated May 18 2026 to be more accurate as to the current state of things)

@kripken
Copy link
Copy Markdown
Member

kripken commented Jan 24, 2025

wasm-bindgen at compile time puts the information it needs to generate JS inside the .wasm file itself in the form of _describe functions.

Does rustc then read the wasm to find those function names, and pass those names to wasm-ld? (if not, how does it find those names?)

In general if we need to read metadata-type info from the wasm, then we have a minimal parser in tools/webassembly.py. If we need something more complex, a binaryen pass is an option.

@walkingeyerobot
Copy link
Copy Markdown
Collaborator Author

wasm-bindgen itself is two pieces: a library that allows you to annotate your rust code marking things to be exported, and a tool that consumes a .wasm file and reads those annotations to produce a companion js file. rustc knows about those function names because wasm-bindgen as a library provided the annotations. If rustc invokes the linker itself, it's able to pass that information along. However, because we need to also build C++, we're only using rustc to compile and not drive the whole process, so we need to have it output that information elsewhere.

One (very naive) possibility is to have rustc invoke a fake linker that just writes the -sEXPORTED_FUNCTIONS to a file for emscripten to read later.

Comment thread tools/link.py Outdated
Comment thread tools/link.py Outdated
Comment thread tools/building.py Outdated
Comment thread tools/building.py Outdated
Comment thread tools/building.py Outdated
Comment thread tools/building.py Outdated
@walkingeyerobot walkingeyerobot marked this pull request as ready for review April 27, 2026 19:40
@walkingeyerobot
Copy link
Copy Markdown
Collaborator Author

I believe this is ready for review! :D

@walkingeyerobot walkingeyerobot changed the title [DRAFT] add wasm-bindgen support add wasm-bindgen support Apr 29, 2026
@sbc100 sbc100 changed the title add wasm-bindgen support Add wasm-bindgen support May 15, 2026
@sbc100
Copy link
Copy Markdown
Collaborator

sbc100 commented May 15, 2026

Sorry for the delay reviewing this. Just getting to it now.

Is the PR description up-to-date with current state of things?

Comment thread tools/config.py Outdated
'CACHE',
'PORTS',
'COMPILER_WRAPPER',
'WASM_BINDGEN',
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we avoid the new config setting completely and just rely on wasm-bindgen being in the PATH when a user added -sWASM_BINDGEN.

This is what we do for tsc I believe. I'm loath to add more config setting if we can possibly avoid it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread .circleci/config.yml Outdated
Comment thread tools/emscripten.py Outdated
for file_path in bindgen_tsd:
with open(file_path, encoding='utf-8') as file:
for line in file:
out += f'{line}'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these 3 lines be replaced with just out += read_file(file_path)?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread tools/link.py Outdated
Comment thread tools/utils.py
"""

value: str
is_file: int
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps putting this in cmdline.py would make more sense?

I'm still not clear why we need to move this, but trying to understand now.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously it was just defined in emcc.py. Moving it here lets us also use it in link.py. That let's us ensure that all entries to linker_args are LinkFlags and not a combination of LinkFlags and strings.

Happy to move it if you'd like.

Comment thread test/test_other.py Outdated
Comment thread tools/building.py Outdated
Comment thread tools/building.py Outdated
Comment thread tools/building.py Outdated
Comment thread tools/building.py Outdated
Comment thread tools/building.py Outdated
Comment thread tools/building.py Outdated
@walkingeyerobot
Copy link
Copy Markdown
Collaborator Author

Sorry for the delay reviewing this. Just getting to it now.

No worries at all!

Is the PR description up-to-date with current state of things?

I just updated it now.

I will address your other comments later this week. Thanks again for taking a look!

gergelyvagujhelyi added a commit to nobodywho-ooo/nobodywho that referenced this pull request May 22, 2026
Now that the unpushed work in the two forks has been pushed to public
branches on nobodywho-ooo (and walkingeyerobot's emscripten has been
forked into nobodywho-ooo/emscripten for transparency), the build no
longer needs to point at local checkout paths.

Workspace Cargo.toml:
- `wasm-bindgen = { path = "/Users/user/git/wasm-bindgen" }`
  → `wasm-bindgen = { git = "https://github.com/nobodywho-ooo/wasm-bindgen",
                       branch = "emscripten-descriptor-fixes" }`
- Drop the `[patch."https://github.com/nobodywho-ooo/llama-cpp-rs"]`
  block entirely — `core/Cargo.toml` already declares
  `llama-cpp-2 / llama-cpp-sys-2 = { git = ..., branch = "wasm-emscripten" }`,
  and the branch now contains the three previously-unpushed commits
  (CMAKE_SYSTEM_PROCESSOR=wasm32, MA_NO_* + -fexceptions for mtmd).

README "Outstanding" section: replaced the two prose-only fork notes
with three explicit list items linking to the public branch URLs:
- nobodywho-ooo/llama-cpp-rs branch wasm-emscripten
- nobodywho-ooo/wasm-bindgen branch emscripten-descriptor-fixes
- nobodywho-ooo/emscripten branch wbg-walkingeyerobot
  (fork of walkingeyerobot/emscripten, which itself carries the
  -sWASM_BINDGEN flag — emscripten-core/emscripten#23493)

Cargo.lock updated to reflect the new wasm-bindgen git source pin
(commit f4fc33dc7).

Verified:
- `cargo check --workspace --exclude nobodywho-js` (after `cargo clean
  -p nobodywho -p nobodywho-python` to flush the stale incremental
  cache from the patch change): clean
- `cargo check -p nobodywho-js --target wasm32-unknown-emscripten`: clean
Comment thread test/common.py
}
''' % locals(),
'a: loaded\na: b (prev: (null))\na: c (prev: b)\n', cflags=extra_args)
''', 'a: loaded\na: b (prev: (null))\na: c (prev: b)\n', cflags=extra_args)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an unrelated cleanup?

Comment thread test/test_other.py
create_file('post.js', '''
Module.onRuntimeInitialized = () => {
out(Module.rs_add(17, 25));
};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a single line function here? Module.onRuntimeInitialized = () => out(Module.rs_add(17, 25));

Comment thread test/test_other.py
'--post-js',
'post.js',
'-lexports.js',
]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this all fit on a single line? Note: you can make --post-js=post.js into a single arg.

Comment thread tools/building.py
'--print-file-name', '--quiet']
nm_args += input_files

result = run_process(nm_args, stdout=subprocess.PIPE)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use check_call here? Then it will show up when you run with emcc -v.

Comment thread tools/emscripten.py


def create_tsd(metadata, embind_tsd):
def create_tsd(metadata, embind_tsd, bindgen_tsd=None):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a default? (i.e. do all callsites pass this arg?)

Comment thread tools/building.py


def link_lld(args, target, external_symbols=None):
def link_lld(args, target, external_symbols=None, linker_inputs=None):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a default? (i.e. do all callsites pass this arg?)

Comment thread src/runtime_common.js
#if SUPPORT_BIG_ENDIAN
#if SUPPORT_BIG_ENDIAN || WASM_BINDGEN
/** @type {!DataView} */
var HEAP_DATA_VIEW;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How/why does WASM_BINDGEN use HEAP_DATA_VIEW?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants