Add blake3 hashing by makew0rld · Pull Request #538 · sigstore/model-transparency

makew0rld · 2025-10-09T20:44:20Z

Closes #530

Summary

BLAKE3 is an excellent cryptographic hash algorithm, both in terms of features and performance. Adding it as an option for model hashing will greatly speed up the hash time for large files.

Our existing internal tooling already tracks files and blobs using BLAKE3, and supporting it for model manifests would make them interoperable with our tooling without expensive rehashing being required.

See the commit message for some explanation of the design decisions.

Checklist

All commits are signed-off, using DCO
All new code has docstrings and type annotations
All new code is covered by tests. Aim for at least 90% coverage. CI is configured to highlight lines not covered by tests.
Public facing changes are paired with documentation changes
Release note has been added to CHANGELOG.md if needed

mihaimaruseac

Overall this looks good, but I'm mostly concerned on taking on a dependency which was not updated in 7 years.

mihaimaruseac · 2025-10-09T20:51:56Z

+        For BLAKE3 this is equivalent to not sharding. Sharding is bypassed
+        because BLAKE3 already operates in parallel. This means the chunk_size
+        and shard_size args are ignored.
+


A little bit concerned about this, given that sharding is also introduced to allow verifying only a portion of the file, rather than the integrity of the entire file. But that's an optimization, so might not matter much

That's interesting, I hadn't thought of that. BLAKE3 actually supports this as well (look up "blake3 bao"), but I think adding support for that is out of scope for this PR.

Yeah, let's merge as it is and if we actually need this support we can add it.

Because BLAKE3 natively supports parallelism without changing the final hash, sharding is bypassed. This is much more useful than getting different file hashes depending on which hashing method you used. The BLAKE3 hashing is done by memory mapping the file, and defaults to the max number of workers which is the number of logical CPU cores. This is a good default and the most performant setup. It is also what the standard BLAKE3 CLI tool (b3sum) does. It is implemented in Rust and so will be true parallelism rather than the thread concurrency implemented for other hashing algorithms, so the speed up should be quite large. But it will likely be slower on HDDs than having no parallelism. I think this is the right default, but the HDD concern is documented. Resolves: sigstore#530 Signed-off-by: makeworld <makeworld@protonmail.com>

makew0rld · 2025-10-09T21:05:12Z

@mihaimaruseac thanks for the quick review!

taking on a dependency which was not updated in 7 years

I'm not sure what you mean, maybe you're looking at a different dependency? The blake3 package (PyPI, repo) had its latest release last week.

mihaimaruseac · 2025-10-09T21:11:23Z

Oh, I accidentally was looking at blake-256, my bad.

mihaimaruseac · 2025-10-09T21:12:01Z

+        For BLAKE3 this is equivalent to not sharding. Sharding is bypassed
+        because BLAKE3 already operates in parallel. This means the chunk_size
+        and shard_size args are ignored.
+


Yeah, let's merge as it is and if we actually need this support we can add it.

makew0rld · 2025-10-09T21:23:14Z

Thanks for the quick merge! If you're able to cut a new release for this soon that would be awesome.

mihaimaruseac · 2025-10-09T21:24:39Z

Working on a release as we speak!

makew0rld requested review from a team as code owners October 9, 2025 20:44

makew0rld force-pushed the blake3 branch 2 times, most recently from a78d172 to 70ace84 Compare October 9, 2025 20:49

mihaimaruseac reviewed Oct 9, 2025

View reviewed changes

cameronfyfe reviewed Oct 9, 2025

View reviewed changes

Comment thread benchmarks/serialize.py Outdated

makew0rld force-pushed the blake3 branch from 677d708 to 47163a9 Compare October 9, 2025 21:03

mihaimaruseac approved these changes Oct 9, 2025

View reviewed changes

mihaimaruseac enabled auto-merge (squash) October 9, 2025 21:13

mihaimaruseac merged commit 1f9a11d into sigstore:main Oct 9, 2025
51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add blake3 hashing#538

Add blake3 hashing#538
mihaimaruseac merged 1 commit intosigstore:mainfrom
eqtylab:blake3

makew0rld commented Oct 9, 2025

Uh oh!

mihaimaruseac left a comment

Uh oh!

Uh oh!

mihaimaruseac Oct 9, 2025

Uh oh!

makew0rld Oct 9, 2025

Uh oh!

mihaimaruseac Oct 9, 2025

Uh oh!

Uh oh!

makew0rld commented Oct 9, 2025

Uh oh!

mihaimaruseac commented Oct 9, 2025

Uh oh!

mihaimaruseac Oct 9, 2025

Uh oh!

Uh oh!

makew0rld commented Oct 9, 2025

Uh oh!

mihaimaruseac commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

makew0rld commented Oct 9, 2025

Summary

Checklist

Uh oh!

mihaimaruseac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mihaimaruseac Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

makew0rld Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

mihaimaruseac Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

makew0rld commented Oct 9, 2025

Uh oh!

mihaimaruseac commented Oct 9, 2025

Uh oh!

mihaimaruseac Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

makew0rld commented Oct 9, 2025

Uh oh!

mihaimaruseac commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants