Increase embedded string threshold from 64 to 128 bytes by Nikhil-Manglore · Pull Request #3397 · valkey-io/valkey

Nikhil-Manglore · 2026-03-23T20:00:03Z

This PR increases the embedded string threshold to 128 bytes (2 cache lines) in order to improve memory overhead. I also fixed the tests that pertained to embedded values

Closes #3025

Signed-off-by: Nikhil Manglore <[email protected]>

codecov · 2026-03-23T20:21:20Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.52%. Comparing base (6822a67) to head (41aa5d5).
⚠️ Report is 17 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #3397      +/-   ##
============================================
+ Coverage     76.38%   76.52%   +0.13%     
============================================
  Files           159      159              
  Lines         79681    79680       -1     
============================================
+ Hits          60865    60972     +107     
+ Misses        18816    18708     -108

Files with missing lines	Coverage Δ
src/object.c	`92.73% <100.00%> (+5.11%)`	⬆️
src/unit/test_object.cpp	`98.98% <100.00%> (ø)`

... and 19 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-03-24T01:46:57Z

Benchmark ran on this commit: e38f9c1

RPS Benchmark Comparison: unstable vs `16d2c42`

2 significant change(s)

✅ +31±2% GET rps pipe=10 threads=9 (data_size=96)
✅ +5.4±0.7% GET rps pipe=10 threads=1 (data_size=96)

14 with no significant change

Click to expand full comparison tables

data_size = 16

	% Change	Test	unstable	`16d2c42`	unstable stats	`16d2c42` stats
➖	+1.4±0.8%	GET rps P1 T1	227K	230K	n=3, σ=1.51K, CV=0.7%, CI99%=±3.8%, PI99%=±7.6%	n=3, σ=1.09K, CV=0.5%, CI99%=±2.7%, PI99%=±5.4%
➖	-0±2%	GET rps P1 T9	1.53M	1.53M	n=3, σ=28.2K, CV=1.8%, CI99%=±10.6%, PI99%=±21.2%	n=3, σ=19.4K, CV=1.3%, CI99%=±7.3%, PI99%=±14.5%
➖	+1.6±0.4%	GET rps P10 T1	1.233M	1.253M	n=3, σ=3.96K, CV=0.3%, CI99%=±1.8%, PI99%=±3.7%	n=3, σ=2.49K, CV=0.2%, CI99%=±1.1%, PI99%=±2.3%
➖	+1±1%	GET rps P10 T9	2.86M	2.896M	n=3, σ=29.7K, CV=1.0%, CI99%=±5.9%, PI99%=±11.9%	n=3, σ=7.14K, CV=0.2%, CI99%=±1.4%, PI99%=±2.8%
➖	+0.3±0.9%	SET rps P1 T1	221K	221.5K	n=3, σ=1.98K, CV=0.9%, CI99%=±5.1%, PI99%=±10.3%	n=3, σ=358, CV=0.2%, CI99%=±0.9%, PI99%=±1.9%
➖	+0.6±0.9%	SET rps P1 T9	1.483M	1.491M	n=3, σ=9.15K, CV=0.6%, CI99%=±3.5%, PI99%=±7.1%	n=3, σ=9.86K, CV=0.7%, CI99%=±3.8%, PI99%=±7.6%
➖	+1.8±0.9%	SET rps P10 T1	1.038M	1.057M	n=3, σ=8.76K, CV=0.8%, CI99%=±4.8%, PI99%=±9.7%	n=3, σ=3.02K, CV=0.3%, CI99%=±1.6%, PI99%=±3.3%
➖	+3.7±0.8%	SET rps P10 T9	1.93M	2.006M	n=3, σ=14.8K, CV=0.8%, CI99%=±4.4%, PI99%=±8.8%	n=3, σ=5.21K, CV=0.3%, CI99%=±1.5%, PI99%=±3.0%

data_size = 96

	% Change	Test	unstable	`16d2c42`	unstable stats	`16d2c42` stats
➖	+3±1%	GET rps P1 T1	221K	227K	n=3, σ=2.01K, CV=0.9%, CI99%=±5.2%, PI99%=±10.5%	n=3, σ=2.16K, CV=1.0%, CI99%=±5.5%, PI99%=±10.9%
➖	+3±1%	GET rps P1 T9	1.470M	1.51M	n=3, σ=8.00K, CV=0.5%, CI99%=±3.1%, PI99%=±6.2%	n=3, σ=15.1K, CV=1.0%, CI99%=±5.7%, PI99%=±11.4%
✅	+5.4±0.7%	GET rps P10 T1	1.165M	1.228M	n=3, σ=3.47K, CV=0.3%, CI99%=±1.7%, PI99%=±3.4%	n=3, σ=6.87K, CV=0.6%, CI99%=±3.2%, PI99%=±6.4%
✅	+31±2%	GET rps P10 T9	2.26M	2.95M	n=3, σ=21.6K, CV=1.0%, CI99%=±5.5%, PI99%=±11.0%	n=3, σ=26.7K, CV=0.9%, CI99%=±5.2%, PI99%=±10.4%
➖	+4±1%	SET rps P1 T1	213K	220.4K	n=3, σ=1.96K, CV=0.9%, CI99%=±5.3%, PI99%=±10.5%	n=3, σ=885, CV=0.4%, CI99%=±2.3%, PI99%=±4.6%
➖	+2.1±0.7%	SET rps P1 T9	1.458M	1.49M	n=3, σ=1.30K, CV=0.1%, CI99%=±0.5%, PI99%=±1.0%	n=3, σ=10.3K, CV=0.7%, CI99%=±4.0%, PI99%=±7.9%
➖	+1.8±0.8%	SET rps P10 T1	1.025M	1.044M	n=3, σ=7.80K, CV=0.8%, CI99%=±4.4%, PI99%=±8.7%	n=3, σ=2.85K, CV=0.3%, CI99%=±1.6%, PI99%=±3.1%
➖	+8±2%	SET rps P10 T9	1.86M	2.01M	n=3, σ=34.8K, CV=1.9%, CI99%=±10.7%, PI99%=±21.5%	n=3, σ=23.6K, CV=1.2%, CI99%=±6.7%, PI99%=±13.5%

Configuration:

architecture: aarch64
benchmark_mode: duration
clients: 1600
cluster_mode: False
duration: 180
tls: False
valkey_benchmark_threads: 90
warmup: 30

Legend:

Test column: Command, metric, P=pipeline depth, T=io-threads
Significance: ✅ significant improvement, ❌ significant regression, ➖ not significant, ❔ insufficient data

Statistical Notes:

CV: Coefficient of Variation - relative variability (σ/μ × 100%)
CI99%: 99% Confidence Interval - range where the true population mean is likely to fall
PI99%: 99% Prediction Interval - range where a single future observation is likely to fall

rainsupreme · 2026-03-24T18:11:02Z

+31% performance is wild, to the point of being suspicious. What's going on there? I want to see if it's repeatable

github-actions · 2026-03-25T00:04:11Z

Benchmark ran on this commit: 9d808ac

RPS Benchmark Comparison: unstable vs `a75d344`

3 significant change(s)

✅ +30±2% GET rps pipe=10 threads=9 (data_size=96)
✅ +6.0±0.8% GET rps pipe=10 threads=1 (data_size=96)
✅ +4.0±0.4% SET rps pipe=1 threads=1 (data_size=96)

13 with no significant change

Click to expand full comparison tables

data_size = 16

	% Change	Test	unstable	`a75d344`	unstable stats	`a75d344` stats
➖	+0.4±0.6%	GET rps P1 T1	230K	230.6K	n=3, σ=1.06K, CV=0.5%, CI99%=±2.7%, PI99%=±5.3%	n=3, σ=904, CV=0.4%, CI99%=±2.2%, PI99%=±4.5%
➖	-0±2%	GET rps P1 T9	1.53M	1.52M	n=3, σ=14.2K, CV=0.9%, CI99%=±5.3%, PI99%=±10.7%	n=3, σ=31.1K, CV=2.0%, CI99%=±11.7%, PI99%=±23.4%
➖	+2±2%	GET rps P10 T1	1.224M	1.25M	n=3, σ=2.53K, CV=0.2%, CI99%=±1.2%, PI99%=±2.4%	n=3, σ=18.4K, CV=1.5%, CI99%=±8.4%, PI99%=±16.9%
➖	+3±1%	GET rps P10 T9	2.845M	2.92M	n=3, σ=3.63K, CV=0.1%, CI99%=±0.7%, PI99%=±1.5%	n=3, σ=31.3K, CV=1.1%, CI99%=±6.2%, PI99%=±12.3%
➖	-1±1%	SET rps P1 T1	222.2K	220K	n=3, σ=684, CV=0.3%, CI99%=±1.8%, PI99%=±3.5%	n=3, σ=3.06K, CV=1.4%, CI99%=±7.9%, PI99%=±15.9%
➖	+1±1%	SET rps P1 T9	1.49M	1.503M	n=3, σ=19.0K, CV=1.3%, CI99%=±7.3%, PI99%=±14.6%	n=3, σ=4.24K, CV=0.3%, CI99%=±1.6%, PI99%=±3.2%
➖	+1.1±0.6%	SET rps P10 T1	1.043M	1.055M	n=3, σ=1.39K, CV=0.1%, CI99%=±0.8%, PI99%=±1.5%	n=3, σ=6.09K, CV=0.6%, CI99%=±3.3%, PI99%=±6.6%
➖	+2±1%	SET rps P10 T9	1.962M	2.00M	n=3, σ=2.67K, CV=0.1%, CI99%=±0.8%, PI99%=±1.6%	n=3, σ=26.6K, CV=1.3%, CI99%=±7.6%, PI99%=±15.3%

data_size = 96

	% Change	Test	unstable	`a75d344`	unstable stats	`a75d344` stats
➖	+5±2%	GET rps P1 T1	217K	228K	n=3, σ=3.45K, CV=1.6%, CI99%=±9.1%, PI99%=±18.3%	n=3, σ=2.43K, CV=1.1%, CI99%=±6.1%, PI99%=±12.2%
➖	+2±1%	GET rps P1 T9	1.477M	1.51M	n=3, σ=6.01K, CV=0.4%, CI99%=±2.3%, PI99%=±4.7%	n=3, σ=17.5K, CV=1.2%, CI99%=±6.6%, PI99%=±13.3%
✅	+6.0±0.8%	GET rps P10 T1	1.162M	1.231M	n=3, σ=3.81K, CV=0.3%, CI99%=±1.9%, PI99%=±3.8%	n=3, σ=8.30K, CV=0.7%, CI99%=±3.9%, PI99%=±7.7%
✅	+30±2%	GET rps P10 T9	2.26M	2.95M	n=3, σ=12.1K, CV=0.5%, CI99%=±3.1%, PI99%=±6.1%	n=3, σ=39.6K, CV=1.3%, CI99%=±7.7%, PI99%=±15.4%
✅	+4.0±0.4%	SET rps P1 T1	212.5K	221.1K	n=3, σ=610, CV=0.3%, CI99%=±1.6%, PI99%=±3.3%	n=3, σ=641, CV=0.3%, CI99%=±1.7%, PI99%=±3.3%
➖	+1±2%	SET rps P1 T9	1.46M	1.48M	n=3, σ=19.5K, CV=1.3%, CI99%=±7.6%, PI99%=±15.3%	n=3, σ=11.4K, CV=0.8%, CI99%=±4.4%, PI99%=±8.9%
➖	+1±1%	SET rps P10 T1	1.03M	1.046M	n=3, σ=10.5K, CV=1.0%, CI99%=±5.8%, PI99%=±11.7%	n=3, σ=3.09K, CV=0.3%, CI99%=±1.7%, PI99%=±3.4%
➖	+8±3%	SET rps P10 T9	1.85M	2.00M	n=3, σ=39.9K, CV=2.2%, CI99%=±12.4%, PI99%=±24.7%	n=3, σ=21.7K, CV=1.1%, CI99%=±6.2%, PI99%=±12.5%

Configuration:

architecture: aarch64
benchmark_mode: duration
clients: 1600
cluster_mode: False
duration: 180
tls: False
valkey_benchmark_threads: 90
warmup: 30

Legend:

Test column: Command, metric, P=pipeline depth, T=io-threads
Significance: ✅ significant improvement, ❌ significant regression, ➖ not significant, ❔ insufficient data

Statistical Notes:

CV: Coefficient of Variation - relative variability (σ/μ × 100%)
CI99%: 99% Confidence Interval - range where the true population mean is likely to fall
PI99%: 99% Prediction Interval - range where a single future observation is likely to fall

rainsupreme · 2026-03-25T19:26:36Z

I'm very curious why this is producing a 31% performance gain, but I'm not complaining either 😆

sarthakaggarwal97 · 2026-03-25T21:01:27Z

That's quite a gain. Curious if this is because that 96 bytes value is now treated as EMBSTR instead of RAW? I guess it reduces the number of look ups since the value is already embedded with the key? But probably still doesn't explain this much. It would great to profile this.

Explanation from Codex about the total size of the EMBSTR

base robj-ptr slot   8
embedded key        18
96B value SDS      100
total              126

Signed-off-by: Nikhil Manglore <[email protected]>

Nikhil-Manglore · 2026-03-26T22:23:55Z

The tests embedded_string_with_key and embedded_string_with_key_and_expire pass now. The failing test is an unrelated flaky test, we can rerun it.

zuiderkwast

The jemalloc size classes in this range are in intervals of 16 bytes (64, 80, 96, 112, 128) so in the worst case, we get 15 bytes internal fragmentation in the allocation, but we save 8 bytes for the pointer.

Previously, for unembedded strings in the range 64-128, the internal fragmentation is up 15 bytes (for 96B strings, 100B sds representation, 112B bucket, wasted 12B) plus the fixed 8 bytes for the obj->ptr.

Also optimal performance in most cases, it seems. 💯

sarthakaggarwal97

LGTM! We should also change our performance benchmarks to include more string sizes after this PR gets merged.

rainsupreme

Huzzah 🥳
(lgtm!)

Nikhil-Manglore · 2026-03-30T22:59:23Z

That's quite a gain. Curious if this is because that 96 bytes value is now treated as EMBSTR instead of RAW? I guess it reduces the number of look ups since the value is already embedded with the key? But probably still doesn't explain this much. It would great to profile this.

Explanation from Codex about the total size of the EMBSTR
base robj-ptr slot   8
embedded key        18
96B value SDS      100
total              126

Yeah 96 byte values are now treated as EMBSTR. Now that the value is embedded the number of cache misses is drastically reduced and there's no pointer dereference for 96 byte values. At high throughput configurations the throughput increases due to how the per-call savings is multiplied across all io-threads.

src/object.c

Nikhil-Manglore marked this pull request as draft March 23, 2026 20:00

github-actions bot assigned Nikhil-Manglore Mar 23, 2026

Nikhil-Manglore added the run-benchmark label Mar 23, 2026

Increase embedded string threshold from 64 to 128 bytes

9d808ac

Signed-off-by: Nikhil Manglore <[email protected]>

Nikhil-Manglore force-pushed the embedded_strings_update branch from e38f9c1 to 9d808ac Compare March 23, 2026 20:00

github-actions bot removed the run-benchmark label Mar 24, 2026

rainsupreme added the run-benchmark label Mar 24, 2026

github-actions bot removed the run-benchmark label Mar 25, 2026

Nikhil-Manglore added 2 commits March 26, 2026 21:56

Update tests

123e2c2

Signed-off-by: Nikhil Manglore <[email protected]>

Merge branch 'unstable' into embedded_strings_update

41aa5d5

Nikhil-Manglore marked this pull request as ready for review March 26, 2026 21:57

zuiderkwast approved these changes Mar 30, 2026

View reviewed changes

sarthakaggarwal97 approved these changes Mar 30, 2026

View reviewed changes

rainsupreme approved these changes Mar 30, 2026

View reviewed changes

This was referenced Mar 30, 2026

[PERFORMANCE] Potential regression due to Removal of internal server object pointer overhead #3025

Closed

Fix incorrect comment in shouldEmbedStringObject #3447

Closed

zuiderkwast added this to Valkey 9.1 Apr 7, 2026

zuiderkwast reviewed Apr 7, 2026

View reviewed changes

src/object.c Show resolved Hide resolved

zuiderkwast merged commit 034d16a into valkey-io:unstable Apr 7, 2026
57 of 58 checks passed

github-project-automation bot moved this to To be backported in Valkey 9.1 Apr 7, 2026

zuiderkwast added the release-notes This issue should get a line item in the release notes label Apr 7, 2026

Nikhil-Manglore deleted the embedded_strings_update branch April 7, 2026 17:59

Conversation

Nikhil-Manglore commented Mar 23, 2026 • edited by zuiderkwast Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Mar 24, 2026

RPS Benchmark Comparison: unstable vs 16d2c42

2 significant change(s)

data_size = 16

data_size = 96

Uh oh!

rainsupreme commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 25, 2026

RPS Benchmark Comparison: unstable vs a75d344

3 significant change(s)

data_size = 16

data_size = 96

Uh oh!

rainsupreme commented Mar 25, 2026

Uh oh!

sarthakaggarwal97 commented Mar 25, 2026

Uh oh!

Nikhil-Manglore commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zuiderkwast left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sarthakaggarwal97 left a comment

Choose a reason for hiding this comment

Uh oh!

rainsupreme left a comment

Choose a reason for hiding this comment

Uh oh!

Nikhil-Manglore commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Nikhil-Manglore commented Mar 23, 2026 •

edited by zuiderkwast

Loading

codecov bot commented Mar 23, 2026 •

edited

Loading

RPS Benchmark Comparison: unstable vs `16d2c42`

rainsupreme commented Mar 24, 2026 •

edited

Loading

RPS Benchmark Comparison: unstable vs `a75d344`

Nikhil-Manglore commented Mar 26, 2026 •

edited

Loading

zuiderkwast left a comment •

edited

Loading

Nikhil-Manglore commented Mar 30, 2026 •

edited

Loading