Try updating to orc by bhartnett · Pull Request #4016 · status-im/nimbus-eth1

bhartnett · 2026-02-25T14:11:17Z

So far the following works with orc on linux:

Building the standalone execution client: make nimbus_execution_client
Apart from test_txpool test case, all tests pass when running: make test
The era file block import appears to work fine.
All eest blockchain tests pass when running: make eest_blockchain_test
All eest engine tests pass when running: make eest_engine_test

bhartnett · 2026-02-25T14:12:17Z

    test_rpc,
    test_snap,
    test_transaction_json,
-    test_txpool,


This test is crashing with a seg fault on linux.

It's a stack-based segfault due to ulimit -s running as part of make test

notably, running ./env.sh nim c -r tests/all_tests without ulimit does not reproduce it.

bhartnett · 2026-02-25T14:20:02Z

  # and look for .su files in "./build/", "./nimcache/" or $TMPDIR that list the
  # stack size of each function.
-  switch("passC", "-fstack-usage -Werror=stack-usage=1048576")
-  switch("passL", "-fstack-usage -Werror=stack-usage=1048576")


It appears that orc increases stack usage.

/home/user/development/status-im/nimbus-eth1/vendor/nimbus-build-system/vendor/Nim/lib/system.nim: In function ‘_ZN7eip759434validateBlobTransactionWrapper7594EN10pooled_txs17PooledTransactionE’: /home/user/development/status-im/nimbus-eth1/execution_chain/core/eip7594.nim:19:15: error: stack usage might be 1205200 bytes [-Werror=stack-usage=] 19 | proc validateBlobTransactionWrapper7594*(tx: PooledTransaction): | ^ lto1: some warnings being treated as errors make[1]: *** [/tmp/ccZV498U.mk:227: /tmp/ccLQlohJ.ltrans75.ltrans.o] Error 1 make[1]: *** Waiting for unfinished jobs.... lto-wrapper: fatal error: make returned 2 exit status compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status Error: execution of an external program failed: 'g++ @all_tests_linkerArgs.txt' stack trace: (most recent call last)

Yes, this seems to be behind the segfault

bhartnett · 2026-02-25T15:14:59Z

There is a compile failure in nimbus-eth2:

Building: build/nimbus
/home/user/development/status-im/nimbus-eth1/vendor/nimbus-eth2/beacon_chain/networking/peer_protocol.nim(191, 1) template/generic instantiation of `p2pProtocol` from here
/home/user/development/status-im/nimbus-eth1/vendor/nimbus-eth2/beacon_chain/networking/eth2_protocol_dsl.nim(349, 17) template/generic instantiation of `createPeerState` from here
/home/user/development/status-im/nimbus-eth1/vendor/nimbus-eth2/beacon_chain/networking/eth2_protocol_dsl.nim(187, 10) Error: expression cannot be cast to 'RootRef'
make: *** [Makefile:227: nimbus] Error 1

Casting to RootRef is no longer allowed with orc: nim-lang/Nim#20016

@agnxsh Perhaps you could have a look at fixing this compile error in the nimbus-eth2 side? I'm not too familiar with this code.

bhartnett · 2026-02-25T16:11:21Z

The portal tests fail with a segfault:

user@pop-os:~/development/status-im/nimbus-eth1/portal/tests/beacon_network_tests$ nim compile -d:danger --verbosity:0 --hints:off --run "/home/user/development/status-im/nimbus-eth1/portal/tests/beacon_network_tests/test_beacon_content.nim"
Beacon Content Keys and Values ..Segmentation fault (core dumped)
Error: execution of an external program failed: '/home/user/development/status-im/nimbus-eth1/portal/tests/beacon_network_tests/test_beacon_content'

Appears to be caused by a stack overflow:

valgrind --leak-check=full ./test_beacon_content
Beacon Content Keys and Values ..==214801== Stack overflow in thread #1: can't grow stack to 0x1ffe801000
==214801== Can't extend stack to 0x1ffe800d08 during signal delivery for thread 1:
==214801==   no stack segment
==214801== 
==214801== Process terminating with default action of signal 11 (SIGSEGV)
==214801==  Access not within mapped region at address 0x1FFE800D08
==214801== Stack overflow in thread #1: can't grow stack to 0x1ffe801000
==214801==    at 0x237BD0: beacon_init_loader::loadNetworkData(string) [clone .constprop.0] (beacon_init_loader.nim:25)
==214801==  If you believe this happened as a result of a stack
==214801==  overflow in your program's main thread (unlikely but
==214801==  possible), you can try to increase the size of the
==214801==  main thread stack using the --main-stacksize= flag.
==214801==  The main thread stack size used in this run was 8388608.
==214801== 
==214801== HEAP SUMMARY:
==214801==     in use at exit: 1,024 bytes in 1 blocks
==214801==   total heap usage: 3 allocs, 2 frees, 2,520 bytes allocated
==214801== 
==214801== LEAK SUMMARY:
==214801==    definitely lost: 0 bytes in 0 blocks
==214801==    indirectly lost: 0 bytes in 0 blocks
==214801==      possibly lost: 0 bytes in 0 blocks
==214801==    still reachable: 1,024 bytes in 1 blocks
==214801==         suppressed: 0 bytes in 0 blocks
==214801== Reachable blocks (those to which a pointer was found) are not shown.
==214801== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==214801== 
==214801== For lists of detected and suppressed errors, rerun with: -s
==214801== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

After increasing the stack size locally (on linux) the seqfault disappears.

tersec · 2026-02-25T16:12:50Z

So far everything points to ORC using more stack, the question is why.

tersec · 2026-02-25T17:18:47Z

Nim v2.2.x ORC adds extra nimZeroMem/nimCopyMem/stack usage over refc nim-lang/Nim#25552

bhartnett · 2026-02-26T00:34:50Z

Nim v2.2.x ORC adds extra nimZeroMem/nimCopyMem/stack usage over refc nim-lang/Nim#25552

Is this a blocker for moving to orc or do you think we could work around by just increasing stack size?

tersec · 2026-02-26T01:25:53Z

Nim v2.2.x ORC adds extra nimZeroMem/nimCopyMem/stack usage over refc nim-lang/Nim#25552

Is this a blocker for moving to orc or do you think we could work around by just increasing stack size?

That it occurs also with a pure-ref/heap version of this setup escalates it for me into something closer to a blocker, because it's more difficult to avoid. It's something truly invisible in the Nim source; there's no obvious stack usage in

let h = new array[8192, int]
let s = new seq[array[8192, int]]
add(s[], h[])

We've run into this issue before in other circumstances where we have to work around Nim materializing large objects on the stack. It's possible, but best avoided to put oneself into this situation because it's unbounded. We have significantly larger objects than a 128KiB blob, and it's important that these never be materialized on a stack.

status-im-auto · 2026-02-26T17:03:05Z

Jenkins Builds

Click to see older builds (27)

❔	Commit	#️⃣	Finished (UTC)	Duration	Platform	Result
⁉️	`77ec4b0`	#2	2026-02-26 17:03:04	~16 min	`unknown`	📄`log`

⁉️	a9ccccc3	#3	2026-02-27 06:39:59	~23 min	`unknown`	📄`log`

⁉️	9025639e	#4	2026-02-28 09:09:46	~23 min	`unknown`	📄`log`

⁉️	65a6705e	#5	2026-03-03 08:57:46	~11 min	`unknown`	📄`log`

⁉️	f09bc5ef	#6	2026-03-06 08:57:35	~10 min	`unknown`	📄`log`

⁉️	2b120901	#7	2026-03-07 08:57:10	~10 min	`unknown`	📄`log`

⁉️	e7aa4b2f	#8	2026-03-09 08:56:30	~9 min	`unknown`	📄`log`

⁉️	7b91b6eb	#9	2026-03-10 08:57:28	~10 min	`unknown`	📄`log`

⁉️	96f34958	#10	2026-03-12 08:57:32	~10 min	`unknown`	📄`log`

⁉️	4e1f5214	#11	2026-03-17 08:57:45	~11 min	`unknown`	📄`log`

⁉️	9d418362	#12	2026-03-18 08:57:00	~10 min	`unknown`	📄`log`

⁉️	9f33731f	#13	2026-03-19 08:57:15	~10 min	`unknown`	📄`log`

⁉️	040440a8	#14	2026-03-20 08:57:08	~10 min	`unknown`	📄`log`

⁉️	26fd182e	#15	2026-03-21 08:57:34	~11 min	`unknown`	📄`log`

⁉️	b8e59575	#16	2026-03-23 08:57:43	~11 min	`unknown`	📄`log`

⁉️	5c1103fc	#17	2026-03-25 08:56:35	~10 min	`unknown`	📄`log`

⁉️	f75d1552	#18	2026-03-26 08:56:20	~9 min	`unknown`	📄`log`

⁉️	2f800551	#19	2026-03-27 08:57:56	~11 min	`unknown`	📄`log`

⁉️	3252c3e0	#20	2026-03-28 08:57:32	~11 min	`unknown`	📄`log`

⁉️	677abc61	#21	2026-03-30 08:57:47	~11 min	`unknown`	📄`log`

⁉️	9972ea16	#22	2026-03-31 08:58:21	~11 min	`unknown`	📄`log`

⁉️	d444f833	#23	2026-04-01 08:57:55	~11 min	`unknown`	📄`log`

⁉️	13e982f7	#24	2026-04-02 08:57:04	~10 min	`unknown`	📄`log`

⁉️	96ae40cf	#25	2026-04-03 08:57:52	~11 min	`unknown`	📄`log`

⁉️	9e29caa8	#26	2026-04-04 08:58:17	~11 min	`unknown`	📄`log`

⁉️	aacc6129	#27	2026-04-08 08:57:51	~11 min	`unknown`	📄`log`

⁉️	3d69990d	#28	2026-04-10 08:57:35	~10 min	`unknown`	📄`log`

❔	Commit	#️⃣	Finished (UTC)	Duration	Platform	Result
⁉️	c0d9ec3d	#29	2026-04-11 08:58:04	~11 min	`unknown`	📄`log`

⁉️	073bd10b	#30	2026-04-12 08:57:32	~10 min	`unknown`	📄`log`

status-im-auto · 2026-04-03T08:57:54Z

✔️ nimbus-eth1/prs/linux/x86_64/hive/PR-4016#25 🔹 ~11 min 🔹 96ae40cf 🔹 📦 null package

status-im-auto · 2026-04-04T08:58:18Z

✔️ nimbus-eth1/prs/linux/x86_64/hive/PR-4016#26 🔹 ~11 min 🔹 9e29caa8 🔹 📦 null package

bhartnett · 2026-04-17T12:09:11Z

@tersec So it turns out that moving to orc will be problematic for the multithreaded use case because it doesn't support atomic reference counting. Any ref types used in multiple threads concurrently will cause crashes due to ref counts being corrupted. The current parallel stateroot computation doesn't work in orc for this reason because it reads from the database from multiple threads in parallel in order to read the hashes and vertexes. The database, txFrame, and vertex types all through the code are ref types. I've found that it does work when using --mm:atomicArc so that confirms the issue is related to the reference counts.

Getting this to work in orc is possible, I could pass around ptr to object types but that would require updating much of the codebase and it leads to messy unmaintainable code in my opinion. The reason for passing these ref types into each thread is because I'm going for the shared memory model when multiple threads read and write to shared state which is generally faster than copying data between threads. When we implement full parallel execution and batch IO we need to be able to read state from the in memory layers and then the database in parallel. In order to do this, each thread needs to access the shared database and txFrame ref types.

tersec · 2026-04-19T19:01:42Z

Is it harder than refc or just neutral? https://nim-lang.org/docs/mm.html#other-mm-modes doesn't list either refc or ORC as atomic.

There is atomicArc, but I've never tried it. ARC in general isn't designed to collect cycles, which might be too big a constraint for Nimbus.

bhartnett · 2026-04-20T03:46:22Z

Is it harder than refc or just neutral? https://nim-lang.org/docs/mm.html#other-mm-modes doesn't list either refc or ORC as atomic.

There is atomicArc, but I've never tried it. ARC in general isn't designed to collect cycles, which might be too big a constraint for Nimbus.

refc is better than orc for my use case because I can use/share ref types between threads. I just need to make sure the ref type in the main thread outlives the tasks where it is used by the worker threads.

When I compile with arc or atomicArc there are some warnings about cycles so I guess we can't use arc based memory management.

That link says 'The reference counting operations (= "RC ops") do not use atomic instructions' under the arc/orc section. The fact that atomicArc exists suggests that arc is not atomic and this matches my conclusions based on my testing where I'm seeing crashes when using orc/arc.

Actually I'm not 100% sure if refc does in fact use atomic ref counts, it might actually be working because of the thread local heaps where the ref counts are stored separately on each heap and therefore the worker threads are unable to touch the ref counts of any other threads. Either way, refc works for me as does atomicArc (but atomicArc will likely be leaking memory).

Use orc instead of refc.

40aa08d

bhartnett commented Feb 25, 2026

View reviewed changes

Fix portal test stack usage.

a81b467

bhartnett added 2 commits February 26, 2026 15:35

Merge branch 'master' into try-update-to-orc

edc7e7f

Undo stack usage increase.

77ec4b0

Conversation

bhartnett commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bhartnett Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

tersec Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

tersec Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

bhartnett Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

tersec Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

bhartnett commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bhartnett commented Feb 25, 2026

Uh oh!

tersec commented Feb 25, 2026

Uh oh!

tersec commented Feb 25, 2026

Uh oh!

bhartnett commented Feb 26, 2026

Uh oh!

tersec commented Feb 26, 2026

Uh oh!

status-im-auto commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Jenkins Builds

Uh oh!

status-im-auto commented Apr 3, 2026

✔️ nimbus-eth1/prs/linux/x86_64/hive/PR-4016#25 🔹 ~11 min 🔹 96ae40cf 🔹 📦 null package

Uh oh!

status-im-auto commented Apr 4, 2026

✔️ nimbus-eth1/prs/linux/x86_64/hive/PR-4016#26 🔹 ~11 min 🔹 9e29caa8 🔹 📦 null package

Uh oh!

bhartnett commented Apr 17, 2026

Uh oh!

tersec commented Apr 19, 2026

Uh oh!

bhartnett commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bhartnett commented Feb 25, 2026 •

edited

Loading

bhartnett commented Feb 25, 2026 •

edited

Loading

status-im-auto commented Feb 26, 2026 •

edited

Loading

bhartnett commented Apr 20, 2026 •

edited

Loading