Skip to content

Non-atomic fs::write in rgb_utils causes node crash and data corruption #120

@free-free-6

Description

@free-free-6

Summary

write_rgb_channel_info in rgb_utils/mod.rs uses fs::write() to update RGB channel info files. fs::write internally performs two separate syscalls:

  1. open(O_TRUNC) — truncates the file to 0 bytes
  2. write(data) — writes new content

This causes two problems:

Problem 1 — Runtime crash (race condition): If another thread reads the file between syscall 1 and 2, it reads an empty file, serde_json fails with EOF, and the thread panics. The panic poisons the shared Mutex, cascading to all other threads and crashing the node.

Problem 2 — Persistent data corruption: If the process is killed (SIGKILL / OOM / docker stop / power loss) between syscall 1 and 2, the file is permanently left as 0 bytes. On restart, any code path that reads this file panics, making the node unable to start.

Affected code

rust-lightning/lightning/src/rgb_utils/mod.rs:

  • write_rgb_channel_info
  • write_rgb_payment_info_file
  • fs::write calls in color_commitment

All write to files in .ldk/ directory without any synchronization or atomic write pattern.

How to reproduce

channel_info_file_race.txt
Attached test file: channel_info_file_race.txt (rename to .rs, place in src/test/)

The test opens an RGB channel, then fires rapid concurrent payments while 5 background tasks continuously call /listchannels. The payments trigger write_rgb_channel_info (via PaymentSent events), while /listchannels calls parse_rgb_channel_info on the same file. Within ~20-30 rounds the race condition is hit:

parse_rgb_channel_info thread_id: ThreadId(3)   ← reading
write_rgb_channel_info thread_id: ThreadId(5)   ← writing (487µs)

panicked at rgb_utils/mod.rs:586:
valid rgb info file: Error("EOF while parsing a value", line: 1, column: 0)

panicked at channelmanager.rs:13758: PoisonError { .. }
panicked at channelmanager.rs:4260:  PoisonError { .. }
...cascade...

Register the test in src/test/mod.rs:

mod channel_info_file_race;

Run:

cargo test channel_info_file_race -- --test-threads=1 --nocapture

Note: the test uses worker_threads = 4 internally to enable true thread concurrency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions