Skip to content

ci: add valgrind stateless test for check memory leak#19800

Draft
cathay4t wants to merge 1 commit into
databendlabs:mainfrom
cathay4t:valgrind
Draft

ci: add valgrind stateless test for check memory leak#19800
cathay4t wants to merge 1 commit into
databendlabs:mainfrom
cathay4t:valgrind

Conversation

@cathay4t
Copy link
Copy Markdown

@cathay4t cathay4t commented Apr 30, 2026

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

By using https://github.com/jfrimmel/cargo-valgrind , all unit test will
be invoked with valgrind check.

Using this command to do memory check on unit test cases:

cargo valgrind \
    nextest run -E 'not group(skip-memcheck)' --profile memcheck

These are the valgrind arugments for cargo valgrind:

--show-leak-kinds=definite
--errors-for-leak-kinds=definite
--max-threads=1024"

Some test cases are known to fail this valgrind test, instead of fixing
the in this patch, those test cases are marked as skip-memcheck in
.config/nextest.toml and pending for further investigation.

The memory leak check is only enabled on Linux x86_64 for now.

Tests

  • No Test - Expand existing unit test to run under valgrind watch.

Type of change

  • Other: Add memory leak check

This change is Reviewable

@cathay4t cathay4t changed the title WIP: CI: Add valgrind stateless test for check memory leak ci: add valgrind stateless test for check memory leak Apr 30, 2026
@github-actions github-actions Bot added the pr-build this PR changes build/testing/ci steps label Apr 30, 2026
@cathay4t

This comment was marked as resolved.

@cathay4t
Copy link
Copy Markdown
Author

cathay4t commented May 1, 2026

Never mind, above error is for possibly lost and still reachable, let me opt-out them.

@cathay4t
Copy link
Copy Markdown
Author

cathay4t commented May 1, 2026

I have marked several test cases as skipped for memory leak check and the command cargo valgrind nextest run -E 'not group(skip-memcheck)' is passing now in my VM.

This patch is ready for review.

Should I create several github issue for these memory check skipped test cases for further investigation or just let them rot?

@cathay4t cathay4t marked this pull request as ready for review May 1, 2026 01:40
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e9cab55a0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread scripts/ci/ci-run-unit-tests.sh Outdated
Comment thread scripts/setup/dev_setup.sh
@cathay4t cathay4t force-pushed the valgrind branch 5 times, most recently from b7e0f60 to d72e49b Compare May 1, 2026 02:34
@cathay4t
Copy link
Copy Markdown
Author

cathay4t commented May 1, 2026

Maybe investigating these memory leaks could be my good second issue. Let me dig around.

@cathay4t cathay4t force-pushed the valgrind branch 3 times, most recently from 1c02fac to f0faf78 Compare May 7, 2026 01:12
@zhang2014
Copy link
Copy Markdown
Member

zhang2014 commented May 7, 2026

@cathay4t Great work. Maybe it's better to add a new CI action(unit-test-valgrind)?

@cathay4t
Copy link
Copy Markdown
Author

cathay4t commented May 7, 2026

Sure.

@cathay4t cathay4t force-pushed the valgrind branch 2 times, most recently from 4072d01 to f0527e3 Compare May 7, 2026 12:03
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

🤖 CI Job Analysis

Workflow: 25495821606

📊 Summary

  • Total Jobs: 88
  • Failed Jobs: 1
  • Retryable: 0
  • Code Issues: 1

NO RETRY NEEDED

All failures appear to be code/test issues requiring manual fixes.

🔍 Job Details

  • linux / test_unit_valgrind: Not retryable (Code/Test)

🤖 About

Automated analysis using job annotations to distinguish infrastructure issues (auto-retried) from code/test issues (manual fixes needed).

@cathay4t cathay4t force-pushed the valgrind branch 2 times, most recently from d7a3ea7 to 5ce4151 Compare May 7, 2026 12:19
By using https://github.com/jfrimmel/cargo-valgrind , all unit test will
be invoked with valgrind check.

Using this command to do memory check on unit test cases:

```bash
cargo valgrind \
    nextest run -E 'not group(skip-memcheck)' --profile memcheck
```

These are the valgrind arugments for `cargo valgrind`:

```
--show-leak-kinds=definite
--errors-for-leak-kinds=definite
--max-threads=1024"
```

Some test cases are known to fail this valgrind test, instead of fixing
the in this patch, those test cases are marked as `skip-memcheck` in
`.config/nextest.toml` and pending for further investigation.

The memory leak check is only enabled on Linux x86_64 for now.

Added and enabled `test_unit_valgrind` github CI test action.

Resolves: databendlabs#5039

Signed-off-by: Gris Ge <cnfourt@gmail.com>
@cathay4t cathay4t marked this pull request as draft May 8, 2026 02:28
@cathay4t
Copy link
Copy Markdown
Author

cathay4t commented May 8, 2026

@zhang2014 The container in CI is not build from unmerged PR Dockerfile, hence the valgrind test will fail. Do you want me to create a dedicate PR to add valgrind into docker/build-tool/debian/Dockerfile or you can push a container image to docer.io directly for this PR CI to pass?

@zhang2014
Copy link
Copy Markdown
Member

Sorry, I missed the notification for this message. @everpcpc could you please take a look?

@ariesdevil
Copy link
Copy Markdown
Contributor

I used to add build with sanitizer to check several problems, including memory leak, but seems it will slow down CI workflow speed. So a few months later, we removed it.

Is valgrind really better than the sanitizer, or should we consider bringing the sanitizer back?

@cathay4t
Copy link
Copy Markdown
Author

cathay4t commented May 11, 2026

I just pickup the oldest good-first-issue: introduce valgrind with zero understanding of project vision.

I only used valgrind in a c binding written in Rust FFI, it correctly identified some memory bugs. Never played with rust address sanitizer, but the document looks promising.

If a tool can slow thing down, it will most likely be able to catch some hard-to-find racing problem where devs never be able to found in their fancy setup. So if a certain test is slow thing down, isolate it to run only after all other test groups is better than remove it in the seek of quicker CI results.

Just my two cents who holds zero experience on databend.

@cathay4t
Copy link
Copy Markdown
Author

I can close this PR if you think we should not continue effort on this path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-build this PR changes build/testing/ci steps

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add valgrind stateless test for check memory leak

3 participants