Skip to content

[HMA] Add explicit connection close for signal ops and clear temp refs#1968

Merged
Dcallies merged 1 commit intofacebook:mainfrom
juanmrad:hma/add-explicit-memory-clear-for-db-and-index-refs
Apr 15, 2026
Merged

[HMA] Add explicit connection close for signal ops and clear temp refs#1968
Dcallies merged 1 commit intofacebook:mainfrom
juanmrad:hma/add-explicit-memory-clear-for-db-and-index-refs

Conversation

@juanmrad
Copy link
Copy Markdown
Collaborator

Summary

Attempts to fix memory leaks in the index lifecycle that cause worker RSS to climb steadily with each index refresh (see #1813).

Close raw_conn in load_signal_index() and commit_signal_index() previously these database connections were never returned to the pool, leaking on every refresh cycle.

Clear built_index reference and signal_list before calling trim_process_memory() so gc.collect() can actually reclaim them without waiting for python to clear reference.

@meta-cla meta-cla Bot added the CLA Signed label Apr 14, 2026
@github-actions github-actions Bot added the hma Items related to the hasher-matcher-actioner system label Apr 14, 2026
@Dcallies
Copy link
Copy Markdown
Contributor

Have we figured out a way to collect evidence on what the source of memory is for longer running processes? At least internally we have some tools that allow us to inspect the memory usage.

There are other things that we could do to flail at the problem such as rewriting index build to try and make it incremental, and only rebuild once the delta starts to get too high. In practice I think this reduces the need to rebuild the index at all to zero for most workloads.

built_index = index_cls.build(signal_list)

# explicitly free the signal list before returning
signal_list.clear()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very suspicious this does anything, because it requires that the local variable is not properly garbage collected when it leaves scope, which stretches my understanding about how refcounting works.

Do we have any evidence that this is needed?

Random googling:

  1. https://stackoverflow.com/questions/28324084/does-deleting-reassigning-a-python-list-allow-for-garbage-collection
  2. https://blog.codingconfessions.com/p/cpython-garbage-collection-internals

raw_conn.commit()
finally:
# explicitly close the raw connection to free memory
raw_conn.close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might work too: https://docs.python.org/3/library/contextlib.html#contextlib.closing which lets you use a with statement.

Reading https://docs.sqlalchemy.org/en/21/core/connections.html#working-with-the-dbapi-cursor-directly I think this is very likely missing the close

@Dcallies Dcallies merged commit ff4e106 into facebook:main Apr 15, 2026
5 checks passed
@juanmrad juanmrad deleted the hma/add-explicit-memory-clear-for-db-and-index-refs branch April 18, 2026 03:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed hma Items related to the hasher-matcher-actioner system

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants