Steps to reproduce
As a continuation of #866 I have this snippet to load parquet files (compressed or otherwise) in a separate thread.
import sys
import glob
import time
import threading
import atoti as tt
# jars folder contains zstd jar
session = tt.Session(extra_jars=["atoti/jars"], java_options=["-Xmx32G", "-Xms32G"])
# Create or update the table with parquet file
def load_parquet(pqfile, table_name, keys=None):
table = session.tables.get(table_name, None)
if table is None:
table = session.read_parquet(pqfile, table_name=table_name, keys=keys)
else:
table.load_parquet(pqfile)
return table
pq_files = sorted(glob.glob("/path/to/compressed/files_*.parquet"))
# Load the first file & create a cube
tbl = load_parquet(pq_files[0], "mytable", keys=["id"])
cube = session.create_cube(tbl)
# Load positions in a separate thread
def loader():
for idx, pq in enumerate(pq_files):
sys.stdout.write(f"\rLoading file #{idx} : {pq}")
load_parquet(pq, "mytable")
time.sleep(1) # wait a sec
threading.Thread(target=loader).start()
Actual Result
Process memory continues to grow when my loader thread runs!
The process starts with 40 GB VIRT & 2.6 GB RSS (observed via htop)
Initial server log indicates other values (1GB heap + 3GB direct)
2024-04-02T10:55:32.129-04:00 INFO 3615 --- [activepivot-health-event-dispatcher] c.a.h.m.ILoggingHealthEventHandler : [jvm, memory] INFO 2024-04-02T14:55:32.127Z uptime=34570ms com.activeviam.health.monitor.impl.JvmHealthCheck.createEvent:61 thread=activeviam-health-check-worker thread_id=52 event_type=JvmMemoryReport JVM Memory Usage report: G1 Young Generation[count=10 (+0), time=0s (+0)] G1 Old Generation[count=0 (+0), time=0s (+0)] Heap[used=1 GiB 465 MiB (1561350664) (+(0)), committed=32 GiB (34359738368) (+(0)), max=32 GiB (34359738368) (+(0))] Direct[used=3 GiB 46 MiB (3269516297) (+(0)), count=11569 (+0), max=32 GiB (34359738368) (+(0))] Threads[count=103 (+0), peak=104 (+0)]
Now as the loader thread runs, and loads a file i see that the total memory continues to rise & only stops rising when the loader stops.
At the end it reached ~85 GB VIRT & 74GB RSS (seen via htop)
But the last line of the server log says that heap being used is 12GB and direct is 16GB.
2024-04-02T12:56:32.199-04:00 INFO 3615 --- [activepivot-health-event-dispatcher] c.a.h.m.ILoggingHealthEventHandler : [jvm, memory] INFO 2024-04-02T16:56:32.199Z uptime=7294642ms com.activeviam.health.monitor.impl.JvmHealthCheck.createEvent:61 thread=activeviam-health-check-worker thread_id=52 event_type=JvmMemoryReport JVM Memory Usage report: G1 Young Generation[count=108 (+0), time=2s (+0)] G1 Old Generation[count=3 (+0), time=1s (+0)] Heap[used=12 GiB 980 MiB (13912595624) (+(0)), committed=32 GiB (34359738368) (+(0)), max=32 GiB (34359738368) (+(0))] Direct[used=16 GiB 380 MiB (17578900307) (+(0)), count=286824 (+0), max=32 GiB (34359738368) (+(0))] Threads[count=40 (+0), peak=113 (+0)]
So the numbers don't add up. The sum of all such memory log lines is within the 32GB upper-bound that I'd set initially.
But the process is actually taking much much more RAM.
Expected Result
I expected the -Xmx option to set a maximum RAM size on the process.
But I think that is just the JVM heap max? I read somewhere that Atoti allocates data off-heap as well. Is that what is happening?
Is there a way to restrict the TOTAL memory usage of the process?
Could it perhaps be a "leak" in the parquet loader?
Environment
Logs
I have detailed logs as well, please let me know what additional info you require and I'll be happy to help!
Steps to reproduce
As a continuation of #866 I have this snippet to load parquet files (compressed or otherwise) in a separate thread.
Actual Result
Process memory continues to grow when my loader thread runs!
The process starts with 40 GB VIRT & 2.6 GB RSS (observed via
htop)Initial server log indicates other values (1GB heap + 3GB direct)
Now as the loader thread runs, and loads a file i see that the total memory continues to rise & only stops rising when the loader stops.
At the end it reached ~85 GB VIRT & 74GB RSS (seen via
htop)But the last line of the server log says that heap being used is 12GB and direct is 16GB.
So the numbers don't add up. The sum of all such memory log lines is within the 32GB upper-bound that I'd set initially.
But the process is actually taking much much more RAM.
Expected Result
I expected the
-Xmxoption to set a maximum RAM size on the process.But I think that is just the JVM heap max? I read somewhere that Atoti allocates data off-heap as well. Is that what is happening?
Is there a way to restrict the TOTAL memory usage of the process?
Could it perhaps be a "leak" in the parquet loader?
Environment
atoti: 0.8.10
Python: 3.12.2
Operating system: linux
Machine being tested on has 32 cores & 256 GB RAM
Logs
I have detailed logs as well, please let me know what additional info you require and I'll be happy to help!