@@ -20,9 +20,9 @@ into NumPy arrays and PyTorch tensors with minimal overhead.
2020By default, the reader uses the regular file API via
2121` parquet::ParquetFileReader ` . In most cases, this is the recommended choice.
2222
23- An alternative reader backend based on ** io_uring** is also available. It can
24- provide better performance, especially for very large datasets and when used
25- together with ** O_DIRECT** .
23+ An alternative reader backend based on ** io_uring** is also available. It may
24+ provide better performance for some workloads, particularly when used together
25+ with ** O_DIRECT** .
2626
2727To enable the alternative backend, set the ` JJ_READER_BACKEND ` environment
2828variable to one of the following values:
@@ -50,21 +50,21 @@ your workload is I/O-bound or memory-/CPU-bound.
5050
5151For datasets larger than the available page cache, performance is typically
5252I/O-bound. Enabling either ` pre_buffer=True ` or ` prefetch_page_cache=True `
53- brings throughput close to the raw I/O ceiling.
53+ brings throughput close to the raw I/O ceiling, but ` prefetch_page_cache `
54+ avoids the increased LLC miss rate caused by ` pre_buffer `
55+ (see [ Page cache prefetching] ( #page-cache-prefetching-with-prefetch_page_cache ) below).
5456
5557Recommended configuration:
5658
5759- ` use_threads = True ` , ` prefetch_page_cache = True ` , ` pre_buffer = False ` ,
5860 with the default reader backend.
5961
60- Both options reach near-identical throughput. ` prefetch_page_cache ` avoids the
61- temporary buffer copies that ` pre_buffer ` uses (see section below) and the
62- increased LLC miss rate.
63-
6462### Small datasets (fit in filesystem cache)
6563
6664For datasets that comfortably fit in RAM, performance is typically CPU- or
67- memory-bound.
65+ memory-bound. Using ` pre_buffer ` is not recommended because it leads to an
66+ increased LLC miss rate and suboptimal performance
67+ (see [ Page cache prefetching] ( #page-cache-prefetching-with-prefetch_page_cache ) below).
6868
6969Recommended configuration:
7070
@@ -73,6 +73,9 @@ Recommended configuration:
7373
7474### Pre-buffering and ` cache_options `
7575
76+ If you use ` pre_buffer=True ` instead of ` prefetch_page_cache ` , the following
77+ tuning applies.
78+
7679When ` pre_buffer=True ` , Arrow merges nearby column ranges and reads them into
7780temporary buffers. The default maximum merged range is 32 MB
7881(` range_size_limit ` ).
@@ -110,23 +113,24 @@ To debug allocator issues with mimalloc, run with `MIMALLOC_SHOW_STATS=1` and
110113### Pre-buffering and ` ARROW_IO_THREADS `
111114
112115When ` pre_buffer=True ` , Arrow dispatches reads to its IO thread pool,
113- configured via the ` ARROW_IO_THREADS ` environment variable (default: 8).
116+ configured via the ` ARROW_IO_THREADS ` environment variable (default: 8).
114117Tuning this value may improve performance.
115118
116119### Page cache prefetching with ` prefetch_page_cache `
117120
118- With ` pre_buffer=True ` , Arrow's IO thread pool allocates temporary buffers
119- and fills them on the IO thread's core. When worker threads on different
120- cores later consume those buffers, the data is cold in their caches,
121- causing LLC misses.
121+ The ` prefetch_page_cache ` option calls ` posix_fadvise(POSIX_FADV_WILLNEED) ` to tell
122+ the kernel to start loading the relevant byte ranges into the page cache.
123+ Each worker thread then reads directly via ` pread ` into its own
124+ locally-allocated buffer, keeping data hot in its local CPU caches.
125+
126+ This avoids the LLC (Last Level Cache) miss problem with ` pre_buffer=True ` ,
127+ where Arrow's IO thread pool fills temporary buffers on one core and worker
128+ threads on different cores later consume cold data.
122129
123- ` prefetch_page_cache ` provides an alternative: it calls
124- ` posix_fadvise(POSIX_FADV_WILLNEED) ` to tell the kernel to start loading
125- the relevant byte ranges into the page cache. Each worker thread then
126- reads directly via ` pread ` into its own locally-allocated buffer, keeping
127- data hot in its local CPU caches.
130+ This is only useful for local or network-mounted file systems that have a
131+ page cache. Remote file systems such as S3 will not benefit from this.
128132
129- Two ways to use it :
133+ There are two ways to enable page cache prefetching :
130134
131135** As a parameter on ` read_into_numpy ` :**
132136
@@ -141,11 +145,7 @@ jj.read_into_numpy(
141145)
142146```
143147
144- This is only useful for local or network-mounted file systems that have a
145- page cache. Remote file systems such as S3 will not benefit from this.
146-
147- ** As a standalone call** (when you want to prefetch ahead of time, e.g.
148- from a different thread):
148+ ** As a standalone call:**
149149
150150``` python
151151jj.prefetch_page_cache(
@@ -154,15 +154,27 @@ jj.prefetch_page_cache(
154154 row_group_indices = range (pr.metadata.num_row_groups),
155155 column_indices = range (pr.metadata.num_columns),
156156)
157+ ```
157158
158- jj.read_into_numpy(
159- source = path,
160- metadata = pr.metadata,
161- np_array = np_array,
162- row_group_indices = range (pr.metadata.num_row_groups),
163- column_indices = range (pr.metadata.num_columns),
164- pre_buffer = False ,
165- )
159+ Useful for sliding-window prefetching, where you prefetch the next files
160+ while processing the current one:
161+
162+ ``` python
163+ # Prime the pump
164+ for path in file_paths[:PREFETCH_DEPTH ]:
165+ jj.prefetch_page_cache(source = path, ... )
166+
167+ # Main loop
168+ for i, path in enumerate (file_paths):
169+ # Slide the window
170+ ahead_index = i + PREFETCH_DEPTH
171+ if ahead_index < len (file_paths):
172+ jj.prefetch_page_cache(source = file_paths[ahead_index], ... )
173+
174+ # Page cache should already be warm
175+ jj.read_into_numpy(source = path, np_array = np_array, ... )
176+
177+ process(np_array)
166178```
167179
168180## Requirements
@@ -373,7 +385,7 @@ jj.read_into_torch(
373385 tensor = tensor,
374386 row_group_indices = range (pr.metadata.num_row_groups),
375387 column_indices = range (pr.metadata.num_columns),
376- pre_buffer = True ,
388+ prefetch_page_cache = True ,
377389 use_threads = True ,
378390)
379391
0 commit comments