G-Research · marcin-krystianc · Apr 30, 2026 · Apr 29, 2026 · Apr 29, 2026
diff --git a/README.md b/README.md
@@ -13,7 +13,7 @@ into NumPy arrays and PyTorch tensors with minimal overhead.
 ## Known limitations
 
 - Data must not contain null values
-- Destination NumPy arrays and PyTorch tensors must be column-major (Fortran-style) 
+- Destination NumPy arrays and PyTorch tensors must be column-major (Fortran-style)
 
 ## Selecting a reader backend
 
@@ -46,7 +46,7 @@ your workload is I/O-bound or memory-/CPU-bound.
 - Reusing NumPy arrays or PyTorch tensors avoids repeated memory allocation.
 - While allocation itself is fast, it can trigger kernel contention and degrade performance.
 
-### Large datasets (exceed filesystem cache)
+### Large datasets (exceeding filesystem cache)
 
 For datasets larger than the available page cache, performance is typically
 I/O-bound. Enabling either `pre_buffer=True` or `prefetch_page_cache=True`
@@ -59,7 +59,7 @@ Recommended configuration:
 - `use_threads = True`, `prefetch_page_cache = True`, `pre_buffer = False`,
   with the default reader backend.
 
-### Small datasets (fit in filesystem cache)
+### Small datasets (fitting in filesystem cache)
 
 For datasets that comfortably fit in RAM, performance is typically CPU- or
 memory-bound. Using `pre_buffer` is not recommended because it leads to an
@@ -85,6 +85,28 @@ threads on different cores later consume cold data.
 This is only useful for local or network-mounted file systems that have a
 page cache. Remote file systems such as S3 will not benefit from this.
 
+The Linux kernel's [`force_page_cache_ra`](https://github.com/torvalds/linux/blob/57b8e2d666a31fa201432d58f5fe3469a0dd83ba/mm/readahead.c#L353-L358)
+caps the number of pages read per `posix_fadvise` call to the block device's
+[readahead window](https://www.kernel.org/doc/html/v5.3/block/queue-sysfs.html#read-ahead-kb-rw).
+Any bytes beyond this cap are silently ignored. The readahead window is
+typically 128 KB or higher. Check the value for your device:
+
+```bash
+cat /sys/block/<device>/queue/read_ahead_kb
+```
+
+If `range_size_limit` exceeds this value, most of each coalesced range will
+not be prefetched. Set `range_size_limit` to match or stay below the device's
+`read_ahead_kb`:
+
+```python
+cache_options = pa.CacheOptions(
+    hole_size_limit=8192,
+    range_size_limit=128*1024,  # must not exceed read_ahead_kb
+    lazy=False,
+)
+```
+
 There are two ways to enable page cache prefetching:
 
 **As a parameter on `read_into_numpy`:**
@@ -97,6 +119,7 @@ jj.read_into_numpy(
     row_group_indices=range(pr.metadata.num_row_groups),
     column_indices=range(pr.metadata.num_columns),
     prefetch_page_cache=True,
+    cache_options=cache_options,
 )
 ```
 
@@ -108,11 +131,12 @@ jj.prefetch_page_cache(
     metadata=pr.metadata,
     row_group_indices=range(pr.metadata.num_row_groups),
     column_indices=range(pr.metadata.num_columns),
+    cache_options=cache_options,
 )
 ```
 
-Useful for sliding-window prefetching, where you prefetch the next files
-while processing the current one:
+The standalone call is useful for sliding-window prefetching, where you
+prefetch the next files while processing the current one:
 
 ```python
 # Prime the pump
@@ -330,10 +354,12 @@ np_array = np.zeros((n_rows, n_columns), dtype="f", order="F")
 pr = pq.ParquetReader()
 pr.open(path)
 
-# cache_options controls which byte ranges are prefetched into the page cache
+# cache_options controls which byte ranges are prefetched into the page cache.
+# range_size_limit should not exceed the device's read_ahead_kb,
+# because the kernel silently ignores readahead beyond that cap.
 cache_options = pa.CacheOptions(
     hole_size_limit=8192,
-    range_size_limit=16*1024*1024,
+    range_size_limit=128*1024,  # must not exceed read_ahead_kb
     lazy=False,
 )