Skip to content

[Windows] JVM SDK fails for models > 2 GB — stat() in nativeCreateEngine / nativeCreateBenchmark hits MSVCRT 32-bit overflow #2000

@oliveskin

Description

@oliveskin

Title: [Windows] JVM SDK fails for models > 2 GB — stat() in nativeCreateEngine / nativeCreateBenchmark hits MSVCRT 32-bit overflow


Summary

com.google.ai.edge.litertlm:litertlm-jvm:0.10.2 cannot load any .litertlm bundle larger than 2 GB on Windows x86_64. Engine.initialize() (and Benchmark.create()) throws:

com.google.ai.edge.litertlm.LiteRtLmJniException: Model file not found: <absolute-path>

even though the file exists, is readable from Java, and has a valid LITERTLM magic. Root cause is a 32-bit stat() pre-check in the JNI layer — the rest of the loader (ModelAssets::Create) is fine.

Repro

  1. On Windows x86_64, build a Kotlin/JVM project that depends on com.google.ai.edge.litertlm:litertlm-jvm:0.10.2.
  2. Point EngineConfig.modelPath at a .litertlm file that is > 2,147,483,647 bytes (e.g. Gemma 3 4B at int8 / a 2.55 GB bundle).
  3. Call Engine(config).initialize().
  4. JNI throws LiteRtLmJniException: Model file not found: <path>.

With an identical call against a small (< 2 GB) .litertlm, the load succeeds.

Evidence that the file is fine

From the JVM side, immediately before handing the path to the native layer:

[diag] absolutePath   : C:\...\model.litertlm
[diag] canonicalPath  : C:\...\model.litertlm
[diag] exists         : true
[diag] isFile         : true
[diag] canRead        : true
[diag] length (bytes) : 2556231680
[diag] first-8 bytes  : 4C 49 54 45 52 54 4C 4D     ← "LITERTLM" magic

FileInputStream reads it cleanly. The fail is entirely inside the JNI stat() call — swapping the same file for a 1 MB dummy passes the pre-check and fails later at the magic-number check, confirming the Model file not found path is the pre-check specifically.

Root cause

kotlin/java/com/google/ai/edge/litertlm/jni/litertlm.cc, both in nativeCreateEngine (around L401) and nativeCreateBenchmark (around L576):

struct stat buffer;
if (stat(model_path_str.c_str(), &buffer) != 0) {
  ThrowLiteRtLmJniException(env, "Model file not found: " + model_path_str);
  ...
}

On Windows with the UCRT / legacy MSVCRT, struct stat uses a 32-bit st_size (i.e. _stat_stat32). For files larger than LONG_MAX bytes (≈ 2.147 GB), stat() returns -1 with errno = EOVERFLOW — even when the file is fully accessible via every other API. The Linux glibc / macOS equivalents of stat() resolve to a 64-bit-safe struct, so this only fails on Windows.

Downstream of the pre-check, ModelAssets::Create uses absl::Status with 64-bit-safe file APIs, so models above 2 GB load correctly once the pre-check is out of the way.

Proposed fix

Replace stat_stat64 (and struct statstruct _stat64) behind a Windows guard. Minimal diff:

#if defined(_WIN32)
  struct _stat64 buffer;
  if (_stat64(model_path_str.c_str(), &buffer) != 0) {
#else
  struct stat buffer;
  if (stat(model_path_str.c_str(), &buffer) != 0) {
#endif
    ThrowLiteRtLmJniException(env, "Model file not found: " + model_path_str);
    return 0;
  }

(Applied at both sites in litertlm.cc.)

Alternatively: skip the pre-check entirely on Windows and let ModelAssets::Create return its own error. The pre-check is redundant — it just produces a friendlier message.

Alternatively, for the whole project: build Windows objects with -D_FILE_OFFSET_BITS=64 and use the *64 family of syscall wrappers consistently. Cheapest change is the two-site _stat64 swap.

Environment

  • Windows 11 Home 10.0.26200
  • JDK 21 (Android Studio JBR)
  • com.google.ai.edge.litertlm:litertlm-jvm:0.10.2 (via Maven/Gradle)
  • Kotlin 2.3.0, Gradle 9.3.1
  • Model: 2,556,231,680-byte .litertlm bundle (Gemma-class, exported with the upstream convert_to_litertlm tool)

Workaround in the wild

We published a small binary patcher that rewrites test eax, eaxxor eax, eax at both pre-check sites (same byte count; always takes the downstream je), which is enough to unblock nativeCreateEngine / nativeCreateBenchmark for large models on Windows while this is pending:

It only modifies a user-supplied local copy of the DLL and refuses to patch anything it doesn't recognize. Hopefully obsolete the moment a fixed litertlm_jni.dll ships.

Happy to turn this into a PR if you're open to it.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions