Skip to content

Parallelize tokenizer and metadata loading to improve engine initialization latency.#1568

Closed
copybara-service[bot] wants to merge 1165 commits intomainfrom
litert_lm_pr_879423683
Closed

Parallelize tokenizer and metadata loading to improve engine initialization latency.#1568
copybara-service[bot] wants to merge 1165 commits intomainfrom
litert_lm_pr_879423683

Conversation

@copybara-service
Copy link
Copy Markdown
Contributor

Parallelize tokenizer and metadata loading to improve engine initialization latency.

ai-edge-bot and others added 30 commits January 16, 2026 16:00
LiteRT-LM-PiperOrigin-RevId: 857335054
LiteRT-LM-PiperOrigin-RevId: 857350704
LiteRT-LM-PiperOrigin-RevId: 857357241
LiteRT-LM-PiperOrigin-RevId: 857423226
Follow the set_* pattern to make it scalable instead of adding all value to the single C function.

LiteRT-LM-PiperOrigin-RevId: 857689534
LiteRT-LM-PiperOrigin-RevId: 858607999
LiteRT-LM-PiperOrigin-RevId: 858656280
LiteRT-LM-PiperOrigin-RevId: 858718908
LiteRT-LM-PiperOrigin-RevId: 858742888
LiteRT-LM-PiperOrigin-RevId: 858748258
LiteRT-LM-PiperOrigin-RevId: 858802033
LiteRT-LM-PiperOrigin-RevId: 858833664
LiteRT-LM-PiperOrigin-RevId: 859090887
Jetson doesn't have std::logf().
Use std::log(static_cast<float>()) instead.

LiteRT-LM-PiperOrigin-RevId: 859148505
- Transpose mask in input handling for fixed gpu kernel size
- Fix exported symbols from libLiteRt.so on linux_arm64

LiteRT-LM-PiperOrigin-RevId: 859164020
LiteRT-LM-PiperOrigin-RevId: 859247355
Currently, tools could be implemented with Kotlin functions only. It is easy to use but does not give developer the full control over the tools spec.

For users who need fine-grained control over the tool description and execution, they can use the new `OpenApiTool` now.

For examples how to implement it, see the `README.md`.

LiteRT-LM-PiperOrigin-RevId: 859271963
- Last sampler shlibs have some regressions

LiteRT-LM-PiperOrigin-RevId: 859350407
LiteRT-LM-PiperOrigin-RevId: 859642028
When context length is big, it increase init time noticeably, e.g. 8s when context length = 32k for gemma3-1b.

LiteRT-LM-PiperOrigin-RevId: 859666933
LiteRT-LM-PiperOrigin-RevId: 859859182
LiteRT-LM-PiperOrigin-RevId: 859875851
LiteRT-LM-PiperOrigin-RevId: 859999259
LiteRT-LM-PiperOrigin-RevId: 860120569
The given option is true by default and used to set the mmap'ed memory for
shared weights are swapped out to reduce memory footprint.
When memory is swapped out, all the temporary changes made by magic numbers
are reverted. So, when magic numbers are used, the give flags must be disabled.

LiteRT-LM-PiperOrigin-RevId: 860150634
LiteRT-LM-PiperOrigin-RevId: 860278850
We will remove the `--hk_token`. The environment is the only way to set the token.

LiteRT-LM-PiperOrigin-RevId: 860319701
hheydary and others added 25 commits March 3, 2026 10:28
LiteRT-LM-PiperOrigin-RevId: 878014141
LiteRT-LM-PiperOrigin-RevId: 878052506
LiteRT-LM-PiperOrigin-RevId: 878095579
LiteRT-LM-PiperOrigin-RevId: 878143704
Support end of vision tflite model in executors

LiteRT-LM-PiperOrigin-RevId: 878252087
Integrate model downloading into macOS, Windows, and Linux CI workflows using a Gemma 3 1B IT model from Hugging Face. Introduce a cross-platform Pytest framework to execute litert_lm_main with a single prompt, serving as an E2E smoke test to verify basic inference functionality and expected output.

LiteRT-LM-PiperOrigin-RevId: 878581563
LiteRT-LM-PiperOrigin-RevId: 878609747
LiteRT-LM-PiperOrigin-RevId: 878717839
LiteRT-LM-PiperOrigin-RevId: 878885676
LiteRT-LM-PiperOrigin-RevId: 879067681
LiteRT-LM-PiperOrigin-RevId: 879202010
LiteRT-LM-PiperOrigin-RevId: 879221034
LiteRT-LM-PiperOrigin-RevId: 879268560
LiteRT-LM-PiperOrigin-RevId: 879452695
…y issue on specific GPUs

LiteRT-LM-PiperOrigin-RevId: 879665447
This change refactors the `Backend` API in `Config.kt` from an enum
to a sealed class to support backend-specific configurations directly
within the backend definition.

**Key changes include:**
*   **Backend Sealed Class:** `Backend` is now a sealed class with three
    variants: `CPU(val numThreads: Int)`, `NPU()`,
    and `GPU()`.
*   **JNI Configuration:** Updated `LiteRtLmJni.nativeCreateEngine` and
    `litertlm.cc` to accept and process the `num_threads` parameter, which
    is mapped to `number_of_threads` in the C++ `CpuConfig`.
*   **Tests & Examples Migration:** Updated all test cases (e.g.,
    `SessionTest`, `DeviceTest`, `BaseDeviceTest`) and example scripts
    (`Main.kt`, `ToolMain.kt`, `BenchmarkMain.kt`) to instantiate the new
    `Backend.CPU()` and `Backend.NPU(...)` data classes.

LiteRT-LM-PiperOrigin-RevId: 879674985
LiteRT-LM-PiperOrigin-RevId: 879706615
LiteRT-LM-PiperOrigin-RevId: 879824143
This change refactors the LiteRT LM Python API to provide a more idiomatic and
simplified interface for users. Key changes include:

 - Abstract Base Classes (ABC): Introduced AbstractEngine and AbstractConversation
   in a new interfaces.py module. The C++ implementation classes are registered as
   virtual subclasses, allowing for proper inheritance checks (e.g.,
   isinstance(engine, AbstractEngine)) across the C++/Python boundary.
 - Simplified Engine & Conversation Lifecycle:
     - Users can now directly instantiate litert_lm.Engine with configuration
       parameters, which internally handles ModelAssets and EngineSettings.
     - Added engine.create_conversation() to eliminate the need for manual
       ConversationConfig and Conversation.create calls.
 - send_message and send_message_async now support both str and dict inputs. String inputs are automatically wrapped into a user-role message.
 - Moved the Backend enum to Python (interfaces.py).
 - Internal Refactoring: Updated C++ bindings to support the new factory methods and
   simplified logging initialization using a mapping-based approach.
 - Enhanced Testing: Updated existing tests to the new API and added new test cases
   for ABC inheritance, simplified conversation creation, and string input support.

LiteRT-LM-PiperOrigin-RevId: 879841369
This change moves the NPU native library directory configuration from the global `ExperimentalFlags` to the `Backend.NPU` class. This enables each NPU backend (main, vision, and audio) to specify its own native library directory, which is
essential for multi-modal models that may utilize different NPU backends or separate library instances.

**Key changes**:
- Updated `Backend.NPU` in `Config.kt` to a data class with a `nativeLibraryDir` property.
- Deprecated `ExperimentalFlags.npuLibrariesDir` while maintaining it as a fallback for backward compatibility.
- Updated the JNI layer (`nativeCreateEngine` in `litertlm.cc` and `LiteRtLmJni.kt`) to support separate native library directories for main, vision, and audio executors.
- Adjusted `Engine.kt` initialization logic to resolve the native library directory per backend, prioritizing the value in `Backend.NPU` over the global experimental flag.
- Updated `BaseDeviceTest.kt` to utilize the new configuration pattern.

LiteRT-LM-PiperOrigin-RevId: 879851457
LiteRT-LM-PiperOrigin-RevId: 879871562
@copybara-service copybara-service Bot force-pushed the litert_lm_pr_879423683 branch from 4dacb83 to 4dd4c65 Compare March 7, 2026 01:50
…zation latency.

LiteRT-LM-PiperOrigin-RevId: 879423683
@copybara-service copybara-service Bot force-pushed the litert_lm_pr_879423683 branch from 4dd4c65 to 9f0fc73 Compare March 7, 2026 02:37
@TomOtero1984 TomOtero1984 force-pushed the litert_lm_pr_879423683 branch from 9f0fc73 to 95c303f Compare March 9, 2026 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.