Parallelize tokenizer and metadata loading to improve engine initialization latency. by copybara-service[bot] · Pull Request #1568 · google-ai-edge/LiteRT-LM

copybara-service · 2026-03-06T18:54:19Z

Parallelize tokenizer and metadata loading to improve engine initialization latency.

LiteRT-LM-PiperOrigin-RevId: 857335054

LiteRT-LM-PiperOrigin-RevId: 857350704

LiteRT-LM-PiperOrigin-RevId: 857357241

LiteRT-LM-PiperOrigin-RevId: 857423226

Follow the set_* pattern to make it scalable instead of adding all value to the single C function. LiteRT-LM-PiperOrigin-RevId: 857689534

LiteRT-LM-PiperOrigin-RevId: 858607999

LiteRT-LM-PiperOrigin-RevId: 858656280

LiteRT-LM-PiperOrigin-RevId: 858718908

LiteRT-LM-PiperOrigin-RevId: 858742888

LiteRT-LM-PiperOrigin-RevId: 858748258

LiteRT-LM-PiperOrigin-RevId: 858802033

LiteRT-LM-PiperOrigin-RevId: 858833664

LiteRT-LM-PiperOrigin-RevId: 859090887

Jetson doesn't have std::logf(). Use std::log(static_cast<float>()) instead. LiteRT-LM-PiperOrigin-RevId: 859148505

- Transpose mask in input handling for fixed gpu kernel size - Fix exported symbols from libLiteRt.so on linux_arm64 LiteRT-LM-PiperOrigin-RevId: 859164020

LiteRT-LM-PiperOrigin-RevId: 859247355

Currently, tools could be implemented with Kotlin functions only. It is easy to use but does not give developer the full control over the tools spec. For users who need fine-grained control over the tool description and execution, they can use the new `OpenApiTool` now. For examples how to implement it, see the `README.md`. LiteRT-LM-PiperOrigin-RevId: 859271963

- Last sampler shlibs have some regressions LiteRT-LM-PiperOrigin-RevId: 859350407

LiteRT-LM-PiperOrigin-RevId: 859642028

When context length is big, it increase init time noticeably, e.g. 8s when context length = 32k for gemma3-1b. LiteRT-LM-PiperOrigin-RevId: 859666933

LiteRT-LM-PiperOrigin-RevId: 859770885

LiteRT-LM-PiperOrigin-RevId: 859859182

LiteRT-LM-PiperOrigin-RevId: 859875851

LiteRT-LM-PiperOrigin-RevId: 859999259

LiteRT-LM-PiperOrigin-RevId: 860120569

The given option is true by default and used to set the mmap'ed memory for shared weights are swapped out to reduce memory footprint. When memory is swapped out, all the temporary changes made by magic numbers are reverted. So, when magic numbers are used, the give flags must be disabled. LiteRT-LM-PiperOrigin-RevId: 860150634

LiteRT-LM-PiperOrigin-RevId: 860170788

LiteRT-LM-PiperOrigin-RevId: 860216677

LiteRT-LM-PiperOrigin-RevId: 860278850

We will remove the `--hk_token`. The environment is the only way to set the token. LiteRT-LM-PiperOrigin-RevId: 860319701

LiteRT-LM-PiperOrigin-RevId: 878014141

LiteRT-LM-PiperOrigin-RevId: 878052506

LiteRT-LM-PiperOrigin-RevId: 878095579

LiteRT-LM-PiperOrigin-RevId: 878143704

LiteRT-LM-PiperOrigin-RevId: 878182837

LiteRT-LM-PiperOrigin-RevId: 878195871

Support end of vision tflite model in executors LiteRT-LM-PiperOrigin-RevId: 878252087

Integrate model downloading into macOS, Windows, and Linux CI workflows using a Gemma 3 1B IT model from Hugging Face. Introduce a cross-platform Pytest framework to execute litert_lm_main with a single prompt, serving as an E2E smoke test to verify basic inference functionality and expected output. LiteRT-LM-PiperOrigin-RevId: 878581563

LiteRT-LM-PiperOrigin-RevId: 878609747

LiteRT-LM-PiperOrigin-RevId: 878717839

LiteRT-LM-PiperOrigin-RevId: 878816159

LiteRT-LM-PiperOrigin-RevId: 878885676

LiteRT-LM-PiperOrigin-RevId: 879067681

LiteRT-LM-PiperOrigin-RevId: 879146688

LiteRT-LM-PiperOrigin-RevId: 879202010

LiteRT-LM-PiperOrigin-RevId: 879221034

LiteRT-LM-PiperOrigin-RevId: 879268560

LiteRT-LM-PiperOrigin-RevId: 879452695

…y issue on specific GPUs LiteRT-LM-PiperOrigin-RevId: 879665447

This change refactors the `Backend` API in `Config.kt` from an enum to a sealed class to support backend-specific configurations directly within the backend definition. **Key changes include:** * **Backend Sealed Class:** `Backend` is now a sealed class with three variants: `CPU(val numThreads: Int)`, `NPU()`, and `GPU()`. * **JNI Configuration:** Updated `LiteRtLmJni.nativeCreateEngine` and `litertlm.cc` to accept and process the `num_threads` parameter, which is mapped to `number_of_threads` in the C++ `CpuConfig`. * **Tests & Examples Migration:** Updated all test cases (e.g., `SessionTest`, `DeviceTest`, `BaseDeviceTest`) and example scripts (`Main.kt`, `ToolMain.kt`, `BenchmarkMain.kt`) to instantiate the new `Backend.CPU()` and `Backend.NPU(...)` data classes. LiteRT-LM-PiperOrigin-RevId: 879674985

LiteRT-LM-PiperOrigin-RevId: 879706615

LiteRT-LM-PiperOrigin-RevId: 879824143

This change refactors the LiteRT LM Python API to provide a more idiomatic and simplified interface for users. Key changes include: - Abstract Base Classes (ABC): Introduced AbstractEngine and AbstractConversation in a new interfaces.py module. The C++ implementation classes are registered as virtual subclasses, allowing for proper inheritance checks (e.g., isinstance(engine, AbstractEngine)) across the C++/Python boundary. - Simplified Engine & Conversation Lifecycle: - Users can now directly instantiate litert_lm.Engine with configuration parameters, which internally handles ModelAssets and EngineSettings. - Added engine.create_conversation() to eliminate the need for manual ConversationConfig and Conversation.create calls. - send_message and send_message_async now support both str and dict inputs. String inputs are automatically wrapped into a user-role message. - Moved the Backend enum to Python (interfaces.py). - Internal Refactoring: Updated C++ bindings to support the new factory methods and simplified logging initialization using a mapping-based approach. - Enhanced Testing: Updated existing tests to the new API and added new test cases for ABC inheritance, simplified conversation creation, and string input support. LiteRT-LM-PiperOrigin-RevId: 879841369

This change moves the NPU native library directory configuration from the global `ExperimentalFlags` to the `Backend.NPU` class. This enables each NPU backend (main, vision, and audio) to specify its own native library directory, which is essential for multi-modal models that may utilize different NPU backends or separate library instances. **Key changes**: - Updated `Backend.NPU` in `Config.kt` to a data class with a `nativeLibraryDir` property. - Deprecated `ExperimentalFlags.npuLibrariesDir` while maintaining it as a fallback for backward compatibility. - Updated the JNI layer (`nativeCreateEngine` in `litertlm.cc` and `LiteRtLmJni.kt`) to support separate native library directories for main, vision, and audio executors. - Adjusted `Engine.kt` initialization logic to resolve the native library directory per backend, prioritizing the value in `Backend.NPU` over the global experimental flag. - Updated `BaseDeviceTest.kt` to utilize the new configuration pattern. LiteRT-LM-PiperOrigin-RevId: 879851457

LiteRT-LM-PiperOrigin-RevId: 879871562

…zation latency. LiteRT-LM-PiperOrigin-RevId: 879423683

ai-edge-bot and others added 30 commits January 16, 2026 16:00

Internal change

a2ae236

LiteRT-LM-PiperOrigin-RevId: 857335054

Internal change

81118b0

LiteRT-LM-PiperOrigin-RevId: 857350704

Refactor JSON tool parser to use Rust implementation.

6dd8689

LiteRT-LM-PiperOrigin-RevId: 857357241

Internal changes and clean up.

3a9748f

LiteRT-LM-PiperOrigin-RevId: 857423226

[litertlm] refactor session config creation in C API

b4e545a

Follow the set_* pattern to make it scalable instead of adding all value to the single C function. LiteRT-LM-PiperOrigin-RevId: 857689534

Introduce EngineFactory for dynamic Engine creation.

87e1b75

LiteRT-LM-PiperOrigin-RevId: 858607999

Internal change and clean up.

bc074f0

LiteRT-LM-PiperOrigin-RevId: 858656280

internal change for chrome

1bcef46

LiteRT-LM-PiperOrigin-RevId: 858718908

Internal change

67cc150

LiteRT-LM-PiperOrigin-RevId: 858742888

Decouple sampler data type from global precision.

f57b38f

LiteRT-LM-PiperOrigin-RevId: 858748258

Add GetTokens() to Tokenizer interface.

bcce044

LiteRT-LM-PiperOrigin-RevId: 858802033

Internal change and clean up.

687be6d

LiteRT-LM-PiperOrigin-RevId: 858833664

Internal cleanup and changes.

7bc9ba7

LiteRT-LM-PiperOrigin-RevId: 859090887

Build fix on linux_arm64

f8217b3

Jetson doesn't have std::logf(). Use std::log(static_cast<float>()) instead. LiteRT-LM-PiperOrigin-RevId: 859148505

Update dependencies of litert_lm

004c953

- Transpose mask in input handling for fixed gpu kernel size - Fix exported symbols from libLiteRt.so on linux_arm64 LiteRT-LM-PiperOrigin-RevId: 859164020

Reverts e70e2c2b8d13a63b7d1c13b8720f3d1203b492ee

ed1a994

LiteRT-LM-PiperOrigin-RevId: 859247355

Update dependencies of litert_lm

90948a6

- Last sampler shlibs have some regressions LiteRT-LM-PiperOrigin-RevId: 859350407

Fix dangling reference issue.

8121970

LiteRT-LM-PiperOrigin-RevId: 859642028

Don't clear KV cache before prefill if requested.

8523260

When context length is big, it increase init time noticeably, e.g. 8s when context length = 32k for gemma3-1b. LiteRT-LM-PiperOrigin-RevId: 859666933

Add a flag to en/disable sampler to handle decode input tensors

f55d3f2

LiteRT-LM-PiperOrigin-RevId: 859770885

Internal change

af91ed5

LiteRT-LM-PiperOrigin-RevId: 859859182

[litertlm] Add a Kotlin example of tool calling

246ae4a

LiteRT-LM-PiperOrigin-RevId: 859875851

Internal change

9d3abe7

LiteRT-LM-PiperOrigin-RevId: 859999259

Internal changes and clean up.

c768b8b

LiteRT-LM-PiperOrigin-RevId: 860120569

Change default backend to CPU in litert_lm_advanced_main.cc.

30e299c

LiteRT-LM-PiperOrigin-RevId: 860170788

Implement LlgConstraintProvider and ExternalConstraintProvider.

bfb0743

LiteRT-LM-PiperOrigin-RevId: 860216677

Introduce LlmContext, configs and state definitions.

9a1fef3

LiteRT-LM-PiperOrigin-RevId: 860278850

[litertlm] Update the instruction of setting hugging face token

322b5c1

We will remove the `--hk_token`. The environment is the only way to set the token. LiteRT-LM-PiperOrigin-RevId: 860319701

hheydary and others added 25 commits March 3, 2026 10:28

Internal change

bb0935c

LiteRT-LM-PiperOrigin-RevId: 878014141

Add build flag to exclude gemma_model_constraint_provider

e083d22

LiteRT-LM-PiperOrigin-RevId: 878052506

Update dependencies of litert_lm

454b7cd

LiteRT-LM-PiperOrigin-RevId: 878095579

Resolve duplicate stb_image define issue

8f122d8

LiteRT-LM-PiperOrigin-RevId: 878143704

LLGuidance FC tool call constraints now enforce function parameters.

d717864

LiteRT-LM-PiperOrigin-RevId: 878182837

Refactor LiteRT LM C API BUILD to support CPU-only engine.

ff99850

LiteRT-LM-PiperOrigin-RevId: 878195871

Support map of TensorBuffers input for vision executors

8978ec1

Support end of vision tflite model in executors LiteRT-LM-PiperOrigin-RevId: 878252087

Add support for an optional MTP drafter model.

413a765

LiteRT-LM-PiperOrigin-RevId: 878609747

Fix for some race-conditions.

c77e905

LiteRT-LM-PiperOrigin-RevId: 878717839

Move the KV cache double buffer creation to the right GPU only scope

86b401e

LiteRT-LM-PiperOrigin-RevId: 878816159

Automated Code Change

6ab32d9

LiteRT-LM-PiperOrigin-RevId: 878885676

Allow to override the default cache directory location.

34f9d0d

LiteRT-LM-PiperOrigin-RevId: 879067681

Configure logits output dtype based on tensor buffer element type.

17f0cb4

LiteRT-LM-PiperOrigin-RevId: 879146688

Internal update only

bc8aa09

LiteRT-LM-PiperOrigin-RevId: 879202010

Update LiteRT clients not to use deprecated Options APIs

c32d78b

LiteRT-LM-PiperOrigin-RevId: 879221034

Fix param tensor step position.

1fec910

LiteRT-LM-PiperOrigin-RevId: 879268560

Update dependencies of litert_lm

e4d7f5e

LiteRT-LM-PiperOrigin-RevId: 879452695

[LiteRT-LM] Force sync at Open CL invoke call to work around a qualit…

cc0e318

…y issue on specific GPUs LiteRT-LM-PiperOrigin-RevId: 879665447

Internal change

5bfb40d

LiteRT-LM-PiperOrigin-RevId: 879706615

[litertlm] update lit release links

077fd48

LiteRT-LM-PiperOrigin-RevId: 879824143

Update the measurement of init total time

a6120c9

LiteRT-LM-PiperOrigin-RevId: 879871562

copybara-service Bot force-pushed the litert_lm_pr_879423683 branch from 4dacb83 to 4dd4c65 Compare March 7, 2026 01:50

Parallelize tokenizer and metadata loading to improve engine initiali…

95c303f

…zation latency. LiteRT-LM-PiperOrigin-RevId: 879423683

copybara-service Bot force-pushed the litert_lm_pr_879423683 branch from 4dd4c65 to 9f0fc73 Compare March 7, 2026 02:37

TomOtero1984 closed this Mar 9, 2026

TomOtero1984 force-pushed the litert_lm_pr_879423683 branch from 9f0fc73 to 95c303f Compare March 9, 2026 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize tokenizer and metadata loading to improve engine initialization latency.#1568

Parallelize tokenizer and metadata loading to improve engine initialization latency.#1568
copybara-service[bot] wants to merge 1165 commits intomainfrom
litert_lm_pr_879423683

copybara-service Bot commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants