hotvect

A feature engineering and ML serving library for machine learning applications, especially for personalization and recommendation systems.

Hotvect allows you to:

Develop feature engineering code that can be shared across offline and online environments.
Integrate machine learning libraries like CatBoost and TensorFlow into ML applications.
Define ML-enabled models and policies, packaging them into reusable, modular forms that can easily be shared, combined, and deployed into production.
Perform offline testing and hyperparameter optimization of models and policies, with built-in bookkeeping of test results.
Integrate with Amazon SageMaker for running offline tests and hyperparameter optimization at scale.

The same data transformation code will be used for training and prediction, ensuring consistency without discrepancies.

Hotvect has characteristics that work well with typical machine learning use cases:

Out-of-core: Processing happens without reading all data into memory, allowing processing of large datasets.
Multi-threaded: Processing is multi-threaded, reducing processing time.
Efficient: The library is coded with efficiency in mind. Using the JVM makes it easy to write efficient feature transformations.

Feature interaction is natively supported, although exploring interaction features requires a separate step.

Getting Started

1. Install dependencies

Java 21
Maven
Python 3.11+
uv

2. Build and install Hotvect (source checkout)

cd python
make init
source .venv/bin/activate

For faster iteration while developing locally (skips Java tests):

cd python
make quick

3. Configure local paths (optional, but recommended)

Hotvect uses ~/.hotvect/config.json to find default base directories.

hv-ext config init

4. Use the CLIs

hv --help
hv-ext --help

Two local HTTP debugging surfaces are available:

# Full algorithm HTTP server (Java runtime, including feature extraction)
hv serve --algorithm-jar /path/to/algo.jar --algorithm-name my-algo --parameter-path /path/to/params.zip --port 8080

# Same server, with browser UI enabled
hv serve --ui --algorithm-jar /path/to/algo.jar --algorithm-name my-algo --parameter-path /path/to/params.zip --source-path /path/to/examples --port 8080

# Worker-only HTTP server (LitServe-backed worker runtime, expects worker-ready feature rows)
# Native LitServe `/predict` and compatibility `/v2/...` endpoints are both exposed.
# Use --algorithm-override when the algorithm definition does not already declare backend
# and a scoped litserve block.
hv worker serve --algorithm-jar /path/to/algo.jar --algorithm-name my-model --parameter-path /path/to/params.zip --port 8081 --algorithm-override worker-http-override.json

Example: download a specific algorithm version's SageMaker backtest results (regex filters):

hv-ext results download \
  "s3://example-bucket/temp/<user>/sagemaker_output/" \
  --dest-base-dir "./results" \
  --from-date "2026-02-15" --to-date "2026-02-15" \
  --algorithm-name-regex "example-algorithm" \
  --algorithm-version-regex "74\\.4\\..*" \
  --job-name-regex "ml-exp-.*" \
  --include-metadata

5. (Optional) Use the docs MCP server

Hotvect ships a read-only MCP server for documentation search and retrieval over stdio (NDJSON JSON-RPC).

If you installed Hotvect, run:

hv-mcp --help

From a source checkout, run it using the repo venv Python:

"$(pwd)/python/.venv/bin/python" "$(pwd)/python/bin/hv-mcp" --help

See python/hotvect/mcp/bundled_docs/docs/guides/docs-mcp/index.md for details and Codex integration.

What does it not provide?

Hotvect does not include:

Machine learning algorithms themselves - It is intended to be combined with existing ML libraries.
Orchestration of ML pipelines - Requires other frameworks like Airflow.
Life-cycle management of models and policies - Supported by an external Experiment Management Service.
Creation, management, and execution of online experiments - Provided by the Experiment Management Service.
Monitoring of ML applications and evaluation of online experiment results - Requires separate solutions.

Notes

Hotvect is designed to be library-agnostic:

Feature engineering runs in the JVM (Java/Kotlin/Scala), so offline and online feature computation can share the same code.
Model inference can run in the JVM (e.g. JNI, or pure-Java implementations like H2O.ai's xgboost-predictor), or in a separate Python process via direct workers (v10+).

Feature engineering should be implemented in a JVM language (e.g., Java, Kotlin, Scala), while APIs for triggering tasks like offline testing are provided as a Python library.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
cdp-certs		cdp-certs
demo-ui-e2e		demo-ui-e2e
hotvect-algorithm-demo		hotvect-algorithm-demo
hotvect-api		hotvect-api
hotvect-catboost		hotvect-catboost
hotvect-core		hotvect-core
hotvect-integration-test		hotvect-integration-test
hotvect-offline-util		hotvect-offline-util
hotvect-online-util		hotvect-online-util
hotvect-processor		hotvect-processor
hotvect-python		hotvect-python
hotvect-tensorflow		hotvect-tensorflow
python		python
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.zappr.yaml		.zappr.yaml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Dockerfile.tensorflow		Dockerfile.tensorflow
Dockerfile.torch		Dockerfile.torch
Dockerfile.torch-tensorflow		Dockerfile.torch-tensorflow
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
algorithm-demo-plan.md		algorithm-demo-plan.md
delivery.yaml		delivery.yaml
mkdocs.yml		mkdocs.yml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hotvect

Getting Started

1. Install dependencies

2. Build and install Hotvect (source checkout)

3. Configure local paths (optional, but recommended)

4. Use the CLIs

5. (Optional) Use the docs MCP server

What does it not provide?

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

hotvect

Getting Started

1. Install dependencies

2. Build and install Hotvect (source checkout)

3. Configure local paths (optional, but recommended)

4. Use the CLIs

5. (Optional) Use the docs MCP server

What does it not provide?

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages