Getting Started with Marin

In this tutorial, you will install Marin on your local machine.

Prerequisites

Before you begin, ensure you have the following installed:

Python 3.11 or higher
uv (Python package manager)
Git
Rust toolchain via rustup (only needed for source builds of Rust crates; see Rust Crates section below)
- Recommended: rustup toolchain install 1.91.0 && rustup default 1.91.0 (matches the Docker pin)
- If you hit an edition2024 error from Cargo (e.g., when building Arrow), use nightly: rustup default nightly
On macOS, install additional build tools for SentencePiece: brew install cmake pkg-config coreutils

In addition, you might find it useful to have the following accounts:

GitHub for submitting pull requests or speedruns
Weights & Biases for experiment tracking
Hugging Face for accessing gated models/tokenizers (such as Meta's Llama 3.1 8B model)

This document focuses on basic setup and usage of Marin. If you're on a GPU, see Local GPU Setup for a GPU-specific walkthrough for getting started. If you want to set up a TPU cluster, see TPU Setup.

Installation

Clone the repository (~10s):

git clone https://github.com/marin-community/marin.git
cd marin

Create and activate a virtual environment (~0s):

uv venv --python 3.11
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install the package and dependencies (5-10m, mostly building packages from source):

Use uv sync to install dependencies and the local Marin package (editable) in one step:
```
# Resolve and install dependencies + local package (editable)
uv sync --all-packages
```

Setup Weights and Biases (WandB) so you can monitor your runs:

export WANDB_API_KEY=...  # Get this from https://wandb.ai/authorize

You can also set WANDB_ENTITY and WANDB_PROJECT.

Setup the Hugging Face CLI so you can use gated models/tokenizers (such as Meta's Llama 3.1 8B model):
```
export HF_TOKEN=...  # Get this from https://huggingface.co/settings/tokens
```
Define the path to where all artifacts generated during execution will be stored (e.g., local_store):
```
export MARIN_PREFIX=...
```

For example, training checkpoints usually will be written to ${MARIN_PREFIX}/checkpoints/. You can set this to an fsspec-recognizable path (e.g., a GCS bucket) or a directory on your machine. See Understanding MARIN_PREFIX and --prefix for details.

You might find it convenient to store WANDB_API_KEY and HF_TOKEN and MARIN_PREFIX in an .env file, which you can load in one go with source .env.

Hardware-specific Setup

Marin runs on multiple types of hardware (CPU, GPU, TPU).

!!! info "Install marin for different accelerators"

Marin requires different JAX installations depending on your hardware accelerator. These installation options are defined in our `pyproject.toml` file and will install the appropriate JAX version for your hardware.

=== "CPU"
    ```bash
    # Install CPU-specific dependencies (local package included)
    uv sync --all-packages --extra=cpu
    ```

=== "GPU"
     If you are working on GPUs you'll need to set up your system first by installing the appropriate CUDA version. In Marin, we default to 12.9.0:
     ```bash
     wget https://developer.download.nvidia.com/compute/cuda/12.9.0/local_installers/cuda_12.9.0_575.51.03_linux.run
     sudo sh cuda_12.9.0_575.51.03_linux.run
     ```
     Now we'll need to install cuDNN, instructions from [NVIDIA docs](https://developer.nvidia.com/cudnn-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=24.04&target_type=deb_local), via following:
     ```bash
     wget https://developer.download.nvidia.com/compute/cudnn/9.10.0/local_installers/cudnn-local-repo-ubuntu2404-9.10.0_1.0-1_amd64.deb
     sudo dpkg -i cudnn-local-repo-ubuntu2404-9.10.0_1.0-1_amd64.deb
     sudo cp /var/cudnn-local-repo-ubuntu2404-9.10.0/cudnn-*-keyring.gpg /usr/share/keyrings/
     sudo apt-get update
     sudo apt-get -y install cudnn
     sudo apt-get -y install cudnn-cuda-12
     ```
     Once system is setup you can verify it via:
     ```bash
     nvcc --version
     ```
     Finally install Python deps for GPU setup:

     ```bash
     # Install GPU-specific dependencies (local package included)
     uv sync --all-packages --extra=gpu
     ```

=== "TPU"

    ```bash
    # Install TPU-specific dependencies
    uv sync --all-packages --extra=tpu
    ```

Rust Crates (dupekit)

Marin includes Rust crates (e.g., dupekit) that are installed as pre-built wheels by default — no Rust toolchain needed. uv sync fetches wheels from GitHub Releases automatically.

To switch to source builds (requires Cargo), use the Makefile targets:

# Check current mode and Cargo availability
make rust-status

# Switch to dev mode: modifies pyproject.toml to build from source (requires Cargo)
make rust-dev

# Switch back to user mode: reverts pyproject.toml to pre-built wheels (no Cargo needed)
make rust-user

!!! warning make rust-dev modifies pyproject.toml to add a local path source for dupekit. Do not commit pyproject.toml while in dev mode — CI will reject it. Run make rust-user before committing.

Trying it Out

To check that your installation worked, you can go to the First Experiment tutorial, where you train a tiny language model on TinyStories on your CPU. For a sneak preview, simply run:

wandb offline  # Disable WandB logging
uv run experiments/tutorials/train_tiny_model_cpu.py

This will:

Download and tokenize the TinyStories dataset to ${MARIN_PREFIX}/
Train a tiny language model
Save the model checkpoint to ${MARIN_PREFIX}/

Next Steps

Now that you have Marin set up and running, you can either continue with the next hands-on tutorial or read more about how Marin is designed for building language models.

Follow our First Experiment tutorial to run a training experiment.
Read our Language Modeling Pipeline to understand Marin's approach to language models.
Submit a speedrun to the Marin speedrun leaderboard.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Started with Marin

Prerequisites

Installation

Hardware-specific Setup

Rust Crates (dupekit)

Trying it Out

Next Steps

FilesExpand file tree

installation.md

Latest commit

History

installation.md

File metadata and controls

Getting Started with Marin

Prerequisites

Installation

Hardware-specific Setup

Rust Crates (dupekit)

Trying it Out

Next Steps