Preparing a Storage Bucket for Marin and Levanter

Many Marin and Levanter workflows expect a durable object store for checkpoints, dataset shards, logs, and executor outputs. This tutorial walks through setting up a Google Cloud Storage (GCS) bucket that you can reference via MARIN_PREFIX or trainer.checkpointer.base_path.

When You Need This

Running local GPU or TPU experiments that write checkpoints to gs://... paths.
Launching TPU jobs with scripts/ray/cluster.py or Ray clusters, where every worker streams artifacts to a shared prefix.
Hosting tokenized datasets or compilation caches that multiple jobs should reuse.

If you only run experiments locally with local_store/ you can skip this, but migrating to GCS early prevents churn later.

Step 1: Choose a Region and Name

Pick a region that matches your compute (e.g., us-central2 for v4/v5e TPUs or us-west4 for west-coast GPUs). Using the same region keeps egress costs low and improves throughput. Bucket names are global, so choose something descriptive like gs://marin-<team>-us-central2.

For the storage class, decide between:

Standard: Lowest latency and predictable performance; slightly higher cost but ideal if training jobs read/write checkpoints frequently.
Autoclass: Google automatically moves objects to colder tiers if they sit idle, which can cut storage costs but occasionally delays reads when objects are thawed. Use this if you mostly archive checkpoints and don't mind rare rehydration pauses.

Marin will attempt to prevent cross-region egress by raising an error in training jobs that write to a different region than the compute, but it's best to avoid that situation entirely.

!!! warning Avoid multi-region buckets (e.g., us or us-west) because they incur higher costs and have more complex performance characteristics. Single-region buckets are cheaper and more predictable for Marin workloads.

Step 2: Create the Bucket

PROJECT_ID=your-gcp-project
BUCKET=gs://marin-yourteam-us-central2
REGION=us-central2

# Create the bucket with uniform access and no public exposure.
gcloud storage buckets create "$BUCKET" \
  --project "$PROJECT_ID" \
  --location "$REGION" \
  --uniform-bucket-level-access \
  --default-storage-class=STANDARD  # add --enable-autoclass to enable automated tiering when you can tolerate slower cold reads

# Grant yourself (or a service account) Storage Admin if needed.
gcloud storage buckets add-iam-policy-binding "$BUCKET" \
  --member="user:you@example.com" \
  --role="roles/storage.objectAdmin"

Uniform bucket-level access ensures IAM policies apply consistently; keep the bucket private unless you intentionally publish checkpoints.

Step 3: Disable Soft Delete

!!! warning Disabling soft delete is critical to avoid runaway storage costs. Marin creates many large, short-lived files that should be deleted immediately. Of course, disabling soft delete means you cannot recover deleted files, so consider implementing lifecycle rules or replication for backups if needed.

GCS enables soft delete by default on new buckets. That feature retains deleted objects for at least seven days, which quickly explodes storage usage for Marin/Levanter workloads because training jobs constantly create and remove multi-gigabyte checkpoints and compilation caches. Disable soft delete immediately after creating the bucket:

# Permanently disable soft delete for this bucket.
gcloud storage buckets update "$BUCKET" --clear-soft-delete

# Optional: verify that the policy is cleared.
gcloud storage buckets describe "$BUCKET" \
  --format="value(soft_delete_policy)"

Clearing the policy ensures that once a training job deletes temporary files they disappear immediately, preventing runaway storage bills. You can still enable backups via lifecycle rules or replication if you need recovery.

Step 4: Temporary Scratch Buckets (`marin-tmp-*`)

For intermediate checkpoints and other short-lived data, Marin provides dedicated scratch buckets named marin-tmp-{region} (one per region). These buckets have lifecycle rules that automatically delete objects based on a ttl=Nd/ path prefix — for example, objects stored under gs://marin-tmp-us-central2/ttl=3d/my-job/ are deleted after 3 days.

Supported TTLs: 1, 2, 3, 4, 5, 6, 7, 14, and 30 days.

To provision or update all scratch buckets (create if missing, disable soft delete, apply lifecycle rules):

uv run infra/configure_temp_buckets.py

# Preview without applying changes:
uv run infra/configure_temp_buckets.py --dry-run

# Target a single bucket:
uv run infra/configure_temp_buckets.py --bucket marin-tmp-us-central2

Custom lifecycle rules for your own buckets

For non-scratch buckets, you can still set up lifecycle rules manually. For example, delete files under a prefix after seven days:

{
  "rule": [
    {
      "action": {"type": "Delete"},
      "condition": {"age": 7, "matchesPrefix": ["tmp/"]}
    }
  ]
}

Save this as lifecycle.json and apply it:

gcloud storage buckets update "$BUCKET" --lifecycle-file=lifecycle.json

Adjust prefixes to match how your experiments organize outputs.

Step 5: Wire It Into Marin / Levanter

Set the bucket as your default prefix whenever you run tutorials:

export MARIN_PREFIX=$BUCKET
export WANDB_PROJECT=marin
export WANDB_ENTITY=your-entity

For Levanter configs, point the checkpointer to the same bucket:

trainer:
  checkpointer:
    base_path: "$BUCKET/your-run"

Commit these defaults in .levanter.yaml or .envrc so every launch script uses the same location.

Ongoing Hygiene Checklist

Re-run gcloud storage buckets describe monthly to confirm soft delete stays disabled.
Use gcloud storage ls --buckets --soft-deleted to ensure no surprise buckets exist in soft-delete state.
Monitor storage costs in Cloud Monitoring or set up alerts when the bucket exceeds an expected size.

With this setup you have a clean, low-overhead bucket tailor-made for Marin and Levanter experiments without the surprise bills that soft delete can cause.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preparing a Storage Bucket for Marin and Levanter

When You Need This

Step 1: Choose a Region and Name

Step 2: Create the Bucket

Step 3: Disable Soft Delete

Step 4: Temporary Scratch Buckets (`marin-tmp-*`)

Custom lifecycle rules for your own buckets

Step 5: Wire It Into Marin / Levanter

Ongoing Hygiene Checklist

FilesExpand file tree

storage-bucket.md

Latest commit

History

storage-bucket.md

File metadata and controls

Preparing a Storage Bucket for Marin and Levanter

When You Need This

Step 1: Choose a Region and Name

Step 2: Create the Bucket

Step 3: Disable Soft Delete

Step 4: Temporary Scratch Buckets (marin-tmp-*)

Custom lifecycle rules for your own buckets

Step 5: Wire It Into Marin / Levanter

Ongoing Hygiene Checklist

Step 4: Temporary Scratch Buckets (`marin-tmp-*`)