Add juicefs mode for storage backend by deep1401 · Pull Request #1964 · transformerlab/transformerlab-app

deep1401 · 2026-05-04T14:40:42Z

JuiceFS Setup: Required Env Vars & Pre-startup Mount

To run with JuiceFS storage, set the following env vars before starting the API server:

Required:

Variable	Description
`TFL_STORAGE_PROVIDER=juicefs`	Enables JuiceFS mode
`TFL_JUICEFS_METADATA_URL`	Metadata engine URL (e.g. `redis://localhost:6379/1`) (NOT REQUIRED IF USING ENTERPRISE HOSTED BY JUICEFS)
`TFL_JUICEFS_VOLUME_NAME`	Name of the JuiceFS volume to mount
`TFL_JUICEFS_STORAGE_BACKEND`	Backing object store: `aws`, `gcp`, or `azure`

Optional:

Variable	Default	Description
`TFL_JUICEFS_MOUNT_POINT`	`/mnt/juicefs`	Where JuiceFS is mounted on the API host
`TFL_JUICEFS_TOKEN`	—	Auth token for hosted JuiceFS (juicefs.com) (REQUIRED IF USING HOSTED)
`TFL_JUICEFS_CONSOLE_URL`	—	Console URL for self-hosted JuiceFS
`TFL_JUICEFS_REMOTE_MOUNT_POINT`	`/tmp/tlab-juicefs`	Mount point inside remote pods (not the API host)

Plus standard credentials for the backing store (AWS_PROFILE, TFL_GCP_SERVICE_ACCOUNT_JSON_PATH, AZURE_STORAGE_CONNECTION_STRING, etc.) depending on which backend you set.

Important: The JuiceFS volume must be mounted at TFL_JUICEFS_MOUNT_POINT before starting the API server. On startup, the server validates that the mount point is active and will exit with an error if it isn't. Run juicefs auth and juicefs mount as part of your host provisioning/boot sequence before
invoking api/run.sh.

netlify · 2026-05-04T14:40:47Z

✅ Deploy Preview for transformerlab canceled.

Name	Link
🔨 Latest commit	`44eef73`
🔍 Latest deploy log	https://app.netlify.com/projects/transformerlab/deploys/69fa538bbe0fec0008d589f8

sentry · 2026-05-04T14:44:17Z

Codecov Report

❌ Patch coverage is 45.80153% with 71 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...merlab/services/compute_provider/launch_juicefs.py	41.66%	35 Missing ⚠️
...ormerlab/services/compute_provider/launch_sweep.py	7.14%	13 Missing ⚠️
api/transformerlab/shared/remote_workspace.py	73.33%	10 Missing and 2 partials ⚠️
...erlab/services/compute_provider/launch_template.py	8.33%	11 Missing ⚠️

📢 Thoughts on this report? Let us know!

…lab/transformerlab-app into add/juicefs-mounting

aliasaria

Review Summary

This PR adds a new juicefs storage provider that mounts a shared filesystem on the API host and on remote pods, scoped per-team via --subdir orgs/<team_id>. The architecture is reasonable — a dedicated launch_juicefs helper module, validated env-var configuration, and a mount-point pre-existence check at startup. The main concerns are a leftover dev-only branch-specific snippet that must come out before merge, an unverified binary download, and some defensive hygiene around path scoping and retry logging.

Must Fix (blocks merge)

[Maintainability] Remove leftover dev/branch hack in `launch_template.py`

Lines 504-507 contain a commented-out git clone … git checkout add/juicefs-mounting block. This was clearly a debugging artifact for installing the SDK from this feature branch. It pins the snippet to the branch name and will be confusing/risky if uncommented later by mistake.

# setup_commands.insert(
#     0,
#     "export CURRENT_DIR=$(pwd); git clone https://github.com/transformerlab/transformerlab-app; cd transformerlab-app/lab-sdk; git checkout add/juicefs-mounting; uv pip install -e .; cd $CURRENT_DIR",
# )

Fix: Delete those four lines. Once the SDK bump to 0.1.32 is published, pip install -q transformerlab (already on the line above) does the right thing.

[Security] JuiceFS binary download has no integrity verification or version pinning

build_juicefs_install_command() does:

"curl -fsSL https://juicefs.com/static/juicefs -o /tmp/juicefs; "
"chmod +x /tmp/juicefs; "
"mv /tmp/juicefs /usr/local/bin/juicefs ..."

No SHA256 verification and no version pin. Every pod fetches whatever lives at that URL today, making your remote workload trust upstream's HTTPS endpoint as an integrity boundary. If juicefs.com is compromised or that path mutates, you push it to every JuiceFS-mounted pod.

Fix: Pin a specific version (e.g. https://github.com/juicedata/juicefs/releases/download/v1.2.3/juicefs-1.2.3-linux-amd64.tar.gz), record a known-good SHA256, and verify with sha256sum -c before chmod +x. At minimum, pin the version in the URL.

Should Fix (important but not blocking)

[Security] Validate `team_id` against path traversal in `_create_juicefs_directory`

org_path = os.path.join(mount_point, "orgs", team_id) followed by os.makedirs(org_path, exist_ok=True). If team_id ever contained .. segments or an absolute path, the directory could land outside the mount. Today team IDs come from the DB so this is theoretical, but it's the exact spot where defense-in-depth costs nothing.

if not re.fullmatch(r"[A-Za-z0-9_-]+", team_id):
    print(f"❌ Invalid team_id for JuiceFS: {team_id}")
    return False

Same applies to the f-string f'orgs/{team_id}' passed into build_juicefs_pod_config's --subdir. shlex.quote blocks shell injection but not path traversal — --subdir will happily resolve ../.

[Correctness] Validate `TFL_JUICEFS_STORAGE_BACKEND` against the known set

_validate_juicefs_config only checks the env var is non-empty. But in build_juicefs_backend_credentials_setup, anything other than "aws" | "gcp" | "azure" silently produces empty setup_commands and env_vars, leading to a confusing juicefs auth failure on the pod. A typo (s3, AWS, awss) reaches launch.

backend = os.getenv("TFL_JUICEFS_STORAGE_BACKEND", "")
if backend not in ("aws", "gcp", "azure"):
    print(f"❌ Invalid TFL_JUICEFS_STORAGE_BACKEND: {backend!r}", file=sys.stderr)
    sys.exit(1)

[Maintainability] The retry loop in `experiment.create_job` masks the underlying issue without a signal

Retrying a UUID4 collision 5 times silently is fine as a JuiceFS-metadata-race workaround, but if it's ever firing for a different reason (e.g. a buggy Job.create raising FileExistsError from the wrong file), nobody will know. Add a logger.warning on retry so the symptom shows up in logs.

except FileExistsError:
    logger.warning("Job.create raised FileExistsError; retrying with new UUID", exc_info=True)
    continue

[Maintainability] Fragile `.replace()` for console-url injection in mount_cmd

In launch_juicefs.py:

if juicefs_console_url:
    auth_cmd = auth_cmd.replace(
        '--token "$TFL_JUICEFS_TOKEN"',
        '--token "$TFL_JUICEFS_TOKEN" --console-url "$TFL_JUICEFS_CONSOLE_URL"',
    )

The string match here happens to occur twice (once per branch of the if/else), so the replace double-injects the flag — that's actually what you want, but it's coincidental and easy to break by editing the auth string elsewhere. Build the snippet directly instead:

console_flag = ' --console-url "$TFL_JUICEFS_CONSOLE_URL"' if juicefs_console_url else ""
auth_cmd = (
    'if [ -n "$ACCESS_KEY" ] && [ -n "$SECRET_KEY" ]; then '
    f'juicefs auth {shlex.quote(volume_name)} --token "$TFL_JUICEFS_TOKEN"{console_flag} '
    '--access-key "$ACCESS_KEY" --secret-key "$SECRET_KEY"; '
    'else '
    f'juicefs auth {shlex.quote(volume_name)} --token "$TFL_JUICEFS_TOKEN"{console_flag}; '
    'fi'
)

Consider Improving (optional / nice-to-have)

The JuiceFS credential-injection block is duplicated near-verbatim in launch_template.py (lines 267-269) and launch_sweep.py (lines 290-296). Pull it into a tiny helper alongside build_juicefs_backend_credentials_setup.
_JUICEFS_MOUNT_POINT in lab/storage.py is captured at import time, while siblings like TFL_REMOTE_STORAGE_ENABLED are read on every call. Reading it dynamically inside set_organization_id would be more consistent and avoid stale-value surprises in tests/long-lived processes.
Consider logging (rather than print) for the _create_juicefs_directory and _validate_juicefs_config failure paths so they integrate with the rest of the API server's structured logs.

What's Working Well

Strong test coverage across both modes (API host and pod) — _validate_juicefs_config is exercised for each missing env var, the mount-command builder covers the non-token, hosted-token, and console-URL variants, and dirs.set_organization_id has tests for both API-host scoping and pod subdir-mount paths.
Correct use of shlex.quote on every interpolated path/volume in the generated shell snippets — this is the right defense for the command-construction layer.
Clear architectural separation: JuiceFS-specific install/mount/auth logic lives in a dedicated launch_juicefs.py rather than being smeared across launch_template.py and launch_sweep.py. The reuse of existing generate_aws_credentials_setup / generate_gcp_credentials_setup / generate_azure_credentials_setup helpers is exactly right.
The "JuiceFS bypasses TFL_REMOTE_STORAGE_ENABLED" decision is well-documented in inline comments at every site that branches on it (api.py, launch_template.py, launch_sweep.py, remote_workspace.py). That's the kind of subtle invariant that would otherwise rot fast.

deep1401 · 2026-05-05T20:04:49Z

@aliasaria

I've left that for whoever tests this PR. I will remove it once we approve the PR.
The binary link is what the juicefs team officially recommends so I preferred this instead of getting it from elsewhere

I'll fix the Correctness and Maintainability points

…lab/transformerlab-app into add/juicefs-mounting

deep1401 · 2026-05-05T20:31:15Z

Pushed the fixes

juicefs mode for storage backend

26cffc0

deep1401 and others added 9 commits May 5, 2026 08:43

Merge branch 'main' into add/juicefs-mounting

e4ce80a

fix ruff

91a1900

sdk version

d614fa3

fix job creation bug

b8c3c95

fix launching on juicefs - auth first and auth order

5e42927

storage fix

f486005

refactor

2f84cfa

ruff

7af4564

Merge branch 'main' into add/juicefs-mounting

9d8a3ed

deep1401 marked this pull request as ready for review May 5, 2026 16:57

deep1401 added 3 commits May 5, 2026 11:04

fix tests

b5e1209

Merge branch 'add/juicefs-mounting' of https://github.com/transformer…

5e4c9cb

…lab/transformerlab-app into add/juicefs-mounting

fix server start create buckets for juicefs

13aa170

aliasaria reviewed May 5, 2026

View reviewed changes

deep1401 and others added 3 commits May 5, 2026 14:05

Merge branch 'main' into add/juicefs-mounting

4c22dcc

fix comments

63ea25b

Merge branch 'add/juicefs-mounting' of https://github.com/transformer…

44eef73

…lab/transformerlab-app into add/juicefs-mounting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add juicefs mode for storage backend#1964

Add juicefs mode for storage backend#1964
deep1401 wants to merge 16 commits into
mainfrom
add/juicefs-mounting

deep1401 commented May 4, 2026 •

edited

Loading

Uh oh!

netlify Bot commented May 4, 2026 •

edited

Loading

Uh oh!

sentry Bot commented May 4, 2026 •

edited

Loading

Uh oh!

aliasaria left a comment

Uh oh!

deep1401 commented May 5, 2026

Uh oh!

deep1401 commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

deep1401 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JuiceFS Setup: Required Env Vars & Pre-startup Mount

Uh oh!

netlify Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for transformerlab canceled.

Uh oh!

sentry Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

aliasaria left a comment

Choose a reason for hiding this comment

Review Summary

Must Fix (blocks merge)

[Maintainability] Remove leftover dev/branch hack in launch_template.py

[Security] JuiceFS binary download has no integrity verification or version pinning

Should Fix (important but not blocking)

[Security] Validate team_id against path traversal in _create_juicefs_directory

[Correctness] Validate TFL_JUICEFS_STORAGE_BACKEND against the known set

[Maintainability] The retry loop in experiment.create_job masks the underlying issue without a signal

[Maintainability] Fragile .replace() for console-url injection in mount_cmd

Consider Improving (optional / nice-to-have)

What's Working Well

Uh oh!

deep1401 commented May 5, 2026

Uh oh!

deep1401 commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

deep1401 commented May 4, 2026 •

edited

Loading

netlify Bot commented May 4, 2026 •

edited

Loading

sentry Bot commented May 4, 2026 •

edited

Loading

[Maintainability] Remove leftover dev/branch hack in `launch_template.py`

[Security] Validate `team_id` against path traversal in `_create_juicefs_directory`

[Correctness] Validate `TFL_JUICEFS_STORAGE_BACKEND` against the known set

[Maintainability] The retry loop in `experiment.create_job` masks the underlying issue without a signal

[Maintainability] Fragile `.replace()` for console-url injection in mount_cmd