Cloud Storage
Where Does Transformer Lab Store Files​
Transformer Lab runs as a central "coordinator" node, but dispatches workloads to different "worker" nodes. All of these nodes (workers and the coordinator) need to have common view of a shared storage directory. This can be stored in the cloud (usually recommended) but could also be on shared storage that is mounted to all nodes in common path (e.g. using NFS)
If you use our s3 or gcs storage option, Transformer Lab will mount the bucket automatically, you don't have to mount any drives yourself. But if you use our localfs storage engine, you map it to a directory that appears like a local path, but is mounted at the operating system level to a shared NFS or other storage engine.
AWS S3 Storage​
To use AWS S3 as remote storage:
-
Set
TFL_REMOTE_STORAGE_ENABLED=truein your.envfile. -
Configure AWS credentials for the
transformerlab-s3profile.Using AWS CLI (Recommended)​
If you have the AWS CLI installed, run:
aws configure --profile transformerlab-s3Enter your AWS Access Key ID, Secret Access Key, default region, and output format when prompted.
Manual Configuration​
Create or edit the AWS credentials file at
~/.aws/credentialsand add:[transformerlab-s3]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEYEnsure the profile has the necessary permissions to create and manage S3 buckets.
Google Cloud Storage (GCS)​
To use Google Cloud Storage instead of AWS S3:
-
Set
TFL_REMOTE_STORAGE_ENABLED=truein your.envfile. -
Set
REMOTE_WORKSPACE_HOST=gcpin the same.envfile. -
Optionally, set
GCP_PROJECTto specify the Google Cloud project. If not set, it defaults totransformerlab-workspace. -
Configure Google Cloud credentials:
Using gcloud CLI (Recommended)​
If you have the Google Cloud CLI installed, authenticate and set the project:
gcloud auth application-default login
gcloud config set project transformerlab-workspace # or your project nameManual Configuration​
Set the
GOOGLE_APPLICATION_CREDENTIALSenvironment variable to the path of your service account key JSON file:export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-key.json"You can obtain a service account key from the Google Cloud Console under IAM & Admin > Service Accounts.
Ensure the service account has the necessary permissions for Cloud Storage operations (Storage Admin or equivalent).
Azure Blob Storage​
To use Azure Blob Storage instead of AWS S3 or GCS:
-
Set
TFL_REMOTE_STORAGE_ENABLED=truein your.envfile. -
Set
TFL_STORAGE_PROVIDER=azurein the same.envfile. -
Configure Azure credentials using one of the following approaches:
Option A: Connection String (Simplest)​
Set the
AZURE_STORAGE_CONNECTION_STRINGenvironment variable in your.envfile:AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=your_account;AccountKey=your_key;EndpointSuffix=core.windows.net"You can find your connection string in the Azure Portal under Storage account → Access keys.
Option B: Account Name + Key​
Set the storage account name and access key separately:
AZURE_STORAGE_ACCOUNT="your_account_name"
AZURE_STORAGE_KEY="your_account_key"Option C: Account Name + SAS Token​
If you prefer to use a Shared Access Signature (SAS) token instead of the full account key:
AZURE_STORAGE_ACCOUNT="your_account_name"
AZURE_STORAGE_SAS_TOKEN="your_sas_token"Ensure the SAS token has sufficient permissions for read, write, list, and delete operations on containers and blobs.
Local Storage​
Instead of using a cloud provider like AWS or GCS, you can configure Transformer Lab to store all artifacts and job data locally. How you set this up depends on your architecture:
Single-Node Setup If your controller and workers run on the exact same machine, configuration is straightforward. You simply define a local file path, and both components will read and write to that exact same location.
Multi-Node Setup (Shared Filesystem)
If your controller and workers run on separate machines, you must use a shared network filesystem (such as NFS). You must mount this shared folder to the exact same file path on every single machine. The system expects the TFL_STORAGE_URI to be identical across the controller and all workers so they can seamlessly share files.
Configuration Steps​
To enable a local or shared filesystem, update your .env file with the following changes:
- Set the storage provider: Add
TFL_STORAGE_PROVIDER=localfs - Define the storage path: Add
TFL_STORAGE_URI=/path/to/your/shared/folder - Disable remote storage: Delete the line
TFL_REMOTE_STORAGE_ENABLED=true(if it is present). - Configure SkyPilot (if applicable): If you are running tasks with SkyPilot, you must configure
hostPathvolume mounts so yourTFL_STORAGE_URIis accessible inside the task pods. See SkyPilot Volume Mounts for localfs for detailed instructions.