Configuration #
Configuration is always done from the project’s wisp-config.yml
. Whenever a job is run from the configuration, the parameters are reflected in the web console. If the configuration is located within a .git
repository, Wisp also keeps track of what commit the job is running on.
You can edit all parameters in the file at any time. Running wisp run
will automatically detect the changes from the last run, and update resources accordingly.
You can see logs for the jobs using wisp logs
, or in the web console.
All available fields for the yaml file are listed here:
# General project specification.
project:
# Project type defines what code and how it should be transferred to the instance.
# There a three supported project types:
# * Local: Uploads the root directory excluding .gitignore files.
# * Git: Downloads a specified git repository to the runner (you may need to set up
# access.)
# * Docker: Downloads a specified Docker Image.
type: local
# Project ID is automatically assigned when running `wisp init`. Do not edit or
# remove this line.
project_id: random-uid
# Project name is the display name that can be used to identify the project. The
# actual identifier for the project is created in the .wisp.lock file when
# initializing Wisp, so this is just for display purposes.
name: My Project
# You can set up environment variables accessible for the pipeline in the console
# as a secret. This is useful if you are using HuggingFace, Weights & Biases etc.
env:
HF_TOKEN: {{secrets.HF_TOKEN}}
# The setup script runs every time a new cluster is being started. Set up your
# environment, download large persistent files and more here.
setup:
# The script step will execute in a bash shell as the root user. You can see what
# tools are automatically installed to Wisp clusters in the Documentation.
script: |
pip install -r requirements.txt
# Run script will execute every time you run `wisp run`. Outputs from the script will
# be saved as a log.
run:
script: |
python train.py --params params.yml
# Teardown is run when the cluster is deleted
teardown:
# Again, you can run any script you'd like.
script: |
echo "Teardown"
# Resources specifies what minimum requirements your workload has. Wisp has many
# different constraint variables, and none of them are required. Wisp will select the
# cheapest viable options if no requirements are specified.
resources:
# Define any of the clouds that wisp supports. Can be a list, a single string or
# None.
clouds: [aws, azure, gcp, lambda]
# Allowed regions for your workload. Can be a list, a single string or None.
regions: [eu, us]
# Accelerators - GPUs, TPUs etc.
accelerators:
# Pure name for your GPU - if you know exactly what you need.
# Can be a list, a single string or None.
name: [H100, A100]
# Compute capability is the supported version of CUDA and specifies a subset
# of GPUs with a set of instructions. https://en.wikipedia.org/wiki/CUDA
# Some models specifies this as a requirement.
compute_capability: 7.0+
# Set the VRAM requirement for the GPU.
vram: 16
# number of accelerators - will try to find a single node with multiple GPUs
# attached. If none is founds, Wisp will spin up a multinode cluster.
n_accelerators: 1
# Minimum memory for the cluster. If None, Wisp will find a sensible number based
# on vCPUs, accelerators etc.
memory: 16
# Number of vCPUs.
vcpus: 8
# Internal storage of the machine in GB. If you need a lot of storage, it may make
# sense to use Wisp's data bucket integration instead.
storage: 1000
# By default, Wisp uses the internal storage of the machine. You can enable
# persistent disk using the option given below.
persistent_disk: 1000
# The next section handles anything related to data in and out of the instance. There
# are many strategies, and Wisp supports the most normal methods for transferring and
# retaining data.
io:
# Inputs will be set up as part of the instance setup - i.e. only once.
inputs:
# Buckets can be mounted directly to the instance. Note that you need to grant
# Wisp access to the bucket through the web console.
bucket:
# Wisp supports AWS S3, Google Cloud Storage (GCS) and Microsoft Blob
# Storage. The left hand side of the colon (:) is the source, and right
# hand side the mount location (relative to the user directory). Wisp will
# test this connection before launching the cluster.
name: "s3://my-bucket:training_data"
# Two strategies for mounting buckets - Stream or Copy. Streaming will
# copy the data adhoc, which is good for large datasets.
strategy:
# Outputs takes the same parameters as inputs. Data will be copied to the
# destination after the run script has finished, and before the teardown step.
outputs:
# Folder will copy the data to the local client to a folder you define. Wisp CLI
# needs write permission to the folder, and you need to have space on your
# machine for the output data. If this step fails, e.g. if you lost
# connection, the job execution will be paused and you will be notified via
# email.
folder:
"~/checkpoints:~/checkpoints"