Quickstart: Wan 2.2 T2V on Modal

Why This Model?

Wan 2.2 T2V (text-to-video) is the best starting point for seeing what LoRA training can do because results are immediately visible — you type a prompt, you get a video. Unlike I2V where the reference image provides a structural crutch, T2V forces both the high-noise and low-noise experts to earn their keep, making quality differences between checkpoints obvious. It's also the most demanding pipeline, so if your dataset and captions work here, they'll work everywhere.

Prerequisites

You need three things installed on your PC before starting:

1. Python 3.10+ — Download from python.org if you don't have it. Open PowerShell and run python --version to check.

2. Modal — Install and set up your account:

pip install modal
python -m modal setup

This opens a browser to authenticate. Free tier includes some GPU credits.

3. HuggingFace token — Create a free account at huggingface.co, go to Settings > Access Tokens, create a token with read access. Then store it as a Modal secret:

python -m modal secret create my-huggingface-secret HF_TOKEN=hf_your_token_here

Step 1: Prepare Your Dataset

Create a folder structure like this next to your training script:

datasets/
  └── YourCharacter/
      ├── Images/
      │   ├── image001.png
      │   ├── image001.txt
      │   ├── image002.png
      │   └── image002.txt
      └── Videos/
          ├── video001.mp4
          ├── video001.txt
          ├── video002.mp4
          └── video002.txt

Every image and video MUST have a matching .txt caption file with the same base name. Captions should start with your character's trigger word and describe movements/poses/expressions only — not appearance or clothing.

Step 2: Update the Dataset Config

Open wan21-dataset-config.toml and update the paths to match your character folder name. The paths must match exactly what's inside your datasets/ folder.

Step 3: Upload Your Dataset

In PowerShell, navigate to the folder containing your training script and dataset:

cd C:\\path\\to\\your\\training\\folder
python -m modal run train_wan22_t2v.py::add_to_volume_datasets

This uploads your dataset to Modal's cloud storage. You only need to re-run this when your dataset changes.