What Is This?

Two Python scripts that auto-caption your images and videos for LoRA training. Point them at a folder, set your anchor word, and they generate .txt caption files that musubi-tuner reads automatically.

You pick one:

Gemini (caption_gemini.py) Replicate (caption_replicate.py)
Cost Free (Google API key) Paid (Replicate credits)
Speed ~10s/file + rate limit waits ~2-5s/file, minimal waits
Rate limits Aggressive on free tier (frequent 429s) Generous
Best for Small datasets (under 50 files) Large datasets, time-sensitive work
Dependencies google-generativeai, Pillow requests

Recommendation: Use Replicate for bulk captioning. Use Gemini for small batches or when you don't want to spend credits.

Prerequisites

Python 3.10+ — Open PowerShell and run python --version to check.

Gemini path:

pip install google-generativeai Pillow

Get a free API key at aistudio.google.com/apikey

Replicate path:

pip install requests

Get your token at replicate.com/account/api-tokens

Step 1: Organize Your Dataset

Both scripts expect your files in two folders — one for images, one for videos. You can use just one or both.

datasets/
  YourCharacter/
    images/
      photo_001.png
      photo_002.jpg
      photo_003.webp
    videos/
      clip_001.mp4
      clip_002.mp4

Supported formats:

::: callout {icon="⚠️" color="yellow_bg"}