Two Python scripts that auto-caption your images and videos for LoRA training. Point them at a folder, set your anchor word, and they generate .txt caption files that musubi-tuner reads automatically.
You pick one:
Gemini (caption_gemini.py) |
Replicate (caption_replicate.py) |
|
|---|---|---|
| Cost | Free (Google API key) | Paid (Replicate credits) |
| Speed | ~10s/file + rate limit waits | ~2-5s/file, minimal waits |
| Rate limits | Aggressive on free tier (frequent 429s) | Generous |
| Best for | Small datasets (under 50 files) | Large datasets, time-sensitive work |
| Dependencies | google-generativeai, Pillow |
requests |
Recommendation: Use Replicate for bulk captioning. Use Gemini for small batches or when you don't want to spend credits.
Python 3.10+ — Open PowerShell and run python --version to check.
Gemini path:
pip install google-generativeai Pillow
Get a free API key at aistudio.google.com/apikey
Replicate path:
pip install requests
Get your token at replicate.com/account/api-tokens
Both scripts expect your files in two folders — one for images, one for videos. You can use just one or both.
datasets/
YourCharacter/
images/
photo_001.png
photo_002.jpg
photo_003.webp
videos/
clip_001.mp4
clip_002.mp4
Supported formats:
.png, .jpg, .jpeg, .webp, .bmp.mp4, .webm, .mov, .avi, .mkv::: callout {icon="⚠️" color="yellow_bg"}