MULTI-TASK SELF-SUPERVISED PRE-TRAINING FOR MUSIC CLASSIFICATION

Summary

Evaluate PASE, a contrastive self-supervised learning framework, on music.
Pre-train the model on the AudioSet music subset, and evaluate on Open-Mic (multiple instruments), Ballroom (dance genre), and FMA (genre).
Consider and evaluate different training recipes: weighting mechanisms, impacts by including different workers/tasks, and data efficiency.

It would be interesting to see how PASE-style representation learners compete with contrastive methods, for general purpose audio representation as shown in ‣.