This is a collection of notes for the conference, biased to my research interests and focused on self-supervised learning, disentanglement, and discrete representation for music and speech.

Without reading the papers into much detail, there could be (many) errors. Please feel free to correct me!

📥 yin-jyun.luo@qmul.ac.uk

Comparative study

SIMILARITY ANALYSIS OF SELF-SUPERVISED SPEECH REPRESENTATIONS

A Comparison Of Discrete Latent Variable Models For Speech Representation Learning

General-purpose representation

MULTI-TASK SELF-SUPERVISED PRE-TRAINING FOR MUSIC CLASSIFICATION

CONTRASTIVE LEARNING OF GENERAL-PURPOSE AUDIO REPRESENTATIONS

Disentanglement

Contrastive Predictive Coding Supported Factorized Variational Autoencoder For Unsupervised Learning Of Disentangled Speech Representations

Self-Supervised VQ-VAE For One-Shot Music Style Transfer

PITCH-TIMBRE DISENTANGLEMENT OF MUSICAL INSTRUMENT SOUNDS BASED ON VAE-BASED METRIC LEARNING

Algorithm

AN ITERATIVE FRAMEWORK FOR SELF-SUPERVISED DEEP SPEAKER REPRESENTATION LEARNING

CONTRASTIVE SEPARATIVE CODING FOR SELF-SUPERVISED REPRESENTATION LEARNING

Voice conversion

ANY-TO-ONE SEQUENCE-TO-SEQUENCE VOICE CONVERSION USING SELF-SUPERVISED DISCRETE SPEECH REPRESENTATIONS

Crank: An Open-Source Software For Nonparallel Voice Conversion Based On Vector-Quantized Variational Autoencoder