Crank: An Open-Source Software For Nonparallel Voice Conversion Based On Vector-Quantized Variational Autoencoder

Summary

A library for fast-prototyping VQVAE-based voice conversion systems

The problems

There does not seem to have a standardised baseline and evaluation metric for non-parallel voice conversion.
The design choice spans a wide array of configurations, which makes a comprehensive benchmark difficult.

The solution

A public repository that implements the backbone framework of VQVAE-based voice conversion methods.
Include a variety of training strategies, e.g., cycle-consistency and adversarial loss, for ablation studies.
Provide standard evaluation metrics and vocoders.

Thoughts

The low codebook utilisation for VQVAEs has been an issue, and several training techniques have been proposed such as normalisation and re-initialisation which are seemingly not mentioned in the work. It would be interesting to see if the implemented training strategies alleviate the problem.