
Summary
- A library for fast-prototyping VQVAE-based voice conversion systems
The problems
- There does not seem to have a standardised baseline and evaluation metric for non-parallel voice conversion.
- The design choice spans a wide array of configurations, which makes a comprehensive benchmark difficult.
The solution
- A public repository that implements the backbone framework of VQVAE-based voice conversion methods.
- Include a variety of training strategies, e.g., cycle-consistency and adversarial loss, for ablation studies.
- Provide standard evaluation metrics and vocoders.
Thoughts
The low codebook utilisation for VQVAEs has been an issue, and several training techniques have been proposed such as normalisation and re-initialisation which are seemingly not mentioned in the work. It would be interesting to see if the implemented training strategies alleviate the problem.