Further reading:
High quality voice conversion using prosodic and high-resolution spectral features https://arxiv.org/pdf/1512.01809.pdfarrow-up-right
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation https://arxiv.org/abs/2211.06687arrow-up-right
Contrastive Pre-training of Visual-Language Models https://towardsdatascience.com/contrastive-pre-training-of-visual-language-models-848dd94c881barrow-up-right
Last updated 2 years ago