Donate to the Palestine's children, safe the people of Gaza.  >>>Donate Link...... Your contribution will help to save the life of Gaza people, who trapped in war conflict & urgently needed food, water, health care and more.

Advancing Text-to-Music Generation with Question Answering and Captioning

Text-to-music generation (T2M-Gen) faces a major obstacle due to the scarcity of large-scale publicly available music datasets with natural language captions. To address this, we propose the Music Understanding LLaMA (MU-LLaMA), capable of answering music-related questions and generating captions for music files. Our model utilizes audio representations from a pretrained MERT model to extract music features. However, obtaining a suitable dataset for training the MU-LLaMA model remains challenging, as existing publicly accessible audio question answering datasets lack the necessary depth for open-ended music question answering. To fill this gap, we present a methodology for generating question-answer pairs from existing audio captioning datasets and introduce the MusicQA Dataset designed for answering open-ended music-related questions. experiments demonstrate that the proposed MU-LLaMA model, trained on our designed MusicQA dataset, achieves outstanding performance in both music question answering and music caption generation across various metrics, outperforming current state-of-the-art (SOTA) models in both fields and offering a promising advancement in the T2M-Gen research field.

To Get Daily Health Newsletter

We don’t spam! Read our privacy policy for more info.

Download Mobile Apps
Follow us on Social Media
© 2012 - 2025; All rights reserved by authors. Powered by Mediarx International LTD, a subsidiary company of Rx Foundation.
RxHarun
Logo