IMAC Dataset

Image-Music Affective Correspondence (IMAC) Dataset

To facilitate the study of crossmodal emotion analysis, we constructed a large scale database, which we call the Image-Music Affective Correspondence (IMAC) database. It consists of more than 85,000 images and 3,812 songs (approximately 270 hours of audio). Each data sample is labeled with one of the three emotions: positive, neutral and negative. The IMAC database is constructed by combining an existing image emotion database (You et al., 2016) with a new music emotion database curated by us (Verma et al., 2019).

Note : The images and audio samples are available as URLs. Some URLs will inevitably break, or become inaccessible, with time.

Music Emotion Dataset

We leveraged the Million Song Dataset to curate our Music Emotion Dataset. There is a community contributed complemetary dataset which contains song-level tags, called as the Last.fm Dataset. They provide a list of unique tags (along with their frequency of occurrence) in their dataset, here.

Finding relevant tags: From this global list of unique tags that Last.fm provides, we looked for tags that are related to emotion. We shortlisted the tags that contain the substrings: 'happy', 'joyous', 'energetic', 'soothing', 'relax', 'calm', 'sad', 'pain'. These tags were cleaned manually to remove irrelevant tags (because when one queries for substring 'pain', it is possible to get 'spain'; many times, there are typographical errors ('exclaim' being spelled as 'excalm' -- and being retrieved for the substring 'calm')). The list of clean tags have been provided here: happy, joyous, energetic, soothing, relax, calm, sad, pain.

Assigning Emotion Score to Songs: Once we had a list of emotion-related tags, we identified songs that had these tags associated with them. Following this, if a given song had 2 positive tags, 1 neutral tag, and no negative tags associated with it, it was assigned posScore = 2, neuScore = 1, and negScore = 0. These song names, along with artist names, and emotion scores are given in this file: Song Emotion Score File (CSV).
This file contains details of over 4,100 songs in each new line: song_id, song_name, artist_name, posScore_neuScore_negScore

Retrieving URLs and Downloading Songs: YouTube API facilitates querying their website. For each of the song in the above file, we constructed a query using the song_name and artist_name (song_name + ' ' + artist_name) and obtained the YouTube URLs of the first 10 retrieved results using that search query. This file is available here: Top 10 YouTube URLs File (CSV). This file contains 10 YouTube URLs for the ~4,100 songs: song_id, URL1, URL2, ..., URL10 If you concatenate any of these URLs with 'https://www.youtube.com/watch?v=', you can visit the corresponding entry on YouTube. For example, https://www.youtube.com/watch?v=-UHHc7POovg.
We then downloaded the audio from the YouTube URL for every song in the dataset. For each song, we considered only the first URL (from the list of top 10 URLs) that had an audio of less than 10 minutes in duration -- and discarded all other. A sample script to download the audio, given a YouTube URL is made available here.

Samples: Click on any of the songs below to listen to them on YouTube. Our dataset assigns negative, positive and neutral labels to these songs, respectively.
• Still in Love (Nocturama), 2003 (labeled as negative)
• Keep the Customer Satisfied (Bridge Over Troubled Water), 1970 (labeled as positive)
• Fast Car by Tracy Chapman, 1988 (labeled as neutral)

Quick Downloads (Song Emotion Dataset)

• Cleaned emotion-related tags (.txt files): happy, joyous, energetic, soothing, relax, calm, sad, pain
• Song (and artist) names along with assigned emotion scores are available in this CSV file: Song Emotion Score File
• Top 10 YouTube URLs that are retrieved for the query 'song_name + artist_name' are in this CSV file: Top 10 YouTube URLs File
• All these files are available in single ZIP here: Download ZIP
• A sample script to download audio from YouTube is available here: View Script

Contact Us

For comments or questions, you may write to the authors: gaverma@adobe.com; eeshan.gunesh.dhekane@umontreal.ca; tanaya.guha@warwick.ac.uk

Paper and Citations

The IMAC dataset was proposed in the following paper:

Learning Affective Correspondence between Music and Image
Gaurav Verma, Eeshan Gunesh Dhekane, Tanaya Guha | [pdf]
In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

• If you use the Music Emotion Dataset, please cite the aforementioned paper: [BibTeX - IMAC]
• If you use the Image Emotion Dataset, please cite the paper that originally proposed that dataset: [IED - Webpage]
• If you use the IMAC dataset (which comprises of both Music and Image Emotion Dataset), please cite the aforementioned paper along with the paper that proposed Image Emotion Dataset.

Copyright

License: Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

The paragraph below is a human-readable summary of (and not a substitute for) the actual license.

You are free to share and adapt the material provided here, under the following terms:
Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
Non-commercial: You may not use the material for commercial purposes.
No additional restrictions: You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.