A Last.fm blogger recently took it upon himself to download and analyze song lyrics, providing an informative and even humorous window into the language of popular music.
Andrew Clegg, of the LAST.HQ blog, downloaded the lyrics to 240,000 or so songs and ran them through IBM’s Word-Cloud Generator program, with results broken down by genre. The results for the country genre are shown below. (It’s apparent what is most on the mind of country performers.)
The first class of results were word-cloud images of words used the most. The software pulled each word and counts the number of times it is used, giving its results in a graphical layout. As the number of uses increases, the corresponding image of that word gets bigger; more uses, bigger word.
Clegg’s methodology included removing “stopwords” that he felt did not explicitly hold any meaning (and, for, I, you, the, etc.), though commenters to the blog brought up the importance of using “I,” “you,” or “me” in the context of a song. Consequently, a fairly popular word in hip-hop was “ich,” the German pronoun for “I.” Even more popular was “la,” the feminine singular definite article in Spanish and other languages, as in “la senorita” (the girl).
The research also involved creating genre maps, graphical representations of the similarities amongst the lyrics of each genre. Soul and blues were close to each other, as were rock and country. Interestingly, rap and hip-hop were about as close to each other as metal was to folk.
Clegg’s third variation centered on distinctive words, or words that appeared almost exclusively in one particular genre. These results exposed the lyrics that set each genre apart from the rest.
The results are available at Clegg’s blog, though be forewarned that some of the word-cloud images contain potentially offensive language.