Your new post is loading...
Olivier Lartillot's insight:
Cynthia is a PhD student. Her research is in the field of music information retrieval: she is working on technologies to analyze, organize and present the considerable amount of digital music information. She is particularly interested in making sense of music data by obtaining richer perspectives on it by taking into account information from multiple data sources. These can be recordings of multiple interpretations of the same music piece, but also related information in non-audio modalities, such as videos of performing musicians and textual information from collaborative web resources describing songs and their usage contexts.
We’ve always felt ambivalent about the word “genre” at The Echo Nest. On one hand, it’s the most universal shorthand for classifying music, because everyone has a basic understanding of the big, old...
Olivier Lartillot's insight:
The Echo Nest “top terms,” which are the words most commonly used to describe a piece of music, are “far more granular than the big, static genres of the past. We’ve been maintaining an internal list of dynamic genre categories for about 800 different kinds of music. We also know what role each artist or song plays in its genre (whether they are a key artist for that genre, one of the most commonly played, or an up-and-comer).
The Echo Nest just announced “a bunch of new genre-oriented features, including:
- A list of nearly 800 genres from the real world of music
- Names and editorial descriptions of every genre
- Essential artists from any genre
- Similar genres to any genre
- Verified explainer links to third-party resources when available
- Genre search by keyword
- Ranked genres associated with artists
- Three radio “presets” for each genre: Core (the songs most representative of the genre); In Rotation (the songs being played most frequently in any genre today); and Emerging (up-and-coming songs within the genre).
Where did these genres come from?
“The Echo Nest’s music intelligence platform continuously learns about music. Most other static genre solutions classify music into rigid, hierarchical relationships, but our system reads everything written about music on the web, and listens to millions of new songs all the time, to identify their acoustic attributes.
“This enables our genres to react to changes in music as they happen. To create dynamic genres, The Echo Nest identifies salient terms used to describe music (e.g., “math rock,” “IDM”, etc.), just as they start to appear. We then model genres as dynamic music clusters — groupings of artists and songs that share common descriptors, and similar acoustic and cultural attributes. When a new genre forms, we know about it, and music fans who listen to our customers’ apps and services will be able to discover it right away, too.
“Our approach to genres is trend-aware. That means it knows not only what artists and songs fall into a given genre, but also how those songs and artists are trending among actual music fans, within those genres.
“About 260 of these nearly 800 genres are hyper-regional, meaning that they are tied to specific places. Our genre system sees these forms of music as they actually exist; it can help the curious music fan hear the differences, for instance, between Luk Thung, Benga, and Zim music.”
The new computing approach is based on the biological nervous system, specifically on how neurons react to stimuli and connect with other neurons to interpret information.
Olivier Lartillot's insight:
Some excerpts from the New York Times paper:
“The new computing approach, already in use by some large technology companies, is based on the biological nervous system, specifically on how neurons react to stimuli and connect with other neurons to interpret information. It allows computers to absorb new information while carrying out a task, and adjust what they do based on the changing signals.
In coming years, the approach will make possible a new generation of artificial intelligence systems that will perform some functions that humans do with ease: see, speak, listen, navigate, manipulate and control. That can hold enormous consequences for tasks like facial and speech recognition, navigation and planning, which are still in elementary stages and rely heavily on human programming.
“We’re moving from engineering computing systems to something that has many of the characteristics of biological computing.”
Conventional computers are limited by what they have been programmed to do. Computer vision systems, for example, only “recognize” objects that can be identified by the statistics-oriented algorithms programmed into them. An algorithm is like a recipe, a set of step-by-step instructions to perform a calculation.
Until now, the design of computers was dictated by ideas originated by the mathematician John von Neumann about 65 years ago. Microprocessors perform operations at lightning speed, following instructions programmed using long strings of 1s and 0s. They generally store that information separately in what is known, colloquially, as memory, either in the processor itself, in adjacent storage chips or in higher capacity magnetic disk drives.
The data are shuttled in and out of the processor’s short-term memory while the computer carries out the programmed action. The result is then moved to its main memory.
The new processors consist of electronic components that can be connected by wires that mimic biological synapses. Because they are based on large groups of neuron-like elements, they are known as neuromorphic processors.
They are not “programmed.” Rather the connections between the circuits are “weighted” according to correlations in data that the processor has already “learned.” Those weights are then altered as data flows in to the chip, causing them to change their values and to “spike.” That generates a signal that travels to other components and, in reaction, changes the neural network, in essence programming the next actions much the same way that information alters human thoughts and actions.
“Instead of bringing data to computation as we do today, we can now bring computation to data. Sensors become the computer, and it opens up a new way to use computer chips that can be everywhere.”
The new computers, which are still based on silicon chips, will not replace today’s computers, but will augment them, at least for now.
Many computer designers see them as coprocessors, meaning they can work in tandem with other circuits that can be embedded in smartphones and in the giant centralized computers that make up the cloud. Modern computers already consist of a variety of coprocessors that perform specialized tasks, like producing graphics on your cellphone and converting visual, audio and other data for your laptop.”
Olivier Lartillot's insight:
I am glad to see such popularization of research related to “melodic” pattern identification that generalizes beyond the music context and beyond the human species, and also this interesting link to music identification technologies (like Shazam). Before discussing further on this, here is first of all what this Scientific American podcast explains in a simple way the computational attempt of mimicking dolphins' melodic pattern identification abilities:
“Shazam-Like Dolphin System ID's Their Whistles: A program uses an algorithm to identify dolphin whistles similar to that of the Shazam app, which identifies music from databases by changes in pitch over time.
Used to be, if you happened on a great tune on the radio, you might miss hearing what it was. Of course, now you can just Shazam it—let your smartphone listen, and a few seconds later, the song and performer pop up. Now scientists have developed a similar tool—for identifying dolphins.
Every dolphin has a unique whistle. They use their signature whistles like names: to introduce themselves, or keep track of each other. Mothers, for example, call a stray offspring by whistling the calf's ID.
To tease apart who's saying what, researchers devised an algorithm based on the Parsons code, the software that mammals, I mean that fishes songs from music databases, by tracking changes in pitch over time.
They tested the program on 400 whistles from 20 dolphins. Once a database of dolphin sounds was created, the program identified subsequent dolphins by their sounds nearly as well as humans who eyeballed the whistles' spectrograms.
Seems that in noisy waters, just small bits of key frequency change information may be enough to help Flipper find a friend.”
More precisely, the computer program generates a compact description of each dolphin whistle indicating how the pitch curve progressively ascends and descends. This enables to get a description that is characteristic of each dolphin, and to compare these whistle curves and see which curve belongs to which dolphin.
But to be more precise, Shazam does not use this kind of approach to identify music. It does not try to detect melodic lines in the music recorded by the user, but take a series of several-second snapshot of each song, such that each snapshot contains all the complex sound at that particular moment (with the polyphony of instruments). A compact description (a “fingerprint”) of each snapshot is produced, that indicate the most important spectral peaks (let's say the more prominent pitch of the polyphony). This fingerprint is then compared with those of each songs in the music database. Finally the identified song in the database is the one whose series of fingerprints fits best with the series of fingerprints of the user's music query. Here is a simple explanation of how Shazam works: http://laplacian.wordpress.com/2009/01/10/how-shazam-works/
Shazam does not model *how* humans identify music. The dolphin whistle comparison program does not model *how* dolphins identify each other. And Shazam and the dolphin whistle ID program do not use similar approaches. But on the other hand, we might assume that dolphins and humans abilities of identifying auditory patterns (in whistles, in music for humans) rely on same core cognitive processes?
There's a theory that human intelligence stems from a single algorithm. The idea arises from experiments suggesting that the portion of your brain dedicated to processing sound from your ears could also handle sight for your eyes.
Olivier Lartillot's insight:
There’s a theory that human intelligence stems from a single algorithm. The idea arises from experiments suggesting that the portion of your brain dedicated to processing sound from your ears could also handle sight for your eyes. This is possible only while your brain is in the earliest stages of development, but it implies that the brain is — at its core — a general-purpose machine that can be tuned to specific tasks.
In the early days of artificial intelligence, the prevailing opinion was that human intelligence derived from thousands of simple agents working in concert, what MIT’s Marvin Minsky called “The Society of Mind.” To achieve AI, engineers believed, they would have to build and combine thousands of individual computing modules. One agent, or algorithm, would mimic language. Another would handle speech. And so on. It seemed an insurmountable feat.
A new field of computer science research known as Deep Learning seeks to build machines that can process data in much the same way the brain does, and this movement has extended well beyond academia, into big-name corporations like Google and Apple. Google is building one of the most ambitious artificial-intelligence systems to date, the so-called Google Brain.
This movement seeks to meld computer science with neuroscience — something that never quite happened in the world of artificial intelligence. “I’ve seen a surprisingly large gulf between the engineers and the scientists.” Engineers wanted to build AI systems that just worked, but scientists were still struggling to understand the intricacies of the brain. For a long time, neuroscience just didn’t have the information needed to help improve the intelligent machines engineers wanted to build.
What’s more, scientists often felt they “owned” the brain, so there was little collaboration with researchers in other fields. The end result is that engineers started building AI systems that didn’t necessarily mimic the way the brain operated. They focused on building pseudo-smart systems that turned out to be more like a Roomba vacuum cleaner than Rosie the robot maid from the Jetsons.
Deep Learning is a first step in this new direction. Basically, it involves building neural networks — networks that mimic the behavior of the human brain. Much like the brain, these multi-layered computer networks can gather information and react to it. They can build up an understanding of what objects look or sound like.
In an effort to recreate human vision, for example, you might build a basic layer of artificial neurons that can detect simple things like the edges of a particular shape. The next layer could then piece together these edges to identify the larger shape, and then the shapes could be strung together to understand an object. The key here is that the software does all this on its own — a big advantage over older AI models, which required engineers to massage the visual or auditory data so that it could be digested by the machine-learning algorithm.
With Deep Learning, you just give the system a lot of data “so it can discover by itself what some of the concepts in the world are.” Last year, one algorithms taught itself to recognize cats after scanning millions of images on the internet. The algorithm didn’t know the word “cat” but over time, it learned to identify the furry creatures we know as cats, all on its own.
This approach is inspired by how scientists believe that humans learn. As babies, we watch our environments and start to understand the structure of objects we encounter, but until a parent tells us what it is, we can’t put a name to it.
No, deep learning algorithms aren’t yet as accurate — or as versatile — as the human brain. But he says this will come.
In 2011, the Deep Learning project was launched at Google, and in recents months, the search giant has significantly expanded this effort, acquiring the artificial intelligence outfit founded by University of Toronto professor Geoffrey Hinton, widely known as the godfather of neural networks.
Chinese search giant Baidu has opened its own research lab dedicated to deep learning, vowing to invest heavy resources in this area. And big tech companies like Microsoft and Qualcomm are looking to hire more computer scientists with expertise in neuroscience-inspired algorithms.
Meanwhile, engineers in Japan are building artificial neural nets to control robots. And together with scientists from the European Union and Israel, neuroscientist Henry Markman is hoping to recreate a human brain inside a supercomputer, using data from thousands of real experiments.
The rub is that we still don’t completely understand how the brain works, but scientists are pushing forward in this as well. The Chinese are working on what they call the Brainnetdome, described as a new atlas of the brain, and in the U.S., the Era of Big Neuroscience is unfolding with ambitious, multidisciplinary projects like President Obama’s newly announced (and much criticized) Brain Research Through Advancing Innovative Neurotechnologies Initiative — BRAIN for short.
If we map how out how thousands of neurons are interconnected and “how information is stored and processed in neural networks,” engineers will have better idea of what their artificial brains should look like. The data could ultimately feed and improve Deep Learning algorithms underlying technologies like computer vision, language analysis, and the voice recognition tools offered on smartphones from the likes of Apple and Google.
“That’s where we’re going to start to learn about the tricks that biology uses. I think the key is that biology is hiding secrets well. We just don’t have the right tools to grasp the complexity of what’s going on.”
Right now, engineers design around these issues, so they skimp on speed, size, or energy efficiency to make their systems work. But AI may provide a better answer. “Instead of dodging the problem, what I think biology could tell us is just how to deal with it….The switches that biology is using are also inherently noisy, but biology has found a good way to adapt and live with that noise and exploit it. If we could figure out how biology naturally deals with noisy computing elements, it would lead to a completely different model of computation.”
But scientists aren’t just aiming for smaller. They’re trying to build machines that do things computer have never done before. No matter how sophisticated algorithms are, today’s machines can’t fetch your groceries or pick out a purse or a dress you might like. That requires a more advanced breed of image intelligence and an ability to store and recall pertinent information in a way that’s reminiscent of human attention and memory. If you can do that, the possibilities are almost endless.
“Everybody recognizes that if you could solve these problems, it’s going to open up a vast, vast potential of commercial value."
Big Data is pushing into the humanities, as evidenced by new, illuminating computer analyses of literary history.
Olivier Lartillot's insight:
Big Data technology is steadily pushing beyond the Internet industry and scientific research into seemingly foreign fields like the social sciences and the humanities. The new tools of discovery provide a fresh look at culture, much as the microscope gave us a closer look at the subtleties of life and the telescope opened the way to faraway galaxies.
“Traditionally, literary history was done by studying a relative handful of texts. What this technology does is let you see the big picture — the context in which a writer worked — on a scale we’ve never seen before.”
Some of those tools are commonly described in terms familiar to an Internet software engineer — algorithms that use machine learning and network analysis techniques. For instance, mathematical models are tailored to identify word patterns and thematic elements in written text. The number and strength of links among novels determine influence, much the way Google ranks Web sites.
It is this ability to collect, measure and analyze data for meaningful insights that is the promise of Big Data technology. In the humanities and social sciences, the flood of new data comes from many sources including books scanned into digital form, Web sites, blog posts and social network communications.
Data-centric specialties are growing fast, giving rise to a new vocabulary. In political science, this quantitative analysis is called political methodology. In history, there is cliometrics, which applies econometrics to history. In literature, stylometry is the study of an author’s writing style, and these days it leans heavily on computing and statistical analysis. Culturomics is the umbrella term used to describe rigorous quantitative inquiries in the social sciences and humanities.
“Some call it computer science and some call it statistics, but the essence is that these algorithmic methods are increasingly part of every discipline now.”
Cultural data analysts often adapt biological analogies to describe their work. For example: “Computing and Visualizing the 19th-Century Literary Genome.”
Such biological metaphors seem apt, because much of the research is a quantitative examination of words. Just as genes are the fundamental building blocks of biology, words are the raw material of ideas.
“What is critical and distinctive to human evolution is ideas, and how they evolve.”
Some projects mine the virtual book depository known as Google Books and track the use of words over time, compare related words and even graph them. Google cooperated and built the software for making graphs open to the public. The initial version of Google’s cultural exploration site began at the end of 2010, based on more than five million books, dating from 1500. By now, Google has scanned 20 million books, and the site is used 50 times a minute. For example, type in “women” in comparison to “men,” and you see that for centuries the number of references to men dwarfed those for women. The crossover came in 1985, with women ahead ever since.
Researchers tapped the Google Books data to find how quickly the past fades from books. For instance, references to “1880,” which peaked in that year, fell to half by 1912, a lag of 32 years. By contrast, “1973” declined to half its peak by 1983, only 10 years later. “We are forgetting our past faster with each passing year.”
Other research approached collective memory from a very different perspective, focusing on what makes spoken lines in movies memorable. Sentences that endure in the public mind are evolutionary success stories, cf. “the fitness of language and the fitness of organisms.” As a yardstick, the researchers used the “memorable quotes” selected from the popular Internet Movie Database, or IMDb, and the number of times that a particular movie line appears on the Web. Then they compared the memorable lines to the complete scripts of the movies in which they appeared — about 1,000 movies. To train their statistical algorithms on common sentence structure, word order and most widely used words, they fed their computers a huge archive of articles from news wires. The memorable lines consisted of surprising words embedded in sentences of ordinary structure. “We can think of memorable quotes as consisting of unusual word choices built on a scaffolding of common part-of-speech patterns.”
Quantitative tools in the humanities and the social sciences, as in other fields, are most powerful when they are controlled by an intelligent human. Experts with deep knowledge of a subject are needed to ask the right questions and to recognize the shortcomings of statistical models.
“You’ll always need both. But we’re at a moment now when there is much greater acceptance of these methods than in the past. There will come a time when this kind of analysis is just part of the tool kit in the humanities, as in every other discipline.”
A funny thing happened the last time I was taking in a performance of Beethoven’s Fifth Symphony (just a few minutes ago).
Olivier Lartillot's insight:
"The Orchestra is a flat-out astounding new app produced by Touch Press, the Philharmonia Orchestra and its principal conductor Salonen. At $13.99, it’s not only one of the best albums—you know, a longish compilation of music—you could purchase for someone this holiday season; it’s an app that could easily change how you consume classical music outside of the concert hall. Or how we introduce new listeners to symphonic works in the first place.
Salonen’s venture wisely avoids trying to recapture the form or mediated rhythms of those storied successes. Physical copies of recordings, after all, are pretty much dead. Conductors aren’t going to be invited back to occupy whole hours of network TV time ever again. So: on to the app store.
The Philharmonia’s success here wasn’t guaranteed merely by its being the first orchestra to upload some videos to a tablet’s app store. Rather, their opening gambit was deeply thought through by people who understand both Mahler and the iPad. Because the best thing about the app is its synchronous way of making you feel and see various musical values at once, you will derive the best experience of The Orchestra by listening only to the musicians, and having the rest of the app’s information delivered visually. The swooping and aggressive harp glissandos that come during the “Princesses Intercede …” movement of Igor Stravinsky’s “Firebird” ballet are exciting enough as pure sound, but this app gets carried right along with the music’s kinetic qualities: The score speeds expressively through each punchy liftoff in 6/8 time, while, above, a bird’s-eye “BeatMap” graphic of the orchestra pulses to signal which instruments are required at each second in order to whip up the overall noise. The presentation of performance video and graphical information is where the app is elevated beyond being a pleasing curiosity and into something that feels legitimately groundbreaking in our appreciation of music—as though there might be a day when they give out Grammys for app-making. You needn’t be totally comfortable reading musical notation in order to find value in looking at a score; at one vivid juncture of Salonen’s own violin concerto, you can read how the drummer at a “heavy rock kit” is advised to “Go crazy.” (And if you can’t read music, there’s a tablature-style reduction that drives home basic information, in a way that will feel familiar to users of GarageBand.)
The app lets you into the music from many angles at once, giving new views on the artistry and technical prowess behind the writing, and playing, of some of the world’s greatest music.
Best of all, The Orchestra is no techno-utopian attempt to do away with the concert hall. Rather, it’s an invitation for new listeners to get comfortable with the density of informational delight that can be had there. Until these musicians come around to your town, consider dropping $14 to meet them in app form. Think of The Orchestra, and the wave of symphony apps that ought to follow in its wake, the way you once did of “albums”—as exquisitely good advertisements for a product that still manages to best your expectations once you travel to get it in the real world."
[Note #1 of Olivier Lartillot, curator: It would be great to add more adaptive Markov modeling on top of that. Cf for instance the Continuator project: http://www.youtube.com/watch?v=ynPWOMzossI]
[Note #2 of Olivier Lartillot, curator: I suggest a short name for the upcoming dedicated website: adlib.it! ^^]
With The Infinite Jukebox, you can create a never-ending and ever changing version of any song. The app works by sending your uploaded track over to The Echo Nest, where it is decomposed into individual beats. Each beat is then analyzed and matched to other similar sounding beats in the song. This information is used to create a detailed song graph of paths though similar sounding beats. As the song is played, when the next beat has similar sounding beats there’s a chance that we will branch to a completely different part of the song. Since the branching is to a very similar sounding beat in the song, you (in theory) won’t notice the jump. This process of branching to similar sounding beats can continue forever, giving you an infinitely long version of the song.
To accompany the playback, I created a chord diagram that shows the beats of the song along the circumference of the circle along with with chords representing the possible paths from each beat to it’s similar neighbors. When the song is not playing, you can mouse over any beat and see all of the possible paths for that beat. When the song is playing, the visualization shows the single next potential beat. I was quite pleased at how the visualization turned out. I think it does a good job of helping the listener understand what is going on under the hood, and different songs have very different looks and color palettes. They can be quite attractive.
I did have to adapt the Infinite Gangnam Style algorithm for the Infinite Jukebox. Not every song is as self-similar as Psy’s masterpiece, so I have to dynamically adjust the beat-similarity threshold until there are enough pathways in the song graph to make the song infinite. This means that the overall musical quality may vary from song to song depending on the amount of self-similarity in the song.
Overall, the results sound good for most songs. I still may do a bit of tweaking on the algorithm to avoid some degenerate cases (you can get stuck in a strange attractor at the end of Karma Police for instance). Give it a try, upload your favorite song and listen to it forever. The Infinite Jukebox.
Thanks to advances in computing power, we can analyze music in radically new and different ways. Computers are still far from grasping some of the deep and often unexpected nuances that release our most intimate emotions. However, by processing vast amounts of raw data and performing unprecedented large-scale analyses beyond the reach of teams of human experts, they can provide valuable insight into some of the most basic aspects of musical discourse, including the evolution of popular music over the years. Has there been an evolution? Can we measure it? And if so, what do we observe?
In a recent article published in the journal Scientific Reports, authors used computers to analyze 464,411 Western popular music recordings released between 1955 and 2010, including pop, rock, hip-hop, folk and funk. They first looked for static patterns characterizing the generic use of primary musical elements like pitch, timbre and loudness. They then measured a number of general trends for these elements over the years.
Common practice in the growing field of music information processing starts by cutting an audio signal into short slices — in our case the musical beat, which is the most relevant and recognizable temporal unit in music (the beat roughly corresponds to the periodic, sometimes unconscious foot-tapping of music listeners).
For each slice, computers represented basic musical information with a series of numbers. For pitch, they computed the relative intensity of the notes present in every beat slice, thus accounting for the basic harmony, melody and chords. For timbre, what some call the “color” of a note, they measured the general waveform characteristics of each slice, thus accounting for the basic sonority of a given beat and the combinations of instruments and effects. And for loudness, they calculated the energy of each slice, accounting for sound volume or perceived intensity.
They then constructed a music “vocabulary”: they assigned code words to slice-based numbers to generate a “text” that could represent the popular musical discourse of a given year or age. Doing so allowed to discover static patterns by counting how many different code words appeared in a given year, how often they were used and which were the most common successions of code words at a given point in time.
Interestingly, in creating a musical “vocabulary,” they found a well-known phenomenon common in written texts and many other domains: Zipf’s law, which predicts that the most frequent word in a text will appear twice as often as the next most frequent word, three times as often as the third most frequent, and so on. The same thing, they found, goes for music.
If we suppose that the most common note combination is used 100 times, the second most common combination will be used 50 times and the third 33 times. Importantly, they found that Zipf’s law held for each year’s vocabulary, from 1955 to 2010, with almost exactly the same “usage ordering” of code words every year. That suggests a general, static rule, one shared with linguistic texts and many other natural and artificial phenomena.
Beyond these static patterns, they also found three significant trends over time. Again using pitch code words, they counted the different transitions between note combinations and found that this number decreased over the decades. The analysis also indicated that pop music’s variety of timbre has been decreasing since the 1960s, meaning that artists and composers tend to stick to the same sound qualities — in other words, instruments playing the same notes sound more similar than they once did. Finally, they found that recording levels had consistently increased since 1955, confirming a so-called race toward louder music.
It is not easy to measure the goodness or badness of singing. There is "no consensus on how to obtain objective measures of singing proficiency in sung melodies".
Music intelligence platform, The Echo Nest, has partnered with Reebok for a Spotify app for fitness fanatics or casual exercisers to get their groove on and motivate themselves into motion.
The study of patterns and long-term variations in popular music could shed new light on relevant issues concerning its organization, structure, and dynamics. More importantly, it addresses valuable questions for the basic understanding of music as one of the main expressions of contemporary culture: Can we identify some of the patterns behind music creation? Do musicians change them over the years? Can we spot differences between new and old music? Is there an ‘evolution’ of musical discourse?
Current technologies for music information processing provide a unique opportunity to answer the above questions under objective, empirical, and quantitative premises. Moreover, akin to recent advances in other cultural assets, they allow for unprecedented large-scale analyses. One of the first publicly-available large-scale collections that has been analyzed by standard music processing technologies is the million song dataset. Among others, the dataset includes the year annotations and audio descriptions of 464,411 distinct music recordings (from 1955 to 2010), which roughly corresponds to more than 1,200 days of continuous listening. Such recordings span a variety of popular genres, including rock, pop, hip hop, metal, or electronic. Explicit descriptions available in the dataset cover three primary and complementary musical facets: loudness, pitch, and timbre.
By exploiting tools and concepts from statistical physics and complex networks, we unveil a number of statistical patterns and metrics characterizing the general usage of pitch, timbre, and loudness in contemporary western popular music.
In order to build a ‘vocabulary’ of musical elements, we encode the dataset descriptions by a discretization of their values, yielding what we call music codewords. Next, to quantify long-term variations of a vocabulary, we need to obtain samples of it at different periods of time. For that we perform a Monte Carlo sampling in a moving window fashion. In particular, for each year, we sample one million beat-consecutive codewords, considering entire tracks and using a window length of 5 years. This procedure, which is repeated 10 times, guarantees a representative sample with a smooth evolution over the years.
We first count the frequency of usage of pitch codewords (i.e. the number of times each codeword type appears in a sample). We observe that most used pitch codewords generally correspond to well-known harmonic items, while unused codewords correspond to strange/dissonant pitch combinations. Sorting the frequency counts in decreasing order provides a very clear pattern behind the data: a power law, which indicates that a few codewords are very frequent while the majority are highly infrequent (intuitively, the latter provide the small musical nuances necessary to make a discourse attractive to listeners). Nonetheless, it also states that there is no characteristic frequency nor rank separating most used codewords from largely unused ones (except for the largest rank values due to the finiteness of the vocabulary). Another non-trivial consequence of power-law behavior is that it can detect case where extreme events (i.e. very rare codewords) will certainly show up in a continuous discourse providing the listening time is sufficient and the pre-arranged dictionary of musical elements is big enough.
Importantly, we find this power-law behavior to be invariant across years, with practically the same fit parameters. However, it could well be that, even though the distribution is the same for all years, codeword rankings were changing (e.g. a certain codeword was used frequently in 1963 but became mostly unused by 2005). To assess this possibility we compute the Spearman's rank correlation coefficients for all possible year pairs and find that they are all extremely high. This indicates that codeword rankings practically do not vary with years.
Codeword frequency distributions provide a generic picture of vocabulary usage. However, they do not account for discourse syntax, as well as a simple selection of words does not necessarily constitute an intelligible sentence. One way to account for syntax is to look at local interactions or transitions between codewords, which define explicit relations that capture most of the underlying regularities of the discourse and that can be directly mapped into a network or graph. Hence, analogously to language-based analyses, we consider the transition networks formed by codeword successions, where each node represents a codeword and each link represents a transition. The topology of these networks and common metrics extracted from them can provide us with valuable clues about the evolution of musical discourse.
All the transition networks we obtain are sparse, meaning that the number of links connecting codewords is of the same order of magnitude as the number of codewords. Thus, in general, only a limited number of transitions between codewords is possible. Such constraints would allow for music recognition and enjoyment, since these capacities are grounded in our ability for guessing/learning transitions and a non-sparse network would increase the number of possibilities in a way that guessing/learning would become unfeasible. Thinking in terms of originality and creativity, a sparse network means that there are still many ‘composition paths’ to be discovered. However, some of these paths could run into the aforementioned guessing/learning tradeoff. Overall, network sparseness provides a quantitative account of music's delicate balance between predictability and surprise.
In sparse networks, the most fundamental characteristic of a codeword is its degree, which measures the number of links to other codewords. With pitch networks, this quantity is distributed according to a power law with the same fit parameters for all considered years. We observe important trends in some network metrics, namely the average shortest path length l, the clustering coefficient C, and the assortativity with respect to random Γ. Specifically, l slightly increases from 2.9 to 3.2, values comparable to the ones obtained when randomizing the network links. The values of C show a considerable decrease from 0.65 to 0.45, and are much higher than those obtained for the randomized network. Thus, the small-worldness of the networks decreases with years. This trend implies that the reachability of a pitch codeword becomes more difficult. The number of hops or steps to jump from one codeword to the other (as reflected by l) tends to increase and, at the same time, the local connectivity of the network (as reflected by C) tends to decrease. Additionally, Γ is always below 1, which indicates that the networks are always less assortative than random (i.e. well-connected nodes are less likely to be connected among them), a tendency that grows with time if we consider the biggest hubs of the network. The latter suggests that there are less direct transitions between ‘referential’ or common codewords. Overall, a joint reduction of the small-worldness and the network assortativity shows a progressive restriction of pitch transitions, with less transition options and more defined paths between codewords.
In contrast to pitch, timbre networks are more assortative than random. The values of l fluctuate around 4.8 and C is always below 0.01. Noticeably, both are close to the values obtained with randomly wired networks. This close to random topology quantitatively demonstrates that, as opposed to language, timbral contrasts (or transitions) are rarely the basis for a musical discourse. This does not regard timbre as a meaningless facet. Global timbre properties, like the aforementioned power law and rankings, are clearly important for music categorization tasks (one example is genre classification). Notice however that the evolving characteristics of musical discourse have important implications for artificial or human systems dealing with such tasks. For instance, the homogenization of the timbral palette and general timbral restrictions clearly challenge tasks exploiting this facet. A further example is found with the aforementioned restriction of pitch codeword connectivity, which could hinder song recognition systems (artificial song recognition systems are rooted on pitch codeword-like sequences).
Loudness distributions are generally well-fitted by a reversed log-normal function. Plotting them provides a visual account of the so-called loudness race (or loudness war), a terminology that is used to describe the apparent competition to release recordings with increasing loudness, perhaps with the aim of catching potential customers' attention in a music broadcast. The empiric median of the loudness values x grows from −22 dBFS to −13 dBFS , with a least squares linear regression yielding a slope of 0.13 dB/year. In contrast, the absolute difference between the first and third quartiles of x remains constant around 9.5 dB, with a regression slope that is not statistically significant. This shows that, although music recordings become louder, their absolute dynamic variability has been conserved, understanding dynamic variability as the range between higher and lower loudness passages of a recording. However, and perhaps most importantly, one should notice that digital media cannot output signals over 0 dBFS35, which severely restricts the possibilities for maintaining the dynamic variability if the median continues to grow.
Finally the loudness transition networks has a one-dimensional character, inferring that no extreme loudness transitions occur (one rarely finds loudness transitions to drive a musical discourse). The very stable metrics obtained for loudness networks imply that, despite the race towards louder music, the topology of loudness transitions is maintained.
Some of the conclusions reported here have historically remained as conjectures, based on restricted resources, or rather framed under subjective, qualitative, and non-systematic premises. With the present work, we gain empirical evidence through a formal, quantitative, and systematic analysis of a large-scale music collection. We encourage the development of further historical databases to be able to quantify the major transitions in the history of music, and to start looking at more subtle evolving characteristics of particular genres or artists, without forgetting the whole wealth of cultures and music styles present in the world.
** I am adding a few personal comments between the excerpts I selected below. (Olivier) **
Shazam forced us to confront the fact that a computer could hear and process music in a way that we humans simply can’t. That insight is at the heart of a new kind of thinking about music—one built on the idea that by taking massive numbers of songs, symphonies, and sonatas, turning them into cold, hard data, and analyzing them with computers, we can learn things about music that would have previously been impossible to uncover.
** I disagree: the principles founding Shazam is fundamentally different from the methodology of statistics-based musicology: in Shazam, it is purely a matching between the audio recorded on your iPhone on the audio in big database of music, there is absolutely nothing musical in this process. (Olivier) **
Computational musicology, as the relatively young field is known within academic circles, has already produced a range of findings that were out of reach before the power of data-crunching was brought to bear on music. Douglas Mason, a doctoral student at Harvard, has analyzed scores of Beatles songs and come up with a new way to understand what Bob Dylan called their “outrageous” use of guitar chords. Michael Cuthbert, an associate professor at MIT, has studied music from the time of the bubonic plague, and discovered that during one of civilization’s darkest hours, surprisingly, music became much happier, as people sought to escape the misery of life.
Meanwhile, Glenn Schellenberg, a psychologist at the University of Toronto at Mississauga who specializes in music cognition, and Christian von Scheve of the Free University of Berlin looked at the composition of 1,000 Top 40 songs from the last 50 years and found that over time, pop has become more “sad-sounding” and “emotionally ambiguous.”
“You get a bird’s eye view of something where the details are so fascinating—where the individual pieces are so engrossing—that it’s very hard for us to see, or in this case hear, the big picture...of context, of history, of what else is going on,” said Cuthbert. “Computers are dispassionate. They can let us hear things across pieces in a way that we can’t by even the closest study of an individual piece.”
As more of the world’s concertos, folk songs, hymns, and number one hits are converted into data and analyzed, it’s turning out that listening is only one of the things we can we do in order to try and understand the music we love. And it confronts us with a kind of irony: Only by transforming it into something that doesn’t look like music can we hope to hear all of its hidden notes.
When it comes to music, lyrics are only the beginning. In many cases, much more is communicated through the texture and sound of the music itself: the tune, the beat, the chord progressions, the tempo, and so on. It’s often these attributes, more than lyrics, that imbue a piece of music with the power to communicate a mood, hijack our emotions, invade our consciousness, or make us dance. And while most of us do all right when it comes to describing how a song makes us feel, we tend to fail miserably when asked to explain what it is about how it sounds—what it does, musically—that makes us feel that way.
Computer-assisted analysis can help bridge that gap—not by directly explaining our reactions, but by generating precise, technical descriptions of what is happening in a particular piece, putting it in a blender along with thousands of others, and allowing us to compare them to, say, other works by the same artist, or pieces of music from a different era. It’s no surprise that computers are well suited to this task, since math has been entwined with music theory from the beginning. The ancient Greeks understood that musical pitches have clear mathematical relationships to one another, and most music is built on time signatures, chords, and melodic structures that can be represented with numbers. In that sense, turning music into data comes relatively naturally.
Even so, it’s not easy, and researchers are constantly working on getting better at it. The most straightforward method—and the most arduous—involves painstakingly converting sheet music, note by note, into a series of numbers that can then be interpreted and statistically analyzed with a computer. This kind of work has its roots in the 1960s, when two separate research teams used an elaborate mechanism involving thousands of punch cards to encode large collections of folk songs from around the world and then looked for patterns in performance style and composition structure—in effect, turning even the subjective parts of music into data. The ethnomusicologist Alan Lomax led one of these endeavors, which he called “Cantometrics”; his team sought to compare vocal styles favored by different cultures, picking out 37 “style factors” such as breathiness and rasp.
Today, the “electronic robots” of most computational musicologists live right on their computers, and consist of specialized software programs such as Humdrum or Music21, which provide scholars with the tools they need to analyze sheet music they’ve translated into data. And these systems are generating new information within music history that would have been difficult to gain by ear alone.
David Huron, a professor at Ohio State University who created Humdrum in the 1980s, has used his program to study the rise of syncopated rhythms. With a colleague, he showed that between 1890 and 1940, the number of syncopated beats in the average measure nearly doubled, and that during the 1910s, songwriters experimented with different kinds of syncopation that had never been tried before. “No scholar had ever identified it. But that’s what came out of the data.”
Using Music21, which was designed by Michael Cuthbert and his MIT colleague Christopher Ariza, Harvard physics doctoral student Douglas Mason analyzed Beatles songs, running more than 100 of them under the microscope and discovering that the majority of them were built around one—and only one—highly unexpected chord.
Up to now, progress in computational musicology has been slow in part because of how time-consuming it has been to turn musical notation into computer data by hand. It can take scholars years of laborious research to create a complete database, even if they’re studying something relatively narrow, like the life’s work of a particular composer.
One of the groups working on making computers better at listening is an academic organization called SALAMI, which helps digital musicologists and programmers test new analytical tools, and holds a contest every year to see who can come up with the most sensitive audio software. Another is Somerville-based Echo Nest, which formed in 2005 as an outgrowth of a project at the MIT Media Lab. It has since built a business out of helping companies like MTV and Clear Channel build music recommendation tools and online radio stations. At Echo Nest, computer engineers develop algorithms that can take any mp3 file and read the raw signal for so-called “psycho-acoustic attributes” that emerge from a song’s dominant melody, tempo, rhythm, and harmonic structure.
As excited as digital musicologists are to have this high-altitude approach within reach, they tend to feel that more traditional colleagues disapprove of their replacing careful, sensitive listening with statistics. “I think arts and humanities scholars, especially in the postmodern age, don’t like to talk in big sweeping generalities,” said Huron. “We like to emphasize the individual artist, and focus on what’s unique about a particular work of art. And we take a kind of pride in being the most sensitive, the most observant. I think for many scholars, numerical methods are antithetical to the spirit of the humanities.”
“We’re not really here to replace musicologists—I want to stress that, because our old school musicologists get upset by this,” said J. Stephen Downie, a professor at the University of Illinois who serves as a principal investigator on SALAMI. “But we can change the kinds of questions they can answer. No musicologist could ever listen to 20,000 hours of music. We can get a machine to do that, and find connections that they would miss.”
For those of us outside the academy, perhaps the best way to appreciate the potential of turning large quantities of music into data currently comes in the form of recommendation engines like Pandora, as well as quirky online apps that we can use to learn new things about our favorite songs. Peachnote, a project of Munich-based computer scientist Vladimir Viro, makes use of publicly available archives of sheet music to allow users to input a basic melodic phrase or chord progression and see how it has been used throughout musical history. (Put in the iconic notes that open Scott Joplin’s 1902 song “The Entertainer,” and Peachnote will tell you that the phrase experienced a gradual rise in popularity throughout the 1800s.) A Web app developed by Echo Nest’s Paul Lamere, “Looking for the Slow Build,” allows you to plot songs that gradually get louder and build toward a climax (“Stairway to Heaven”) against ones that stay steady (“California Gurls”). Another Lamere creation, “In Search of the Click Track,” reveals which of your favorite songs were recorded with the aid of a metronome or a drum machine.
But while such toys capture the magic many of us experienced when we first saw Shazam in action, the real payoff from crunching huge amounts of music data will go beyond the music itself—and instead will tell us something about ourselves. Consider the most notable feature of Spotify, the popular music streaming service, which informs you what everyone in your social network is listening to, in real time. What makes such information so thrilling is that we think of music as a mirror: To love a song is to identify with it; to love an artist is to declare that his or her view of the world resonates with our own. And just as we can learn a lot about our friends by scrolling through their mp3 libraries, so too could someone analyze, say, the last 30 years of recorded music and tell a new kind of story about our culture as a whole. And while it’s not clear we really want to know what the massive success of LMFAO says about us, there’s always the possibility the data will reveal that it was a symptom of powerful historical forces far beyond our control. Try proving that just by listening to “Party Rock Anthem” over and over.
Functional MRI of the listening brain found that different regions become active when listening to different types of music and instrumental versus vocals. Allie Wilkinson reports.
Olivier Lartillot's insight:
"Computer algorithms were used to identify specific aspects of the music, which the researchers were able to match with specific, activated brain areas. The researchers found that vocal and instrumental music get treated differently. While both hemispheres of the brain deal with musical features, the presence of lyrics shifts the processing of musical features to the left auditory cortex.
The Internet radio service has started to mine user data for the best ways to target advertising. It can deconstruct your song choices to predict, for example, your political party of choice.
Olivier Lartillot's insight:
“After years of customizing playlists to individual listeners by analyzing components of the songs they like, then playing them tracks with similar traits, the company has started data-mining users’ musical tastes for clues about the kinds of ads most likely to engage them.”
Art practice will gain a whole new status and role in future societies. Creativity will be key to harness the new possibilities offered by science and technology, and by the hyper-connected environments that will surround us, in useful directions. Art, science and humanities will connect to help boost this wave of change and creativity in Europe.
Olivier Lartillot's insight:
Here is first of all a bit of background related to this Futurium project from the European Commission:
("Why your vote is crucial") https://ec.europa.eu/digital-agenda/futurium/en/content/get-started
“If you are interested in policy-making, this is the right place to be! Have a say on eleven compelling themes that will likely shape policy debates in the coming few decades!
They are a synthesis of more than 200 futures co-created by hundreds of "futurizens", including young thinkers as well as renowned scientists from different disciplines, in brainstorming sessions, both online and actual events all around Europe.
The themes include many insights on how policy-making could evolve in the near future. They can potentially help to guide future policy choices or to steer the direction of research funding; for instance, because they cast new light on the sweeping changes that could occur in areas like jobs and welfare; also by furthering our understanding of new routes to the greater empowerment of human-beings; and by exploring the societal impacts of the emergence of super-centenarians.
Everyone can now provide feedback and rate the relevance and timing of the themes.
Which one has the greatest impact? When will these themes become relevant?
Vote and help shape the most compelling options for future policies!”
Below is the theme “Art, sciences, humanities”. All these ideas seem to have important repercussion in music research. It would be splendid to see such ideals having an impact in future European research policies. So if you would support these ideas, please vote for this theme in the poll, which closes at the end of the week.
“The challenges facing humanity are revealing themselves as increasingly global and highly interconnected. The next few decades will give us the tools to start mastering this complexity in terms of a deeper understanding, but also in terms of policy and action with more predictability of impacts.
This will result from a combination of thus far unseen Big Data from various sources of evidence (smart grids, mobility data, sensor data, socio-economic data) along with the rise of dynamical modelling and new visualisation, analysis, and synthesis techniques (like narrative). It will also rely on a new alliance between science and society.
The virtualisation of the scientific process and the advent of social networks will allow every scientist to join forces with others in the open global virtual laboratory. Human performance enhancement and embeddable sensors will enable scientists to perceive and observe processes in the real world in new ways. New ICT tools will allow better understanding of the social processes underlying all societal actions.
Digital games will increasingly be used as training grounds for developing worlds that work – from testing new systems of governance, to new systems of economy, medical and healing applications, industrial applications, educational systems and models – across every aspect of life, work, and culture.
Digital technologies will also empower people to co-create their environments, the products they buy, the science they learn, and the art they enjoy. Digital media will break apart traditional models of art practice, production, and creativity, making production of previously expensive art forms like films affordable to anyone.
The blurring boundaries between artist and audience will completely disappear as audiences increasingly ‘applaud’ a great work by replying with works of their own, which the originating artist will in turn build upon for new pieces. Digital media creates a fertile space for a virtuous circle of society-wide creativity and art production.
Art practice will gain a whole new status and role in future societies. Creativity will be key to harness the new possibilities offered by science and technology, and by the hyper-connected environments that will surround us, in useful directions. Art, science and humanities will connect to help boost this wave of change and creativity in Europe.
•How do we engage policy makers and civic society throughout the process of gathering data and analysing evidence on global systems? How do we cross-fertilise sciences, humanities and art?
•How do we ensure reward and recognition in a world of co-creation where everyone can be a scientist or an artist from his/her own desktop? How do we deal with ownership, responsibility and liability?
•How do we keep scientific standards alive as peer-reviewed research and quality standards are challenged by the proliferation of open-access publication? How do we assure the quality and credibility of data and models?
•How do we channel the force of creativity into areas of society that are critical but often slow to change, like healthcare, education, etc.?
•How do we ensure universal access and competency with emerging digital and creative technologies? Greater engagement of citizens in science and the arts? How do we disseminate learning about creativity and the arts to currently underserved populations?
•Equitable benefit distribution: how do we ensure that the benefits scientific discoveries and innovations are distributed evenly in society?
•Clear, effective communication, across multiple languages: how do we communicate insights from complex systems analyses to people who were not participants in the process in ways that create value shifts and behavioural changes to achieve solutions to global issues?
•Can the development of new narratives and metaphors make scientific results accessible to all humanity to reframe global challenges?
•Can the virtualisation of research and innovation lifecycles, the multidisciplinary collaboration and the cross fertilisation with arts and humanities help improve the impact of research?
•Transformation of education: how might the roles of schools and professional educators evolve in the light of the science and art revolution? What might be the impact on jobs and productivity?
•How do we respond to the increasing demand for data scientists and data analysts?
•How do we cope with unintended and undesirable effects of pervasive digitization of society such as media addictions, IPR and authenticity, counterfeiting, plagiarism, life history theft? How do we build trust in both artists and audiences?
•How do we ensure that supercomputing, simulation and big data are not invasive to privacy and support free will and personal aspirations?
•Can crowd-financing platforms for art initiatives balance the roles in current artistic economies (e.g. arts granting agencies, wealthy patrons)?
•How do we harness digital gaming technologies, and developments in live gaming, to allow users to create imagined worlds that empower them and the communities they live within?”
Olivier Lartillot's insight:
[ Note from curator: Wired already wrote an article about Carlsson and his compressed sensing method 4 years ago.
There are interesting critical comments about this article in Slashdot: http://science.slashdot.org/comments.pl?sid=4328305&cid=45105969
“It is not sufficient to simply collect and store massive amounts of data; they must be intelligently curated, and that requires a global framework. “We have all the pieces of the puzzle — now how do we actually assemble them so we can see the big picture? You may have a very simplistic model at the tiny local scale, but calculus lets you take a lot of simple models and integrate them into one big picture.” Similarly, modern mathematics — notably geometry — could help identify the underlying global structure of big datasets.
Gunnar Carlsson, a mathematician at Stanford University, is representing cumbersome, complex big data sets as a network of nodes and edges, creating an intuitive map of data based solely on the similarity of the data points; this uses distance as an input that translates into a topological shape or network. The more similar the data points are, the closer they will be to each other on the resulting map; the more different they are, the further apart they will be on the map. This is the essence of topological data analysis (TDA).
TDA is an outgrowth of machine learning, a set of techniques that serves as a standard workhorse of big data analysis. Many of the methods in machine learning are most effective when working with data matrices, like an Excel spreadsheet, but what if your data set doesn’t fit that framework? “Topological data analysis is a way of getting structured data out of unstructured data so that machine-learning algorithms can act more directly on it.”
As with Euler’s bridges, it’s all about the connections. Social networks map out the relationships between people, with clusters of names (nodes) and connections (edges) illustrating how we’re all connected. There will be clusters relating to family, college buddies, workplace acquaintances, and so forth. Carlsson thinks it is possible to extend this approach to other kinds of data sets as well, such as genomic sequences.”
[… and music?!]
“One can lay the sequences out next to each other and count the number of places where they differ,” he explained. “That number becomes a measure of how similar or dissimilar they are, and you can encode that as a distance function.”
The idea behind topological data analysis is to reduce large, raw data sets of many dimensions to compressed representation of the data sets in smaller lower dimensions without sacrificing the most relevant topological properties. Ideally, this will reveal the underlying shape of the data. For example, a sphere technically exists in every dimension, but we can perceive only the three spatial dimensions. However, there are mathematical glasses through which one can glean information about these higher-dimensional shapes, Carlsson said. “A shape is an infinite number of points and an infinite amount of distances between those points. But if you’re willing to sacrifice a little roundness, you can represent [a circle] by a hexagon with six nodes and six edges, and it’s still recognizable as a circular shape.”
That is the basis of the proprietary technology Carlsson offers through his start-up venture, Ayasdi, which produces a compressed representation of high dimensional data in smaller bits, similar to a map of London’s tube system. Such a map might not accurately represent the city’s every last defining feature, but it does highlight the primary regions and how those regions are connected. In the case of Ayasdi’s software, the resulting map is not just an eye-catching visualization of the data; it also enables users to interact directly with the data set the same way they would use Photoshop or Illustrator. “It means we won’t be entirely faithful to the data, but if that set at lower representations has topological features in it, that’s a good indication that there are features in the original data also.”
Topological methods are a lot like casting a two-dimensional shadow of a three-dimensional object on the wall: they enable us to visualize a large, high-dimensional data set by projecting it down into a lower dimension. The danger is that, as with the illusions created by shadow puppets, one might be seeing patterns and images that aren’t really there.
It is so far unclear when TDA works and when it might not. The technique rests on the assumption that a high-dimensional big data set has an intrinsic low-dimensional structure, and that it is possible to discover that structure mathematically. Recht believes that some data sets are intrinsically high in dimension and cannot be reduced by topological analysis. “If it turns out there is a spherical cow lurking underneath all your data, then TDA would be the way to go,” he said. “But if it’s not there, what can you do?” And if your dataset is corrupted or incomplete, topological methods will yield similarly flawed results.
Emmanuel Candes, a mathematician at Stanford University, and his then-postdoc, Justin Romberg, were fiddling with a badly mangled image on his computer, the sort typically used by computer scientists to test imaging algorithms. They were trying to find a method for improving fuzzy images, such as the ones generated by MRIs when there is insufficient time to complete a scan. On a hunch, Candes applied an algorithm designed to clean up fuzzy images, expecting to see a slight improvement. What appeared on his computer screen instead was a perfectly rendered image. Candes compares the unlikeliness of the result to being given just the first three digits of a 10-digit bank account number, and correctly guessing the remaining seven digits. But it wasn’t a fluke. The same thing happened when he applied the same technique to other incomplete images.
The key to the technique’s success is a concept known as sparsity, which usually denotes an image’s complexity, or lack thereof. It’s a mathematical version of Occam’s razor: While there may be millions of possible reconstructions for a fuzzy, ill-defined image, the simplest (sparsest) version is probably the best fit. Out of this serendipitous discovery, compressed sensing was born. With compressed sensing, one can determine which bits are significant without first having to collect and store them all.
This approach can even be useful for applications that are not, strictly speaking, compressed sensing problems, such as the Netflix prize. In October 2006, Netflix announced a competition offering a $1 million grand prize to whoever could improve the filtering algorithm for their in-house movie recommendation engine, Cinematch. An international team of statisticians, machine learning experts and computer engineers claimed the grand prize in 2009, but the academic community in general also benefited, since they gained access to Netflix’s very large, high quality data set. Recht was among those who tinkered with it. His work confirmed the viability of applying the compressed sensing approach to the challenge of filling in the missing ratings in the dataset.
Cinematch operates by using customer feedback: Users are encouraged to rate the films they watch, and based on those ratings, the engine must determine how much a given user will like similar films. The dataset is enormous, but it is incomplete: on average, users only rate about 200 movies, out of nearly 18,000 titles. Given the enormous popularity of Netflix, even an incremental improvement in the predictive algorithm results in a substantial boost to the company’s bottom line. Recht found that he could accurately predict which movies customers might be interested in purchasing, provided he saw enough products per person. Between 25 and 100 products were sufficient to complete the matrix.
“We have shown mathematically that you can do this very accurately under certain conditions by tractable computational techniques,” Candes said, and the lessons learned from this proof of principle are now feeding back into the research community.
Recht and Candes may champion approaches like compressed sensing, while Carlsson and Coifman align themselves more with the topological approach, but fundamentally, these two methods are complementary rather than competitive. There are several other promising mathematical tools being developed to handle this brave new world of big, complicated data. Vespignani uses everything from network analysis — creating networks of relations between people, objects, documents, and so forth in order to uncover the structure within the data — to machine learning, and good old-fashioned statistics.
Coifman asserts the need for an underlying global theory on a par with calculus to enable researchers to become better curators of big data. In the same way, the various techniques and tools being developed need to be integrated under the umbrella of such a broader theoretical model. “In the end, data science is more than the sum of its methodological parts,” Vespignani insists, and the same is true for its analytical tools. “When you combine many things you create something greater that is new and different.”
George Tzanetakis provides give an overview of techniques, applications and capabilities of music information retrieval systems.
Olivier Lartillot's insight:
Great tutorial by George Tzanetakis about the research on computational music analysis (a discipline known as Music Information Retrieval). The tutorial includes introduction of engineering techniques commonly used in those research.
Here are the discussion topics that you will find:
Music Information Retrieval
Audio Feature Extraction
Linear Systems and Sinusoids
Short Time Fourier Transform
Spectrum and Shape Descriptors
Mel Frequency Cepstral Coefficients
Audio Feature Extraction
Chroma – Pitch Perception
Automatic Rhythm Description
Content-based Similarity Retrieval (or query-by-example)
Polyphonic Audio-Score Alignment
Dynamic Time Warping
The MUSART system
After scientists earlier this year claimed to have proved that music has been sliding a path of diminishing returns and actually does all sound the same, musicologist Stephen Graham points out why pop music is probably as exciting now as it was in 1955.
Stephen Graham is a musicologist and music critic based at Goldsmith's College, and an Editor at the Journal of Music.
Olivier Lartillot's insight:
I "scooped" about that scientific research criticized here, in a post I wrote a few months ago, available here:
Here is below a copy (with some short edits) of this new critical essay by Stephen Graham, that you can read entirely on thequietus.com (click on the title above).
Pedants and cranks have been predicting pop music's demise ever since its emergence in its modern form in the 1950s and 1960s. Always informed by a strong dose of cultural prejudice, sometimes these predictions take the form of blunt broadsides about perceived pop ephemerality, sometimes they are accompanied by chauvinistic gnashings of teeth about authenticity, and sometimes they are snuck in under the radar of 'objective' scientific analysis. I want to discuss something that falls most definitely into that insidious last camp.
A paper published in Scientific Reports on July 26 this year claims to 'measure the evolution of contemporary western popular music', statistically analysing a 'dataset' of 465,259 songs dating back to 1955 and distributed evenly across those years and widely across various popular music genres. The paper analyses its dataset under three main criteria: pitch, timbre and loudness. The authors of the paper found a growing homogenisation of the first two of these 'primary musical facets', and additionally detected 'growing loudness levels' in pop music of the past fifty-seven years.
These findings are not in themselves inherently problematic, although they derive from a deeply flawed methodology, to which I'll come below. What are deeply problematic are the conclusions drawn from them by the authors themselves, first, and by a range of journalists and music critics, second. My argument is that, in themselves, the measurements that the paper puts forward can basically be taken in good faith. However, since these measurements only take into account some elements of music on the one hand, and nothing of musical meaning and context on the other, the authors' attempt to build them into a grand narrative about pop music's evolving 'value' is shaky at best, and revealing of quite ugly cultural prejudices at worst.
Loud and homogenous pop?
The authors of the paper claim that they 'observe a number of trends in the evolution of contemporary popular music'. 'These point', they say, 'towards less variety in pitch transitions, towards a consistent homogenisation of the timbral palette, and towards louder and, in the end, potentially poorer volume dynamics'. As is their wont when confronted with apparently authoritative scientific 'data', journalists had a field day with the scientist's interpretations. Many used the paper's comparatively neutral conclusions as a springboard for a plethora of sweeping and condescending generalisations.
However, the Scientific Reports paper itself, and, consequently, many of the newspaper and web articles that reported its findings and conclusions with such misplaced ideological glee, suffers from two fundamental and fatal flaws.
First, the paper's analytical framework is inadequate. Its claims to authority are hampered by the absence from its supposedly representative dataset of one of the key elements of music, rhythm, whether that be harmonic rhythm, timbral rhythm, melodic rhythm, or the mensural rhythms of tempo and metre. Although the paper uses 'the temporal resolution of the beat' to aid 'discretisation' of its musical facets, the word 'rhythm' does not appear once in an article aspiring to answer questions about the inner nature of musical discourse and musical evolution.
Similarly, since we are dealing with popular music, the absence of language from the sample frame, such as that contained in titles, lyrics, slogans or other pertinent materials, is just as deleterious. Finally, harmony is also ignored to a significant degree, since although the paper focuses on timbre and pitch, precise heirarchisations of pitch, such as chord voicings or layering of the musical texture in order to articulate bass, harmony and melody, are excluded from the analysis. Horizontal or consecutive pitch relationships are elevated over vertical or simultaneous ones.
A constituent issue of this first fundamental flaw, deriving more from the paper's methodology than its sample frame, relates to the fact that the authors isolate and thus absolutise various musical 'facets' (timbre, pitch and loudness). These 'facets', in themselves, have little business being isolated, since they gain meaning from various contexts; musical, social, cultural or otherwise. An abstracted set of pitches means little besides itself when considered separately from timbre, rhythm, phrasing, use of technology, language and other technical and expressive 'ingredients' of music. Although many valid and valuable analyses have been carried out under precisely this sort of isolationist rubric, the key point is that specific findings about pitch should not be extrapolated into generalised propositions about how music works; data about pitch organisation are just that, and are not in themselves anything more. The analytical framework of the paper thus pivots on the fallacy of misplaced concreteness, where constituent elements of music are seen as more distinct than they really are.
The second fundamental flaw of the paper also relates to this point about isolating and decontextualising musical 'facets', to continue to use the authors' terminology. I noted above that facets such as pitch gain meaning once they are situated in musical contexts. Equally important to this 'meaning' are the socio-cultural discourses through which music becomes encoded with conventionalised meanings.
Is objective 'data' about musical evolution possible?
As Roland Barthes famously wrote, 'a text's unity lies not in its origin, but in its destination'; an aphorism blithely ignored by the authors of the Scientific Reports paper and by the journalists who appropriated the scientists' findings, all of whose assertions about pop music stay at the level of technical design and thus ignore vital emergent phenomena and processes of perception, interpretation and meaning. It is indeed reasonable to attempt to generate 'objective' data about music in order to 'identify some of the patterns behind music creation'. But in doing so analysts must first of all ensure that the analytical framework is as all-encompassing as possible.
Second, they must avoid circularity in building conclusions, a pervasive fault of the paper here, where the authors claim that 'our perception of the new would be rooted on these changing characteristics' (i.e. on the criteria utilised in the paper). This is straight up circular reasoning based on an exclusion bias. Music is reduced by the paper to loudness, timbre and pitch, and in doing so the horizon of the 'new' is likewise reduced to these facets. If 'music' is disclosed by the paper, than any possibility of it becoming 'new' must therefore derive from that disclosure. But there's much more to music than is here, and perceptions of the new have to do with a much fuller panoply of musical facets, as well as, of course, shifting patterns of meaning, than they are given credit for here.
Third, and finally, it is vital that if analysts or journalists are seeking to draw conclusions about music's meaning and value, then due heed must be paid to the socio-cultural discourses that largely generate music's meaning. Otherwise their analyses will simply serve to perpetuate antiquated ideas about what is and what is not musically worthwhile, and about what music might be seen to 'mean'.
One of the universal appeals of music lies in its mysterious ability to manipulate and reflect our emotions. Even the simplest of tunes can evoke strong feelings of joy, fear,...
Olivier Lartillot's insight:
“Similarly as looking at new ways of finding TV programmes by mood, similar research is applied to music.
As you can imagine, getting a computer to understand human emotions has its challenges - three, in fact. The first one is how to numerically define mood. This is a complicated task as not only do people disagree on the mood of a track, but music often expresses a combination of emotions. Over the years, researchers have come up with various models, notably Hevner's clusters which define eight mood categories, and Russell's circumplex model, which represents mood as a point on a two-dimensional plane. Both approaches have their drawbacks, so researchers at QMUL Centre for Digital Music are developing a model which combines the strengths of both. The model will be based on earlier research conducted on the emotional similarity of common keywords.
The next challenge is processing the raw digital music into a format that the computer can handle. This should be a small set of numbers that represent what a track sounds like. They are created by running the music through a set of algorithms, each of which produce an array of numbers called 'features'. These features represent different properties of the music, such as the tempo and what key it's written in. They also include statistics about the frequencies, loudness and rhythm of the music. The trick lies in finding the right set of features that describe all the properties of music that are important for expressing emotion.
Now for the final challenge. We need to find out exactly how the properties of the music work together to produce different emotions. Even the smartest musicologists struggle with this question, so - rather lazily - this is left to the computer to work it out.
Machine learning is a method of getting a computer to 'learn' how two things are related by analysing lots of real-life examples. In this case, it is looking at the relationship between musical features and mood. There are a number of algorithms that could be used, but the popular 'support vector machine' (SVM) has been shown to work for this task and can handle both linear and non-linear relationships.
For the learning stage to be successful, the computer will need to be 'trained' using thousands of songs that have accompanying information about the mood of each track. This kind of collection is very hard to come across, and researchers often struggle to find appropriate data sets. Not only that, but the music should cover a wide range of musical styles, moods and instrumentation.
Although the Desktop Jukebox is mostly composed of commercial music tracks, it also houses a huge collection of what is known as 'production music'. This is music that has been recorded using session artists, and so is wholly owned by the music publishers who get paid each time the tracks are used. This business model means that they are keen to make their music easy to find and search, so every track is hand-labelled with lots of useful information.
Through project partners at I Like Music, the BBC Research and Development Group obtained over 128,000 production music tracks to use in our research. The tracks, which are sourced from over 80 different labels, include music from every genre.
The average production music track is described by 40 keywords, of which 16 describe the genre, 12 describe the mood and 5 describe the instrumentation. Over 36,000 different keywords are used to describe the music, the top 100 of which are shown in the tag cloud below. Interestingly, about a third of the keywords only appear once, including such gems as 'kangaroove', 'kazoogaloo', 'pogo-inducing' and 'hyper-bongo'.
In order to investigate how useful the keywords are in describing emotion and mood, the relationships between the words were analysed. The method was to calculate the co-occurrence of keyword pairs - that is, how often a pair of words appear together in the description of a music track. The conjecture was that words which appear together often have similar meanings.
Using the top 75 mood keywords, the co-occurrence of each pair in the production music database were calculated to produce a large matrix. In order to make any sense out of it, the keywords and the connections between them were visualized. Those with strong connections (that often appeared together) were positioned close to each other, and those with weak connections further apart.
The keywords arranged themselves into a logical pattern, where negative emotions were on the left and positive emotions on the right, with energetic emotions on top and lethargic emotions on the bottom. This roughly fits Russell's arousal-valence plane, suggesting that this model may be a suitable way to describe moods in the production music library, however more research is required before a model is chosen.
The BBC Research and Development Group has been working with the University of Manchester to extract features from over 128,000 production music files using the N8 cluster. Once that is work is complete, they will be able to start training and testing musical mood classifiers which can automatically label music tracks.”
“Jehan's research focused on teaching computers to capture the sonic elements of music, while Whitman's studied the cultural and social components. In combining the two approaches they created the Echo Nest, one of the most important digital music companies few have heard about.
Starting in 2005, they set about creating a vast database, a music brain that, based on your interest in Kanye West, can suggest you check out rapper Drake. Sound like Pandora? It's similar – but on a massive scale.
A computer program analyzes songs for their fundamental elements such as key and tempo.
[In Pandora, this music analysis is performed manually by human – (Olivier)]
While Echo Nest's approach is unique, other firms, like Gracenote and Rovi, also compile and market music data. (Apple's iTunes relies on Gracenote, for instance.) Some services, notably Pandora, have built proprietary systems that could compete with Echo Nest.
A cool new music service you've never heard of has hit the web, offering deep insights into the structure and rhythm of popular songs.
Created by Japan’s Advanced Industrial Science and Technology Institute, Songle (SONG-lee) analyzes music tracks hosted on the web and reveals the underlying chords, beats, melodies and repeats. Listeners can see how a song is laid out, and jump immediately to the chorus if they choose. They can search for music based on chord, beat and melody structures, such as all songs with the chord progression Am, B, E. There is also a visualization engine synchronized to a song’s core elements.
“This is a showcase for active music listening,” said Songle creator Dr. Masataka Goto, leader of the Media Interaction Group at AIST’s Information Technology Research Group. “Listeners can browse within songs.”
Goto cautioned against reading too much into Songle at this point, highlighting it’s educational and entertainment value instead.
Users submit links to tracks hosted online and Songle analyzes them in about 5-10 minutes, adding the metadata (but no copy of the song) to its database. The service, which launched in Japan in August, has analyzed about 80,000 tracks so far.
Songle has an embeddable player with a visualizer that adds graphics synchronized to a song to any web page using the embed code. It also allows listeners to provide corrections to the estimates created by its music analytics algorithm, potentially improving accuracy.
ISMIR (International Society for Music Information Retrieval) conference is a conference on the processing, searching, organizing and accessing music-related data. It attracts a research community that is intrigued by the revolution in music distribution and storage brought about by digital technology which generated quite some research activity and interest in academia as well as in industry.
In this discipline, referred to as Music Information Retrieval (or MIR for short), the topic is not so much to understand and model music (like in the field of music cognition), but to design robust and effective methods to locate and retrieve musical information, including tasks like query-by-humming, music recommendation, music recognition, and genre classification.
A common approach in MIR research is to use information-theoretic models to extract information from the musical data, be it the audio recording itself or all kinds of meta-data, such as artist or genre classification. With advanced machine learning techniques, and the availability of so-called ‘ground truth’ data (i.e., annotations made by experts that the algorithm uses to decide on the relevance of the results for a certain query), a model of retrieving relevant musical information is constructed. Overall, this approach is based on the assumption that all relevant information is present in the data and that it can, in principle, be extracted from that data (data-oriented approach).
Several alternatives have been proposed, such as models based on perception-based signal processing or mimetic and gesture-based queries. However, with regard to the cognitive aspects of MIR (the perspective of the listener), some information might be implicit or not present at all in the data. Especially in the design of similarity measures (e.g., ‘search for songs that sound like X’) it becomes clear quite quickly that not all required information is present in the data. Elaborating state-of-the-art MIR techniques with recent findings from music cognition seems therefore a natural next step in improving (exploratory) search engines for music and audio (cognition-based approach).
A creative paper, discussing the differences and overlaps between the two fields in dialog form, is about to appear in the proceedings of the upcoming ISMIR conference. Emanuel Bigand –a well-known music cognition researcher–, and Jean-Julien Aucouturier –MIR researcher–, wrote a fictitious dialog:
THIS has been the crossover year for Big Data — as a concept, as a term and, yes, as a marketing tool. Big Data has sprung from the confines of technology circles into the mainstream.
First, here are a few, well, data points: Big Data was a featured topic this year at the World Economic Forum in Davos, Switzerland, with a report titled “Big Data, Big Impact.” In March, the federal government announced $200 million in research programs for Big Data computing.
Rick Smolan, creator of the “Day in the Life” photography series, has a new project in the works, called “The Human Face of Big Data.” The New York Times has adopted the term in headlines like “The Age of Big Data” and “Big Data on Campus.” And a sure sign that Big Data has arrived came just last month, when it became grist for satire in the “Dilbert” comic strip by Scott Adams. “It comes from everywhere. It knows all,” one frame reads, and the next concludes that “its name is Big Data.”
The Big Data story is the making of a meme. And two vital ingredients seem to be at work here. The first is that the term itself is not too technical, yet is catchy and vaguely evocative. The second is that behind the term is an evolving set of technologies with great promise, and some pitfalls.
Big Data is a shorthand label that typically means applying the tools of artificial intelligence, like machine learning, to vast new troves of data beyond that captured in standard databases. The new data sources include Web-browsing data trails, social network communications, sensor data and surveillance data.
The combination of the data deluge and clever software algorithms opens the door to new business opportunities. Google and Facebook, for example, are Big Data companies. The Watson computer from I.B.M. that beat human “Jeopardy” champions last year was a triumph of Big Data computing. In theory, Big Data could improve decision-making in fields from business to medicine, allowing decisions to be based increasingly on data and analysis rather than intuition and experience.
“The term itself is vague, but it is getting at something that is real,” says Jon Kleinberg, a computer scientist at Cornell University. “Big Data is a tagline for a process that has the potential to transform everything.”
Rising piles of data have long been a challenge. In the late 19th century, census takers struggled with how to count and categorize the rapidly growing United States population. An innovative breakthrough came in time for the 1890 census, when the population reached 63 million. The data-taming tool proved to be machine-readable punched cards, invented by Herman Hollerith; these cards were the bedrock technology of the company that became I.B.M.
So the term Big Data is a rhetorical nod to the reality that “big” is a fast-moving target when it comes to data. The year 2008, according to several computer scientists and industry executives, was when the term “Big Data” began gaining currency in tech circles. Wired magazine published an article that cogently presented the opportunities and implications of the modern data deluge.
This new style of computing, Wired declared, was the beginning of the Petabyte Age. It was an excellent magazine piece, but the “petabyte” label was too technical to be a mainstream hit — and inevitably, petabytes of data will give way to even bigger bytes: exabytes, zettabytes and yottabytes.
Many scientists and engineers at first sneered that Big Data was a marketing term. But good marketing is distilled and effective communication, a valuable skill in any field. For example, the mathematician John McCarthy made up the term “artificial intelligence” in 1955, when writing a pitch for a Rockefeller Foundation grant. His deft turn of phrase was a masterstroke of aspirational marketing.
In late 2008, Big Data was embraced by a group of the nation’s leading computer science researchers, the Computing Community Consortium, a collaboration of the government’s National Science Foundation and the Computing Research Association, which represents academic and corporate researchers. The computing consortium published an influential white paper, “Big-Data Computing: Creating Revolutionary Breakthroughs in Commerce, Science and Society.”
Its authors were three prominent computer scientists, Randal E. Bryant of Carnegie Mellon University, Randy H. Katz of the University of California, Berkeley, and Edward D. Lazowska of the University of Washington.
Their endorsement lent intellectual credibility to Big Data. Rod A. Smith, an I.B.M. technical fellow and vice president for emerging Internet technologies, says he likes the term because it nudges people’s thinking up from the machinery of data-handling or precise measures of the volume of data.
“Big Data is really about new uses and new insights, not so much the data itself,” Mr. Smith says.
My Presentation of "An Integrated Framework for Transcription, Modal and Modal and Motivic Analysis of Maqam Improvisation" at the 2nd CompMusic workshop, in Istanbul (Jul 12-13, 2012).
The CréMusCult project is dedicated to the study of oral/aural creativity in Mediterranean traditional cultures, and especially in Maqam music. Through a dialogue between anthropological survey, musical analysis and cognitive modeling, one main objective is to bring to light the psychological processes and interactive levels of cognitive processing underlying the perception of modal structures in Maqam improvisations.
We propose a comprehensive modeling of the analysis of maqam music founded on a complex interaction between progressive bottom-up processes of transcription, modal analysis and motivic analysis and the impact of top-down influence of higher-level information on lower-level inferences. The model was first designed specifically for the analysis of a particular Tunisian Maqam, and is progressively generalized to other maqamat and to other types of maqam/makam music.