Beat Detection

Tempo is defined by Cambridge Dictionary as “the speed at which a piece of music is played” and is measured in beats per minute. Tempo is usually what is trying to be followed when detecting a beat; this is relatively easy for humans to do by ear but seemingly challenging to be performed by a computer.

http://archive.gamedev.net/archive/reference/programming/features/beatdetection/ has a range of techniques for detecting the beat of a song.

The first group of techniques revolves around “statistical streaming beat detection” with the first example of this being the analysis of sound energy. A sound will be heard as a beat when the energy of that sound is “largely superior” to the energies that have come before it. This distinct variation in the energy is a beat. To detect beats, an approach is to look at sound energy peaks.

To do this, compute the average sound energy of a signal and compare it to the “instant sound energy” at any given moment. The average energy here is not of the entire song as the beat of a song can change throughout, but rather the average sound energy of nearby samples. The article goes on to say:

an, bn = list of sound amplitude values captured every Te seconds for the left and right channels.
“instant sound energy” will be represented by 1024 samples from both an and bn. 1024 samples is approximately 5×10^-2 seconds.

To detect a beat, the energy of the sound has to be greater than the local average energy. The “local average” will be represented by 44032 (closest number to 44100 divisible by 1024) samples, from an and bn, which is approximately one second.

Equation for finding the “instant energy”

Equation for finding “local average energy”

B[0] and B[1] is just an and bn.

By comparing e to <E>*C, where C is a constant which adjusts the sensitivity of the algorithm, if e is greater than <E>*C then it is a beat.

A way of improving this algorithm is by keeping a list of the history of instant energies (E). E has to be approximately 1 second (which is 44032 samples). Since 44032/1024 is 43, E[0] will contain the newest energy and E[42] will contain the oldest energy. All values for E are calculated on groups of 1024 samples.

This algorithm is also shown at https://mziccard.me/2015/05/28/beats-detection-algorithms-1/.

The obvious drawback to this is how to choose the constant C. This value can be calculated using another algorithm.

This finds the variance of the energies of the energy value and the average energy value. V is then used in this algorithm:

Again, if e > (<E> * C) then there is a beat.

Another method of beat detection is Frequency analysis using the Fast Fourier Transform. This is done by looking at the frequency spectrum that is produced by using FFT. The frequency energy from each sub-band can then be calculated and compared, much like the sound energy method.

If the number of sub-bands was set to 32 then we could compute the “instant energy” here using:

Es[i] is the energy of the sub-band i which runs from 0-31. B contains the 1024 frequency amplitudes of the signal. The average energy is therefore computed using:

<Ei> contains the average of the last second of the signal. Now to see if there is a beat, the energy at a given sub-band has to be greater than the value of that sub-band’s average * C i.e. Es[i] > (<Ei> * C).

These are then implemented in Processing in the “Minim” library that can be found here: http://code.compartmental.net/tools/minim/. In the examples they show both sound energy and frequency energy analysis. Although the audio they provide has a clear beat, it is not as good at finding the beat of other audio I have fed in to it.

I know that beat detection will play a role in my visualisation, I am just unsure to implement it properly and in an effective way.

Something else I found that could be useful, although this might in too much detail.

Not sure whether some of the processes done here could help

Share this:

Related

Leave a comment Cancel reply