Naive Constant Q transform implementation in Haskell. CQT is more suitable for melodic input data than discrete Fourier transform because the transform resolution in lower frequency bands is bigger - in other words the bin frequency/resolution ratio remains constant.
Example of Constant Q transform of a violoncello sample
The code is structured as a Stack package. The build should be easily reproducible by running stack build
in the root directory.
Usage: constant-q-exe [OPTIONS]... -i INPUTFILE -o OUTPUTFILE
Process INPUTFILE (in WAV format) and save the Constant Q transform
spectrogram to OUTPUTFILE (png image format).
Options:
-min FREQ Set minimum frequency to FREQ (default 110.0)
-max FREQ Set maximum frequency to FREQ (default 11000.0)
-b NUM Set the number of frequency bins in one octave (default 48)
-p NUM Set the hop size (default 1024)
-q NUM Set the Q factor (quality of frequency resolution) (default 72)
-h Print a help message and exit
stack exec -- constant-q-exe -i input/cello.wav -o doc/cello.png -q 50 -p 200 -min 220
Setting the minimum frequency to 220 Hz and Q factor down to 50 speeds up the computation considerably. The hop size was adjusted to create an image with bigger width.
The output image is shown in the introduction section of the readme.
The input file is read using WAVE-0.1.3
library, only WAV files are supported. The input samples are sliced up to a list of suffixes with hop size
gaps. The program then evaluates the CQ transform on these subsets of the input, which is similar to computing short term Fourier transform (through the direct, naive approach), the results of each of the subset are the pixel columns in the output image. To reduce spectral leak across frequency bins, I used Hann window.
To speed up the computation, repeating computations are cached (see transformFactorMemoized
). I tried straightforward parallelisation by using the parallel
package but it did not help and in some cases the runtime was actually slower. Unfortunately I don't understand why since the computation of the image columns is mutually independent.