andreasjansson / cantable-diffuguesion

Bach chorale generation and harmonization

Demo API Examples Versions (24f3dfb8)

Run time and cost

Predictions run on Nvidia A100 (40GB) GPU hardware.

Cantable Diffuguesion

Bach chorale generation and harmonization


You can use Cantable Diffuguesion to generate Bach chorales unconditionally, or harmonize melodies or parts of melodies.

For harmonization we use tinyNotation, with a few modifications: * The ? symbol followed by a duration denotes a section that the model should in-paint, e.g. ?2 will in-paint a half note duration. * The ?* symbol will in-paint everything between a defined beginning and an end, e.g. c2 ?* B4 c2 will start the piece with c2, then generate notes for the specified duration, and finally the melody will end with B4 c2. * Optional bars | are ignored and can be used to make the melody notation more pleasing.


Cantable Diffuguesion is a diffusion model trained to generate Bach chorales. Four-part chorales are presented to the network as 4-channel arrays. The pitches of the individual parts are activated in the corresponding channel of the array. Here is a plot of a single input example, where the four channels are plotted on separate images:

As in Stable Diffusion, a U-Net is trained to predict the noise residual.

After training the generative model we add 8 channels to the inputs, with the middle four channels representing a mask, and the last four channels are masked chorales. We randomly mask the four channels individually, as opposed to Stable Diffusion Inpainting that use a one-channel mask.

The two plots below show a mask and a masked input array:


We use all four-part pieces in the Music21 Bach Chorales corpus. 85% are used for training, the rest for validation and testing.