GACELA: A generative adversarial context encoder for long audio inpainting

GACELA is an adversarial context encoder designed to restore missing segments of ca. 1 second duration in musical audio data, i.e., to perform audio inpainting of long gaps. When inpainting a long gap, the audio characteristics and underlying structure of the generated signal must match precisely that of the original signal, making this a particularly difficult task. To address this, we use a generative adversarial network with five discriminators that simultaneously assess the quality of the generated signal and how it respects the general structure of the original signal. Additionally, the system is capable of generating different solutions for a single gap, since it models the uncertainty inherent to the task. We tested this system with listening tests on different types of music signals and on gaps ranging from 375ms to 1500ms duration. We found that while subjects were often able to detect the signals inpainted by the system, they did not find the inpainting to be disturbing.

GACELA: A generative adversarial context encoder for long audio inpainting

The adversarial context encoder

Effect of the complexity

Effect of the gap length

Latent variable

Comparison to SGA