The adversarial context encoder
A general description of the system.
GACELA is an adversarial context encoder designed to restore missing segments of ca. 1 second duration in musical audio data, i.e., to perform audio inpainting of long gaps. When inpainting a long gap, the audio characteristics and underlying structure of the generated signal must match precisely that of the original signal, making this a particularly difficult task. To address this, we use a generative adversarial network with five discriminators that simultaneously assess the quality of the generated signal and how it respects the general structure of the original signal. Additionally, the system is capable of generating different solutions for a single gap, since it models the uncertainty inherent to the task. We tested this system with listening tests on different types of music signals and on gaps ranging from 375ms to 1500ms duration. We found that while subjects were often able to detect the signals inpainted by the system, they did not find the inpainting to be disturbing.
A general description of the system.
Sound samples for music pieces of different complexities.
Sound samples for different gap lengths.
Sound samples fixing the context and drawing different samples from the latent variable.
Sound samples from our system on the songs used for the listening test in the similarity graph algorithm.