Under Carmine Cella, CNMAT students Alois Cerbu and Luke Dzwonczyk published three papers this summer at both the International Computer Music Conference (ICMC) in Seoul, Korea and the Digital Audio Effects Conference (DAFx) in Surrey, UK.

 

Reprogrammable Effects Pedals on the Daisy Seed Platform

Alois Cerbu, Carmine Cella

ICMC 2024, Seoul, Korea

Many creative audio effects are available only as plug-ins to digital audio workstations, or as boutique hardware devices. We describe the implementation of three realtime digital effects algorithms – a strobe tuner, spectral processor (with denoising & phase vocoder frequency estimation), and realtime granular synthesizer – designed to run on an inexpensive open-source hardware platform powered by the ElectroSmith Daisy Seed microcontroller. The platform has the form factor of a guitar stompbox; we discuss the merits of portable, reprogrammable hardware audio processing solutions for live performance, and future directions for this project.

Github: https://github.com/amcerbu/StrobeSpectralGranular

Audio Visualization via Delay Embedding and Subspace Learning
Alois Cerbu, Carmine Cella

DAFx 2024, Surrey, UK


We describe a sequence of methods for producing videos from audio signals. Our visualizations capture perceptual features like harmonicity and brightness: they produce stable images from periodic sounds and slowly-evolving images from inharmonic ones; they associate jagged shapes to brighter sounds and rounded shapes to darker ones. We interpret our methods as adaptive FIR filterbanks and show how, for larger values of the complexity parameters, we can perform accurate frequency detection without the Fourier transform. Attached to the paper is a code repository containing the Jupyter notebook used to generate the images and videos cited. We also provide code for a realtime C++ implementation of the simplest visualization method. We discuss the mathematical theory of our methods in the two appendices. 

 

Github: https://github.com/amcerbu/Delay-Embedding-and-Subspace-Learning

  

Network Bending of Diffusion Models for Audio-visual Generation

Luke Dzwonczyk, Carmine Cella, David Ban

DAFx 2024, Surrey, UK

In this paper we present the first steps towards the creation of a tool which enables artists to create music visualizations using pre-trained, generative, machine learning models. First, we investigate the application of network bending, the process of applying transforms within the layers of a generative network, to image generation diffusion models by utilizing a range of point-wise, tensor-wise, and morphological operators. We identify a number of visual effects that result from various operators, including some that are not easily recreated with standard image editing tools. We find that this process allows for continuous, fine-grain control of image generation which can be helpful for creative applications. Next, we generate music-reactive videos using Stable Diffusion by passing audio features as parameters to network bending operators. Finally, we comment on certain transforms which radically shift the image and the possibilities of learning more about the latent space of Stable Diffusion based on these transforms.

Link to paper: https://arxiv.org/pdf/2406.19589

Supplemental images and videos: https://dzluke.github.io/DAFX2024/