Tag Archives: C++

Procedural Mesh Generation from Live Audio in Unreal Engine 4

Source: https://github.com/Tsarpf/UE4-Procedural-Audio

Here are two example videos of it running live where at the beginning I just press play in Foobar on Windows, and the visualization begins. Spotify, Youtube, or any other audio source on Windows works straight away as well.

A word of warning though, it’s only a proof of concept, so it is not stable! In-editor it generally works well, but sometimes crashes. The program also does not release all the memory that it should. Due to a yet unknown reason the standalone version basically doesn’t work at all.

Thanks to other open source projects

For the mesh generation part I got a lot of help from SiggiG’s procedural UE4 project/tutorial which lives here: https://github.com/SiggiG/ProceduralMeshes

Some of the code for figuring out frequencies from audio chunks is from eXi’s sound visualization plugin, especially the original CalculateFrequencySpectrum function that can be found here, and his use of the library KissFFT.

Because I’m proud that I was able to figure out the frequency calculating part myself as well, I want to add that I have my own implementation for it (with the help of the library “ffft”), but for this project I overwrote that with eXi’s solution to rule out bugs in that area.

A very brief and dense overview on how it works

On Windows, we’re in luck because we’ve already done the audio capture part, now we just direct it towards UE4 instead of a file. An audio sink for the audio capturer receives chunks of audio frames from the capturer, and the audio listener itself is ran in it’s own thread within the visualizer process. Now that chunks of audio are popping into a queue from which we can dequeue them in the UE4 main thread, we can calculate the sound spectrums for each audio chunk we receive. Finally on each game tick we fetch a list of new frequencies, and if found, add those to the mesh, and move the camera forward to keep up.

Feel free to ask me on Twitter or in the comments if something is unclear!

Making it work on platforms other than Windows

On Linux capturing the audio should be very easy for example by directing arecord ‘s standard output to the UE4 program’s standard input and going forward from there, but haven’t gotten around to trying that yet. On OSX I would start searching for a solution with the help of the project “Soundflower”, but I’m not sure how easy that will be.

Thanks!

Maybe the proof of concept gives someone an idea for something awesome. Please make a new Audiosurf that takes in live audio and doesn’t need to process the whole song from a file first. Or make the sound waves collideable and get a some sort of game mechanic out of that?

Programmatically capture audio playing on Windows

ss (2015-12-28 at 10.32.59)

I wanted to do real-time audio visualization and didn’t want to fight with music streaming service libraries more than once (I’m looking at you, LibSpotify), so I thought I’d go with the most general solution — get the audio straight from the OS.

This post is written as a kind of information dump I would have wanted to read when I started figuring this all out.

I had wondered why isn’t there any software like virtual audio cable that would also provide programmatic access to what’s running through the virtual device. So I took a look at how to write my own, and apparently it’s really time consuming and difficult. Not going to start there, then.

Anyway, it turns out in Windows there’s something called WASAPI that provides a solution: “loopback recording”

In loopback mode, a client of WASAPI can capture the audio stream that is being played by a rendering endpoint device.

And there’s an almost-ready-to-use example for it! Although it was a bit weird goto-heavy let-us-put-almost-everything-in-the-same-function kind of thing.

In the code example in Capturing a Stream, the RecordAudioStream function can be easily modified to configure a loopback-mode capture stream. The required modifications are:

I wasted a lot of time trying to understand what the format of the data I was being delivered by default was, and how to change the format to PCM, but it turns out the beans are spilled right here.

Basically you fill a WAVEFORMATEX struct to describe the format, or modify the struct as it is returned from a call to IAudioClient::GetMixFormat that “retrieves the stream format that the audio engine uses for its internal processing of shared-mode streams.”

By the way, often changing to a format that uses the same sample rate (f.ex 44.1khz) and channel count (2 for stereo) can be provided straight away by WASAPI so you don’t have to do any actual conversion yourself.

Here’s how my system’s current configuration’s (in hindsight it would be a better idea to just fill the struct…) format could be changed to 16 bit PCM:

pwfx->wBitsPerSample = 16;
pwfx->nBlockAlign = 4;
pwfx->wFormatTag = WAVE_FORMAT_PCM;
pwfx->nAvgBytesPerSec = pwfx->nSamplesPerSec * pwfx->nBlockAlign;
pwfx->cbSize = 0;

IAudioClient::IsFormatSupported can be used to check if the type of audio you’d want to use will work without having to call initialize and seeing if it fails.

One more thing, if you’re not familiar with COM code, before calling IAudioClient::Initialize you have to initialize COM, which meant just calling CoInitialize(nullptr) once somewhere before initializing the audio client.

In the code I wrote to try all this out, I just wrote the captured data to a file which I then imported to Audacity to check for correctness.

Note that the number from IAudioCaptureClient::GetBuffer describing the amount of data we got out of it is in frames. This means to get the byte (or char) count that ostream::write for example needs we need to do something like this:

int bytesPerSample = m_bitsPerSample / 8;
unsigned int byteCount = numFramesAvailable * bytesPerSample * m_nChannels;

Anyway, here’s my example implementation you can check out if you get stuck with something https://github.com/Tsarpf/windows-default-playback-device-to-pcm-file

Hope it’s of use to someone.