Category Archives: Audio

Programmatically capture audio playing on Windows

ss (2015-12-28 at 10.32.59)

I wanted to do real-time audio visualization and didn’t want to fight with music streaming service libraries more than once (I’m looking at you, LibSpotify), so I thought I’d go with the most general solution — get the audio straight from the OS.

This post is written as a kind of information dump I would have wanted to read when I started figuring this all out.

I had wondered why isn’t there any software like virtual audio cable that would also provide programmatic access to what’s running through the virtual device. So I took a look at how to write my own, and apparently it’s really time consuming and difficult. Not going to start there, then.

Anyway, it turns out in Windows there’s something called WASAPI that provides a solution: “loopback recording”

In loopback mode, a client of WASAPI can capture the audio stream that is being played by a rendering endpoint device.

And there’s an almost-ready-to-use example for it! Although it was a bit weird goto-heavy let-us-put-almost-everything-in-the-same-function kind of thing.

In the code example in Capturing a Stream, the RecordAudioStream function can be easily modified to configure a loopback-mode capture stream. The required modifications are:

I wasted a lot of time trying to understand what the format of the data I was being delivered by default was, and how to change the format to PCM, but it turns out the beans are spilled right here.

Basically you fill a WAVEFORMATEX struct to describe the format, or modify the struct as it is returned from a call to IAudioClient::GetMixFormat that “retrieves the stream format that the audio engine uses for its internal processing of shared-mode streams.”

By the way, often changing to a format that uses the same sample rate (f.ex 44.1khz) and channel count (2 for stereo) can be provided straight away by WASAPI so you don’t have to do any actual conversion yourself.

Here’s how my system’s current configuration’s (in hindsight it would be a better idea to just fill the struct…) format could be changed to 16 bit PCM:

pwfx->wBitsPerSample = 16;
pwfx->nBlockAlign = 4;
pwfx->wFormatTag = WAVE_FORMAT_PCM;
pwfx->nAvgBytesPerSec = pwfx->nSamplesPerSec * pwfx->nBlockAlign;
pwfx->cbSize = 0;

IAudioClient::IsFormatSupported can be used to check if the type of audio you’d want to use will work without having to call initialize and seeing if it fails.

One more thing, if you’re not familiar with COM code, before calling IAudioClient::Initialize you have to initialize COM, which meant just calling CoInitialize(nullptr) once somewhere before initializing the audio client.

In the code I wrote to try all this out, I just wrote the captured data to a file which I then imported to Audacity to check for correctness.

Note that the number from IAudioCaptureClient::GetBuffer describing the amount of data we got out of it is in frames. This means to get the byte (or char) count that ostream::write for example needs we need to do something like this:

int bytesPerSample = m_bitsPerSample / 8;
unsigned int byteCount = numFramesAvailable * bytesPerSample * m_nChannels;

Anyway, here’s my example implementation you can check out if you get stuck with something https://github.com/Tsarpf/windows-default-playback-device-to-pcm-file

Hope it’s of use to someone.

How computer experts (don’t) play music from their computers

It’s 9AM and you’ve just woken up and drank some possibly expired pomegranate juice because it’s the only drink you have. And you’re so thirsty you don’t even really taste it so it’s fine. Your head hurts. Hangover.

So first you slouch to the closest shop for frozen pizza, tons of orange juice, and sprite. At the till, the familiar clerk chuckles a bit, probably at the combination of what you’re buying, the fact that paying seems challenging for you today, and your hangovery face.

You get home, you’d like some mellow music so you boot up your desktop. The highlight of your day so far becomes nailing on the first try the difficult task of selecting Windows on the boot menu before it autoboots to your broken Linux installation. It’s such a great victory you decide to share it with someone.

While typing, Windows rebooted the computer because of updates, and it’s Linux after all. You sigh, press the power button, try again. The computer now freezes in the BIOS splash screen. And continues to do so despite multiple reboot attempts. It did serve without a hiccup for over 6 years, so that’s like, over 2000 days? But today it had enough.

No worries, you have a laptop, you’ll use that. Right, so you only have Arch Linux on it because: reasons. You plug the USB cord in, hit play and… The sound is coming from the laptop’s speakers. And you couldn’t think of anything better to play than Wonderwall since someone sang it last night and it’s playing in your head. Great.

Somehow you can now taste the pomegranate juice you didn’t taste when you drank it an hour earlier.

Changing the audio output device is surprisingly difficult when you’ve installed all audio drivers you could find since you had no idea what the buggy and obscure program you found earlier was able to use for midi playback. So you don’t know which one is currently being used and thus where the settings are. You feel, possibly are, stupid.

You decide your today’s (ad)ventures are asdgkdjfga enough to make it your second blog post, so you write that, and head back to bed hoping beginning your day in the evening will work out better.