A Brief History of Windows Audio APIs

A few months ago, the audio programming bug bit me pretty hard. I’m not entirely sure why it took so long really. I’ve been recording and mixing music since college. And much of my software development career has been built on a giant mass of C/C++ code. But somehow these worlds never converged. Somehow, with all of the time I’ve spent in front of tools like Cakewalk Sonar and FruityLoops, it never occurred to me that I might be able to learn what these applications are doing under the hood.

Then my nightstand began accumulating books with titles like “The Theory of Sound” and “Who Is Fourier? A Mathematical Adventure”. I absorbed quite a bit of theoretical material in a relatively short amount of time. But as with anything, you must use it to learn it. So I looked to the platform I already use for all of my audio recording – Windows.

What I found was a dark, mysterious corner in the Windows platform. There’s not a lot in the way of introductory material here. As of this writing, I could find no books dedicated to Windows audio programming. Sure, there’s MSDN, but, um, it’s MSDN. I also spent some time digging through back issues of Windows Developer’s Journal, MSDN Magazine, Dr. Dobbs, etc. and the pickings were slim. It seemed the best sources of information were blogs, forums, and StackOverflow. The trick was wading through the information and sorting it all out.

Developers new to Windows audio application development, like me, are often overwhelmed by the assortment of APIs available. I’m not just talking about third party libraries. I’m talking about the APIs baked into Windows itself. This includes weird sounding things like MME, WASAPI, DirectSound, WDM/KS, and XAudio2. There are a lot different paths a developer could take. But which one makes the most sense? What are the differences between them all? And why are there so many options?

I needed a bit more information and context in deciding how I was going to spend my time. And for this, I had to go back to 1991.

1991 – Windows Multimedia Extensions (aka MME, aka WinMM): Ahhh…1991. That was the year both Nirvana’s “Nevermind” and “Silence of the Lambs” entered pop culture. It was also the year of the first Linux kernel and the very first web browser. Most of us didn’t realize it at the time, but a lot of cool stuff was happening.

Most PCs of this vintage had tiny little speakers that were really only good at producing beeps and bloops. Their forte was square waves. They could be coerced into producing more sophisticated sounds using a technique called Pulse Width Modulation, but the quality wasn’t much to get excited about. That “Groove is the Heart” sound file being played through your PC speaker might be recognizable, but it certainly wasn’t going to get anybody on the dance floor.

Sound cards didn’t usually come bundled with name brand PCs, but they were becoming more and more popular all the time. Independently owned computer shops were building and selling homebrew PCs with sound cards from companies like Creative Labs and Adlib. Folks not lucky enough to buy a computer bundled with a sound card could buy an add-on card out of the back of a magazine like PC Computing or Computer Shopper and be up and running in no time.

The 90’s was also the golden age for the demo scene. Programmers pushed the limits of graphics and audio hardware in less bytes than most web pages are today. Amiga MOD files were a big deal too. They even inspired many audio enthusiasts to build their own parallel port DACs for the best audio experience. And then there were the video games. Game publishers like Apogee and Sierra Entertainment were cranking out awesome game titles, most of which could take advantage of Sound Blaster or Adlib cards if they were available.

Professional audio on the PC existed, but it was usually implemented using external hardware solutions, proprietary software, and proprietary communications protocols. Consumer grade sound card manufacturers were adding MIDI support in the form of a dual purpose joystick port that seemed oddly out of place. It was more of a marketing tactic than a useful feature. Most consumers had no idea what MIDI was.

It was at this point when Microsoft decided to add an audio API for Windows. Windows 3.0 had been out for a year and was in widespread use. So Microsoft released a version of Windows 3.0 called Windows 3.0 with Multimedia Extensions (abbreviated MME, sometimes referred to in software development circles as the waveOut API). MME has both a high-level and low-level API. The low-level API supports waveform audio and MIDI input/output. It has function names that start with waveIn, waveOut, midiIn, midiStream, etc. The high-level API, the Media Control Interface (MCI), is REALLY high level. MCI is akin to a scripting language for devices.

MME was the very first standard audio API for Windows. It’s evolved a bit over the years, to be sure. But it’s still around. And it works well, but with some caveats.

Latency is a problem with MME. Dynamic, near-real time audio (e.g., game event sounds, software synthesizers, etc.) is a bit harder to do in a timely fashion. Anything that occurs 10ms later than the brains thinks it should is perceived to be out of sync. So that kind of programming is pretty much out of the question. However, pre-generated content (e.g., music files, ambient sounds, Windows system sounds, etc.) works well with MME. At the time, that was good enough.

MME is still around. Some might even use the word thriving. Historically, support for high quality audio has been a pain point for MME. Parts of the MME API (e.g., anything that deals with the device capability structures WININCAPS and WINOUTCAPS) can only handle a maximum of 96kHz and 16-bit audio. However, in modern versions of Windows, MME is built on top of Core Audio (more on this later). You may find that even though a device can’t report itself as capable of higher quality audio, higher sample rates and bit depths work anyway.

1995 – DirectSound (aka DirectX Audio): When Windows 3.1 came out in 1992, MME was officially baked in. But Windows still left game developers uninspired. All versions of Windows up to this point were effectively shells on top of DOS. It was in the way. It consumed memory and other resources that the games desperately needed. DOS was well known and already a successful platform for games. With DOS, games didn’t have to compete for resources and they could access hardware directly. As a result, most PC games continued to be released as they had been – DOS only.

Along came Windows 95. Besides giving us the infamous “Start” button and the music video for Weezer’s “Buddy Holly”, Windows 95 brought with it DirectX. DirectX was core to Microsoft’s strategy for winning over game developers, whom they saw as important for the success of Windows 95.

DirectX was the umbrella name given to a collection of COM-based multimedia APIs, which included DirectSound. DirectSound distinguished itself from MME by providing things like on the fly sample rate conversion, effects, multi-stream mixing, alternate buffering strategies, and hardware acceleration where available (in modern versions of Windows, this is no longer the case. See the discussion on Core Audio below). Because DirectSound was implemented using VxDs, which were kernel mode drivers, it could work extremely close to the hardware. It provided lower latency and support for higher quality audio than MME.

DirectSound, like the rest of DirectX, wasn’t an instant hit. It took game developers time, and a bit of encouragement on the part of Microsoft, to warm up to it. Game development under DOS, after all, was a well worn path. People knew it. People understood it. There was also a fear that maybe DirectX would be replaced, just as its predecessor WinG (a “high-performance” graphics API) had been. But eventually the gaming industry was won over and DirectX fever took hold.

As it relates to professional audio, DirectSound was a bit of a game changer. There were PC-based DAW solutions before DirectX, to be sure. From a software perspective, most of them were lightweight applications that relied on dedicated hardware to do all of the heavy lifting. And with their hardware, applications did their best at sidestepping Windows’ driver system. DirectSound made it practical to interact with hardware through a simple API. This allowed pro-audio applications to decouple themselves from the hardware they supported. The umbilical cord between professional grade audio software and hardware could be severed.

DirectX also brought pluggable, software based audio effects (DX effects) and instruments (DXi Instruments) to the platform. This is similar in concept to VST technology from Steinberg. Because DX effects and instruments are COM based components, they’re easily discoverable and consumable by any running application. This meant effects and software synthesizers could be developed and marketed independently of recording applications. Thanks to VST and DX effects, a whole new market was born that continues to thrive today.

Low latency, multi-stream mixing, high resolution audio, pluggable effects and instruments – all of these were huge wins for DirectSound.

1998 – Windows Driver Model / Kernel Streaming (aka WDM/KS): After the dust settled with Windows 95, Microsoft began looking at their driver model. Windows NT had been around for a few years. And despite providing support for the same Win32 API as it’s 16-bit/32-bit hybrid siblings, Windows NT had a very different driver model. This meant if a hardware vendor wanted to support both Windows NT and Windows 95, they needed to write two completely independent drivers – drivers for NT built using the the Windows NT Driver Model and VxDs for everything else.

Microsoft decided to fix this problem and the Windows Driver Model (WDM) was born. WDM is effectively an enhanced version of the Windows NT Driver Model, which was a bit more sophisticated than the VxDs used by Windows 95 and 3.x. One of the big goals for WDM, however, was binary and source code compatibility across all future versions of Windows. A single driver to rule them all. And this happened. Sort of.

Windows 98 was the first official release of Windows to support WDM, in addition to VxDs. Windows 2000, a derivative of Windows NT followed two years later and only supported WDM drivers. Windows ME, the butt of jokes for years to come, arrived not long after. But ME was the nail in the coffin for the Windows 9.x product line. The technology had grown stale. So the dream of supporting a driver model across both the NT and the 9.x line was short lived. All versions of Windows since have effectively been iterations of Windows NT technology. And WDM has since been the lone driver model for Windows.

So what’s this WDM business got to do with audio APIs? Before WDM came about, Windows developers were using either DirectSound or MME. MME developers were used to dealing with latency issues. But DirectSound developers were used to working a bit closer to the metal. With WDM, both MME and DirectSound audio now passed through something call the Kernel Audio Mixer (usually referred to as the KMixer). KMixer was a kernel mode component responsible for mixing all of the system audio together. KMixer introduced latency. A lot of it. 30 milliseconds, in fact. And sometimes more. That may not seem like a lot, but for a certain class of applications this was a non-starter.

Pro-Audio applications, such as those used for live performances and multitrack recording, were loathe to embrace KMixer. Many developers of these types of applications saw KMixer as justification for using non-Microsoft APIs such as ASIO and GSIF, which avoided the Windows driver system entirely (assuming the hardware vendors provided the necessary drivers).

Cakewalk, a Boston-based company famous for their DAW software, started a trend that others quickly adopted. In their Sonar product line starting with version 2.2, they began supporting a technique called WDM/KS. The WDM part you know. The KS stands for Kernel Streaming.

Kernel streaming isn’t an official audio API, per se. It’s something a WDM audio driver supports as part of its infrastructure. The WDM/KS technique involves talking directly to the hardware’s streaming driver, bypassing KMixer entirely. By doing so, an application could avoid paying the KMixer performance tax, reduce the load on the CPU, and have direct control over the data delivered to the audio hardware. Latency wasn’t eliminated. Audio hardware introduces its own latency, after all. But the performance gains could be considerable. And with no platform components manipulating the audio data before it reached the hardware, applications could exert finer control over the integrity of the audio as well.

The audio software community pounced on this little trick and soon it seemed like everybody was supporting WDM/KS.

It’s worth noting at this point in the story that, in special circumstances, DirectSound could actually bypass KMixer. If hardware mixing was supported by both the audio hardware and the application, DirectSound buffers could be dealt with directly by the audio hardware. It wasn’t a guaranteed thing, though. And I only mention it here in fairness to DirectSound.

2007 – Windows Core Audio: It was almost 10 years before anything significant happened with the Windows audio infrastructure. Windows itself entered an unusually long lull period. XP came out in 2001. Windows Vista development, which had begun development 5 months before XP had even been released, was fraught with missteps and even a development “reboot”. When Vista finally hit the store shelves in 2007, both users and developers were inundated with a number of fundamentals changes in the way things worked. We were introduced to things like UAC, Aero, BitLocker, ReadyBoost, etc. The end user experience of Vista wasn’t spectacular. Today, most people consider it a flop. Some even compare it to Windows ME. But for all of its warts, Vista introduced us to a bevvy of new technologies that we still use today. Of interest for this discussion is Windows Core Audio.

Windows Core Audio, not to be confused with OSX’s similarly named Core Audio, was a complete redesign in the way audio is handled on Windows. KMixer was killed and buried. Most of the audio components were moved from kernel land to user land, which had an impact on application stability. (Since WDM was accessed via kernel mode operations, WDM/KS applications could easily BSOD the system if not written well). All of the legacy audio APIs we knew and loved were shuffled around and suddenly found themselves built on top of this new user mode API. This included DirectSound, which at this point lost support for hardware accelerated audio entirely. Sad news for DirectSound applications, but sadder news was to come (more on this in a bit).

Core Audio is actually 4 APIs in one – MMDevice, WASAPI, DeviceTopology, and EndpointVolume. MMDevice is the device discovery API. The API for interacting with all of the software components that exist in the audio path is the DeviceTopology API. For interacting with volume control on the device itself, there’s the EndpointVolume API. And then there’s the audio session API – WASAPI. WASAPI is the workhorse API. It’s where all of the action happens. It’s where the sausage, er, sound gets made.

Along with new APIs came a number of new concepts, such as audio sessions and device roles. Core Audio is much better suited to the modern era of computing. Today we live in an ecosystem of devices. Users no longer have a single audio adapter and a set of speakers. We have headphones, speakers, bluetooth headsets, USB audio adapters, webcams, HDMI connected devices, WiFi connected devices, etc. Core Audio makes it easy for applications to work with all of these things based on use-case.

Another significant improvement Core Audio brings us is the ability to operate in either shared mode or exclusive mode.

Shared mode has some parallels with the old KMixer model. With shared mode, applications write to a buffer that’s handed off to the system’s audio engine. The audio engine is responsible for mixing all applications’ audio together and sending the mix to the audio driver. As with KMixer, this introduces latency.

Exclusive mode is Microsoft’s answer to the pro-audio world. Exclusive mode has much of the same advantages of WDM/KS. Applications have exclusive access to hardware and audio data travels directly from the application to the driver to the hardware. You also have more flexibility in audio formats with exclusive mode as compared to shared mode. The audio data format can be whatever the hardware supports – even non-PCM data.

At this point, you might assume WDM/KS can go away. Well, it can’t. As I said before, it’s not really an API. It’s part of the WDM driver infrastructure, so it will continue to exist so long as WDM exists. However, there’s no compelling reason to use WDM/KS for modern audio applications. An exclusive mode audio session in Core Audio is safer and just as performant. Plus it has the advantage of being a real audio API.

As of this writing, Windows 10 is the latest version of Windows and Core Audio still serves as the foundation for platform audio.

2008 – XAudio2: Over the years, DirectX continued to evolve. The Xbox, which was built on DirectX technologies, was a significant source of influence in the direction DirectX took. The “X” in Xbox comes from DirectX, after all. When DirectX 10 came out in 2007, it was evident that Microsoft had gone into their latest phase of DirectX development with guns blazing. Many APIs were deprecated. New APIs appeared that started with the letter “X”, such as XInput and XACT3.

XAudio2 appeared in the DirectX March 2008 SDK and was declared the official successor to DirectSound. It was built from the ground-up, completely independent of DirectSound. Its origins are in the original XAudio API which was part of XNA, Microsoft’s managed gaming framework. And while XAudio was considered an Xbox API, XAudio2 was targeted at multiple platforms, including the desktop. DirectSound was given “deprecated” status (this is the sadder news I mentioned earlier).

XAudio2 offers a number of features missing from DirectSound, including support for compressed formats like xWMA and ADPCM, as well as built-in, sophisticated DSP effects. It’s also considered a “cross-platform” API, which really just means it’s supported on the Xbox 360, Windows, and Windows Phone.

It’s worth mentioning that while XAudio2 is considered a low-level API, it’s still built on other technology. For the desktop, XAudio2 sits on top of Core Audio like everything else.

You might read all of this business about XAudio2 and assume that DirectSound is dead. We’re quite a way off from that, I think. There’s still a lot of DirectSound based software out there. Given Microsoft’s commitment to backwards compatibility, some level of DirectSound support/emulation is liable to exist in perpetuity. However, unless you’re determined to support versions of Windows that even Microsoft has stopped supporting, there’s no compelling reason to support DirectSound in modern audio applications.

Honorable Mention – ASIO: There are plenty of third party audio APIs available for Windows that weren’t invented by Microsoft. Some of them, like GSIF used by TASCAM’s (formerly Nemesys) GigaStudio, are tied to specific hardware. Some of them, like PortAudio and JUCE (more than just an audio API), are open-source wrappers around platform specific APIs. Some of them like OpenAL are just specifications that have yet to gain widespread adoption. But none has had quite the impact on the audio industry as ASIO.

Steinberg, the same forward-thinking company that gave us VSTs and Cubase, introduced us to ASIO all the way back in 1997. ASIO was originally a pro-audio grade driver specification for Windows. Its popularity, however, has allowed it to gain some level of support on Linux and OSX platforms. Its primary goal was, and still is, to give applications a high quality, low latency data path direct from application to the sound hardware.

Of course, the power of ASIO relies on hardware manufacturers providing ASIO drivers with their hardware. For applications that can support ASIO, all of the business of dealing with the Windows audio stack can be completely avoided. Conceptually, ASIO provides applications with direct, unfettered access to the audio hardware. Before Windows Vista, this could allow for some potentially significant performance gains. In the Core Audio world, this is less of a selling point.

The real-world performance of ASIO really depends on the quality of driver provided by the manufacturer. Sometimes an ASIO driver might outperform its WDM counterpart. Sometimes it’s the other way around. For that reason, many pro-audio applications have traditionally allowed the user to select their audio driver of choice. This, of course, makes life complicated for end-users because they have to experiment a bit to learn what works best for them. But such is life.

The waters get muddied even further with the so-called “universal” ASIO drivers, like ASIO4ALL and ASIO2KS. These types of drivers are targeted at low cost, consumer-oriented hardware that lack ASIO support out-of-the-box. By installing a universal ASIO driver, ASIO-aware applications can leverage this hardware. In practice, this type of driver merely wraps WDM/KS or WASAPI and only works as well as the underlying driver it’s built on. It’s a nice idea, but it’s really contrary to the spirit of the ASIO driver. Universal drivers are handy, though, if the audio application you’re trying to use only supports ASIO and you’ve got a cheap sound card lacking ASIO support.

ASIO, like MME is an old protocol. But it’s very much still alive and evolving. Most pro-audio application professionals hold it in high regard and still consider it the driver of choice when interfacing with audio hardware.

Conclusion: “Shane, where’s the code?” I know, I know. How do you talk about APIs without looking at code? I intentionally avoided it here in the interest of saving space. And, yet, this article still somehow ended up being long winded. In any case, I encourage you to go out on the Interwebs and look at as much Windows audio source code as you can find. Browse the JUCE and Audacity source repos, look at PortAudio, and peruse the sample code that Microsoft makes available on MSDN. It pays to see what everybody else is doing.

For new developers, the choice of audio API may or may not be clear. It’s tempting to make the following generalization: games should go with XAudio2, pro-audio should go with ASIO and/or Core Audio, and everybody else should probably go with MME. Truth is, there are no rules. The needs of every application are different. Each developer should weigh their options against effort, time, and money. And as we see more often than not, sometimes the solution isn’t a single solution at all.

(Shameless Plug: If you’re interested in learning how to use Core Audio, consider purchasing an early draft of “Appendix B: Introduction to Windows Core Audio” from the book I’m currently working on entitled, “Practical Digital Audio for C++ Programmers.”)

Back from CppCon 2015

Now that the dust has settled a bit and I’ve adjusted to being back on east coast time, I thought it’d be worth talking a little bit about CppCon 2015. CppCon, for those who haven’t heard about it, is a five day conference devoted to C++ (seven if you pay for an extra class). It’s a relatively new kid on the block, this year being only the second year. It’s certainly not the only C++ conference in town. But CppCon distinguishes itself from all the others in terms of scale. Attendance to C++Now, for example, is capped at around 150 people and features three tracks at any given time. C++ and Beyond describes itself as “small” and features only one track at a time over three days. This year, CppCon saw nearly 700 people in attendance. That’s a nearly 15% growth over last year’s 600 attendee count. The days start early and end late. And at any given point, there could be up to six separate tracks going on. Presenters include folks from Google, Microsoft, Boost, Adobe, etc. As you can imagine, there’s enough content at CppCon to satiate even the thirstiest of minds.

Just like last year, CppCon was held at the Meydenbauer Center in beautiful Bellevue, Washington, a Seattle “suburb” (I use that word loosely. See Wikipedia.) that just so happens to be in Microsoft’s backyard. The conference center itself has four levels. The bottom two floors have amphitheatre-sized conference rooms that are used for keynotes and larger talks. The top floor has a number of smaller classroom sized conference rooms and is where most of the action actually takes place.

Most of the rock stars showed up again this year – Bjarne Stroustrup, Herb Sutter, Andrei Alexandrescu, John Lakos, etc. (Scott Meyers was noticeably MIA this year). Bjarne’s keynote, “Writing Good C++14”, set the tone for everything that was to come. The theme seemed to be “move forward” – abandon your old compilers, modernize your legacy codebase, and leave the past in the past. This was reflected by the significant number of talks that revolved around C++17 proposals and technical previews that will be appearing in a compiler near you.



Me with Bjarne Stroustrup

Like any conference, the quality of presentations was a mixed bag. There were great speakers and some not so great. Some presentations were practical, some were meta, and some were sales pitches for third party libraries. All tech conferences are like this to some degree. For conferences with only one or two tracks, this can be a mild annoyance. But the fact that there was so much happening at the same time allowed attendees to be a bit more discerning.

What about schwag? Schwag is something we all pretend to not care about. After all, we’re only there for the knowledge right? Mmhmm. 🙂 There actually wasn’t much schwag to speak of. This year, attendees received a plastic bag containing some flyers, a deck of “Woman in Tech” playing cards, and a thumb drive containing an offline version of cppreference.com. There were no free shirts, despite being asked to provide shirt size at registration time. At one point, JetBrains started giving away yo-yos, CLion stickers, and copies of a mini-book entitled “C++ Today: The Beast is Back”, which happened to serve as the basis for Jon Kalb’s highly entertaining presentation of the same name.That was about it. Not even the meals were free, which seemed to surprise a lot of folks.

Apart from that, there weren’t many disappointments at CppCon. This conference has a lot to offer. The talks were great. All of the presenters were approachable and very personable. The atmosphere was positive. And, most importantly, it was FUN. Would I go back again?
Definitely. Should you go? Absolutely.

C++ Exceptions: The Good, The Bad, And The Ugly

Recently, a recording of Titus Winters’ presentation from CPPCon 2014 found its way around the office. In the video, Titus discusses style guides and coding conventions as they apply to Google’s C++ codebase. One of the contentious topics touched upon was the use of exceptions, which are apparently verboten at Google. That, of course, sparked a few questions from Titus’ audience and incited some debate within our own organization.

It’s interesting how polarized C++ developers are when it comes to the use of exceptions. It doesn’t matter if you talk to a junior developer or a 20 year veteran. You’ll almost always get a response that lies at one of two ends of the love/hate spectrum. At one end is “Exceptions are evil. Like, goto-evil, man. It’s chaos. Biggest wart in the C++ standard, bar none.” At the other end is “Dude, it’s 2015. WTF aren’t you using exceptions? They’re so chic, so modern. So much better than those geriatric return codes.” Of course, put a few beers in these folks and maybe some free food, and their postures will waver. Both parties eventually admit there’s some good and some bad when it comes to exceptions. As it turns out, the exception is a language feature that’s not as cut-and-dry/black-and-white/1-and-0 as we programmery folks care to admit.

For this blog entry, I thought it might be useful to take a step back and look at things from a 10,000 foot view (or 3,048 meters for my imperially challenged friends). Let’s spend some time picking apart the arguments for and against C++ exceptions. Even if we can’t arrive at some grand conclusion, it’ll at least allow us to appreciate the perspectives of our peers a little better. And who knows? Maybe we’ll find some middle ground.

So let’s start on a positive note. How about some pros?

The Pros:

Exceptions cannot be ignored.

In C, the convention for communicating errors is the much beloved error code. It’s succinct, there aren’t many surprises, and it gets the job done. None of us are new to error codes. We encounter them every day in system calls, standard library functions, third party libraries, and even in our own code.

The dirty truth is that in C, errors are ignored by default. A function caller is free to exercise their right to ignorance. As a result, “failable” function calls often go unchecked. And when unexpected failures occurs, one of three things typically happen:

  1. Nothing. The failure wasn’t fatal. The code continues to operate just fine.
  2. The code doesn’t crash, but the application starts to behave strangely and sometimes eventually crashes.
  3. Boom. The application crashes and burns immediately.

The same code path with the same unchecked failure may even exhibit a random selection of one of these three behaviors every time it’s executed.

In contrast, exceptions cannot be ignored. An uncaught exception does one thing – crashes the application. It forces you, as the developer, to make error handling a priority.

Exceptions can carry more information than return codes.

To make sense of an error code, you usually need to look it up. You might have to refer to another source file, the API docs, or a sticky notes in the guy’s cube next door. And even then, you might not be able to determine exactly what caused the error.

“File open failed.” – Ok. But what file? And why?

“Connection reset by peer.” – Which peer?

Maybe you don’t care about the specifics. But if you do and you’re not sure what exactly caused the problem, a bit of sleuthing my be required. That means more work on the error-handling side of the fence.

Ideally, you’d be able to capture more details on the error-causing side of the fence. Exceptions can help with this. In C++, anything copyable can be thrown. With a copyable user-defined type, you can capture as much context relating to an error as you’d like and communicate that to the caller just by throwing it.

Exceptions allow error-handling code to exist separately from the place in the code where the error was detected.

How many times have you seen code that looked similar to this?

bool success = doSomeWork();
if (!success)
{
    logError("Error Occurred");
    return;
}    
success = doMoreWork();
if (!success)
{
    logError("Error Occurred");
    return;
}    
success = doEvenMoreWork();
if (!success)
{
    logError("Error Occurred");
    return;
}
// etc. 
// etc.

This code snippet contains a lot of noise. And it’s a little repetitive in how the error is handled. What if we need to change our error handling behavior? We’ll need to visit each place where success equals false. How different might this look if we were using exceptions?

Let’s assume the functions doSomeWork(), doMoreWork(), and doEvenMoreWork() throw exceptions instead of returning a success value. Our code might then look like this…

try
{
    doSomeWork();
    doMoreWork();
    doEvenMoreWork();
}
catch (...) // Or some specific exception type.
{
    logError("Error occurred.");
    return;
}

The code footprint is smaller and we’ve concentrated our error-handling in one spot. In the normal, non-exceptional case, the code actually runs faster than the previous code snippet because it doesn’t have to constantly check return values.

Exceptions allow errors to propagate out of constructors.

Constructors don’t return anything. They can’t. The standard says so. So what happens if an error occurs during the execution of the constructor? How does client code know something went wrong?

There is, of course, a brute force way to accomplish this. You could include error flags in the class being constructed and set them/check them appropriately. This is a bit of work and it requires both the class and its client to have an agreed-upon contract for error checking. It works, but it’s not really ideal. It results in a bit of extra code and it’s easy to make mistakes.

A much easier way to communicate errors from the constructor is to throw an exception. It’s straightforward and doesn’t require the class implementer to pollute the class with error flags and error-related getters/setters.

That being said, throwing exceptions from constructors does have a gotcha. If an exception is thrown from a constructor, no instance of the class is created, and therefore no destructor will be called. Think about that statement for a second. What this means is that that if there were any resources (heap allocations, opened handles, etc.) acquired in the constructor prior to the exception being thrown, they may be leaked unless appropriate steps are taken.

Exceptions are better than setjmp/longjmp.

The setjmp/longjmp dynamic duo comes to us from C. They provide a mechanism for performing what is referred to as a non-local jump, (sometimes called a non-local goto). setjmp is called to mark a point on the stack where program execution should return and longjmp is used to jump back to that point. The way it works is that setjmp records the contents of the CPU registers, which includes the stack pointer. When longjmp is called later, those register values are restored and the code executes as if it had just returned from the call to setjmp.

What happens to all the stuff that was on the stack between the call to setjmp and longjmp? As you might expect, it’s all discarded. It’s as if that stuff never existed.

You might be saying to yourself, “This just sounds like stack unwinding.” It’s certainly a form it. But when C++ programmers thinks of stack unwinding, they imagine walking down the stack frame by frame and executing little bits of code along the way until their destination is reached. In traditional stack unwinding, destructors for stack-allocated objects get called and RAII objects get the opportunity to clean house. If we’re dealing with compilers with try-finally extensions (Visual Studio), even finally blocks get executed.

Unfortunately, that’s not the form of stack unwinding we’re dealing with when we work with setjmp/longjmp. longjmp literally jumps down the stack to the point recorded by setjmp in one fell swoop. It doesn’t do it frame-by-frame. It doesn’t call snippets of code along the way. It’s one second you’re here, the next second you’re there. Destructors for local variables don’t get called. RAII objects never do any cleanup. And finally blocks never get a chance to do their job.

And that’s why exceptions shine over the use of setjmp/longjmp. Exceptions allow for the form of stack unwinding that C++ developers are comfortable with. They can sleep easy at night knowing that local variables get destroyed as expected and destructors will execute even under the most exceptional of circumstances.

Exceptions are easier to propagate than return codes.

Conceptually, error code propagation seems like a no-brainer. Functions that call other “failable” functions check error codes, react appropriately, and propagate the error down the call stack if it’s appropriate. Pretty simple, eh? Not so quick. Checking error codes for function calls requires a certain amount of vigilance. It can be tedious. And, as shown in a previous pro, it results in a lot of extra, repetitive code. In practice, it’s not uncommon for many function calls go unchecked. It’s usually just the “important” ones that get all of the attention. And because errors are ignored by default in C, many errors tend to slip through the cracks. As functions call other functions that call other functions, this problem compounds and an application can miss out on opportunities to react.

As mentioned before, exceptions are propagated by default. There’s no way to accidentally ignore an exception without crashing your application. If you’re writing a function that calls another function that throws, and you wish the caller of your function to handle all the errors, there’s nothing you need to do. (Disclaimer: In some circumstances, it may be desireable to catch and rethrow the exception. So you may actually need to do something. But it really depends on the needs of your function. See the references at the end up the article for situations where this might be appropriate.)

And now for the glass-half-empty list.

The Cons:

Exceptions are more complex than error codes.

Something an error code has over an exception is simplicity. The concept is so easy to grasp someone completely new to coding can learn how to use and apply error codes almost immediately. The very first function any C/C++ developer writes is main(). And guess what? It returns an error code.

Exceptions aren’t as simple. Not only do you need to understand things like throw, try, catch, nothrow, and dynamic exception specifications (deprecated), but also an assortment of supporting functions and data types, such as std::exception, std::exception_ptr, std::rethrow_exception, std::nested_exception, std::rethrow_if_nested, etc.
There are also plenty of rules and best practices that must followed like…

“Don’t emit exceptions from destructors.”

“Constructors must clean up before throwing because a destructor will not be called.”

“Assign ownership of every resource immediately upon allocation to a named manager object that manages no other resources” (dubbed Dimov’s Rule by Jon Kalb)

etc.

It’s a lot to absorb. Really. And it can be intimidating to both new and seasoned C++ developers alike.

Writing exception-safe code is hard.

Even if you understand all of the exception-related concepts and jargon mentioned in the previous item, writing exception-safe code is still hard. As Scott Meyers says in “More Effective C++”, “Exception-safe programs are not created by accident.” Code absolutely must be designed with exceptions in mind. This includes the throwers, the catchers, and EVERYTHING in between.

Bad things can happen in functions that aren’t expecting to be short-circuited by an exception. That’s one of many reasons the introduction of exceptions to a legacy codebase that doesn’t already use exceptions is a very, very bad idea.

Exceptions make coder harder to read because it creates invisible exit points.

A function that returns an error cannot cause the calling function to prematurely return (we’ll ignore interrupts, abnormal program termination, and setjmp/longjmp for the moment). If a function decides to check the error code produced by another function and return based on some criteria, you’ll see that written down in the code. It’s explicit and not easy to hide.

The potential for a thrown exception can be subtle. If your function calls another function that throws, and your function doesn’t catch the exception, your function will stop executing. From a readability standpoint, this can be awful. It may not be obvious that any given block of code may be prematurely terminated, even if it’s completely designed with exceptions in mind.

Knowing what to catch can be tricky.

There are some languages that are aggressive about making sure you follow the contract when it comes to exceptions. Java, I’m talking about you. In Java, if your function throws an exception, it MUST have an exception specification that says so. (I’m strictly referring to checked exceptions here. Java also has unchecked exceptions. Those are harder to recover from and are exempt from the rules I’m discussing here.) If it doesn’t have an appropriate exception specification, you’ll get a compile time error. Also in Java, if your function calls another function that throws an exception, you MUST either catch the exception or have a compatible exception specification on your function. If you don’t, you’ll get a compile time error. The compiler holds your hand a bit here.

C++ doesn’t support such things. The compiler doesn’t care one way or the other if an exception is caught. It assumes, perhaps naively, that you know what you’re doing (also perhaps naively). C++ does have exception specifications. But historically, dynamic exception specifications like you see in Java never really helped us out in C++ land. They just muddied the waters. In C++, if a function doesn’t have an exception specification, it can throw anything it wants. On the other hand, if a function has a dynamic exception specification, it can only throw the types, or subtypes of the types, specified. Throwing anything else results in std::unexpected being called. Of course, this a runtime check, and it does nothing for us at compile time.

As you might expect, dynamic exception specifications fell out of favor in C++. They were so loathed that they were deprecated in C++11. All we’re left with in the C++11/14 era, is noexcept or nothing at all (which means anything can be thrown).

Another aspect of this “knowing what to catch can be tricky” thing is that there’s no universal base class. Sure, there’s std::exception. But that class was intended to be a base for exceptions thrown by the standard library. There’s nothing forcing you to use that for your own exception types. You can technically throw anything copyable in C++ – int, char *, std::vector, your own made-up type, etc.

So how do you do know what needs to be caught when you call another function? The compiler certainly won’t help you. Unfortunately, you must turn to documentation, comments in code, and in the worst case, wading through actual source code. Blech.

Exceptions incur a cost.

Nothing is free. When an exception occurs, there is a cost. Older compilers may impose some performance overhead when executing a block of code whether it throws or not. Modern compilers only incur a performance cost when the exception actually occurs. Exceptions should be rare. So if there’s no throw, there should be no performance penalty.

Even if there’s no performance hit, there will always be a size cost. When an exception occurs, the application needs to do more work to make the exception work than it would, say, a simple return statement. There’s more bookkeeping involved. And that means more code. In constrained embedded environments where every byte of program size is critical, this can be an automatic deal breaker.

Exceptions are easily abused as control flow mechanisms.

Have you ever seen code like this?

bool someCondition = false;
 
try
{
    if (checkSomeValue)
    {
        someCondition = true;
        throw 0;
    }
    if (checkSomeOtherValue)
    {
        someCondition = true;
        throw 0;
    }
    if (checkSomeOtherOtherValue)
    {
        someCondition = true;
        throw 0;
    }
 
    // ... More of the same
}
catch (...)
{
}
 
if (someCondition)
{
    // do something that relies on someCondition being true
}

This is just one example where an exception could be used to control the flow of code. It kind of works like a break. And it’s so very wrong. Why? When exceptions are thrown, the application suffers both a size cost and a performance cost. Performance-wise, this code is much slower than it should be.

How about this one?

try
{
    int a = std::stoi(someInput);
    printInt(a);
}
catch (const std::logic_error &)
{
    std::string b = someInput;
    printString(b);
}

Again, here we’re leveraging exceptions for the happy path. The code is slower than it should be. Remember, exceptions should be the exception. We’d actually get better performance in this example with error-codes.

Something that’s guaranteed not to throw today may not have such a guarantee tomorrow.

Once a function is declared nothrow and starts being used, its interface SHOULD be set in stone. Consumers of that function will expect it to never throw. It’s easy to amass a large body of code dependent on such a contract. So what happens when someone comes along and decides said function really needs to throw an exception? Probably a significant amount of effort expended visiting every place in the source code where that function is called. It’s that or or nothing at all, fingers-crossed, and hope for the best. What if that function is a virtual method in a base class? What if the function is part of an API with consumers beyond your reach? There be dragons.

To declare a function nothrow requires careful consideration and perhaps even some fortune-telling abilities. Most folks generally don’t bother with nothrow functions unless they’re intended for use with things like std::swap or std::move. Tread lightly.

Conclusion

Those are the big-ticket items, in my opinion. I’m sure some of you will have your own pro/cons that you feel should be added to the list. I encourage you to leave them in comments below.

So where do I lie on the love/hate spectrum? It depends. There’s a time and a place for everything. And exceptions are no exception (bah dum dum). Exceptions work great in some scenarios (e.g., a modern C++ codebase designed with exceptions in mind) and poor in others (e.g., a legacy C codebase, a library with bindings to other languages, etc.). As Kenny Rogers said, “You gotta know when to hold ’em , know when to fold ’em.” Experience will be your guide.

If you’d like to learn how to be better at writing exception-safe code, below are a few resources for you.

Jon Kalb’s presentations from CppCon 2014 are excellent. He does a deep dive into modern C++ exception handling. Highly recommended.

Jon Kalb’s “Exception Safe Code Part 1”
Jon Kalb’s “Exception Safe Code Part 2”
Jon Kalb’s “Exception Safe Code Part 3”

Scott Meyers covers a great number of exception-related best practices in his Effective C++ books (Effective C++, More Effective C++, and Effective Modern C++), all of which are required reading for any C++ developer.

Andrei Alexandrescu gave the presentation “Systematic Error Handling in C++” at the “C++ and Beyond” seminar back in 2012 thats very much still relevant today. In this talk, he explains a mechanism to bridge the error code/exception worlds with Expected and touches upon a newer version of ScopeGuard (something you should be intimately familiar with).

Good luck. And happy coding.