Software Teams: Dysfunction Junction, What’s Your Function?

My wife and I recently undertook a small home renovation project. We have a two bedroom house and a baby on the way. As a result, I found myself trading in my posh office space for not so posh basement dwellings. Neither Leora, nor myself, especially liked the idea of me sitting in some dark and damp concrete corner hovering over a keyboard. So we hired a contractor to build out a proper office space in the basement. We told the contractor what we wanted. He planned it all out, coordinated with subcontractors, and met with the code enforcement officer periodically. The whole thing was finished in the course of about three weeks.

What impressed me the most about this project was how wonderfully uneventful it was. Granted, it wasn’t exactly a major construction project. It was just a room. But it involved a lot of people – insulators, framers, electrician, drywallers, and painters (my wife and I accepted this duty). All told, three subcontractors and at least ten different people contributed to the construction of this room (eleven if you count the code enforcement officer). Since each element of construction was built upon some previous element (e.g., framing could only be done after insulation, electrical after framing, etc.), it required a bit of coordination and an awareness of how one person’s work impacts the work of the next guy. Ten different people with a mix of skillsets and egos. It sounds like a recipe for frustration, but somehow it went smoothly and stayed on schedule.

Being a software guy, I immediately began comparing this experience to the software teams I’ve been involved with. I’ve worked with a variety of different teams over the years – big, small, agile, spiral, waterfall, fun, all-business, slow, nimble, etc. I’ve worked with, and learned from, some truly remarkable software developers on just about every team I’ve ever been on. And in most cases, the projects those teams worked on were delivered on-time (ballpark), on budget (mostly), and were considered successes. However, there’s only one team, maybe two, that I look back on and regard as laser focused and as well-coordinated as my basement room construction team. Is a team’s success a measure of their professionalism? Or do those successes hide the ugly truth? The truth being, of course, that many software development teams are fraught with dysfunction.

From the customer’s perspective, it doesn’t really matter. Customers only care about the sausage, not how it’s made. From the software developer’s perspective, it matters a lot. All software developers like to think of themselves as software professionals. But there’s a difference between a professional software developer and a professional software development team. A team of software professionals does not make a professional software team. Think about that for a moment. It’s an important distinction to make because it impacts the customer more than many folks realize.

Some might say the root cause of team dysfunction really boils down to two things – poor leadership and communication. I think that’s painting a picture with some rather large brush strokes. It’s an over-simplistic reduction. A lot of little things contribute to team dysfunction. And over time these things compound. Poor leadership and communication are often by-products rather than the cause.

In this article, I’d like to look at a few things that I think are important for software teams to get right. Understand that much of this is my own opinion. It’s based on my own experiences. No two teams are the same. And not everything I mention below may be appropriate for every team. I’m hopeful, however, you’ll find it relatable. And who knows? Perhaps you’ll discover a few ideas that may be useful for your own software team.

Let’s dive in.

Team Structure

There’s a growing trend among modern software development teams, especially the smaller Agile teams, for self-organization. Many take the position that if knowledge and responsibilities are collectively shared across the team, no functional area can be siloed, and anyone can contribute wherever help is needed. For some teams, this can work fantastically. For others, it’s disastrous. Whether or not it can work depends largely on the product and the team dynamic. For teams that need domain experts, this approach may not even be an option. It’s important that you structure your team appropriately for the product and not the other way around.

Many times a team’s structure is dependent upon the development methodology they choose to use. I talk specifically about methodology later on, but it’s worth mentioning here that a poor implementation, or even a poor choice, of a software development methodology often results in a poor team structure.

Look at your project carefully. Ask yourself questions such as the following. “Do we need domain experts?” “Do we already informally think of our team as being comprised of smaller teams?” “Is the scope of our project(s) appropriate for our team size?”

Teams need to be sized appropriately. If your team is too small, it’ll struggle to keep its head above water. If the team is too big, it can actually slow down under it’s own weight. Some methodologies even pose limits on team size. Don’t scale your team size for the sake of your methodology if it means that the size is inappropriate for the project.

Ensure your team is composed of folks with specializations appropriate for your product. Software development usually involves more than just writing code. Most software teams include testers, graphic designers, project managers, Scrum Masters, business analysts, product designers, etc. Even among coders, it’s not uncommon to have individuals who are experts in specific problem domains and whose time is dedicated almost exclusively to those efforts. Examples include imaging/optics, digital signal processing, chemistry, geospatial information systems, flight and aerial dynamics, advanced mathematics, etc.

Regarding your software engineers, it’s healthiest, I think, to have a good mix of experience levels. One of my first jobs out of college was with a company whose entire software development staff was composed of kids fresh out of college. It was a small company with a tight budget. Since junior level developers were cheap and eager to prove themselves, it seemed like a no-brainer for the company to staff-up on people like me. The problem was, of course, we had no technical leadership. There was little to no design. Even the technologies we were using were unfamiliar to most of us. Lots of ugly code got written. Bad practices were rampant and often reinforced by our peers. The products were built and, for the most part, did what they were expected to do. But they were buggy, fragile, and nightmares to maintain.

Having a team of only senior level folks can have its disadvantages as well. Egos can come into play. There can be conflict as various folks attempt to establish themselves as the alpha-programmer. Differing and contentious opinions can cause conflict. Many teams in this situation often find themselves struggling to make progress, especially in the early stages of a project.

Another problem can arise when it comes to the pressures of schedules. If you find that given your current team structure, it’s apparent you’re going to miss significant milestones or deadlines, be leery of hiring more people to throw at the problem. There’s a quote, often called Brooks’ Law, from Fred Brooks’ book “The Mythical Man-Month” that says, “adding manpower to a late software project makes it later.” And it’s true. The best you can do in these situations is to figure out where the obstacles are for your teammates and do your best to eliminate them. Learn from the situation and when your team is readying for the next phase of development, adjust the team accordingly. Adding to your team late in the project will usually do little to get you to your goal faster, and will often just slow you down.

Leadership

Occasionally, you’ll find a software manager that promotes the idea of empowering team members to affect change. It’s very progressive and it sounds awesome. The thinking is that by empowering an entire group of people instead of a lone individual, the group as a whole will benefit from a large pool of good ideas. Not to mention, everybody likes to feel empowered, right? The folly, of course, is that every team is chock full of “experts” and they all have different ideas of the “right way” to tackle any given problem. Guess what? We’re not all experts. Not all ideas are good or even practical. And it’s usually only the loudest that get heard. That sounds harsh, but it’s true.

Software teams need leadership that can make swift, deliberate, and informed decisions with the least amount of friction possible. This is true regardless if you’re part of a modern, self-organized team or one with a formally defined structure. The only way to avoid group-think paralysis is by having a small number of informed and respected individuals whose opinions and ideas the team trusts empowered with the authority to make decisions that affect the entire project and team.

The choice of leadership sets the tone for the entire team. A calm, good-natured leader can have a calming effect on the entire team, even during times of crisis. A loud, overly-aggressive leader can make the team feel anxious and panicky in the best of times. When choosing your technical leadership, remember that the demeanor of the technical leader can be just as important as their technical skillset.

Workflow

A clearly defined and communicated workflow is one of the most important things a software development team can have. It sets expectations and lays out the lifecycle for units of work. This should communicate, at a minimum, the following key points:

  • The developer’s next step once they’ve completed a code change.
  • How changes flow through the source code repository. If you’re using branches and/or forks, what’s the structure and how/when does code move from one branch or fork to another?
  • How code reviews are initiated, as well as the responsibilities of both the original code writer and the code reviewer.
  • When/how testers get their hands on changes.
  • How builds work and how to tell if a given change has made it into a build.
  • How progress tracked.
  • How product versioning works.
  • When a given unit of work is considered done. This may seem obvious, but many teams struggle with this concept. Is it done when the developer considers it finished? Or do testers get the final say? Do other stakeholders need to chime in? Is there a difference between “complete” and “accepted”?

Once you’ve decided on your development workflow, capture it in your tooling. Many issue tracking systems make this simple. Some systems, like Jira, have explicit workflow support baked right in. Other systems, like Rally, often require a little bit more work forcing you to explicitly define tasks for each phase in an issue’s lifecycle. However you decide to do it, whatever tool you use, make sure it’s easy for everybody, including testers, business analysts, project managers, etc., to see it and use it. It should accurately reflect the state of the work at any given time. The importance of this shouldn’t can’t be overstated. The bigger the team, the more important it is.

Many teams are in a perpetual state of flux. They leave their workflow a bit open-ended, allowing it to evolve in hopes that it’ll settle into a natural rhythm that feels right for the team. For teams of more than a few people, this kind of approach usually feels chaotic.

Define your workflow, declare it so, capture it in your tools, and then stand back and watch. Make notes of what does and doesn’t work along the way. Allow your workflow wishlist to build and institute changes only at logical breaking points that carry the least amount of risk to your team’s work (e.g., after a product release). Making small changes frequently will fatigue your teammates and will often do little to alleviate the confusion.

Remember that communication is the litmus test for your workflow. If your team’s communication frequently breaks down, that’s usually a sign that your workflow has flaws.

The Religion of Methodology

Every software methodology has had its day in the sun. Even the waterfall model, which is often looked upon with disdain by folks today, was once all the rage. Methodologies, like many things in the world of computing, are akin to religions. For some developers, Scrum is the only true way to do software development. For others, it might be Spiral. But we oftentimes forget that a successful implementation of a software methodology is dependent largely upon the organization in which it’s implemented.

There is no one-size-fits-all software methodology. Just because you’ve had a long love-affair with Scrum or Lean, doesn’t make it appropriate for your current team. There are usually forces external from your software team at play. And these forces can significantly impact your team’s ability to implement certain types of methodologies effectively. Believe it or not, there are some contexts in which Waterfall is still the most appropriate way to do things. *gasp*

Shoehorning the wrong methodology into your team results in a lot of wasted energy and can often be an exercise in frustration. If your team tries to implement a given methodology and realizes it’s not working, perhaps you should give it up and find something else. If you can’t find a methodology that’s a good fit for you team, don’t be afraid to make something up. The methodology you use doesn’t need a fancy name, an O’Reilly book written about it, or a yearly conference dedicated to it. The most important thing is that you’re putting into practice something that allows your team to work effectively, work comfortably, and maintain good relationships with other parts of the organization. That’s it. If you feel like your team has succeeded in doing that, congratulations! It ain’t broke.

On a related note, if you try to implement a particular methodology that doesn’t work, you probably shouldn’t continue to identify your team with that particular methodology. For example, if you try Scrum, and the only thing you manage to adopt is the concept of a sprint, guess what? You’re not really a Scrum team. Continuing to identify yourself as Scrum can result in a team identify crisis of sorts. Meetings can quickly devolve into what your team does that is or isn’t Scrum. Team members may start championing for practices that might never be practical for your team. If you know you can never truly do Scrum, stop identifying yourself as such and you’ll cut down on the noise.

Workspace

The working environment significantly impacts team member productivity, as well as their sense of satisfaction. A software teams needs accommodations for both private, focused thinking, as well as collaboration. Don’t mistakenly assume that those needs are in equal proportion and certainly don’t get the proportion wrong. And, of course, the worst mistake you can make is sacrificing one for the sake of the other.

Software teams are composed of thought workers. And while it’s true that there are frequent occasions where collaboration is required, I expect you’ll find most your team members’ time is spent alone and focused. As such, you should ensure your environment is conducive for that sort of activity. The more distractions there are, the more frustrated your thought workers will be.

In the early part of the 2000’s, collaborative, or open-concept, workspaces for software teams became extremely fashionable. Technology companies started paying attention to what was happening in Silicon Valley and felt the need to emulate it. Ignoring the fact that most Silicon Valley startups were lucky to even have office space in the first place, often finding themselves in lofts or shared spaces with other companies, the rest of the world’s technology companies charged forward and spent buckets of money renovating their workspaces. Down came the office walls (ironically, managers’ offices usually remained in-tact). Away went the cube walls. In came the foosball tables and beanbag chairs. Conference rooms were rebuilt with glass walls in an effort to maintain the open-space concept. Developers suddenly found themselves face-to-face with their neighbors. What do you suppose happened next? The sphere of distraction took hold. Things slowed down. The quality of work suffered. And people started working from home. They had to. It was the only way to get anything done.

Unfortunately, many companies haven’t gotten the memo that the Great Collaborative Workspace Experiment was a failure and continue to push forward in the name of innovation. To those of you in positions of influence in these companies, I beg of you, be considerate to your thought workers and their needs. If you’ve created a working environment that you couldn’t do your taxes in, you should probably rethink your workspace.

The extreme opposite of open-concept workspaces are cubicle farms and private office spaces. There are plenty of faux pas to be made there as well. The first being shared space. If you’ve ever had to share a cube or office with someone, you’ll know that while it’s significantly better than an open concept workspace, it’s far from ideal. Distractions abound. Another problem is giving your developers a space that’s too small. No one wants to work in a closet. And if you’re an embedded developer, chances are your space will be cluttered with devices and various pieces of hardwaremabobs. These take room too.

Bottom line, your thought workers need privacy and adequate space to work comfortably. If they’re not comfortable, they won’t be doing their best. If you find that your hands are tied and that you simply can’t accommodate the needs of your developers, you may have to think a bit more creatively (hint: telecommuting).

Mentoring

I once worked for a company that hired summer interns, threw them in the deep end, and then expected great things. There was little orientation when it came to the codebase or tooling. There was certainly no mentoring. And worse, the interns were usually given the responsibility of implementing complex components that were critical to the success of the company’s flagship application. For all intents and purposes, the interns were treated like any other mid-to-senior level software engineer. How do you suppose that worked out?

Even if your team doesn’t hire interns, chances are you’ve probably got a few green software engineers on your team. It’s important that you recognize that not all software developers are equal. Those job titles (Software Engineer I, Software Engineer II, Senior Software Engineer, etc.) exist for a reason. Expect to see the same quality of work and pragmatism in your entire engineering staff regardless of experience level and you’ll be sorely disappointed with the results.

It’s critical to invest in your junior level software developers. Provide them with mentoring opportunities. It’s your team’s responsibility to teach them good habits and break them from the bad ones before they take root. Foster an environment of learning. We don’t graduate college knowing everything we need to know. I learned more in my first year on the job than I learned during my entire time in college. I’m sure for many of you, it was the same way.

Don’t forget what it was like being fresh out of school and eager to prove yourself. Remember the feelings you had when you first realized how much you didn’t know. And with that in mind, cultivate the environment that you wish you’d had when you first entered the workforce.

Junior developers won’t be junior forever. A lot of their personal growth hinges on the experiences your team provides. So make it count.

R&D

Many software development teams find themselves belonging to a larger organization-within-the-organization called R&D, or Research and Development. And unfortunately, many of those teams spend all their time on the “D” and very little on the “R”. Product development is important. But so is the research part.

From a business perspective, everyone understands the importance of investing in research. It’s all about diversifying and staying relevant. No company wants to watch their product line creeping towards obsolescence.

However, something that’s not often recognized is that devoting resources to research can also have a huge benefit to your software team. Software developers are a naturally curious breed. We don’t typically enjoy getting stuck on the same project for long periods of time. New technology excites us. Being able to work on something new and different from time to time can break the monotony and improve the overall sense of job satisfaction.

The way you allocate resources for research depends largely on your team’s commitments and priorities. I’m not going to suggest you do something as extreme as Google’s 20% time. But at the very least, you could encourage pet projects among your team members. I once worked at a company that formed a three-man “New Initiatives” team that was intended to develop proof-of-concepts / prototypes thought up by creative minds within the company. The plan was to rotate developers in and out of that team periodically. It didn’t really work for us, but it wasn’t because it was a bad idea. It failed mostly because it essentially turned into a product team overnight.

Regardless of how you choose to do the “R” of R&D, ensure that it’s a priority and communicate the status of those efforts to the rest of the team periodically. And most importantly, make sure each of your team members is able to contribute to the research efforts at some point along the way.

Training and Continued Education

I mentioned earlier the importance of investing in your team’s junior developers. This holds true for everybody else too. The software industry is changing at a faster and faster rate. Sure, developers need to be kept up to date with the latest trends, tools, and best practices. But so do business analysts, project managers, marketing folks, and the guys in QA.

Everyone on the team has a responsibility to continue to learn and develop their skillset. But the team’s leaders have the added responsibility to encourage their teammates and open doors to potential learning opportunities. Conferences, on-site training, online courses, books, trade magazines, and user-groups are all excellent sources of information.

Advocate for training and continued education to be included in the team’s budget. If money is tight, and it sometimes can be, encourage your team members to take the initiative and learn something on their own that they can present to the rest of the group. Weekly or monthly lunch-n-learns can be a lot of fun. Not to mention, they’re cheap.

Tooling

Software development teams use a lot of tools. Some of them are for the entire team, some are specific to the role an individual plays on the team. In all cases, it’s important for team members to know their tools.

On another team, I watched a project manager who used the same issue tracking tool for years somehow continue to fumble their way through it meeting after meeting. Could it have been because the tool was hard to use? Maybe. Could it be because the project manager never bothered to really learn the tool? Possibly. The real eye opener in this case was realizing the rest of the team also lacked a basic proficiency with the tool. Most never even bothered to use it. “How were issues tracked effectively?”, you ask. The answer is not well.

If your team is using a tool that it tolerates instead of embraces, that’s probably a sign you need to either switch tools or invest in training.

I would like to say a few things about developer-specific tools and builds, mostly because the developer’s point-of-view is my point-of-view, but also because developer tooling is essentially up-stream tooling and has an implied impact on the entire team.

Let’s talk about builds first. There are essentially two types of builds – developer builds and build server builds. Ideally, these should work the same. There will already be plenty of head-scratching opportunities when things work in one place, but not the other. Why add to the complexity by introducing different mechanisms for performing builds?

Developers should find performing local builds as easy as pushing a button or issuing a simple command. That one button or command should automate the entire build process. This includes things that happen before the code compilation process, such as image and font generation, language translation support, resource file compilation, etc, as well as things that happen after code compilation, such as running automated tests and building installers. This provides a consistent way for your team to build the software and codifies any assumptions made about the built components. It also effectively documents the way the software is built.

In addition to easy software builds easy, developers must be able to debug without too much effort. Don’t let your build process or tooling get in the way. As soon as you require a developer to think about how to use his tools, you’ve derailed his train of thought. You’ve distracted him from the real problem he’s trying to solve and a distraction like this can be a huge source of productivity loss.

A good rule of thumb is that developers should always be able to build and debug software from their local development environment. If there’s a separate team responsible for managing tools and builds, they should also be contributing developers who are able to eat their own dog food. They need to feel the same pain as everyone else. If they don’t understand how developers use the tools, they’re in no position to manage them.

Software Architecture

There’s a trend among some software teams, especially young Agile teams, to forego high-level, architectural design in favor of test driven or “just in time” design. The misconception is that big-picture design is a waste of time since things will probably change anyway and that you can refactor architecture as you encounter bumps in the road. This is a tad naive. Agile does not preclude architecture. To quote someone much beloved by the Agile community, Uncle Bob Martin,

“There has been a feeling in the Agile community since about ’99 that architecture is irrelevant, we don’t need to do architecture, all we need to do is write a lots of tests and do lots of stories and do quick iterations and the code will assemble itself magically, and this has always been horse shit. I even think most of the original Agile proponents would agree that was a silliness.”

-Uncle Bob Martin, Coplien and Martin Debate TDD, CDD and Professionalism

Test driven design has its place, but that place is at the micro level. Function implementations, classes and interface designs, algorithms, etc. all benefit from test driven design. At the macro level, test driven design is a recipe for disaster. A software team needs a clear and unified vision of what they’re building. It’s critical to map out at a high-level the components of a system along with their contracts and responsibilities. This process is best performed by one, perhaps two, individuals and not the entire team.

If you fail to perform architectural due diligence early in your projects, you will most certainly regret it. Refactoring architecture is not a trivial task and it can introduce a significant amount of risk, usually at inopportune times in a project’s schedule. You’ll do your team well to give this some thought up front.

I debated including software architecture in this list because it’s a bit more techie than I wanted to get. But a poorly architected piece of software affects the entire team, not just the developers. While developers feel the effects of poor architecture when they dive into the code, the rest of the team feels poor architecture in the form of slow progress, higher incidence of regressions, developer lethargy, increased technical debt, scheduling problems, inability to plan ahead, scope creep, etc. Avoiding architecture early in the project will give you plenty of short term gains. But in the long term, it’s like driving with flat tires.

Documentation

Every team generates some amount of documentation. Requirements documents, software architecture and design documents, build and code quality metrics, team process specification, meeting notes, HOW-TO’s, the list goes on and on.

Whatever types of documentation your team does or does not produce, it’s important that your team knows what sort of documentation is available and how to find it. It’s also just as important to recognize what is lacking and should be created.

Types of documents that are often overlooked include things like project requirements (“What!?” Yes, you’d be surprised how often this doesn’t exist), getting started guides for new hires, and workflow/process documentation.

Take a hard look at the documentation you do have and then imagine yourself as someone new to the team. What would they need to know?

It’s not uncommon for teams to amass a large volume of documentation that’s scattered across a variety of locations. Keep your documentation organized and easy to access. Whether you use a Wiki or a shared network folder full of Word documents, it doesn’t matter so long as it’s easy to get to. Missing documentation, documentation that’s hard to find, or documentation that contradicts itself all work against you.

Conclusion

If you recognize dysfunction within your own software team, take comfort in knowing you’re in good company. We all experience it in one degree or another. The real challenge is in not becoming apathetic to the cause.

The number one goal for a software team is to develop reliable, maintainable software that meets or exceeds our customer’s expectations. Team dysfunction is an obstacle in obtaining that goal. The bigger the team, the more opportunity there is for dysfunction.

I encourage you to think about some of the points mentioned above. And I invite you to share some of your own stories and ideas for mitigating team dysfunction. Please leave your comments below.

-Shane

A Brief History of Windows Audio APIs

A few months ago, the audio programming bug bit me pretty hard. I’m not entirely sure why it took so long really. I’ve been recording and mixing music since college. And much of my software development career has been built on a giant mass of C/C++ code. But somehow these worlds never converged. Somehow, with all of the time I’ve spent in front of tools like Cakewalk Sonar and FruityLoops, it never occurred to me that I might be able to learn what these applications are doing under the hood.

Then my nightstand began accumulating books with titles like “The Theory of Sound” and “Who Is Fourier? A Mathematical Adventure”. I absorbed quite a bit of theoretical material in a relatively short amount of time. But as with anything, you must use it to learn it. So I looked to the platform I already use for all of my audio recording – Windows.

What I found was a dark, mysterious corner in the Windows platform. There’s not a lot in the way of introductory material here. As of this writing, I could find no books dedicated to Windows audio programming. Sure, there’s MSDN, but, um, it’s MSDN. I also spent some time digging through back issues of Windows Developer’s Journal, MSDN Magazine, Dr. Dobbs, etc. and the pickings were slim. It seemed the best sources of information were blogs, forums, and StackOverflow. The trick was wading through the information and sorting it all out.

Developers new to Windows audio application development, like me, are often overwhelmed by the assortment of APIs available. I’m not just talking about third party libraries. I’m talking about the APIs baked into Windows itself. This includes weird sounding things like MME, WASAPI, DirectSound, WDM/KS, and XAudio2. There are a lot different paths a developer could take. But which one makes the most sense? What are the differences between them all? And why are there so many options?

I needed a bit more information and context in deciding how I was going to spend my time. And for this, I had to go back to 1991.

1991 – Windows Multimedia Extensions (aka MME, aka WinMM): Ahhh…1991. That was the year both Nirvana’s “Nevermind” and “Silence of the Lambs” entered pop culture. It was also the year of the first Linux kernel and the very first web browser. Most of us didn’t realize it at the time, but a lot of cool stuff was happening.

Most PCs of this vintage had tiny little speakers that were really only good at producing beeps and bloops. Their forte was square waves. They could be coerced into producing more sophisticated sounds using a technique called Pulse Width Modulation, but the quality wasn’t much to get excited about. That “Groove is the Heart” sound file being played through your PC speaker might be recognizable, but it certainly wasn’t going to get anybody on the dance floor.

Sound cards didn’t usually come bundled with name brand PCs, but they were becoming more and more popular all the time. Independently owned computer shops were building and selling homebrew PCs with sound cards from companies like Creative Labs and Adlib. Folks not lucky enough to buy a computer bundled with a sound card could buy an add-on card out of the back of a magazine like PC Computing or Computer Shopper and be up and running in no time.

The 90’s was also the golden age for the demo scene. Programmers pushed the limits of graphics and audio hardware in less bytes than most web pages are today. Amiga MOD files were a big deal too. They even inspired many audio enthusiasts to build their own parallel port DACs for the best audio experience. And then there were the video games. Game publishers like Apogee and Sierra Entertainment were cranking out awesome game titles, most of which could take advantage of Sound Blaster or Adlib cards if they were available.

Professional audio on the PC existed, but it was usually implemented using external hardware solutions, proprietary software, and proprietary communications protocols. Consumer grade sound card manufacturers were adding MIDI support in the form of a dual purpose joystick port that seemed oddly out of place. It was more of a marketing tactic than a useful feature. Most consumers had no idea what MIDI was.

It was at this point when Microsoft decided to add an audio API for Windows. Windows 3.0 had been out for a year and was in widespread use. So Microsoft released a version of Windows 3.0 called Windows 3.0 with Multimedia Extensions (abbreviated MME, sometimes referred to in software development circles as the waveOut API). MME has both a high-level and low-level API. The low-level API supports waveform audio and MIDI input/output. It has function names that start with waveIn, waveOut, midiIn, midiStream, etc. The high-level API, the Media Control Interface (MCI), is REALLY high level. MCI is akin to a scripting language for devices.

MME was the very first standard audio API for Windows. It’s evolved a bit over the years, to be sure. But it’s still around. And it works well, but with some caveats.

Latency is a problem with MME. Dynamic, near-real time audio (e.g., game event sounds, software synthesizers, etc.) is a bit harder to do in a timely fashion. Anything that occurs 10ms later than the brains thinks it should is perceived to be out of sync. So that kind of programming is pretty much out of the question. However, pre-generated content (e.g., music files, ambient sounds, Windows system sounds, etc.) works well with MME. At the time, that was good enough.

MME is still around. Some might even use the word thriving. Historically, support for high quality audio has been a pain point for MME. Parts of the MME API (e.g., anything that deals with the device capability structures WININCAPS and WINOUTCAPS) can only handle a maximum of 96kHz and 16-bit audio. However, in modern versions of Windows, MME is built on top of Core Audio (more on this later). You may find that even though a device can’t report itself as capable of higher quality audio, higher sample rates and bit depths work anyway.

1995 – DirectSound (aka DirectX Audio): When Windows 3.1 came out in 1992, MME was officially baked in. But Windows still left game developers uninspired. All versions of Windows up to this point were effectively shells on top of DOS. It was in the way. It consumed memory and other resources that the games desperately needed. DOS was well known and already a successful platform for games. With DOS, games didn’t have to compete for resources and they could access hardware directly. As a result, most PC games continued to be released as they had been – DOS only.

Along came Windows 95. Besides giving us the infamous “Start” button and the music video for Weezer’s “Buddy Holly”, Windows 95 brought with it DirectX. DirectX was core to Microsoft’s strategy for winning over game developers, whom they saw as important for the success of Windows 95.

DirectX was the umbrella name given to a collection of COM-based multimedia APIs, which included DirectSound. DirectSound distinguished itself from MME by providing things like on the fly sample rate conversion, effects, multi-stream mixing, alternate buffering strategies, and hardware acceleration where available (in modern versions of Windows, this is no longer the case. See the discussion on Core Audio below). Because DirectSound was implemented using VxDs, which were kernel mode drivers, it could work extremely close to the hardware. It provided lower latency and support for higher quality audio than MME.

DirectSound, like the rest of DirectX, wasn’t an instant hit. It took game developers time, and a bit of encouragement on the part of Microsoft, to warm up to it. Game development under DOS, after all, was a well worn path. People knew it. People understood it. There was also a fear that maybe DirectX would be replaced, just as its predecessor WinG (a “high-performance” graphics API) had been. But eventually the gaming industry was won over and DirectX fever took hold.

As it relates to professional audio, DirectSound was a bit of a game changer. There were PC-based DAW solutions before DirectX, to be sure. From a software perspective, most of them were lightweight applications that relied on dedicated hardware to do all of the heavy lifting. And with their hardware, applications did their best at sidestepping Windows’ driver system. DirectSound made it practical to interact with hardware through a simple API. This allowed pro-audio applications to decouple themselves from the hardware they supported. The umbilical cord between professional grade audio software and hardware could be severed.

DirectX also brought pluggable, software based audio effects (DX effects) and instruments (DXi Instruments) to the platform. This is similar in concept to VST technology from Steinberg. Because DX effects and instruments are COM based components, they’re easily discoverable and consumable by any running application. This meant effects and software synthesizers could be developed and marketed independently of recording applications. Thanks to VST and DX effects, a whole new market was born that continues to thrive today.

Low latency, multi-stream mixing, high resolution audio, pluggable effects and instruments – all of these were huge wins for DirectSound.

1998 – Windows Driver Model / Kernel Streaming (aka WDM/KS): After the dust settled with Windows 95, Microsoft began looking at their driver model. Windows NT had been around for a few years. And despite providing support for the same Win32 API as it’s 16-bit/32-bit hybrid siblings, Windows NT had a very different driver model. This meant if a hardware vendor wanted to support both Windows NT and Windows 95, they needed to write two completely independent drivers – drivers for NT built using the the Windows NT Driver Model and VxDs for everything else.

Microsoft decided to fix this problem and the Windows Driver Model (WDM) was born. WDM is effectively an enhanced version of the Windows NT Driver Model, which was a bit more sophisticated than the VxDs used by Windows 95 and 3.x. One of the big goals for WDM, however, was binary and source code compatibility across all future versions of Windows. A single driver to rule them all. And this happened. Sort of.

Windows 98 was the first official release of Windows to support WDM, in addition to VxDs. Windows 2000, a derivative of Windows NT followed two years later and only supported WDM drivers. Windows ME, the butt of jokes for years to come, arrived not long after. But ME was the nail in the coffin for the Windows 9.x product line. The technology had grown stale. So the dream of supporting a driver model across both the NT and the 9.x line was short lived. All versions of Windows since have effectively been iterations of Windows NT technology. And WDM has since been the lone driver model for Windows.

So what’s this WDM business got to do with audio APIs? Before WDM came about, Windows developers were using either DirectSound or MME. MME developers were used to dealing with latency issues. But DirectSound developers were used to working a bit closer to the metal. With WDM, both MME and DirectSound audio now passed through something call the Kernel Audio Mixer (usually referred to as the KMixer). KMixer was a kernel mode component responsible for mixing all of the system audio together. KMixer introduced latency. A lot of it. 30 milliseconds, in fact. And sometimes more. That may not seem like a lot, but for a certain class of applications this was a non-starter.

Pro-Audio applications, such as those used for live performances and multitrack recording, were loathe to embrace KMixer. Many developers of these types of applications saw KMixer as justification for using non-Microsoft APIs such as ASIO and GSIF, which avoided the Windows driver system entirely (assuming the hardware vendors provided the necessary drivers).

Cakewalk, a Boston-based company famous for their DAW software, started a trend that others quickly adopted. In their Sonar product line starting with version 2.2, they began supporting a technique called WDM/KS. The WDM part you know. The KS stands for Kernel Streaming.

Kernel streaming isn’t an official audio API, per se. It’s something a WDM audio driver supports as part of its infrastructure. The WDM/KS technique involves talking directly to the hardware’s streaming driver, bypassing KMixer entirely. By doing so, an application could avoid paying the KMixer performance tax, reduce the load on the CPU, and have direct control over the data delivered to the audio hardware. Latency wasn’t eliminated. Audio hardware introduces its own latency, after all. But the performance gains could be considerable. And with no platform components manipulating the audio data before it reached the hardware, applications could exert finer control over the integrity of the audio as well.

The audio software community pounced on this little trick and soon it seemed like everybody was supporting WDM/KS.

It’s worth noting at this point in the story that, in special circumstances, DirectSound could actually bypass KMixer. If hardware mixing was supported by both the audio hardware and the application, DirectSound buffers could be dealt with directly by the audio hardware. It wasn’t a guaranteed thing, though. And I only mention it here in fairness to DirectSound.

2007 – Windows Core Audio: It was almost 10 years before anything significant happened with the Windows audio infrastructure. Windows itself entered an unusually long lull period. XP came out in 2001. Windows Vista development, which had begun development 5 months before XP had even been released, was fraught with missteps and even a development “reboot”. When Vista finally hit the store shelves in 2007, both users and developers were inundated with a number of fundamentals changes in the way things worked. We were introduced to things like UAC, Aero, BitLocker, ReadyBoost, etc. The end user experience of Vista wasn’t spectacular. Today, most people consider it a flop. Some even compare it to Windows ME. But for all of its warts, Vista introduced us to a bevvy of new technologies that we still use today. Of interest for this discussion is Windows Core Audio.

Windows Core Audio, not to be confused with OSX’s similarly named Core Audio, was a complete redesign in the way audio is handled on Windows. KMixer was killed and buried. Most of the audio components were moved from kernel land to user land, which had an impact on application stability. (Since WDM was accessed via kernel mode operations, WDM/KS applications could easily BSOD the system if not written well). All of the legacy audio APIs we knew and loved were shuffled around and suddenly found themselves built on top of this new user mode API. This included DirectSound, which at this point lost support for hardware accelerated audio entirely. Sad news for DirectSound applications, but sadder news was to come (more on this in a bit).

Core Audio is actually 4 APIs in one – MMDevice, WASAPI, DeviceTopology, and EndpointVolume. MMDevice is the device discovery API. The API for interacting with all of the software components that exist in the audio path is the DeviceTopology API. For interacting with volume control on the device itself, there’s the EndpointVolume API. And then there’s the audio session API – WASAPI. WASAPI is the workhorse API. It’s where all of the action happens. It’s where the sausage, er, sound gets made.

Along with new APIs came a number of new concepts, such as audio sessions and device roles. Core Audio is much better suited to the modern era of computing. Today we live in an ecosystem of devices. Users no longer have a single audio adapter and a set of speakers. We have headphones, speakers, bluetooth headsets, USB audio adapters, webcams, HDMI connected devices, WiFi connected devices, etc. Core Audio makes it easy for applications to work with all of these things based on use-case.

Another significant improvement Core Audio brings us is the ability to operate in either shared mode or exclusive mode.

Shared mode has some parallels with the old KMixer model. With shared mode, applications write to a buffer that’s handed off to the system’s audio engine. The audio engine is responsible for mixing all applications’ audio together and sending the mix to the audio driver. As with KMixer, this introduces latency.

Exclusive mode is Microsoft’s answer to the pro-audio world. Exclusive mode has much of the same advantages of WDM/KS. Applications have exclusive access to hardware and audio data travels directly from the application to the driver to the hardware. You also have more flexibility in audio formats with exclusive mode as compared to shared mode. The audio data format can be whatever the hardware supports – even non-PCM data.

At this point, you might assume WDM/KS can go away. Well, it can’t. As I said before, it’s not really an API. It’s part of the WDM driver infrastructure, so it will continue to exist so long as WDM exists. However, there’s no compelling reason to use WDM/KS for modern audio applications. An exclusive mode audio session in Core Audio is safer and just as performant. Plus it has the advantage of being a real audio API.

As of this writing, Windows 10 is the latest version of Windows and Core Audio still serves as the foundation for platform audio.

2008 – XAudio2: Over the years, DirectX continued to evolve. The Xbox, which was built on DirectX technologies, was a significant source of influence in the direction DirectX took. The “X” in Xbox comes from DirectX, after all. When DirectX 10 came out in 2007, it was evident that Microsoft had gone into their latest phase of DirectX development with guns blazing. Many APIs were deprecated. New APIs appeared that started with the letter “X”, such as XInput and XACT3.

XAudio2 appeared in the DirectX March 2008 SDK and was declared the official successor to DirectSound. It was built from the ground-up, completely independent of DirectSound. Its origins are in the original XAudio API which was part of XNA, Microsoft’s managed gaming framework. And while XAudio was considered an Xbox API, XAudio2 was targeted at multiple platforms, including the desktop. DirectSound was given “deprecated” status (this is the sadder news I mentioned earlier).

XAudio2 offers a number of features missing from DirectSound, including support for compressed formats like xWMA and ADPCM, as well as built-in, sophisticated DSP effects. It’s also considered a “cross-platform” API, which really just means it’s supported on the Xbox 360, Windows, and Windows Phone.

It’s worth mentioning that while XAudio2 is considered a low-level API, it’s still built on other technology. For the desktop, XAudio2 sits on top of Core Audio like everything else.

You might read all of this business about XAudio2 and assume that DirectSound is dead. We’re quite a way off from that, I think. There’s still a lot of DirectSound based software out there. Given Microsoft’s commitment to backwards compatibility, some level of DirectSound support/emulation is liable to exist in perpetuity. However, unless you’re determined to support versions of Windows that even Microsoft has stopped supporting, there’s no compelling reason to support DirectSound in modern audio applications.

Honorable Mention – ASIO: There are plenty of third party audio APIs available for Windows that weren’t invented by Microsoft. Some of them, like GSIF used by TASCAM’s (formerly Nemesys) GigaStudio, are tied to specific hardware. Some of them, like PortAudio and JUCE (more than just an audio API), are open-source wrappers around platform specific APIs. Some of them like OpenAL are just specifications that have yet to gain widespread adoption. But none has had quite the impact on the audio industry as ASIO.

Steinberg, the same forward-thinking company that gave us VSTs and Cubase, introduced us to ASIO all the way back in 1997. ASIO was originally a pro-audio grade driver specification for Windows. Its popularity, however, has allowed it to gain some level of support on Linux and OSX platforms. Its primary goal was, and still is, to give applications a high quality, low latency data path direct from application to the sound hardware.

Of course, the power of ASIO relies on hardware manufacturers providing ASIO drivers with their hardware. For applications that can support ASIO, all of the business of dealing with the Windows audio stack can be completely avoided. Conceptually, ASIO provides applications with direct, unfettered access to the audio hardware. Before Windows Vista, this could allow for some potentially significant performance gains. In the Core Audio world, this is less of a selling point.

The real-world performance of ASIO really depends on the quality of driver provided by the manufacturer. Sometimes an ASIO driver might outperform its WDM counterpart. Sometimes it’s the other way around. For that reason, many pro-audio applications have traditionally allowed the user to select their audio driver of choice. This, of course, makes life complicated for end-users because they have to experiment a bit to learn what works best for them. But such is life.

The waters get muddied even further with the so-called “universal” ASIO drivers, like ASIO4ALL and ASIO2KS. These types of drivers are targeted at low cost, consumer-oriented hardware that lack ASIO support out-of-the-box. By installing a universal ASIO driver, ASIO-aware applications can leverage this hardware. In practice, this type of driver merely wraps WDM/KS or WASAPI and only works as well as the underlying driver it’s built on. It’s a nice idea, but it’s really contrary to the spirit of the ASIO driver. Universal drivers are handy, though, if the audio application you’re trying to use only supports ASIO and you’ve got a cheap sound card lacking ASIO support.

ASIO, like MME is an old protocol. But it’s very much still alive and evolving. Most pro-audio application professionals hold it in high regard and still consider it the driver of choice when interfacing with audio hardware.

Conclusion: “Shane, where’s the code?” I know, I know. How do you talk about APIs without looking at code? I intentionally avoided it here in the interest of saving space. And, yet, this article still somehow ended up being long winded. In any case, I encourage you to go out on the Interwebs and look at as much Windows audio source code as you can find. Browse the JUCE and Audacity source repos, look at PortAudio, and peruse the sample code that Microsoft makes available on MSDN. It pays to see what everybody else is doing.

For new developers, the choice of audio API may or may not be clear. It’s tempting to make the following generalization: games should go with XAudio2, pro-audio should go with ASIO and/or Core Audio, and everybody else should probably go with MME. Truth is, there are no rules. The needs of every application are different. Each developer should weigh their options against effort, time, and money. And as we see more often than not, sometimes the solution isn’t a single solution at all.

(Shameless Plug: If you’re interested in learning how to use Core Audio, consider purchasing an early draft of “Appendix B: Introduction to Windows Core Audio” from the book I’m currently working on entitled, “Practical Digital Audio for C++ Programmers.”)

Back from CppCon 2015

Now that the dust has settled a bit and I’ve adjusted to being back on east coast time, I thought it’d be worth talking a little bit about CppCon 2015. CppCon, for those who haven’t heard about it, is a five day conference devoted to C++ (seven if you pay for an extra class). It’s a relatively new kid on the block, this year being only the second year. It’s certainly not the only C++ conference in town. But CppCon distinguishes itself from all the others in terms of scale. Attendance to C++Now, for example, is capped at around 150 people and features three tracks at any given time. C++ and Beyond describes itself as “small” and features only one track at a time over three days. This year, CppCon saw nearly 700 people in attendance. That’s a nearly 15% growth over last year’s 600 attendee count. The days start early and end late. And at any given point, there could be up to six separate tracks going on. Presenters include folks from Google, Microsoft, Boost, Adobe, etc. As you can imagine, there’s enough content at CppCon to satiate even the thirstiest of minds.

Just like last year, CppCon was held at the Meydenbauer Center in beautiful Bellevue, Washington, a Seattle “suburb” (I use that word loosely. See Wikipedia.) that just so happens to be in Microsoft’s backyard. The conference center itself has four levels. The bottom two floors have amphitheatre-sized conference rooms that are used for keynotes and larger talks. The top floor has a number of smaller classroom sized conference rooms and is where most of the action actually takes place.

Most of the rock stars showed up again this year – Bjarne Stroustrup, Herb Sutter, Andrei Alexandrescu, John Lakos, etc. (Scott Meyers was noticeably MIA this year). Bjarne’s keynote, “Writing Good C++14”, set the tone for everything that was to come. The theme seemed to be “move forward” – abandon your old compilers, modernize your legacy codebase, and leave the past in the past. This was reflected by the significant number of talks that revolved around C++17 proposals and technical previews that will be appearing in a compiler near you.



Me with Bjarne Stroustrup

Like any conference, the quality of presentations was a mixed bag. There were great speakers and some not so great. Some presentations were practical, some were meta, and some were sales pitches for third party libraries. All tech conferences are like this to some degree. For conferences with only one or two tracks, this can be a mild annoyance. But the fact that there was so much happening at the same time allowed attendees to be a bit more discerning.

What about schwag? Schwag is something we all pretend to not care about. After all, we’re only there for the knowledge right? Mmhmm. 🙂 There actually wasn’t much schwag to speak of. This year, attendees received a plastic bag containing some flyers, a deck of “Woman in Tech” playing cards, and a thumb drive containing an offline version of cppreference.com. There were no free shirts, despite being asked to provide shirt size at registration time. At one point, JetBrains started giving away yo-yos, CLion stickers, and copies of a mini-book entitled “C++ Today: The Beast is Back”, which happened to serve as the basis for Jon Kalb’s highly entertaining presentation of the same name.That was about it. Not even the meals were free, which seemed to surprise a lot of folks.

Apart from that, there weren’t many disappointments at CppCon. This conference has a lot to offer. The talks were great. All of the presenters were approachable and very personable. The atmosphere was positive. And, most importantly, it was FUN. Would I go back again?
Definitely. Should you go? Absolutely.