Hyrum’s Law

A few mornings ago, I was listening to CppCast Episode #70 on my commute into work. This particular episode featured an interview with Titus Winters. If you’re not familiar with Titus, he currently leads the C++ libraries team at Google and generally has lots to say regarding large codebases and sustainability. You can check out his CppCon talks here, here, and here.

During the interview, Titus shared a small anecdote regarding an attempt by his team to make broad code changes. They discovered weird behavioral dependencies that were never intended to be part of the API’s contract, never documented, and certainly never intended for client applications to rely upon. He followed the anecdote with the following “law”, attributed to his colleague Hyrum. (presumably Hyrum Wright)

“With a sufficient number of users of an interface, it doesn’t matter what you promised in the interface contracts, all observable behaviors of your class or function or whatnot will be depended upon by somebody.”

Titus calls this Hyrum’s Law and it’s a trueism if there ever was one.

Most developers learn to use an API or library by experimentation, regardless of whether there’s documentation to lean on. It turns out that the human brain has an uncanny need to actually apply a solution to a given problem before fully appreciating and understanding the problem. Attempting to use someone else’s software component in your own code is where the real learning starts. And when a moment of success is achieved, it’s often where the learning stops.

As I see it, there are really two major contributors to Hyrum’s Law. The first is that once some minimal amount of success is achieved, developers working with an API or library don’t always verify that a given component’s contract matches their expectations or usage. We’ve got other things to do, after all. And besides, what we’re doing makes total sense and is not at all weird, right? Umm….maybe? Maybe not? The second is that interface authors aren’t always comprehensive when it comes to contracts. Sometimes there are side-effects and pre/post conditions that escape our attention or we may make naïve assumptions about our users. For instance, threading restrictions, “undefined” behavior guarantees, object lifetime, optional vs. required client data, thrown exception types, etc. are all things that sometimes get glossed over when it comes time to document the contract. As such, users of our components can only make assumptions about things we aren’t explicit about.

As users of others’ code, we must be careful not to assume our usage of an interface is as the author intended. When in doubt (and when possible), verify. And if there’s disparity between the contract and actual behavior, please notify the author!

As authors of interfaces, we must be careful to document and make explicit the intended contract. Of course, that’s often easier said than done. It’s hard to be comprehensive, but we should give it our best shot. And we should be prepared to refine the contract at opportune times as we learn where the holes are.

C++: Heed the Signs!

We’ve all seen this warning at some point…

warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

Comparisons and arithmetic operations involving a mix of signed and unsigned numbers creep into code all the time. And when it does, compilers will sometimes produce helpful warnings such as the one shown above.

But what do you do if you see such a warning? Maybe you say to yourself, “What’s the big deal? The compiler should be able to figure out how to deal with mixed signage, right?” You then reassure yourself that it’s in-fact not a big deal, the compiler is smarter than you, and a blind eye is turned. If you’re feeling particularly cocky, you may even disable that compiler warning altogether.

But what are you ignoring? Could there be something sinister lurking in the dark, waiting to strike when you’re not paying attention?

Treading Into Murky Water

Take a look at this snippet of code.

#include <iostream>
int main(int argc, char **argv)
{
    unsigned int a = 1;
    signed int b = -1;
 
    if (b < a)
        std::cout << "All is right with the world.\n";
    else
        std::cout << "Up is down and down is up!\n";
 
    return 0; 
}

The variable a, which is unsigned, is initialized to 1. The variable b, which is signed, is initialized to -1. And b is obviously less than a, right?

Both Visual Studio and GCC compile the code with a warning similar to that shown above. Running the code produces the following output.

Up is down and down is up!

“Hey now! That’s madness!” you exclaim.

Perhaps. But before I explain what’s happening, let’s journey a little farther down the rabbit hole with one more example…

#include <iostream>
int main(int argc, char **argv)
{
    unsigned int a = 1;
    signed int b = -1;
 
    std::cout << "1 + -1 = " << (a + b) << "\n";
 
    b = -2;
 
    std::cout << "1 + -2 = " << (a + b) << "\n";
 
    return 0; 
}

In this example, both Visual Studio and GCC compile the code without error and without warning (even with all warnings turned on). The following output is produced.

1 + -1 = 0
1 + -2 = 4294967295

“What in the world is going on here?!” In short – unsigned promotions.

Let’s see what the C++ standard has to say. You’ll find the following excerpt in Section 5, “Expressions”.

Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:

  • If either operand is of scoped enumeration type, no conversions are performed; if the other operand does not have the same type, the expression is ill-formed.
  • If either operand is of type long double, the other shall be converted to long double.
  • Otherwise, if either operand is double, the other shall be converted to double.
  • Otherwise, if either operand is float, the other shall be converted to float.
  • Otherwise, the integral promotions shall be performed on both operands. Then the following rules shall be applied to the promoted operands:
    • If both operands have the same type, no further conversion is needed.
    • Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank shall be converted to the type of the operand with greater rank.
    • Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other operand, the operand with signed integer type shall be converted to the type of the operand with unsigned integer type.
    • Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type.
    • Otherwise, both operands shall be converted to the unsigned integer type corresponding to the type of the operand with signed integer type.

In short, the last conversion rule is applied if none of the other rules apply – “..both operands shall be converted to the unsigned integer type corresponding to the type of the operand with signed integer type.”

In our first example, the variable b is converted to an unsigned int during the comparison to a. What happens when you convert the value -1 to an unsigned value? It’s the same value as UINT_MAX, which on a platform with 32-bit integers equates to 4294967295. And, of course, that’s not smaller than 1 which is why we saw the output we did.

What about the second example? The first line of output seemed to work just fine. Only the second line produced unexpected results.

In the first line of output, we saw:

1 + -1 = 0

Here, a was equal to 1 and b was equal to -1. The code for this was…

std::cout << "1 + -1 = " << (a + b) << "\n";

However, if we substitute the values in for (a + b), applying the unsigned promotion to b, we have the following (assuming 32-bit integers)…

std::cout << "1 + -1 = " << (1 + 4294967295) << "\n";

In this particular case, adding 1 to 4294967295 overflows resulting in a value of 0. What’s especially interesting here is that 0 was our expected result, so this code actually provides us with a false sense that everything is working as it should.

It was only after we set b to a value of -2 that we saw weird things happen. Once again, here’s what the code looks like once b is promoted.

std::cout << "1 + -2 = " << (1 + 4294967294) << "\n";

And this, of course, equals 4294967295.

The Takeaway

Many C++ experts are adamant that the only time you should ever use an unsigned data type is when you need to store a bitmask. Signed data types should be used in ALL other cases. If you end up needing a value that exceeds the range of a given signed data type, use a bigger signed data type.

I agree with the sentiment of this. But the reality is that we don’t always have the luxury of picking our data types. We’re often at the mercy of APIs, sensor specifications, file formats, network protocols, etc.

Any time you encounter or write code that mixes signed and unsigned data types, proceed with caution. Think carefully about how the data is used, and apply a healthy dose of skepticism. And, of course, when in doubt, test, test, test.

* The STL makes heavy use of size_t, which is an unsigned type. When brought up in conversation, this point is often met with loud and uncomfortable groans. The general feeling is that allowing size_t into the STL was a mistake. But it’s something we’re stuck with for now.

Appendix B – Introduction to Windows Core Audio

For a little while now, I’ve been hard at work on a little side project. I almost hesitate to announce it at this point, because it’s still very early. But, what the heck. Why not. I’ve started writing a book. And no, it’s not one full of suspense and intrigue. Nor is it the next young adult break-out series. Turns out, I’m writing a programming book. The working title is “Practical Digital Audio for C++ Programmers”, which I admit is a mouthful. Henceforth (at least as far as this blog entry is concerned), I shall refer to it as PDA4CPP.

When I first started my audio programming journey, I quickly discovered there was a huge hole in the information available to newcomers to the field. There was plenty of material to be found regarding specific audio libraries. And there was even more material that discussed, in very mind-bendy ways, things like audio effects and synthesis that assumed you already had some level of comfort with digital audio programming. But there was very little in-between. And as a complete newb, I found it super discouraging. So I decided to do something about it. PDA4CPP is the fruits of my labor.

As I mentioned, the book is in its infancy. Only one chapter has been completed to date – Appendix B: Introduction to Windows Core Audio. But it’s a beast, coming in at 170 pages. In it, I talk about where Core Audio fits into the Windows story, the Windows audio architecture, device discovery, audio formats, WASAPI, audio rendering, and audio capturing.

Why did I start with Appendix B? Some of it was because of the questions and feedback I received from my blog entry, “A Brief History of Windows Audio APIs”. But mostly, I started with Appendix B because that’s where I needed to. Most of the book’s code will be implemented around a custom audio library that’s effectively a thin wrapper around platform-specific audio code. The Windows side of things provided as great a starting point as any.

Something I’m going to experiment with is making drafts of the book’s chapters available for purchase as I complete them. Not only will this help motivate me to keep writing, but it will also help me gauge interest. Appendix B is the first chapter available for purchase. Pricing for each chapter will vary based on each chapter’s size and density. More information can be found on the book’s page, which can be found under the “Pages” menu. An excerpt is available, as well as the chapter’s source code.

If you purchase the chapter and love it, hate it, or have ideas on how to improve it, please email me or leave a comment below.

Thanks!

-Shane