Troublesome enums

Originally published in Develop issue 71, April 2007. Reproduced here with kind permission of Michael French.

I was looking at some old code the other day, reminiscing about times when, as an industry, we’d only just started using C++. Properties, long lists of states were kept in a single header file. We used enums.

Now enums are pretty useful; they’re type-safe, light-weight; useful named constants for a set of things. What’s not to like?

And in the code it all felt right – things were being kept together; a single file to maintain. Two years down the line though, the list had ballooned out of all control, and adding a single entry meant a painful rebuild.

Today we know better; we feel the pain as these properties litter header files. So we keep them isolated, reducing dependencies. Great if you’re fortunate to be writing new code, or have the opportunity to refactor. But often we have to work with legacy code.

What we’d like to do is move the enumeration into the compilation unit. Typically we’d use “forward declarations”, splitting the enum into two files. Let’s try

cats.h:

enum CoolCats;			// Forward declare (1)

cats.inl:

enum CoolCats
{
    PANTHER,
    CLOUDED_LEOPARD,
    HOWARD_MOON,

    …	 // lots and lots of cats, you get the idea

    COUNT_OF_CATS
};

Looks good and works in Visual Studio. Not so quick though; it, correctly, doesn’t work in GCC. Why should that be, when the technique is fine for structs and classes?

If I’ve understood correctly, the reason is this; in C++ the size of pointers to built-in types can be different. Why should this matter for enums? If you read section 7.2 of the standards (especially subsections 5 and 6) you’ll see that the underlying integral type of an enum is implementation specific and depends on the range of the enumeration. This makes it impossible to determine the size of the pointer to the (incomplete) enum, and thus unusable.

So back to the problem, how can we split the enum? Well the final line of 7.2.6 says “it is possible to define an enumeration that has values not defined by any of its enumerators”. Aha!
So instead, how about trying

cats.h:

enum CoolCats
{
    CATS_MIN = 0,
    CATS_MAX = 300 // problem here – how do we know what this should be?
};

cats.inl:

// Now here are the real enums.. note the numbers could come from a
// “private” nameless enum local to this file, but without scoping there
// would be nothing stopping it from being used by the client.

static const CoolCats panther = CoolCats( 1 );
STATIC_ASSERT( (panther >= CATS_MIN) && (panther <= CATS_MAX) );

// etc.

We’ve split the type declaration and enumerations into separate files. But is this really any better?

It’s questionable. The fact that I’ve had to use a blunt instrument like a cast should ring alarm bells. Can the ability to define “enums” outside the existing enumerators really be classed as safe? We’ve created something less readable, and what should be named properties only show up as numbers in the debugger.

But this solution might be one way to tackle another “issue” with the enum described. Is COUNT_OF_CATS really a type of cat? The enum has been used to represent two different things; types and counts. Sure, you could replace COUNT_OF_CATS with something like LAST_CAT, but even that has a slightly different meaning, and, because we’ve moved the bulk into the .inl file, we’ve had to set an arbitrary CATS_MAX “enum”. Unsatisfactory. It’s a shame that in C++ we can’t actually enumerate enums.

Perhaps, in this example, it’s better to define

cats.h:

enum CoolCats {}; 	// Empty enum, what’s the point?

static const CoolCats CATS_MIN = CoolCats(0);	// arbitrary, which isn’t great
static const CoolCats CATS_MAX = CoolCats(300);	// arbitrary, which still isn’t

Is this really any better? It’s certainly more complicated. Notice that we’ve completely lost one of the key features of enums – named constants. So in some ways, what’s the point?! But there’s another problem; CATS_MAX might not be what you think it is. By having an empty enum, the underlying type the compiler picks will be one that could hold zero. Imagine this could be a byte!

The crux of the problem is that for such a large list, using raw enums was the wrong solution. We should have developed some kind of “Cat” class, tailored to our hearts’ content, with features like default initial values (enums don’t), names (for debug) and so on.

But now we’ve come full-circle and unfortunately we are working with legacy code. I’ve presented a few of the gotchas, but it’s up to you.

Thanks to Steven Tattersall and Thaddaeus Frogley.