The Flattening Problem

Given a C or C++ program, can you write a script that will inline all the #includes, so that compiling its output is identical to compiling the program directly?

I call this ‘the flattening problem,” and it sounds much easier than it actually is. My first attempt, for example, was foiled by include guards.

To overcome infinite include loops, everyone’s first inclination is to remember which files are already included, and include them only once. This seems reasonable, but it assumes that no header file can usefully be included more than once, which is generally untrue. Instead, you need to keep track of defined preprocessor symbols (and their values), so that you can simply elide unreachable #includes. If you do this, you’re left with two problems:

  1. Not all preprocessor symbols definitions are contained in a program’s source code. Defines can be automatic, like __cplusplus and __FILE__, or external, via the compiler’s -D switch.
  2. #include statements can take preprocessor symbols as their arguments. There is a rather famous IOCCC winner, (vanschintz.c) that abuses this feature to solve the Towers of Hanoi problem.

The combination of 1 and 2 means you may actually encounter include directives with externally-defined arguments. This makes the problem impossible, in the general case.

You can, however, get an approximate solution with some shortcuts. First, you can make assumptions about automatic preprocessor defines (for example, you assume that .cpp files implicitly define __cplusplus). Second, you must be willing to simply fail on any #include statement that takes an externally-defined preprocessor symbol as its parameter.

(Thanks to Weimin Chen, Wendy Thrash, and Jonathan Caves for their help on this one.)


%d bloggers like this: