### But… but… It’s still compiling!

Filed under: Algorithm, boost, C++, Metaprogramming, Programming, template |

A few weeks ago, a friend posted an article on his blog about python being slow by comparing implementations of the 8 queens puzzle in different languages using a recursive approach. Then just for fun, I decided to code the compile-time version with C++. I was also curious about the time it would take for the compiler to complete the calculation. Here’s the code:

#include <iostream> typedef unsigned long long uint64_t; template <uint64_t Cols = 0, uint64_t Diag135 = 0, uint64_t Diag45 = 0, uint64_t Solution = 0> struct state { static uint64_t const cols = Cols; static uint64_t const diag135 = Diag135; static uint64_t const diag45 = Diag45; static uint64_t const solution = Solution; }; template <int K, int J, typename State> struct test { static bool const result = ((State::cols & (1ull << J)) + (State::diag135 & (1ull << (J + K))) + (State::diag45 & (1ull << (32 + J - K)))) == 0; }; template <int K, int J, typename State> struct mark { typedef state < State::cols ^ (1ull << J), State::diag135 ^ (1ull << (J + K)), State::diag45 ^ (1ull << (32 + J - K)), State::solution > state_type; }; template <int StartValue, int Times, typename State> struct accumulate_result { typedef typename accumulate_result < StartValue + 1, Times - 1, state < State::cols, State::diag135, State::diag45, State::solution + test<0, StartValue, State>::result > >::state_type state_type; }; template <int StartValue, typename State> struct accumulate_result<StartValue, 0, State> { typedef State state_type; }; template <int Niv, int Dx, typename State> struct solve; template <bool Condition, typename State, int Current, int Niv, int Dx> struct result_from_test { typedef typename mark < Niv, Current, typename solve < Niv - 1, Dx, typename mark<Niv, Current, State>::state_type >::state_type >::state_type state_type; }; template <typename State, int Current, int Niv, int Dx> struct result_from_test<false, State, Current, Niv, Dx> { typedef State state_type; }; template <int Begin, int Times, typename State, int Niv, int Dx> struct process_queens { typedef typename process_queens < Begin + 1, Times - 1, typename result_from_test < test<Niv, Begin, State>::result, State, Begin, Niv, Dx >::state_type, Niv, Dx >::state_type state_type; }; template <int Begin, typename State, int Niv, int Dx> struct process_queens<Begin, 0, State, Niv, Dx> { typedef State state_type; }; template <int Niv, int Dx, typename State = state<> > struct solve { typedef typename process_queens<0, Dx, State, Niv, Dx>::state_type state_type; static uint64_t const result = state_type::solution; }; template <int Dx, typename State> struct solve<0, Dx, State> { typedef typename accumulate_result<0, Dx, State>::state_type state_type; static uint64_t const result = state_type::solution; }; template <int Dx> struct meta_queens: solve<Dx - 1, Dx> {}; int main() { std::cout << meta_queens<8>::result; }

That’s a good stress-test for the compiler’s template instantiation mechanism, below are the compilation time for N=2 to N=8 using gcc 4.4. I simply used the time command on the shell to get the results.

N = 2: 0m0.206s

N = 3: 0m0.209s

N = 4: 0m0.216s

N = 5: 0m0.260s

N = 6: 0m0.559s

N = 7: 0m7.545s

N = 8: 5m38.229s

Well, I was expecting it to be slow but to be honest, not that slow. We’re in 2010, with libraries like boost, template metaprogramming isn’t just for academic purposes anymore. At first, I thought I might have used a buggy version of gcc so I tried another one, gcc 4.2, but same results. Then I tried the same code on Visual C++ 2008 and the latest Visual C++ 2010 which is still Beta 2 but both had similar results. Finally I tried compiling it with clang, even though I wasn’t sure it would compile my code, because I’ve been following its development for the last few months and I have been very impressed so far. As you will see with the following results, there’s still hope.

N = 2: 0m0.212s

N = 3: 0m0.214s

N = 4: 0m0.219s

N = 5: 0m0.240s

N = 6: 0m0.319s

N = 7: 0m0.653s

N = 8: 0m2.164s

N = 9: 0m9.190s

Wow, it’s like day and night, clang is a lot faster. I even tried it with N=10, it took a fantastic 8 hours and 38 minutes but that result is flawed since I don’t have enough RAM(4GB on the test machine) and the hard drive was excessively swapping. Let’s say I wouldn’t try it with gcc or cl.exe (Visual C++).

Why is it so slow with gcc and Visual C++? I had the opportunity to get an answer from Douglas Gregor, a clang developer. He said that he doesn’t know about Visual C++ but for gcc, every time we name the type *state<Cols, Diag135, Diag45, Solution>* for some values *Cols*, *Diag135*, *Diag45* and *Solution*, it does a linear search through a list of every instantiation of “state” so it makes template instantiation quadratic in the number of instantiations. In the case of clang, it uses a hash table on the template arguments, so it gets amortized O(1) lookup for instantiations.

So, that’s why clang is so fast with template instantiation, I hope we’ll see that optimization used in other compilers soon because with libraries such as boost:: phoenix/spirit/fusion/proto making extensive use of template metaprogramming, compile time is getting more important than ever. As for Clang, even if its C++ support is still incomplete, it already starts to shine and is ahead of the competition in some areas.

Keep up the good work clang developers 🙂

Hello,

Extract from the GCC page about the next version (4.5) :

“Compilation time for code that uses templates should now scale linearly with the number of instantiations rather than quadratically, as template instantiations are now looked up using hash tables”.

Did you test this GCC 4.5 version ?

No I didn’t but that’s good news 🙂