## Archive for the ‘Algorithm’ Category

### But… but… It’s still compiling!

Filed under: Algorithm, boost, C++, Metaprogramming, Programming, template |

**Comments (2)**

A few weeks ago, a friend posted an article on his blog about python being slow by comparing implementations of the 8 queens puzzle in different languages using a recursive approach. Then just for fun, I decided to code the compile-time version with C++. I was also curious about the time it would take for the compiler to complete the calculation. Here’s the code:

#include <iostream> typedef unsigned long long uint64_t; template <uint64_t Cols = 0, uint64_t Diag135 = 0, uint64_t Diag45 = 0, uint64_t Solution = 0> struct state { static uint64_t const cols = Cols; static uint64_t const diag135 = Diag135; static uint64_t const diag45 = Diag45; static uint64_t const solution = Solution; }; template <int K, int J, typename State> struct test { static bool const result = ((State::cols & (1ull << J)) + (State::diag135 & (1ull << (J + K))) + (State::diag45 & (1ull << (32 + J - K)))) == 0; }; template <int K, int J, typename State> struct mark { typedef state < State::cols ^ (1ull << J), State::diag135 ^ (1ull << (J + K)), State::diag45 ^ (1ull << (32 + J - K)), State::solution > state_type; }; template <int StartValue, int Times, typename State> struct accumulate_result { typedef typename accumulate_result < StartValue + 1, Times - 1, state < State::cols, State::diag135, State::diag45, State::solution + test<0, StartValue, State>::result > >::state_type state_type; }; template <int StartValue, typename State> struct accumulate_result<StartValue, 0, State> { typedef State state_type; }; template <int Niv, int Dx, typename State> struct solve; template <bool Condition, typename State, int Current, int Niv, int Dx> struct result_from_test { typedef typename mark < Niv, Current, typename solve < Niv - 1, Dx, typename mark<Niv, Current, State>::state_type >::state_type >::state_type state_type; }; template <typename State, int Current, int Niv, int Dx> struct result_from_test<false, State, Current, Niv, Dx> { typedef State state_type; }; template <int Begin, int Times, typename State, int Niv, int Dx> struct process_queens { typedef typename process_queens < Begin + 1, Times - 1, typename result_from_test < test<Niv, Begin, State>::result, State, Begin, Niv, Dx >::state_type, Niv, Dx >::state_type state_type; }; template <int Begin, typename State, int Niv, int Dx> struct process_queens<Begin, 0, State, Niv, Dx> { typedef State state_type; }; template <int Niv, int Dx, typename State = state<> > struct solve { typedef typename process_queens<0, Dx, State, Niv, Dx>::state_type state_type; static uint64_t const result = state_type::solution; }; template <int Dx, typename State> struct solve<0, Dx, State> { typedef typename accumulate_result<0, Dx, State>::state_type state_type; static uint64_t const result = state_type::solution; }; template <int Dx> struct meta_queens: solve<Dx - 1, Dx> {}; int main() { std::cout << meta_queens<8>::result; }

That’s a good stress-test for the compiler’s template instantiation mechanism, below are the compilation time for N=2 to N=8 using gcc 4.4. I simply used the time command on the shell to get the results.

N = 2: 0m0.206s

N = 3: 0m0.209s

N = 4: 0m0.216s

N = 5: 0m0.260s

N = 6: 0m0.559s

N = 7: 0m7.545s

N = 8: 5m38.229s

Well, I was expecting it to be slow but to be honest, not that slow. We’re in 2010, with libraries like boost, template metaprogramming isn’t just for academic purposes anymore. At first, I thought I might have used a buggy version of gcc so I tried another one, gcc 4.2, but same results. Then I tried the same code on Visual C++ 2008 and the latest Visual C++ 2010 which is still Beta 2 but both had similar results. Finally I tried compiling it with clang, even though I wasn’t sure it would compile my code, because I’ve been following its development for the last few months and I have been very impressed so far. As you will see with the following results, there’s still hope.

N = 2: 0m0.212s

N = 3: 0m0.214s

N = 4: 0m0.219s

N = 5: 0m0.240s

N = 6: 0m0.319s

N = 7: 0m0.653s

N = 8: 0m2.164s

N = 9: 0m9.190s

Wow, it’s like day and night, clang is a lot faster. I even tried it with N=10, it took a fantastic 8 hours and 38 minutes but that result is flawed since I don’t have enough RAM(4GB on the test machine) and the hard drive was excessively swapping. Let’s say I wouldn’t try it with gcc or cl.exe (Visual C++).

Why is it so slow with gcc and Visual C++? I had the opportunity to get an answer from Douglas Gregor, a clang developer. He said that he doesn’t know about Visual C++ but for gcc, every time we name the type *state<Cols, Diag135, Diag45, Solution>* for some values *Cols*, *Diag135*, *Diag45* and *Solution*, it does a linear search through a list of every instantiation of “state” so it makes template instantiation quadratic in the number of instantiations. In the case of clang, it uses a hash table on the template arguments, so it gets amortized O(1) lookup for instantiations.

So, that’s why clang is so fast with template instantiation, I hope we’ll see that optimization used in other compilers soon because with libraries such as boost:: phoenix/spirit/fusion/proto making extensive use of template metaprogramming, compile time is getting more important than ever. As for Clang, even if its C++ support is still incomplete, it already starts to shine and is ahead of the competition in some areas.

Keep up the good work clang developers 🙂