## Archive for the ‘Algorithm’ Category

### But… but… It’s still compiling!

A few weeks ago, a friend posted an article on his blog about python being slow by comparing implementations of the 8 queens puzzle in different languages using a recursive approach. Then just for fun, I decided to code the compile-time version with C++. I was also curious about the time it would take for the compiler to complete the calculation. Here’s the code:

```#include <iostream>

typedef unsigned long long uint64_t;

template <uint64_t Cols = 0, uint64_t Diag135 = 0, uint64_t Diag45 = 0, uint64_t Solution = 0>
struct state
{
static uint64_t const cols = Cols;
static uint64_t const diag135 = Diag135;
static uint64_t const diag45 = Diag45;
static uint64_t const solution = Solution;
};

template <int K, int J, typename State>
struct test
{
static bool const result = ((State::cols    & (1ull << J)) +
(State::diag135 & (1ull << (J + K))) +
(State::diag45  & (1ull << (32 + J - K)))) == 0;
};

template <int K, int J, typename State>
struct mark
{
typedef state
<
State::cols    ^ (1ull << J),
State::diag135 ^ (1ull << (J + K)),
State::diag45  ^ (1ull << (32 + J - K)),
State::solution
> state_type;
};

template <int StartValue, int Times, typename State>
struct accumulate_result
{
typedef typename
accumulate_result
<
StartValue + 1,
Times - 1,
state
<
State::cols,
State::diag135,
State::diag45,
State::solution + test<0, StartValue, State>::result
>
>::state_type state_type;
};

template <int StartValue, typename State>
struct accumulate_result<StartValue, 0, State>
{
typedef State state_type;
};

template <int Niv, int Dx, typename State>
struct solve;

template <bool Condition, typename State, int Current, int Niv, int Dx>
struct result_from_test
{
typedef typename
mark
<
Niv,
Current,
typename solve
<
Niv - 1,
Dx,
typename mark<Niv, Current, State>::state_type
>::state_type
>::state_type state_type;
};

template <typename State, int Current, int Niv, int Dx>
struct result_from_test<false, State, Current, Niv, Dx>
{
typedef State state_type;
};

template <int Begin, int Times, typename State, int Niv, int Dx>
struct process_queens
{
typedef typename
process_queens
<
Begin + 1,
Times - 1,
typename result_from_test
<
test<Niv, Begin, State>::result,
State,
Begin,
Niv,
Dx
>::state_type,
Niv,
Dx
>::state_type state_type;
};

template <int Begin, typename State, int Niv, int Dx>
struct process_queens<Begin, 0, State, Niv, Dx>
{
typedef State state_type;
};

template <int Niv, int Dx, typename State = state<> >
struct solve
{
typedef typename
process_queens<0, Dx, State, Niv, Dx>::state_type state_type;
static uint64_t const result = state_type::solution;
};

template <int Dx, typename State>
struct solve<0, Dx, State>
{
typedef typename
accumulate_result<0, Dx, State>::state_type state_type;
static uint64_t const result = state_type::solution;
};

template <int Dx>
struct meta_queens:
solve<Dx - 1, Dx>
{};

int main()
{
std::cout << meta_queens<8>::result;
}
```

That’s a good stress-test for the compiler’s template instantiation mechanism, below are the compilation time for N=2 to N=8 using gcc 4.4. I simply used the time command on the shell to get the results.

N = 2: 0m0.206s
N = 3: 0m0.209s
N = 4: 0m0.216s
N = 5: 0m0.260s
N = 6: 0m0.559s
N = 7: 0m7.545s
N = 8: 5m38.229s

Well, I was expecting it to be slow but to be honest, not that slow. We’re in 2010, with libraries like boost, template metaprogramming isn’t just for academic purposes anymore. At first, I thought I might have used a buggy version of gcc so I tried another one, gcc 4.2, but same results. Then I tried the same code on Visual C++ 2008 and the latest Visual C++ 2010 which is still Beta 2 but both had similar results. Finally I tried compiling it with clang, even though I wasn’t sure it would compile my code, because I’ve been following its development for the last few months and I have been very impressed so far. As you will see with the following results, there’s still hope.

N = 2: 0m0.212s
N = 3: 0m0.214s
N = 4: 0m0.219s
N = 5: 0m0.240s
N = 6: 0m0.319s
N = 7: 0m0.653s
N = 8: 0m2.164s
N = 9: 0m9.190s

Wow, it’s like day and night, clang is a lot faster. I even tried it with N=10, it took a fantastic 8 hours and 38 minutes but that result is flawed since I don’t have enough RAM(4GB on the test machine) and the hard drive was excessively swapping. Let’s say I wouldn’t try it with gcc or cl.exe (Visual C++).

Why is it so slow with gcc and Visual C++? I had the opportunity to get an answer from Douglas Gregor, a clang developer. He said that he doesn’t know about Visual C++ but for gcc, every time we name the type state<Cols, Diag135, Diag45, Solution> for some values Cols, Diag135, Diag45 and Solution, it does a linear search through a list of every instantiation of “state” so it makes template instantiation quadratic in the number of instantiations. In the case of clang, it uses a hash table on the template arguments, so it gets amortized O(1) lookup for instantiations.

So, that’s why clang is so fast with template instantiation, I hope we’ll see that optimization used in other compilers soon because with libraries such as boost:: phoenix/spirit/fusion/proto making extensive use of template metaprogramming, compile time is getting more important than ever.  As for Clang, even if its C++ support is still incomplete, it already starts to shine and is ahead of the competition in some areas.

Keep up the good work clang developers