Archive for the ‘Metaprogramming’ Category

But… but… It’s still compiling!

A few weeks ago, a friend posted an article on his blog about python being slow by comparing implementations of the 8 queens puzzle in different languages using a recursive approach. Then just for fun, I decided to code the compile-time version with C++. I was also curious about the time it would take for the compiler to complete the calculation. Here’s the code:

#include <iostream>

typedef unsigned long long uint64_t;

template <uint64_t Cols = 0, uint64_t Diag135 = 0, uint64_t Diag45 = 0, uint64_t Solution = 0>
struct state
{
    static uint64_t const cols = Cols;
    static uint64_t const diag135 = Diag135;
    static uint64_t const diag45 = Diag45;
    static uint64_t const solution = Solution;
};

template <int K, int J, typename State>
struct test
{
    static bool const result = ((State::cols    & (1ull << J)) +
                                (State::diag135 & (1ull << (J + K))) +
                                (State::diag45  & (1ull << (32 + J - K)))) == 0;
};

template <int K, int J, typename State>
struct mark
{
    typedef state
            <
                State::cols    ^ (1ull << J),
                State::diag135 ^ (1ull << (J + K)),
                State::diag45  ^ (1ull << (32 + J - K)),
                State::solution
            > state_type;
};

template <int StartValue, int Times, typename State>
struct accumulate_result
{
    typedef typename
        accumulate_result
        <
            StartValue + 1,
            Times - 1,
            state
            <
                State::cols,
                State::diag135,
                State::diag45,
                State::solution + test<0, StartValue, State>::result
            >
        >::state_type state_type;
};

template <int StartValue, typename State>
struct accumulate_result<StartValue, 0, State>
{
    typedef State state_type;
};

template <int Niv, int Dx, typename State>
struct solve;

template <bool Condition, typename State, int Current, int Niv, int Dx>
struct result_from_test
{
    typedef typename
        mark
        <
            Niv,
            Current,
            typename solve
            <
                Niv - 1,
                Dx,
                typename mark<Niv, Current, State>::state_type
            >::state_type
        >::state_type state_type;
};

template <typename State, int Current, int Niv, int Dx>
struct result_from_test<false, State, Current, Niv, Dx>
{
    typedef State state_type;
};

template <int Begin, int Times, typename State, int Niv, int Dx>
struct process_queens
{
    typedef typename
        process_queens
        <
            Begin + 1,
            Times - 1,
            typename result_from_test
            <
                test<Niv, Begin, State>::result,
                State,
                Begin,
                Niv,
                Dx
            >::state_type,
            Niv,
            Dx
        >::state_type state_type;
};

template <int Begin, typename State, int Niv, int Dx>
struct process_queens<Begin, 0, State, Niv, Dx>
{
    typedef State state_type;
};

template <int Niv, int Dx, typename State = state<> >
struct solve
{
    typedef typename
        process_queens<0, Dx, State, Niv, Dx>::state_type state_type;
    static uint64_t const result = state_type::solution;
};

template <int Dx, typename State>
struct solve<0, Dx, State>
{
    typedef typename
        accumulate_result<0, Dx, State>::state_type state_type;
    static uint64_t const result = state_type::solution;
};

template <int Dx>
struct meta_queens:
    solve<Dx - 1, Dx>
{};

int main()
{
    std::cout << meta_queens<8>::result;
}

That’s a good stress-test for the compiler’s template instantiation mechanism, below are the compilation time for N=2 to N=8 using gcc 4.4. I simply used the time command on the shell to get the results.

N = 2: 0m0.206s
N = 3: 0m0.209s
N = 4: 0m0.216s
N = 5: 0m0.260s
N = 6: 0m0.559s
N = 7: 0m7.545s
N = 8: 5m38.229s

Well, I was expecting it to be slow but to be honest, not that slow. We’re in 2010, with libraries like boost, template metaprogramming isn’t just for academic purposes anymore. At first, I thought I might have used a buggy version of gcc so I tried another one, gcc 4.2, but same results. Then I tried the same code on Visual C++ 2008 and the latest Visual C++ 2010 which is still Beta 2 but both had similar results. Finally I tried compiling it with clang, even though I wasn’t sure it would compile my code, because I’ve been following its development for the last few months and I have been very impressed so far. As you will see with the following results, there’s still hope.

N = 2: 0m0.212s
N = 3: 0m0.214s
N = 4: 0m0.219s
N = 5: 0m0.240s
N = 6: 0m0.319s
N = 7: 0m0.653s
N = 8: 0m2.164s
N = 9: 0m9.190s

Wow, it’s like day and night, clang is a lot faster. I even tried it with N=10, it took a fantastic 8 hours and 38 minutes but that result is flawed since I don’t have enough RAM(4GB on the test machine) and the hard drive was excessively swapping. Let’s say I wouldn’t try it with gcc or cl.exe (Visual C++).

Why is it so slow with gcc and Visual C++? I had the opportunity to get an answer from Douglas Gregor, a clang developer. He said that he doesn’t know about Visual C++ but for gcc, every time we name the type state<Cols, Diag135, Diag45, Solution> for some values Cols, Diag135, Diag45 and Solution, it does a linear search through a list of every instantiation of “state” so it makes template instantiation quadratic in the number of instantiations. In the case of clang, it uses a hash table on the template arguments, so it gets amortized O(1) lookup for instantiations.

So, that’s why clang is so fast with template instantiation, I hope we’ll see that optimization used in other compilers soon because with libraries such as boost:: phoenix/spirit/fusion/proto making extensive use of template metaprogramming, compile time is getting more important than ever.  As for Clang, even if its C++ support is still incomplete, it already starts to shine and is ahead of the competition in some areas.

Keep up the good work clang developers 🙂

Advertisements

Understanding SFINAE

The SFINAE Gnomes

What is SFINAE and why do we care? SFINAE(Substitution Failure Is Not An Error) is an acronym that was first introduced by David Vandevoorde to describe the compiler’s process of eliding from the overload resolution set any template where a type substitution error would occur. Let’s take  a look at the following program I shamelessly stole from boost’s documentation:

int negate(int i){ return -i; }
template <typename T>
typename T::result_type negate(T const &t)
{
    return -t();
}

Let’s suppose you call negate(10), even though the first negate would be a good match, the compiler has to also take into consideration the second templated version and try to instantiate it with int, generating the following code:

int::result_type negate(int const &t)
{
    return -t();
}

Without SFINAE, the generated code would cause a compiler error since “int” doesn’t have a “result_type” member but fortunately we do, so the problematic overload is silently ignored by the compiler. As you will see, there are a few interesting and unexpected ways to use SFINAE. In the following example, we’ll check if a type is a container by checking if it has a iterator member with SFINAE via a technique called “the sizeof trick”.

First we need a type of some size, let’s take char:

typedef char true_type;

Then we need a type of some other size, it could be anything as long as its size is greater than the other we chosen before. An easy way to ensure it is to simply make a structure containing an array of more than one of the other type.

struct false_type{ true_type _[2]; };

Now, we need a function with two overloads, one that will be called if the type passed contains an iterator, returning our true_type and another one that will be called when it doesn’t, obviously returning our false_type.

template <typename T>
true_type has_iterator_checker(typename T::iterator *);

template <typename T>
false_type has_iterator_checker(...);

In the preceding code snippet, our good match is a function taking a iterator pointer as parameter and returning our true_type. Our worst match is the last match possible by the C++ conversion rules, the evil ellipsis, returning our false_type. As you probably already remarked, you just need the function signatures because those functions are actually never going to be called since everything is done at compilation. Now, it’s time to assemble the parts of the puzzle:

#include <iostream>
#include <vector>

typedef char true_type;
struct false_type{ true_type _[2]; };

template <typename T>
true_type has_iterator_checker(typename T::iterator *);

template <typename T>
false_type has_iterator_checker(...);

#define IS_CONTAINER(x) (sizeof(has_iterator_checker<x >(0)) == sizeof(true_type))

int main()
{
    std::cout << "Is container (int): "
              << IS_CONTAINER(int) << std::endl;
    std::cout << "Is container (std::vector<int>): "
              << IS_CONTAINER(std::vector<int>) << std::endl;
}

The only thing we haven’t seen yet in this code is the IS_CONTAINER macro. It compares the size of the type returned by the most compatible overload chosen by the compiler to our true_type. If the chosen overload returns true_type then our type is a container. We can now check if a type is a container at compile time. Our current solution is nice and functional but could really use better packaging. I’ll spare you the details and simply paste the code below:

template <typename T>
class is_container
{
    typedef char true_type;
    struct false_type{ true_type _[2]; };
    template <typename U>
    static true_type has_iterator_checker(typename U::iterator *);
    template <typename U>
    static false_type has_iterator_checker(...);
public:
    enum { value = (sizeof(has_iterator_checker<T>(0)) ==
                    sizeof(true_type)) };
};

Hurray! No more macros and we can easily check if a type is a container by simply doing is_container<my_type>::value. We’re now the proud owners of a nice little is_container class and of a bit more knowledge about SFINAE. Time to put that class to good use, but before, we will need another SFINAE trick to enable or disable code based on a condition. A very simple code that opens up a lot of possibilities:

template <bool Cond, class T = void>
struct enable_if
{
    typedef T type;
};

template <class T>
struct enable_if<false, T>
{};

To help explaining how it works, here’s a small usage example:

template <typename T>
typename enable_if<some_boolean_condition>::type
    foo(T const &t)
{
    std::cout << t << std::endl;
}

The enable_if template takes two parameters, a boolean condition and a return type, which is “void” by default, that is simply the one the function would return if we didn’t use the template. What’s important in that template and where the magic happens is the ::type part. If “some_boolean_condition” is true, then ::type returns the return type specified and the function is instantiated normally. If “some_boolean_condition” is false, the specialization enable_if<false, T> will be used and since it doesn’t have a ::type member, a substitution failure will occur causing the compiler to silently ignore the function.

With these new SFINAE based tools in our toolbox, we can easily make static dispatching based on the property of a type as you will see in the next example:

#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>

/*enable_if and is_container should be here*/

template <typename T>
typename enable_if<!is_container<T>::value>::type
    super_print(T const &t)
{
    std::cout << t << std::endl;
}

template <typename T>
typename enable_if<is_container<T>::value>::type
    super_print(T const &t)
{
    typedef typename T::value_type value_type;
    std::copy(t.begin(),
              t.end(),
              std::ostream_iterator<value_type>(std::cout, ", "));
    std::cout << std::endl;
}

int main()
{
    super_print(10);
    std::vector<int> b;
    b.push_back(1);
    b.push_back(2);
    b.push_back(3);
    super_print(b);
}

That simple code resumes everything we learned so far to allow easy printing of variables and containers. Without using enable_if, there would be an ambiguity since they both have the signature “void(T const &)” but having enable_if, we can enable or disable either template overload depending on the fact that it contains an “iterator” member or not, solving the issue.

That’s it for today, as you saw, SFINAE is yet another of C++’s hidden powers giving you the opportunity to make things that the language wasn’t designed for.