But… but… It’s still compiling!

A few weeks ago, a friend posted an article on his blog about python being slow by comparing implementations of the 8 queens puzzle in different languages using a recursive approach. Then just for fun, I decided to code the compile-time version with C++. I was also curious about the time it would take for the compiler to complete the calculation. Here’s the code:

#include <iostream>

typedef unsigned long long uint64_t;

template <uint64_t Cols = 0, uint64_t Diag135 = 0, uint64_t Diag45 = 0, uint64_t Solution = 0>
struct state
{
    static uint64_t const cols = Cols;
    static uint64_t const diag135 = Diag135;
    static uint64_t const diag45 = Diag45;
    static uint64_t const solution = Solution;
};

template <int K, int J, typename State>
struct test
{
    static bool const result = ((State::cols    & (1ull << J)) +
                                (State::diag135 & (1ull << (J + K))) +
                                (State::diag45  & (1ull << (32 + J - K)))) == 0;
};

template <int K, int J, typename State>
struct mark
{
    typedef state
            <
                State::cols    ^ (1ull << J),
                State::diag135 ^ (1ull << (J + K)),
                State::diag45  ^ (1ull << (32 + J - K)),
                State::solution
            > state_type;
};

template <int StartValue, int Times, typename State>
struct accumulate_result
{
    typedef typename
        accumulate_result
        <
            StartValue + 1,
            Times - 1,
            state
            <
                State::cols,
                State::diag135,
                State::diag45,
                State::solution + test<0, StartValue, State>::result
            >
        >::state_type state_type;
};

template <int StartValue, typename State>
struct accumulate_result<StartValue, 0, State>
{
    typedef State state_type;
};

template <int Niv, int Dx, typename State>
struct solve;

template <bool Condition, typename State, int Current, int Niv, int Dx>
struct result_from_test
{
    typedef typename
        mark
        <
            Niv,
            Current,
            typename solve
            <
                Niv - 1,
                Dx,
                typename mark<Niv, Current, State>::state_type
            >::state_type
        >::state_type state_type;
};

template <typename State, int Current, int Niv, int Dx>
struct result_from_test<false, State, Current, Niv, Dx>
{
    typedef State state_type;
};

template <int Begin, int Times, typename State, int Niv, int Dx>
struct process_queens
{
    typedef typename
        process_queens
        <
            Begin + 1,
            Times - 1,
            typename result_from_test
            <
                test<Niv, Begin, State>::result,
                State,
                Begin,
                Niv,
                Dx
            >::state_type,
            Niv,
            Dx
        >::state_type state_type;
};

template <int Begin, typename State, int Niv, int Dx>
struct process_queens<Begin, 0, State, Niv, Dx>
{
    typedef State state_type;
};

template <int Niv, int Dx, typename State = state<> >
struct solve
{
    typedef typename
        process_queens<0, Dx, State, Niv, Dx>::state_type state_type;
    static uint64_t const result = state_type::solution;
};

template <int Dx, typename State>
struct solve<0, Dx, State>
{
    typedef typename
        accumulate_result<0, Dx, State>::state_type state_type;
    static uint64_t const result = state_type::solution;
};

template <int Dx>
struct meta_queens:
    solve<Dx - 1, Dx>
{};

int main()
{
    std::cout << meta_queens<8>::result;
}

That’s a good stress-test for the compiler’s template instantiation mechanism, below are the compilation time for N=2 to N=8 using gcc 4.4. I simply used the time command on the shell to get the results.

N = 2: 0m0.206s
N = 3: 0m0.209s
N = 4: 0m0.216s
N = 5: 0m0.260s
N = 6: 0m0.559s
N = 7: 0m7.545s
N = 8: 5m38.229s

Well, I was expecting it to be slow but to be honest, not that slow. We’re in 2010, with libraries like boost, template metaprogramming isn’t just for academic purposes anymore. At first, I thought I might have used a buggy version of gcc so I tried another one, gcc 4.2, but same results. Then I tried the same code on Visual C++ 2008 and the latest Visual C++ 2010 which is still Beta 2 but both had similar results. Finally I tried compiling it with clang, even though I wasn’t sure it would compile my code, because I’ve been following its development for the last few months and I have been very impressed so far. As you will see with the following results, there’s still hope.

N = 2: 0m0.212s
N = 3: 0m0.214s
N = 4: 0m0.219s
N = 5: 0m0.240s
N = 6: 0m0.319s
N = 7: 0m0.653s
N = 8: 0m2.164s
N = 9: 0m9.190s

Wow, it’s like day and night, clang is a lot faster. I even tried it with N=10, it took a fantastic 8 hours and 38 minutes but that result is flawed since I don’t have enough RAM(4GB on the test machine) and the hard drive was excessively swapping. Let’s say I wouldn’t try it with gcc or cl.exe (Visual C++).

Why is it so slow with gcc and Visual C++? I had the opportunity to get an answer from Douglas Gregor, a clang developer. He said that he doesn’t know about Visual C++ but for gcc, every time we name the type state<Cols, Diag135, Diag45, Solution> for some values Cols, Diag135, Diag45 and Solution, it does a linear search through a list of every instantiation of “state” so it makes template instantiation quadratic in the number of instantiations. In the case of clang, it uses a hash table on the template arguments, so it gets amortized O(1) lookup for instantiations.

So, that’s why clang is so fast with template instantiation, I hope we’ll see that optimization used in other compilers soon because with libraries such as boost:: phoenix/spirit/fusion/proto making extensive use of template metaprogramming, compile time is getting more important than ever.  As for Clang, even if its C++ support is still incomplete, it already starts to shine and is ahead of the competition in some areas.

Keep up the good work clang developers :)

Is std::string’s storage contiguous?

Take a look at the following code:

#include <cstddef>
#include <cstring>
#include <iostream>
#include <string>

/* Dummy recv function */
size_t recv(int socket, void *buffer, size_t length, int flags)
{
 std::memcpy(buffer, "Some stuff", 10);
 return 10;
}

int main()
{
 //Create a string with space for some characters
 std::string x(64, char());

 std::cout << "x.size() = " << x.size() << ", x.capacity() = "
           << x.capacity() << std::endl;

 //filling the string with data assuming that &x[0]
 //is pointing at the beginning a contiguous array
 std::size_t bytes_read = recv(0, &x[0], x.size(), 0);

 //Using the good old swap trick to free excess space
 std::string(&x[0], bytes_read).swap(x);

 std::cout << "x.size() = " << x.size() << ", x.capacity() = "
           << x.capacity() << std::endl;
 std::cout << x << std::endl;
}

Is that code… valid? I always thought it wasn’t since I heard lots of people telling me that storage for a std::string isn’t guaranteed to be contiguous (I’ve yet to see an implementation with storage that isn’t) so the preceding code would possibly cause undefined behavior. A few days ago, I was on Freenode’s ##C++ when some guy asked the same question, I was responding him with the  classical answer… when another guy “Tinodidriksen” replied that I was wrong. What?!? What if he’s right? I mean, I’ve never actually looked at the standard about it, I only took the usual answer as being the truth. Let’s see what the standard says about it.

1) basic_string constructor requirements(See tables 38-43 in the 14882:2003 standard):
Excerpt from table 39: “data() points at the first element of an allocated copy of rlen consecutive elements of the string controlled by str beginning at position pos”, rlen being equal to size() according to the same table.

2) 21.3.4 paragraph 1, basic_string indexed access:
Returns: If pos < size(), returns data()[pos]. Otherwise, if pos == size(), the const version returns charT().

3) 21.3.6 paragraph 4, const charT* data() const:
Requires: The program shall not alter any of the values stored in the character array.

Well, these are the three relevant excerpts I found, the first one guarantees that the storage is contiguous because it’s allocated in a single block of X consecutive elements. The second proves that when you modify the string using the index operator, it modifies the buffer pointed by data(). (data() in that specific sentence is probably meaning the pointer to the underlying storage and not the data() member function itself since using it would force the implementor to use an ugly const cast and would violate the third and last point).

My conclusion, the standard does not say explicitly “std::basic_string must have contiguous storage” like std::vector (ISO/IEC 14882:2003 only) but there are enough constraints available to state that it can’t be otherwise. If I missed something or you have a different opinion, I’d love to hear it.

Understanding SFINAE

The SFINAE Gnomes

What is SFINAE and why do we care? SFINAE(Substitution Failure Is Not An Error) is an acronym that was first introduced by David Vandevoorde to describe the compiler’s process of eliding from the overload resolution set any template where a type substitution error would occur. Let’s take  a look at the following program I shamelessly stole from boost’s documentation:

int negate(int i){ return -i; }
template <typename T>
typename T::result_type negate(T const &t)
{
    return -t();
}

Let’s suppose you call negate(10), even though the first negate would be a good match, the compiler has to also take into consideration the second templated version and try to instantiate it with int, generating the following code:

int::result_type negate(int const &t)
{
    return -t();
}

Without SFINAE, the generated code would cause a compiler error since “int” doesn’t have a “result_type” member but fortunately we do, so the problematic overload is silently ignored by the compiler. As you will see, there are a few interesting and unexpected ways to use SFINAE. In the following example, we’ll check if a type is a container by checking if it has a iterator member with SFINAE via a technique called “the sizeof trick”.

First we need a type of some size, let’s take char:

typedef char true_type;

Then we need a type of some other size, it could be anything as long as its size is greater than the other we chosen before. An easy way to ensure it is to simply make a structure containing an array of more than one of the other type.

struct false_type{ true_type _[2]; };

Now, we need a function with two overloads, one that will be called if the type passed contains an iterator, returning our true_type and another one that will be called when it doesn’t, obviously returning our false_type.

template <typename T>
true_type has_iterator_checker(typename T::iterator *);

template <typename T>
false_type has_iterator_checker(...);

In the preceding code snippet, our good match is a function taking a iterator pointer as parameter and returning our true_type. Our worst match is the last match possible by the C++ conversion rules, the evil ellipsis, returning our false_type. As you probably already remarked, you just need the function signatures because those functions are actually never going to be called since everything is done at compilation. Now, it’s time to assemble the parts of the puzzle:

#include <iostream>
#include <vector>

typedef char true_type;
struct false_type{ true_type _[2]; };

template <typename T>
true_type has_iterator_checker(typename T::iterator *);

template <typename T>
false_type has_iterator_checker(...);

#define IS_CONTAINER(x) (sizeof(has_iterator_checker<x >(0)) == sizeof(true_type))

int main()
{
    std::cout << "Is container (int): "
              << IS_CONTAINER(int) << std::endl;
    std::cout << "Is container (std::vector<int>): "
              << IS_CONTAINER(std::vector<int>) << std::endl;
}

The only thing we haven’t seen yet in this code is the IS_CONTAINER macro. It compares the size of the type returned by the most compatible overload chosen by the compiler to our true_type. If the chosen overload returns true_type then our type is a container. We can now check if a type is a container at compile time. Our current solution is nice and functional but could really use better packaging. I’ll spare you the details and simply paste the code below:

template <typename T>
class is_container
{
    typedef char true_type;
    struct false_type{ true_type _[2]; };
    template <typename U>
    static true_type has_iterator_checker(typename U::iterator *);
    template <typename U>
    static false_type has_iterator_checker(...);
public:
    enum { value = (sizeof(has_iterator_checker<T>(0)) ==
                    sizeof(true_type)) };
};

Hurray! No more macros and we can easily check if a type is a container by simply doing is_container<my_type>::value. We’re now the proud owners of a nice little is_container class and of a bit more knowledge about SFINAE. Time to put that class to good use, but before, we will need another SFINAE trick to enable or disable code based on a condition. A very simple code that opens up a lot of possibilities:

template <bool Cond, class T = void>
struct enable_if
{
    typedef T type;
};

template <class T>
struct enable_if<false, T>
{};

To help explaining how it works, here’s a small usage example:

template <typename T>
typename enable_if<some_boolean_condition>::type
    foo(T const &t)
{
    std::cout << t << std::endl;
}

The enable_if template takes two parameters, a boolean condition and a return type, which is “void” by default, that is simply the one the function would return if we didn’t use the template. What’s important in that template and where the magic happens is the ::type part. If “some_boolean_condition” is true, then ::type returns the return type specified and the function is instantiated normally. If “some_boolean_condition” is false, the specialization enable_if<false, T> will be used and since it doesn’t have a ::type member, a substitution failure will occur causing the compiler to silently ignore the function.

With these new SFINAE based tools in our toolbox, we can easily make static dispatching based on the property of a type as you will see in the next example:

#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>

/*enable_if and is_container should be here*/

template <typename T>
typename enable_if<!is_container<T>::value>::type
    super_print(T const &t)
{
    std::cout << t << std::endl;
}

template <typename T>
typename enable_if<is_container<T>::value>::type
    super_print(T const &t)
{
    typedef typename T::value_type value_type;
    std::copy(t.begin(),
              t.end(),
              std::ostream_iterator<value_type>(std::cout, ", "));
    std::cout << std::endl;
}

int main()
{
    super_print(10);
    std::vector<int> b;
    b.push_back(1);
    b.push_back(2);
    b.push_back(3);
    super_print(b);
}

That simple code resumes everything we learned so far to allow easy printing of variables and containers. Without using enable_if, there would be an ambiguity since they both have the signature “void(T const &)” but having enable_if, we can enable or disable either template overload depending on the fact that it contains an “iterator” member or not, solving the issue.

That’s it for today, as you saw, SFINAE is yet another of C++’s hidden powers giving you the opportunity to make things that the language wasn’t designed for.

Clang, a better compiler for C and its derivatives.

If there’s a project that caught my interest lately, it’s Clang, a front-end for the LLVM compiler created as a “drop-in” replacement for gcc. It’s a brand new, state of the art, modern, BSD-licensed open-source compiler that is a lot faster, optimizes more aggressively while using far less memory than the competition. For now, only the C(Yes, at last, a C99 compiler) and Objective C support is complete and are considered “production ready”. C++’s support is still only partial but the situation is improving at a very fast pace. See  http://clang.llvm.org/cxx_status.html for more information about it.

Having a compiler that produces good and optimized code is always welcome but… Is there anything left in it for the programmer? Yes, clang’s static analyzer provides the programmer with clearer, better diagnostic messages. It is also easier to integrate to IDEs and is already doing wonders in Apple’s XCode 3.2. Let’s see what it can do.

For a short example, see this ill-formed code:


#include <stdio.h>

typedef const char *LPCTSTR; /*Let's say they're ugly          */
typedef char *LPSTR;         /*typedefs from some other header */

void print(LPSTR toprint)
{
    printf("%s", toprint)    /*Omitted ';' on purpose*/
}

int main(void)
{
    LPCTSTR a = "Hello";
    LPCTSTR b = "World";
    LPCTSTR c = a + b; /*cannot concatenate strings with operator +*/
    print(c); /*Passing a const char * to a function taking a char * */
    return 0;
}

Let’s compare clang to gcc and cl (Visual C++). All three compiler detect two errors and a warning but let’s see how good are their diagnostics:

First error: Semi-colon missing at the end of line 8

GCC:
test.c: In function ‘print’:
test.c:9: error: expected ‘;’ before ‘}’ token
CL:
...\test.c(9) : error C2143: syntax error : missing ';' before '}'
CLang:
test.c:8:26: error: expected ';' after expression
    printf("%s", toprint)
                         ^
                         ;

Both gcc’s and cl’s diagnostic are wrong, the semi-colon should be placed at the end of line 8. On the other side, clang found that a semi-colon is missing at the 26th character of line 8 and even points it with ^, a really nice improvement.

Second Error: You can’t add two LPCTSTR(const char * in disguise)

GCC:
test.c: In function ‘main’:
test.c:15: error: invalid operands to binary + (have ‘LPCTSTR’ and
           ‘LPCTSTR’)
CL:
...\test.c(15) : error C2110: '+' : cannot add two pointers
CLang:
test.c:15:19: error: invalid operands to binary expression
              ('LPCTSTR' (aka 'char const *') and 'LPCTSTR'
              (aka 'char const *'))
    LPCTSTR c = a + b;
                ~ ^ ~

Now, all three compilers found the error but the diagnostics of gcc and cl are not really helpful, they only tell us that we cannot add two LPCTSTR but doesn’t give us a clue as to why. CLang’s diagnostic message is about the same but also shows us that LPCTSTRs are simply char pointers in disguise and where is the exact source of the problem.

Third Error: Passing a const char * to a function taking char * could be dangerous

GCC:
test.c:16: warning: passing argument 1 of ‘print’ discards
           qualifiers from pointer target type
test.c:6: note: expected ‘LPSTR’ but argument is of type ‘LPCTSTR’
CL:
.../test.c(16) : warning C4090: 'function' : different 'const' qualifiers
CLang:
test.c:16:11: warning: passing 'LPCTSTR' (aka 'char const *') discards
              qualifiers, expected 'LPSTR' (aka 'char *')
    print(c);
          ^

Now, gcc’s diagnostic is just plain vague, the fact is mentions “pointer target type” can give us a small hint that LPCTSTR might be a pointer but it’s not like “const” is the only possible qualifier in C. cl does a bit better telling us that it’s a constness issue but again, clang is clearly the winner by letting us know that LPCTSTR is in fact a char const * and showing us where is the problem.

Probably that some conservative “elitists” are going to say that gcc or cl diagnostics are fine the way they are but the way I see it, there’s always place for improvement and if a tool like clang can increase my productivity while generating fast and optimized code then… why not? I encourage everyone to at least give it a try( for C and Objective C ) and even contributing to it by improving it’s C++ support.

http://clang.llvm.org

C++ and functional programming idioms

If you’re curious like me, you probably ventured at least once in the scary and mind-bending world of functional programming, came back and told yourself: “It would be nice if I could do this or that in c++”. FP languages have been present for decades but only recently, we’ve been starting to see the adoption of some of their techniques in classical imperative languages… like higher order functions, closures/lambdas functions, currying and lazy evaluation. For example, Javascript supports closures since version 1.7 and C# from 3.0.

Seeing how useful these techniques are, it’s normal to want them in our favorite programming language. We’re already doing a bit a FP without knowing thanks to the standard library algorithms. Lots of them takes functors/predicates as arguments so they mimics fairly well the behavior of higher order functions. Besides that, C++ has no built-in support for other idioms like lambda functions or closures but we can achieve similar effects due to the lazy nature of templates and a technique known as “expression templates”. More on that technique on a future post…

To demonstrate my point, let’s take a small program that takes a string as input and return the most frequent character. In the old classical C++ way, it could be implemented as follow:

#include <iostream>
#include <locale>
#include <map>
#include <string>

namespace
{

    char most_frequent_letter(const std::string &str)
    {
        typedef std::map<char, unsigned int> char_counts_t;

        char_counts_t char_counts;

        for(std::string::const_iterator itr = str.begin();
                itr != str.end(); ++itr)
            if(std::isalpha(*itr, std::locale()))
                ++char_counts[*itr];

        for(char_counts_t::const_iterator itr = char_counts.begin();
                itr != char_counts.end(); ++itr)
            std::cout << itr->first << " => " << itr->second << std::endl;

        if(!char_counts.empty())
        {
            char_counts_t::const_iterator highest_count = char_counts.begin();
            for(char_counts_t::const_iterator itr = ++char_counts.begin();
                    itr != char_counts.end(); ++itr)
                if(itr->second > highest_count->second)
                    highest_count = itr;
            return highest_count->first;
        }
        return ' ';
    }

}

int main(int argc, char *argv[])
{
    if(argc > 1)
    {
        std::string some_string = argv[1];
        std::cout << "The string is: " << some_string << "\n" << std::endl;
        std::cout << "The most frequent letter is: " <<
            most_frequent_letter(some_string) << std::endl;
    }
    else
        std::cout << "Usage: " << argv[0] << " <string>" << std::endl;
}

So far so good, it works and does the job. We’re putting the characters in a map using the character as the key and the count as the value. Then, we print the content of the map and finally iterate through it to find the character with the highest value. The problems with this code is that we’re reinventing parts already in the standard library and that code lacks expressiveness. Let’s see how the code could look like if we used the standard algorithms.

namespace
{

    template <typename map_t>
    struct map_filler
    {
        typedef void result_type;

        map_filler(map_t &map):
            map_(map)
        {
        }
        template <typename T>
        result_type operator()(const T &t) const
        {
            if(std::isalpha(t, std::locale()))
                ++map_[t];
        }
    private:
        map_t &map_;
    };

    struct pair_printer
    {
        typedef void result_type;

        template <typename pair_t>
        result_type operator()(const pair_t &pair) const
        {
            std::cout << pair.first << " => " << pair.second << std::endl;
        }
    };

    struct pair_value_comparer
    {
        typedef bool result_type;

        template <typename pair_t>
        result_type operator()(const pair_t &a, const pair_t &b)
        {
            return a.second < b.second;
        }
    };

    char most_frequent_letter(const std::string &str)
    {
        typedef std::map<char, unsigned int> char_counts_t;

        char_counts_t char_counts;

        std::for_each(str.begin(), str.end(),
                map_filler<char_counts_t>(char_counts));

        std::for_each(char_counts.begin(), char_counts.end(),
                pair_printer());

        char_counts_t::const_iterator result = std::max_element(
                char_counts.begin(),
                char_counts.end(),
                pair_value_comparer());

        return (result != char_counts.end()) ? result->first : ' ';
    }

}

Hmm… Okay… Let’s see, our “most_frequent_letter” function is now using the standard library algorithms. It does make the function clearer and way more expressive but at the cost of around 40 lines of “support code” whose are in our case, functors. Even when thinking in terms of reusability, the chance of needing that same support code in the future is small, if not inexistant. What would we do in that case in a FP language? Use a small lambda functions/closure instead. For that example, I’m going to use boost::phoenix 2.0, an efficient FP library part of boost which is in my opinion the best general, multi-purpose C++ library and a must-have for any serious C++ programmer. Let’s see what phoenix can do:

namespace
{

	namespace phx = boost::phoenix;
	using namespace phx::arg_names;
	using namespace phx::local_names;
	using phx::at_c;

	char most_frequent_letter(const std::string &str)
	{
		typedef std::map<char, unsigned int> char_counts_t;

		char_counts_t char_counts;

		std::for_each(str.begin(), str.end(),
				phx::if_(phx::bind(std::isalpha<char>, _1,
						phx::construct<std::locale>()))
				[
					phx::let(_a = phx::ref(char_counts)[_1])
					[
						++_a
					]
				]);

		std::for_each(char_counts.begin(), char_counts.end(),
				std::cout << at_c<0>(_1) << " => " << at_c<1>(_1) << std::endl);

		char_counts_t::const_iterator result = std::max_element(
				char_counts.begin(), char_counts.end(),
				at_c<1>(_1) < at_c<1>(_2));

		return (result != char_counts.end()) ? result->first : ' ';
	}

}

I made a few using statements to make the code easier to understand. Let’s take a look the for_each statement, the 2 first arguments are the usual .begin() and .end() but then you see that strange if_ as the third argument. if_, like every phoenix statement, returns a functor object created at compile time via template composition (Expression templates). So with this library, you can create inline functors on the fly without the “support code” bloat. You can use your own functors as long as they’re lazy which means they don’t do anything before the operator () is called on them. Fortunately, the lib also provides wrapper for “normal” functions.

Now for that code, nothing much to say for the if_ statement, it’s just a lazy version of the classic if keyword. phx::bind is one of the included wrappers, it creates a lazy version of a function passed as the first argument binded with the arguments passed as additional parameters. _1 and _2 are placeholders, they’re the actual parameters passed by the algorithm to the functor and phx::construct returns a new object of the type passed as the template parameter. Knowing that, we can now understand that “phx::bind(std::isalpha, _1, phx::construct())” returns a lazy version of std::isalpha with the current argument from std::for_each binded as the first argument to std::isalpha and an object of type std::locale binded as the second. phx::let’s only purpose is to create scoped local variables. phx::ref returns a reference to the object passed as the parameter. phx::at_c is simple, on a std::pair phx::at_c returns .first and phx::at_c .second.

For more information, consult the boost::phoenix documentation.

With that new tool, we can now more easily than ever use C++ to imitate some FP idioms:

#include <algorithm>
#include <iostream>
#include <iterator>
#include <numeric>
#include <string>
#include <vector>

#include <boost/fusion/include/std_pair.hpp>
#include <boost/spirit/home/phoenix/bind.hpp>
#include <boost/spirit/home/phoenix/core.hpp>
#include <boost/spirit/home/phoenix/fusion.hpp>
#include <boost/spirit/home/phoenix/object.hpp>
#include <boost/spirit/home/phoenix/operator.hpp>
#include <boost/spirit/home/phoenix/scope.hpp>
#include <boost/spirit/home/phoenix/statement.hpp>

int main(int argc, char *argv[])
{
	namespace phx = boost::phoenix;
	using namespace phx::arg_names;
	using namespace phx::local_names;

	std::vector<int> input;
	input.push_back(1);
	input.push_back(2);
	input.push_back(3);
	input.push_back(4);
	input.push_back(5);
	//map( Make a new sequence with all the elements multiplied by 2 )
	std::transform(input.begin(),
			input.end(),
			std::ostream_iterator<int>(std::cout, ", "),
			_1 * 2);
	std::cout << std::endl;

	//filter( Make a new sequence containing all the odd numbers )
	std::remove_copy_if(input.begin(),
			input.end(),
			std::ostream_iterator<int>(std::cout, ", "),
			!(_1 % 2));
	std::cout << std::endl;

	//fold/reduce (Builds up and returns a value based on the sequence)
	//I use std::string here because it makes it easier to show what is
	//going on exactly.
	std::vector<std::string> words;
	words.push_back("H");
	words.push_back("e");
	words.push_back("l");
	words.push_back("l");
	words.push_back("o");

	//foldl
	std::string result = std::accumulate(words.begin(),
			words.end(),
			static_cast<std::string>(""),
			_1 + _2);
	std::cout << result << std::endl;
	//foldr
	result = std::accumulate(words.rbegin(),
			words.rend(),
			static_cast<std::string>(""),
			_1 + _2);
	std::cout << result << std::endl;
}

In a near future, we’ll probably see more FP techniques being applied to imperative languages because they can make the code cleaner and more expressive without penalties. In the case of C++, like we just saw, they can leverage existing standard library algorithms and make them more convenient.

C++’s typesafety to the rescue?

Today, i was reading a couple of blogs when i came across an interesting article from gnuvince about the capacity/incapacity of the compilers to reliably detect, catch and warn the programmer of possible errors which can be quite difficult in loosely/weakly typed programming languages such as C and C++. I was looking at his problematic code written in C to demonstrate the problem when i suddenly got an idea. “How much can C++ improve this situation by using its built-in facilities and stronger type checking?”.

First of all, here’s the original code:

#include <stdio.h>

void interleave(void **aggregated,
        void **array1,
        size_t size1,
        void **array2,
        size_t size2) {
    while (size1 && size2) {
        *aggregated++ = *array1++;
        *aggregated++ = *array2++;
        size1--;
        size2--;
    }
    while (size1) {
        *aggregated++ = *array1++;
        size1--;
    }
    while (size2) {
        *aggregated++ = *array2++;
        size2--;
    }
}

int main(void) {
    int xs[4] = {1, 2, 3, 4};
    double ys[4] = {1, 2, 3, 4}; /*   ???    */
    int zs[8];
    size_t i;

    interleave((void **)zs, (void **)xs, 4, (void **)ys, 4);
    for (i = 0; i < 8; ++i) {
        printf("%d ", zs[i]);
    }
    printf("\n");

    return 0;
}

The problem with this code is that it’s not taking in consideration the size of the elements, is dereferencing void pointers and does not warn if you do something wrong like passing 2 arrays of different types to the function. Still, it does show the problem because the code compiles silently without warnings. That code in a large project could cause hours of frustration and debugging.

A possible working implementation would be:

void interleave(void *aggregated,
        void *array1,
        size_t size1,
        void *array2,
        size_t size2, size_t size_of_elem) {

    uint8_t *buf1 = array1;
    uint8_t *buf2 = array2;
    uint8_t *aggr = aggregated;

    while (size1 && size2) {
        memcpy(aggr, buf1, size_of_elem);
        memcpy(aggr + size_of_elem, buf2, size_of_elem);
        aggr += 2 * size_of_elem;
        buf1 += size_of_elem;
        buf2 += size_of_elem;

        size1--;
        size2--;
    }
    if (size1)
        memcpy(aggr, buf1, size1 * size_of_elem);

    if (size2)
        memcpy(aggr, buf2, size2 * size_of_elem);
}

Now, that we have a working C version of that function, what more can we do to achieve or at least increase typesafety without literally killing reusability? Simply, nothing. We can easily conclude that C is not typesafe at all by design because genericity is only achieved at the cost of discarding type information so there’s no way any tool or compiler is going to help us there. Let the fun begins, let’s see what C++ can do in that situation in terms of typesafety and genericity.

The first anomaly a C++ programmer would see in the preceding code is the presence of those nasty and evil void pointers. Luckily, we can get rid of them by using templates.

Let’s make a first attempt to improve the typesafety of this code using c++:

template <typename T>
void interleave(T *aggregated,
        T *array1,
        size_t size1,
        T *array2,
        size_t size2) {
    while (size1 && size2) {
        *aggregated++ = *array1++;
        *aggregated++ = *array2++;
        size1--;
        size2--;
    }
    while (size1) {
        *aggregated++ = *array1++;
        size1--;
    }
    while (size2) {
        *aggregated++ = *array2++;
        size2--;
    }
}

Now, we managed to get typesafety by using templates, types are never discarded because we don’t use void pointers and the arrays passed must have the exact same type or else the compiler will scream with:

interleave.cpp: In function ‘int main()’:
interleave.cpp:30: error: no matching function for call to ‘interleave(int [10], int [6], int, double [4], int)’

Bingo, we did it, we have a typesafe, generic and reusable function. Oh Well, not that reusable since it only works with arrays. We could do far better, just look at how the standard library algorithms are implemented. They use an iterator interface and work with ranges. If we can do the same, our function will work with arrays and every standard library containers, now that’s real reusability.
Here’s the code modified to be adaptable to the iterator pattern:

template <typename FwdInItr1, typename FwdInItr2, typename FwdOutItr>
void interleave(FwdInItr1 seq_beg_1, FwdInItr1 seq_end_1,
        FwdInItr2 seq_beg_2, FwdInItr2 seq_end_2,
        FwdOutItr dest)
{
    while(seq_beg_1 != seq_end_1 && seq_beg_2 != seq_end_2)
    {
        *dest++ = *seq_beg_1++;
        *dest++ = *seq_beg_2++;
    }

    while(seq_beg_1 != seq_end_1)
        *dest++ = *seq_beg_1++;

    while(seq_beg_2 != seq_end_2)
        *dest++ = *seq_beg_2++;
}

Lets try to understand that code. I declare a template with three types of iterators, they can have different types but must be compatible else the code won’t compile. So i could take an array of int and an array of double as inputs and put the result in a standard library container of int as long as the three iterators are “Forward Iterators”. As a bonus, you will get a warning at compile time because of the possible loss of data from the conversion from double to int. That’s the kind of information we’re looking for.

Now, we have a nice, typesafe, generic and reusable algorithm that works with pretty much everything. There’s one last thing we can do to improve that code, use the standard library algorithms when possible, it makes the code clearer and sometimes faster.

Let’s take a look at the final code in all its glory:

#include <iostream>
#include <iterator>
#include <list>

template <typename FwdInItr1, typename FwdInItr2, typename FwdOutItr>
void interleave(FwdInItr1 seq_beg_1, FwdInItr1 seq_end_1,
        FwdInItr2 seq_beg_2, FwdInItr2 seq_end_2,
        FwdOutItr dest)
{
    while(seq_beg_1 != seq_end_1 && seq_beg_2 != seq_end_2)
    {
        *dest++ = *seq_beg_1++;
        *dest++ = *seq_beg_2++;
    }

    if(seq_beg_1 != seq_end_1)
        std::copy(seq_beg_1, seq_end_1, dest);

    if(seq_beg_2 != seq_end_2)
        std::copy(seq_beg_2, seq_end_2, dest);
}

int main()
{
    int xs[4] = {1, 3, 5, 7};
    std::list<int> ys;
    ys.push_back(2);
    ys.push_back(4);
    ys.push_back(6);
    ys.push_back(8);
    ys.push_back(9);
    ys.push_back(10);

    interleave(xs, xs + 4, ys.begin(), ys.end(), std::ostream_iterator<int>(std::cout, " "));
}

That’s it. Why use std::copy? How can this code be faster? When using raw pointers with std::copy, it will internally use memcpy/memmove to copy instead of iterating and copying one element at a time. You can also see how adaptable is the new code compared to the original. C++ improves a lot compared to C in terms of typesafety with no runtime overhead because everything is done at compile time and better typesafety means better checking from the compiler.

Follow

Get every new post delivered to your Inbox.