C++’s typesafety to the rescue?

Today, i was reading a couple of blogs when i came across an interesting article from gnuvince about the capacity/incapacity of the compilers to reliably detect, catch and warn the programmer of possible errors which can be quite difficult in loosely/weakly typed programming languages such as C and C++. I was looking at his problematic code written in C to demonstrate the problem when i suddenly got an idea. “How much can C++ improve this situation by using its built-in facilities and stronger type checking?”.

First of all, here’s the original code:

#include <stdio.h>

void interleave(void **aggregated,
        void **array1,
        size_t size1,
        void **array2,
        size_t size2) {
    while (size1 && size2) {
        *aggregated++ = *array1++;
        *aggregated++ = *array2++;
        size1--;
        size2--;
    }
    while (size1) {
        *aggregated++ = *array1++;
        size1--;
    }
    while (size2) {
        *aggregated++ = *array2++;
        size2--;
    }
}

int main(void) {
    int xs[4] = {1, 2, 3, 4};
    double ys[4] = {1, 2, 3, 4}; /*   ???    */
    int zs[8];
    size_t i;

    interleave((void **)zs, (void **)xs, 4, (void **)ys, 4);
    for (i = 0; i < 8; ++i) {
        printf("%d ", zs&#91;i&#93;);
    }
    printf("\n");

    return 0;
}
&#91;/sourcecode&#93;

The problem with this code is that it's not taking in consideration the size of the elements, is dereferencing void pointers and does not warn if you do something wrong like passing 2 arrays of different types to the function. Still, it does show the problem because the code compiles silently without warnings. That code in a large project could cause hours of frustration and debugging.

A possible working implementation would be:

&#91;sourcecode language='c'&#93;
void interleave(void *aggregated,
        void *array1,
        size_t size1,
        void *array2,
        size_t size2, size_t size_of_elem) {

    uint8_t *buf1 = array1;
    uint8_t *buf2 = array2;
    uint8_t *aggr = aggregated;

    while (size1 && size2) {
        memcpy(aggr, buf1, size_of_elem);
        memcpy(aggr + size_of_elem, buf2, size_of_elem);
        aggr += 2 * size_of_elem;
        buf1 += size_of_elem;
        buf2 += size_of_elem;

        size1--;
        size2--;
    }
    if (size1)
        memcpy(aggr, buf1, size1 * size_of_elem);

    if (size2)
        memcpy(aggr, buf2, size2 * size_of_elem);
}
&#91;/sourcecode&#93;

Now, that we have a working C version of that function, what more can we do to achieve or at least increase typesafety without literally killing reusability? Simply, nothing. We can easily conclude that C is not typesafe at all by design because genericity is only achieved at the cost of discarding type information so there's no way any tool or compiler is going to help us there. Let the fun begins, let's see what C++ can do in that situation in terms of typesafety and genericity.

The first anomaly a C++ programmer would see in the preceding code is the presence of those nasty and evil void pointers. Luckily, we can get rid of them by using templates.

Let's make a first attempt to improve the typesafety of this code using c++:

&#91;sourcecode language='cpp'&#93;
template <typename T>
void interleave(T *aggregated,
        T *array1,
        size_t size1,
        T *array2,
        size_t size2) {
    while (size1 && size2) {
        *aggregated++ = *array1++;
        *aggregated++ = *array2++;
        size1--;
        size2--;
    }
    while (size1) {
        *aggregated++ = *array1++;
        size1--;
    }
    while (size2) {
        *aggregated++ = *array2++;
        size2--;
    }
}

Now, we managed to get typesafety by using templates, types are never discarded because we don’t use void pointers and the arrays passed must have the exact same type or else the compiler will scream with:

interleave.cpp: In function ‘int main()’:
interleave.cpp:30: error: no matching function for call to ‘interleave(int [10], int [6], int, double [4], int)’

Bingo, we did it, we have a typesafe, generic and reusable function. Oh Well, not that reusable since it only works with arrays. We could do far better, just look at how the standard library algorithms are implemented. They use an iterator interface and work with ranges. If we can do the same, our function will work with arrays and every standard library containers, now that’s real reusability.
Here’s the code modified to be adaptable to the iterator pattern:

template <typename FwdInItr1, typename FwdInItr2, typename FwdOutItr>
void interleave(FwdInItr1 seq_beg_1, FwdInItr1 seq_end_1,
        FwdInItr2 seq_beg_2, FwdInItr2 seq_end_2,
        FwdOutItr dest)
{
    while(seq_beg_1 != seq_end_1 && seq_beg_2 != seq_end_2)
    {
        *dest++ = *seq_beg_1++;
        *dest++ = *seq_beg_2++;
    }

    while(seq_beg_1 != seq_end_1)
        *dest++ = *seq_beg_1++;

    while(seq_beg_2 != seq_end_2)
        *dest++ = *seq_beg_2++;
}

Lets try to understand that code. I declare a template with three types of iterators, they can have different types but must be compatible else the code won’t compile. So i could take an array of int and an array of double as inputs and put the result in a standard library container of int as long as the three iterators are “Forward Iterators”. As a bonus, you will get a warning at compile time because of the possible loss of data from the conversion from double to int. That’s the kind of information we’re looking for.

Now, we have a nice, typesafe, generic and reusable algorithm that works with pretty much everything. There’s one last thing we can do to improve that code, use the standard library algorithms when possible, it makes the code clearer and sometimes faster.

Let’s take a look at the final code in all its glory:

#include <iostream>
#include <iterator>
#include <list>

template <typename FwdInItr1, typename FwdInItr2, typename FwdOutItr>
void interleave(FwdInItr1 seq_beg_1, FwdInItr1 seq_end_1,
        FwdInItr2 seq_beg_2, FwdInItr2 seq_end_2,
        FwdOutItr dest)
{
    while(seq_beg_1 != seq_end_1 && seq_beg_2 != seq_end_2)
    {
        *dest++ = *seq_beg_1++;
        *dest++ = *seq_beg_2++;
    }

    if(seq_beg_1 != seq_end_1)
        std::copy(seq_beg_1, seq_end_1, dest);

    if(seq_beg_2 != seq_end_2)
        std::copy(seq_beg_2, seq_end_2, dest);
}

int main()
{
    int xs[4] = {1, 3, 5, 7};
    std::list<int> ys;
    ys.push_back(2);
    ys.push_back(4);
    ys.push_back(6);
    ys.push_back(8);
    ys.push_back(9);
    ys.push_back(10);

    interleave(xs, xs + 4, ys.begin(), ys.end(), std::ostream_iterator<int>(std::cout, " "));
}

That’s it. Why use std::copy? How can this code be faster? When using raw pointers with std::copy, it will internally use memcpy/memmove to copy instead of iterating and copying one element at a time. You can also see how adaptable is the new code compared to the original. C++ improves a lot compared to C in terms of typesafety with no runtime overhead because everything is done at compile time and better typesafety means better checking from the compiler.

Advertisements

3 comments so far

  1. Steven Pigeon on

    Why use void* and cast to uint_t *? would it not be sufficient to declare parameters as char* and dispense of casts? unint_t brings nothing much to the safety of the test (the bytes aren’t operated upon except through memcpy/memmove).

    The std:: lib uses char* instead of void whenever possible mainly because it points to char, which has a sizeof, whereas void* points to void, and for void, sizeof is undefined.

    I also suggest using count (count1,count2) rather than size (size1,size2) where refering to the number of elements.

    btw, pretty cool trick with the std::ostream_iterator

  2. Maxime Roger Cyr on

    You’re totally right about the useless casts to uint8_t, but you’re wrong when stating that the std library is using char * instead of void * whenever possible, It was only the case in pre-ANSI C when void * was non-existant. Also, only void * have the ability to be implicitly converted from and to a pointer of any type, using char * would need a cast.

  3. Steven Pigeon on

    I was thinking of istream::read, for example. Its prototype is

    istream& read ( char* s, streamsize n );

    not void *. The fact that the ANSI-C (or even pre-ANSI, for which I really don’t care) used char * or not is mostly irrelevant. Using char* probably have more advantages than void*. I don’t consider the caller-side C-style cast as a major problem.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: