Archive for the ‘template’ Category
C++ and functional programming idioms
If you’re curious like me, you probably ventured at least once in the scary and mind-bending world of functional programming, came back and told yourself: “It would be nice if I could do this or that in c++”. FP languages have been present for decades but only recently, we’ve been starting to see the adoption of some of their techniques in classical imperative languages… like higher order functions, closures/lambdas functions, currying and lazy evaluation. For example, Javascript supports closures since version 1.7 and C# from 3.0.
Seeing how useful these techniques are, it’s normal to want them in our favorite programming language. We’re already doing a bit a FP without knowing thanks to the standard library algorithms. Lots of them takes functors/predicates as arguments so they mimics fairly well the behavior of higher order functions. Besides that, C++ has no built-in support for other idioms like lambda functions or closures but we can achieve similar effects due to the lazy nature of templates and a technique known as “expression templates”. More on that technique on a future post…
To demonstrate my point, let’s take a small program that takes a string as input and return the most frequent character. In the old classical C++ way, it could be implemented as follow:
#include <iostream>
#include <locale>
#include <map>
#include <string>
namespace
{
char most_frequent_letter(const std::string &str)
{
typedef std::map<char, unsigned int> char_counts_t;
char_counts_t char_counts;
for(std::string::const_iterator itr = str.begin();
itr != str.end(); ++itr)
if(std::isalpha(*itr, std::locale()))
++char_counts[*itr];
for(char_counts_t::const_iterator itr = char_counts.begin();
itr != char_counts.end(); ++itr)
std::cout << itr->first << " => " << itr->second << std::endl;
if(!char_counts.empty())
{
char_counts_t::const_iterator highest_count = char_counts.begin();
for(char_counts_t::const_iterator itr = ++char_counts.begin();
itr != char_counts.end(); ++itr)
if(itr->second > highest_count->second)
highest_count = itr;
return highest_count->first;
}
return ' ';
}
}
int main(int argc, char *argv[])
{
if(argc > 1)
{
std::string some_string = argv[1];
std::cout << "The string is: " << some_string << "\n" << std::endl;
std::cout << "The most frequent letter is: " <<
most_frequent_letter(some_string) << std::endl;
}
else
std::cout << "Usage: " << argv[0] << " <string>" << std::endl;
}
So far so good, it works and does the job. We’re putting the characters in a map using the character as the key and the count as the value. Then, we print the content of the map and finally iterate through it to find the character with the highest value. The problems with this code is that we’re reinventing parts already in the standard library and that code lacks expressiveness. Let’s see how the code could look like if we used the standard algorithms.
namespace
{
template <typename map_t>
struct map_filler
{
typedef void result_type;
map_filler(map_t &map):
map_(map)
{
}
template <typename T>
result_type operator()(const T &t) const
{
if(std::isalpha(t, std::locale()))
++map_[t];
}
private:
map_t &map_;
};
struct pair_printer
{
typedef void result_type;
template <typename pair_t>
result_type operator()(const pair_t &pair) const
{
std::cout << pair.first << " => " << pair.second << std::endl;
}
};
struct pair_value_comparer
{
typedef bool result_type;
template <typename pair_t>
result_type operator()(const pair_t &a, const pair_t &b)
{
return a.second < b.second;
}
};
char most_frequent_letter(const std::string &str)
{
typedef std::map<char, unsigned int> char_counts_t;
char_counts_t char_counts;
std::for_each(str.begin(), str.end(),
map_filler<char_counts_t>(char_counts));
std::for_each(char_counts.begin(), char_counts.end(),
pair_printer());
char_counts_t::const_iterator result = std::max_element(
char_counts.begin(),
char_counts.end(),
pair_value_comparer());
return (result != char_counts.end()) ? result->first : ' ';
}
}
Hmm… Okay… Let’s see, our “most_frequent_letter” function is now using the standard library algorithms. It does make the function clearer and way more expressive but at the cost of around 40 lines of “support code” whose are in our case, functors. Even when thinking in terms of reusability, the chance of needing that same support code in the future is small, if not inexistant. What would we do in that case in a FP language? Use a small lambda functions/closure instead. For that example, I’m going to use boost::phoenix 2.0, an efficient FP library part of boost which is in my opinion the best general, multi-purpose C++ library and a must-have for any serious C++ programmer. Let’s see what phoenix can do:
namespace
{
namespace phx = boost::phoenix;
using namespace phx::arg_names;
using namespace phx::local_names;
using phx::at_c;
char most_frequent_letter(const std::string &str)
{
typedef std::map<char, unsigned int> char_counts_t;
char_counts_t char_counts;
std::for_each(str.begin(), str.end(),
phx::if_(phx::bind(std::isalpha<char>, _1,
phx::construct<std::locale>()))
[
phx::let(_a = phx::ref(char_counts)[_1])
[
++_a
]
]);
std::for_each(char_counts.begin(), char_counts.end(),
std::cout << at_c<0>(_1) << " => " << at_c<1>(_1) << std::endl);
char_counts_t::const_iterator result = std::max_element(
char_counts.begin(), char_counts.end(),
at_c<1>(_1) < at_c<1>(_2));
return (result != char_counts.end()) ? result->first : ' ';
}
}
I made a few using statements to make the code easier to understand. Let’s take a look the for_each statement, the 2 first arguments are the usual .begin() and .end() but then you see that strange if_ as the third argument. if_, like every phoenix statement, returns a functor object created at compile time via template composition (Expression templates). So with this library, you can create inline functors on the fly without the “support code” bloat. You can use your own functors as long as they’re lazy which means they don’t do anything before the operator () is called on them. Fortunately, the lib also provides wrapper for “normal” functions.
Now for that code, nothing much to say for the if_ statement, it’s just a lazy version of the classic if keyword. phx::bind is one of the included wrappers, it creates a lazy version of a function passed as the first argument binded with the arguments passed as additional parameters. _1 and _2 are placeholders, they’re the actual parameters passed by the algorithm to the functor and phx::construct returns a new object of the type passed as the template parameter. Knowing that, we can now understand that “phx::bind(std::isalpha, _1, phx::construct())” returns a lazy version of std::isalpha with the current argument from std::for_each binded as the first argument to std::isalpha and an object of type std::locale binded as the second. phx::let’s only purpose is to create scoped local variables. phx::ref returns a reference to the object passed as the parameter. phx::at_c is simple, on a std::pair phx::at_c returns .first and phx::at_c .second.
For more information, consult the boost::phoenix documentation.
With that new tool, we can now more easily than ever use C++ to imitate some FP idioms:
#include <algorithm>
#include <iostream>
#include <iterator>
#include <numeric>
#include <string>
#include <vector>
#include <boost/fusion/include/std_pair.hpp>
#include <boost/spirit/home/phoenix/bind.hpp>
#include <boost/spirit/home/phoenix/core.hpp>
#include <boost/spirit/home/phoenix/fusion.hpp>
#include <boost/spirit/home/phoenix/object.hpp>
#include <boost/spirit/home/phoenix/operator.hpp>
#include <boost/spirit/home/phoenix/scope.hpp>
#include <boost/spirit/home/phoenix/statement.hpp>
int main(int argc, char *argv[])
{
namespace phx = boost::phoenix;
using namespace phx::arg_names;
using namespace phx::local_names;
std::vector<int> input;
input.push_back(1);
input.push_back(2);
input.push_back(3);
input.push_back(4);
input.push_back(5);
//map( Make a new sequence with all the elements multiplied by 2 )
std::transform(input.begin(),
input.end(),
std::ostream_iterator<int>(std::cout, ", "),
_1 * 2);
std::cout << std::endl;
//filter( Make a new sequence containing all the odd numbers )
std::remove_copy_if(input.begin(),
input.end(),
std::ostream_iterator<int>(std::cout, ", "),
!(_1 % 2));
std::cout << std::endl;
//fold/reduce (Builds up and returns a value based on the sequence)
//I use std::string here because it makes it easier to show what is
//going on exactly.
std::vector<std::string> words;
words.push_back("H");
words.push_back("e");
words.push_back("l");
words.push_back("l");
words.push_back("o");
//foldl
std::string result = std::accumulate(words.begin(),
words.end(),
static_cast<std::string>(""),
_1 + _2);
std::cout << result << std::endl;
//foldr
result = std::accumulate(words.rbegin(),
words.rend(),
static_cast<std::string>(""),
_1 + _2);
std::cout << result << std::endl;
}
In a near future, we’ll probably see more FP techniques being applied to imperative languages because they can make the code cleaner and more expressive without penalties. In the case of C++, like we just saw, they can leverage existing standard library algorithms and make them more convenient.
C++’s typesafety to the rescue?
Today, i was reading a couple of blogs when i came across an interesting article from gnuvince about the capacity/incapacity of the compilers to reliably detect, catch and warn the programmer of possible errors which can be quite difficult in loosely/weakly typed programming languages such as C and C++. I was looking at his problematic code written in C to demonstrate the problem when i suddenly got an idea. “How much can C++ improve this situation by using its built-in facilities and stronger type checking?”.
First of all, here’s the original code:
#include <stdio.h>
void interleave(void **aggregated,
void **array1,
size_t size1,
void **array2,
size_t size2) {
while (size1 && size2) {
*aggregated++ = *array1++;
*aggregated++ = *array2++;
size1--;
size2--;
}
while (size1) {
*aggregated++ = *array1++;
size1--;
}
while (size2) {
*aggregated++ = *array2++;
size2--;
}
}
int main(void) {
int xs[4] = {1, 2, 3, 4};
double ys[4] = {1, 2, 3, 4}; /* ??? */
int zs[8];
size_t i;
interleave((void **)zs, (void **)xs, 4, (void **)ys, 4);
for (i = 0; i < 8; ++i) {
printf("%d ", zs[i]);
}
printf("\n");
return 0;
}
The problem with this code is that it’s not taking in consideration the size of the elements, is dereferencing void pointers and does not warn if you do something wrong like passing 2 arrays of different types to the function. Still, it does show the problem because the code compiles silently without warnings. That code in a large project could cause hours of frustration and debugging.
A possible working implementation would be:
void interleave(void *aggregated,
void *array1,
size_t size1,
void *array2,
size_t size2, size_t size_of_elem) {
uint8_t *buf1 = array1;
uint8_t *buf2 = array2;
uint8_t *aggr = aggregated;
while (size1 && size2) {
memcpy(aggr, buf1, size_of_elem);
memcpy(aggr + size_of_elem, buf2, size_of_elem);
aggr += 2 * size_of_elem;
buf1 += size_of_elem;
buf2 += size_of_elem;
size1--;
size2--;
}
if (size1)
memcpy(aggr, buf1, size1 * size_of_elem);
if (size2)
memcpy(aggr, buf2, size2 * size_of_elem);
}
Now, that we have a working C version of that function, what more can we do to achieve or at least increase typesafety without literally killing reusability? Simply, nothing. We can easily conclude that C is not typesafe at all by design because genericity is only achieved at the cost of discarding type information so there’s no way any tool or compiler is going to help us there. Let the fun begins, let’s see what C++ can do in that situation in terms of typesafety and genericity.
The first anomaly a C++ programmer would see in the preceding code is the presence of those nasty and evil void pointers. Luckily, we can get rid of them by using templates.
Let’s make a first attempt to improve the typesafety of this code using c++:
template <typename T>
void interleave(T *aggregated,
T *array1,
size_t size1,
T *array2,
size_t size2) {
while (size1 && size2) {
*aggregated++ = *array1++;
*aggregated++ = *array2++;
size1--;
size2--;
}
while (size1) {
*aggregated++ = *array1++;
size1--;
}
while (size2) {
*aggregated++ = *array2++;
size2--;
}
}
Now, we managed to get typesafety by using templates, types are never discarded because we don’t use void pointers and the arrays passed must have the exact same type or else the compiler will scream with:
interleave.cpp: In function ‘int main()’:
interleave.cpp:30: error: no matching function for call to ‘interleave(int [10], int [6], int, double [4], int)’
Bingo, we did it, we have a typesafe, generic and reusable function. Oh Well, not that reusable since it only works with arrays. We could do far better, just look at how the standard library algorithms are implemented. They use an iterator interface and work with ranges. If we can do the same, our function will work with arrays and every standard library containers, now that’s real reusability.
Here’s the code modified to be adaptable to the iterator pattern:
template <typename FwdInItr1, typename FwdInItr2, typename FwdOutItr>
void interleave(FwdInItr1 seq_beg_1, FwdInItr1 seq_end_1,
FwdInItr2 seq_beg_2, FwdInItr2 seq_end_2,
FwdOutItr dest)
{
while(seq_beg_1 != seq_end_1 && seq_beg_2 != seq_end_2)
{
*dest++ = *seq_beg_1++;
*dest++ = *seq_beg_2++;
}
while(seq_beg_1 != seq_end_1)
*dest++ = *seq_beg_1++;
while(seq_beg_2 != seq_end_2)
*dest++ = *seq_beg_2++;
}
Lets try to understand that code. I declare a template with three types of iterators, they can have different types but must be compatible else the code won’t compile. So i could take an array of int and an array of double as inputs and put the result in a standard library container of int as long as the three iterators are “Forward Iterators”. As a bonus, you will get a warning at compile time because of the possible loss of data from the conversion from double to int. That’s the kind of information we’re looking for.
Now, we have a nice, typesafe, generic and reusable algorithm that works with pretty much everything. There’s one last thing we can do to improve that code, use the standard library algorithms when possible, it makes the code clearer and sometimes faster.
Let’s take a look at the final code in all its glory:
#include <iostream>
#include <iterator>
#include <list>
template <typename FwdInItr1, typename FwdInItr2, typename FwdOutItr>
void interleave(FwdInItr1 seq_beg_1, FwdInItr1 seq_end_1,
FwdInItr2 seq_beg_2, FwdInItr2 seq_end_2,
FwdOutItr dest)
{
while(seq_beg_1 != seq_end_1 && seq_beg_2 != seq_end_2)
{
*dest++ = *seq_beg_1++;
*dest++ = *seq_beg_2++;
}
if(seq_beg_1 != seq_end_1)
std::copy(seq_beg_1, seq_end_1, dest);
if(seq_beg_2 != seq_end_2)
std::copy(seq_beg_2, seq_end_2, dest);
}
int main()
{
int xs[4] = {1, 3, 5, 7};
std::list<int> ys;
ys.push_back(2);
ys.push_back(4);
ys.push_back(6);
ys.push_back(8);
ys.push_back(9);
ys.push_back(10);
interleave(xs, xs + 4, ys.begin(), ys.end(), std::ostream_iterator<int>(std::cout, " "));
}
That’s it. Why use std::copy? How can this code be faster? When using raw pointers with std::copy, it will internally use memcpy/memmove to copy instead of iterating and copying one element at a time. You can also see how adaptable is the new code compared to the original. C++ improves a lot compared to C in terms of typesafety with no runtime overhead because everything is done at compile time and better typesafety means better checking from the compiler.
Comments (2)