C++ and functional programming idioms

If you’re curious like me, you probably ventured at least once in the scary and mind-bending world of functional programming, came back and told yourself: “It would be nice if I could do this or that in c++”. FP languages have been present for decades but only recently, we’ve been starting to see the adoption of some of their techniques in classical imperative languages… like higher order functions, closures/lambdas functions, currying and lazy evaluation. For example, Javascript supports closures since version 1.7 and C# from 3.0.

Seeing how useful these techniques are, it’s normal to want them in our favorite programming language. We’re already doing a bit a FP without knowing thanks to the standard library algorithms. Lots of them takes functors/predicates as arguments so they mimics fairly well the behavior of higher order functions. Besides that, C++ has no built-in support for other idioms like lambda functions or closures but we can achieve similar effects due to the lazy nature of templates and a technique known as “expression templates”. More on that technique on a future post…

To demonstrate my point, let’s take a small program that takes a string as input and return the most frequent character. In the old classical C++ way, it could be implemented as follow:

#include <iostream>
#include <locale>
#include <map>
#include <string>

namespace
{

    char most_frequent_letter(const std::string &str)
    {
        typedef std::map<char, unsigned int> char_counts_t;

        char_counts_t char_counts;

        for(std::string::const_iterator itr = str.begin();
                itr != str.end(); ++itr)
            if(std::isalpha(*itr, std::locale()))
                ++char_counts[*itr];

        for(char_counts_t::const_iterator itr = char_counts.begin();
                itr != char_counts.end(); ++itr)
            std::cout << itr->first << " => " << itr->second << std::endl;

        if(!char_counts.empty())
        {
            char_counts_t::const_iterator highest_count = char_counts.begin();
            for(char_counts_t::const_iterator itr = ++char_counts.begin();
                    itr != char_counts.end(); ++itr)
                if(itr->second > highest_count->second)
                    highest_count = itr;
            return highest_count->first;
        }
        return ' ';
    }

}

int main(int argc, char *argv[])
{
    if(argc > 1)
    {
        std::string some_string = argv[1];
        std::cout << "The string is: " << some_string << "\n" << std::endl;
        std::cout << "The most frequent letter is: " <<
            most_frequent_letter(some_string) << std::endl;
    }
    else
        std::cout << "Usage: " << argv[0] << " <string>" << std::endl;
}

So far so good, it works and does the job. We’re putting the characters in a map using the character as the key and the count as the value. Then, we print the content of the map and finally iterate through it to find the character with the highest value. The problems with this code is that we’re reinventing parts already in the standard library and that code lacks expressiveness. Let’s see how the code could look like if we used the standard algorithms.

namespace
{

    template <typename map_t>
    struct map_filler
    {
        typedef void result_type;

        map_filler(map_t &map):
            map_(map)
        {
        }
        template <typename T>
        result_type operator()(const T &t) const
        {
            if(std::isalpha(t, std::locale()))
                ++map_[t];
        }
    private:
        map_t &map_;
    };

    struct pair_printer
    {
        typedef void result_type;

        template <typename pair_t>
        result_type operator()(const pair_t &pair) const
        {
            std::cout << pair.first << " => " << pair.second << std::endl;
        }
    };

    struct pair_value_comparer
    {
        typedef bool result_type;

        template <typename pair_t>
        result_type operator()(const pair_t &a, const pair_t &b)
        {
            return a.second < b.second;
        }
    };

    char most_frequent_letter(const std::string &str)
    {
        typedef std::map<char, unsigned int> char_counts_t;

        char_counts_t char_counts;

        std::for_each(str.begin(), str.end(),
                map_filler<char_counts_t>(char_counts));

        std::for_each(char_counts.begin(), char_counts.end(),
                pair_printer());

        char_counts_t::const_iterator result = std::max_element(
                char_counts.begin(),
                char_counts.end(),
                pair_value_comparer());

        return (result != char_counts.end()) ? result->first : ' ';
    }

}

Hmm… Okay… Let’s see, our “most_frequent_letter” function is now using the standard library algorithms. It does make the function clearer and way more expressive but at the cost of around 40 lines of “support code” whose are in our case, functors. Even when thinking in terms of reusability, the chance of needing that same support code in the future is small, if not inexistant. What would we do in that case in a FP language? Use a small lambda functions/closure instead. For that example, I’m going to use boost::phoenix 2.0, an efficient FP library part of boost which is in my opinion the best general, multi-purpose C++ library and a must-have for any serious C++ programmer. Let’s see what phoenix can do:

namespace
{

	namespace phx = boost::phoenix;
	using namespace phx::arg_names;
	using namespace phx::local_names;
	using phx::at_c;

	char most_frequent_letter(const std::string &str)
	{
		typedef std::map<char, unsigned int> char_counts_t;

		char_counts_t char_counts;

		std::for_each(str.begin(), str.end(),
				phx::if_(phx::bind(std::isalpha<char>, _1,
						phx::construct<std::locale>()))
				[
					phx::let(_a = phx::ref(char_counts)[_1])
					[
						++_a
					]
				]);

		std::for_each(char_counts.begin(), char_counts.end(),
				std::cout << at_c<0>(_1) << " => " << at_c<1>(_1) << std::endl);

		char_counts_t::const_iterator result = std::max_element(
				char_counts.begin(), char_counts.end(),
				at_c<1>(_1) < at_c<1>(_2));

		return (result != char_counts.end()) ? result->first : ' ';
	}

}

I made a few using statements to make the code easier to understand. Let’s take a look the for_each statement, the 2 first arguments are the usual .begin() and .end() but then you see that strange if_ as the third argument. if_, like every phoenix statement, returns a functor object created at compile time via template composition (Expression templates). So with this library, you can create inline functors on the fly without the “support code” bloat. You can use your own functors as long as they’re lazy which means they don’t do anything before the operator () is called on them. Fortunately, the lib also provides wrapper for “normal” functions.

Now for that code, nothing much to say for the if_ statement, it’s just a lazy version of the classic if keyword. phx::bind is one of the included wrappers, it creates a lazy version of a function passed as the first argument binded with the arguments passed as additional parameters. _1 and _2 are placeholders, they’re the actual parameters passed by the algorithm to the functor and phx::construct returns a new object of the type passed as the template parameter. Knowing that, we can now understand that “phx::bind(std::isalpha, _1, phx::construct())” returns a lazy version of std::isalpha with the current argument from std::for_each binded as the first argument to std::isalpha and an object of type std::locale binded as the second. phx::let’s only purpose is to create scoped local variables. phx::ref returns a reference to the object passed as the parameter. phx::at_c is simple, on a std::pair phx::at_c returns .first and phx::at_c .second.

For more information, consult the boost::phoenix documentation.

With that new tool, we can now more easily than ever use C++ to imitate some FP idioms:

#include <algorithm>
#include <iostream>
#include <iterator>
#include <numeric>
#include <string>
#include <vector>

#include <boost/fusion/include/std_pair.hpp>
#include <boost/spirit/home/phoenix/bind.hpp>
#include <boost/spirit/home/phoenix/core.hpp>
#include <boost/spirit/home/phoenix/fusion.hpp>
#include <boost/spirit/home/phoenix/object.hpp>
#include <boost/spirit/home/phoenix/operator.hpp>
#include <boost/spirit/home/phoenix/scope.hpp>
#include <boost/spirit/home/phoenix/statement.hpp>

int main(int argc, char *argv[])
{
	namespace phx = boost::phoenix;
	using namespace phx::arg_names;
	using namespace phx::local_names;

	std::vector<int> input;
	input.push_back(1);
	input.push_back(2);
	input.push_back(3);
	input.push_back(4);
	input.push_back(5);
	//map( Make a new sequence with all the elements multiplied by 2 )
	std::transform(input.begin(),
			input.end(),
			std::ostream_iterator<int>(std::cout, ", "),
			_1 * 2);
	std::cout << std::endl;

	//filter( Make a new sequence containing all the odd numbers )
	std::remove_copy_if(input.begin(),
			input.end(),
			std::ostream_iterator<int>(std::cout, ", "),
			!(_1 % 2));
	std::cout << std::endl;

	//fold/reduce (Builds up and returns a value based on the sequence)
	//I use std::string here because it makes it easier to show what is
	//going on exactly.
	std::vector<std::string> words;
	words.push_back("H");
	words.push_back("e");
	words.push_back("l");
	words.push_back("l");
	words.push_back("o");

	//foldl
	std::string result = std::accumulate(words.begin(),
			words.end(),
			static_cast<std::string>(""),
			_1 + _2);
	std::cout << result << std::endl;
	//foldr
	result = std::accumulate(words.rbegin(),
			words.rend(),
			static_cast<std::string>(""),
			_1 + _2);
	std::cout << result << std::endl;
}

In a near future, we’ll probably see more FP techniques being applied to imperative languages because they can make the code cleaner and more expressive without penalties. In the case of C++, like we just saw, they can leverage existing standard library algorithms and make them more convenient.

About these ads

8 comments so far

  1. Adrian on

    Or, you could use a functional language:

    import List

    f = head.head.sortBy (\x y->compare (length y) (length x)).group.sort

  2. FalconNL on

    A slightly different approach that includes all the necessary code for the fully working program:

    import List

    main = getLine >>= print . snd . last . sort . map (\c -> (length c, head c)) . group . sort

  3. Steven Pigeon on

    Adrian: duh!

    Though, Brainiac above has a point, of sort. Even if C++ can allow some constructs to be functional-like, they remain typically imperative by nature, and not dynamically typed. For example, your examples for foldr and foldl are very verbose and do necessitate the inclusion of a few boost headers (14, if I counted correctly), and the use of relatively non-intuitive constructs.

    If you look at languages like Python (a hybrid functional/imperative language) that offers very simple to use (and more importantly, to understand) “list comprehensions”, your filter example reduces to:

    [ x for x in input_list if (x%2)!=0 ]

    which is clear, concise and a lot cuter than the C++ code based on iterators.

    Would you think that in many cases it would be preferable to hide the iterator altogether? For exemple, having something like

    int sum = std::accumulate(input_list,_1+_2);

    where most of the complexity is abstracted away?

    More is not always better, and you can hide more to use less ;)

    I still think the idea of having functional-like idioms brought in C++ is a great one. I’ll have to look at what pheonix has to offer.

  4. gnuvince on

    Steven: functional programming and dynamic typing are orthogonal concepts: there exist statically typed functional languages (Haskell, ML, Miranda) and there exist dynamically typed functional languages (Erlang, Clojure, Scheme.)

  5. Maxime Roger Cyr on

    About the number of headers to include. Phoenix 2.0 right now, as of boost 1.36, is only a functional programming library part of another one(boost::spirit) but will eventually become a full-fledged library replacing the current boost::lambda and boost::bind libraries. Before its full release as a stand-alone library, Phoenix will further be refined and probably that a single “include all” header will be available. Myself, i don’t like the fact of using at_c to access std::pair::first or std::pair::second and i think a “first” and “second” functor should be made before that release. (It’s easy to do but should be included by default)

    Thank you for your comments :)

  6. Dodheim on

    Steve, a few notes:

    1. Only two boost headers need to be included directly for foldr and foldl — core.hpp and operator.hpp. (As an aside, Phoenix is extremely modular; technically each operator has its own header, you can include each of them individually if it suits you, operator.hpp just includes all of them.)

    2. ‘Non-intuitive’ is certainly subjective. Do you think the Python list comprehension is intuitive to someone who doesn’t know Python?

    2. The lambda ‘_1 + _2′ would work given inputs of any type with an operator+, so what benefit would dynamic typing give in this context? I think quite the opposite — it would defeat the purpose of using C++ to begin with (compiler errors for types without an operator+, aggressive inlining with optimal code paths per type, etc). Template parameter deduction gives all the ‘dynamic typing’ needed for the sake of a lambda; what do you feel is lacking?

    3. Your proposed ‘std::accumulate(input_list, _1 + _2)’ syntax is granted through use of a different boost library, range_ex. So given that you can already have that abstraction, what is the real critique of the lambda expression? ‘_1 + _2′ is too verbose? If too unclear, an alternative syntax is ‘arg1 + arg2′. If still too unclear, you can rename the arguments anything you want (covered in the Phoenix docs).

  7. Steven Pigeon on

    You are right about what’s intuitive and what’s not; for lack of a better word, I meant to say that some statements are easier read than others, for equal tasks.

    for your 2nd number 2 ;) I don’t think in terms of defeating the purpose of using C, C++, or some other language. One construct hardly can justify abandoning a language for an other. C++ (and more so of C) is a language where you can have rather fine control on what exactly is going on in the machine, something that can’t be said from, ex, Python (and java, I would suppose though I suspect if you know the underlying JVM very well, it’s predictible enough). So probably chosing C++ for a given project from the start is a decision based on other criteria than merely “can I do a dropwhile that is all cute” (I exagerate, but you understand).

    For number 3: no, the critique was pointed at the for_each construct, for example. You are right saying that std::accumulate is clean enough; I have nothing much to say about it. _1 and _2 are probably good choices since it’s likely that arg1 and arg2 may clash with existing variable (though the problem might not be on Boost’s side, we agree).

  8. Michal Mocny on

    Great article, thanks.
    I think most will already know this, but I just wanted to point out that C++0x will have lambda functions/closures built in, with a far nicer syntax :)

    We _can_ have the best of both worlds (as soon as compilers let us :)


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: