Archive for the ‘contiguous’ Category

Is std::string’s storage contiguous?

Take a look at the following code:

#include <cstddef>
#include <cstring>
#include <iostream>
#include <string>

/* Dummy recv function */
size_t recv(int socket, void *buffer, size_t length, int flags)
 std::memcpy(buffer, "Some stuff", 10);
 return 10;

int main()
 //Create a string with space for some characters
 std::string x(64, char());

 std::cout << "x.size() = " << x.size() << ", x.capacity() = "
           << x.capacity() << std::endl;

 //filling the string with data assuming that &x[0]
 //is pointing at the beginning a contiguous array
 std::size_t bytes_read = recv(0, &x[0], x.size(), 0);

 //Using the good old swap trick to free excess space
 std::string(&x[0], bytes_read).swap(x);

 std::cout << "x.size() = " << x.size() << ", x.capacity() = "
           << x.capacity() << std::endl;
 std::cout << x << std::endl;

Is that code… valid? I always thought it wasn’t since I heard lots of people telling me that storage for a std::string isn’t guaranteed to be contiguous (I’ve yet to see an implementation with storage that isn’t) so the preceding code would possibly cause undefined behavior. A few days ago, I was on Freenode’s ##C++ when some guy asked the same question, I was responding him with the¬† classical answer… when another guy “Tinodidriksen” replied that I was wrong. What?!? What if he’s right? I mean, I’ve never actually looked at the standard about it, I only took the usual answer as being the truth. Let’s see what the standard says about it.

1) basic_string constructor requirements(See tables 38-43 in the 14882:2003 standard):
Excerpt from table 39: “data() points at the first element of an allocated copy of rlen consecutive elements of the string controlled by str beginning at position pos”, rlen being equal to size() according to the same table.

2) 21.3.4 paragraph 1, basic_string indexed access:
Returns: If pos < size(), returns data()[pos]. Otherwise, if pos == size(), the const version returns charT().

3) 21.3.6 paragraph 4, const charT* data() const:
Requires: The program shall not alter any of the values stored in the character array.

Well, these are the three relevant excerpts I found, the first one guarantees that the storage is contiguous because it’s allocated in a single block of X consecutive elements. The second proves that when you modify the string using the index operator, it modifies the buffer pointed by data(). (data() in that specific sentence is probably meaning the pointer to the underlying storage and not the data() member function itself since using it would force the implementor to use an ugly const cast and would violate the third and last point).

My conclusion, the standard does not say explicitly “std::basic_string must have contiguous storage” like std::vector (ISO/IEC 14882:2003 only) but there are enough constraints¬†available to state that it can’t be otherwise. If I missed something or you have a different opinion, I’d love to hear it.