Copy File C++ Fstream Assignment

How to read in a file in C++

So here's a simple question, what is the correct way to read in a file completely in C++?

Various people have various solutions, those who use the C API, C++ API, or some variation of tricks with iterators and algorithms. Wondering which method is the fastest, I thought I might as well put the various options to the test, and the results were surprising.

First, let me propose an API that we'll be using for the function. We'll send a function a C string (char *) of a filename, and we'll get back a C++ string (std::string) of the file contents. If the file cannot be opened, we'll throw an error why that is so. Of course you're welcome to change these functions to receive and return whatever format you prefer, but this is the prototype we'll be operating on:
std::string get_file_contents(const char *filename);
Our first technique to consider is using the C API to read directly into a string. We'll open a file using fopen(), calculate the file size by seeking to the end, and then size a string appropriately. We'll read the contents into the string, and then return it.
#include <string>#include <cstdio>#include <cerrno>std::string get_file_contents(constchar *filename){ std::FILE *fp = std::fopen(filename, "rb");if (fp) { std::string contents; std::fseek(fp, 0, SEEK_END); contents.resize(std::ftell(fp)); std::rewind(fp); std::fread(&contents[0], 1, contents.size(), fp); std::fclose(fp);return(contents); }throw(errno);}
I'm dubbing this technique "method C". This is more or less the technique of any proficient C++ programmer who prefers C style I/O would look like.

The next technique we'll review is basically the same idea, but using C++ streams instead.
#include <fstream>#include <string>#include <cerrno>std::string get_file_contents(constchar *filename){ std::ifstream in(filename, std::ios::in | std::ios::binary);if (in) { std::string contents; in.seekg(0, std::ios::end); contents.resize(in.tellg()); in.seekg(0, std::ios::beg); in.read(&contents[0], contents.size()); in.close();return(contents); }throw(errno);}
I'm dubbing this technique "method C++". Again, more or less a straight forward C++ implementation based on the same principals as before.

The next technique people consider is using istreambuf_iterator. This iterator is designed for really fast iteration out of stream buffers (files) in C++.
#include <fstream>#include <streambuf>#include <string>#include <cerrno>std::string get_file_contents(constchar *filename){ std::ifstream in(filename, std::ios::in | std::ios::binary);if (in) {return(std::string((std::istreambuf_iterator<char>(in)), std::istreambuf_iterator<char>())); }throw(errno);}
This method is liked by many because of how little code is needed to implement it, and you can read a file directly into all sorts of containers, not just strings. The method was also popularized by the Effective STL book. I'm dubbing the technique "method iterator".

Now some have looked at the last technique, and felt it could be optimized further, since if the string has an idea in advance how big it needs to be, it will reallocate less. So the idea is to reserve the size of the string, then pull the data in.
#include <fstream>#include <streambuf>#include <string>#include <cerrno>std::string get_file_contents(constchar *filename){ std::ifstream in(filename, std::ios::in | std::ios::binary);if (in) { std::string contents; in.seekg(0, std::ios::end); contents.reserve(in.tellg()); in.seekg(0, std::ios::beg); contents.assign((std::istreambuf_iterator<char>(in)), std::istreambuf_iterator<char>()); in.close();return(contents); }throw(errno);}
I will call this technique "method assign", since it uses the string's assign function.

Some have questioned the previous function, as assign() in some implementations may very well replace the internal buffer, and therefore not benefit from reserving. Better to call push_back() instead, which will keep the existing buffer if no reallocation is needed.
#include <fstream>#include <streambuf>#include <string>#include <algorithm>#include <iterator>#include <cerrno>std::string get_file_contents(constchar *filename){ std::ifstream in(filename, std::ios::in | std::ios::binary);if (in) { std::string contents; in.seekg(0, std::ios::end); contents.reserve(in.tellg()); in.seekg(0, std::ios::beg); std::copy((std::istreambuf_iterator<char>(in)), std::istreambuf_iterator<char>(), std::back_inserter(contents)); in.close();return(contents); }throw(errno);}
Combining std::copy() and std::back_inserter(), we can achieve our goal. I'm labeling this technique "method copy".

Lastly, some want to try another approach entirely. C++ streams have some very fast copying to another stream via operator<< on their internal buffers. Therefore, we can copy directly into a string stream, and then return the string that string stream uses.
#include <fstream>#include <sstream>#include <string>#include <cerrno>std::string get_file_contents(constchar *filename){ std::ifstream in(filename, std::ios::in | std::ios::binary);if (in) { std::ostringstream contents; contents << in.rdbuf(); in.close();return(contents.str()); }throw(errno);}
We'll call this technique "method rdbuf".


Now which is the fastest method to use if all you actually want to do is read the file into a string and return it? The exact speeds in relation to each other may vary from one implementation to another, but the overall margins between the various techniques should be similar.

I conducted my tests with libstdc++ and GCC 4.6, what you see may vary from this.

I tested with multiple megabyte files, reading in one after another, and repeated the tests a dozen times and averaged the results.


MethodDuration
C24.5
C++24.5
Iterator64.5
Assign68
Copy62.5
Rdbuf32.5


Ordered by speed:


MethodDuration
C/C++24.5
Rdbuf32.5
Copy62.5
Iterator64.5
Assign68


These results are rather interesting. There was no speed difference at all whether using the C or C++ API for reading a file. This should be obvious to us all, but yet many people strangely think that the C API has less overhead. The straight forward vanilla methods were also faster than anything involving iterators.

C++ stream to stream copying is really fast. It probably only took a bit longer than the vanilla method due to some reallocations needed. If you're doing disk file to disk file though, you probably want to consider this option, and go directly from in stream to out stream.

Using the istreambuf_iterator methods while popular and concise are actually rather slow. Sure they're faster than istream_iterators (with skipping turned off), but they can't compete with more direct methods.

A C++ string's internal assign() function, at least in libstdc++, seems to throw away the existing buffer (at the time of this writing), so reserving then assigning is rather useless. On the other hand, reading directly into a string, or a different container for that matter, isn't necessarily your most optimal solution where iterators are concerned. Using the external std::copy() function, along with back inserting after reservation is faster than straight up initialization. You might want to consider this method for inserting into some other containers. In fact, I found that std::copy() of istreambuf_iterators with back inserter into an std::deque to be faster than straight up initialization (81 vs 88.5), despite a Deque not being able to reserve room in advance (nor does such make sense with a Deque).

I also found this to be a cute way to get a file into a container backwards, despite a Deque being rather useless for working with file contents.
std::deque<char> contents; std::copy((std::istreambuf_iterator<char>(in)), std::istreambuf_iterator<char>(), std::front_inserter(contents));
Now go out there and speed up your applications!

If there's any demand, I'll see about performing these tests with other C++ implementations.

Everything @Jamal said:

When reading user input prefer not mix with as one reads and discards new line while the other does not and this can cause confusion as to where on the input stream you are.

Secondly user input (when done manually) is usually line based (ie the input buffer is not flushed until they hit return). So it is best to read a whole line at a time then validate it.

Remember scope:

You must also check for invalid input.

What happens if I type 'fred<enter>'?
Personally I have forgotten if the variable is even changed (its in the standard somewhere). But the stream has its bit set. This means any further attempt to read from the stream will result in nothing happening so you can't even do it in a loop.

If I type 'fred' then this will enter an infinite loop (even if the user enters 10 next time around). Because the bit has been set and nothing will be read from the stream while this bit is set (its set because you are trying to read an int and you got fred (which is not an int)).

Don't declare all your variables at the top of the function.

Declare them as close to the point where you use them as possible (it makes reading the code easier). Also in C++ we sometimes (quite often) use the side affects of the constructor/destructor to do work. So you don't want those firing until you need them to fire.

When testing a stream state best to use rather than any of the other states (as this catches all failures). Also note that when a stream is used in a boolean context (ie in an condition) it converts itself into a value compatible with the context using the method as a test.

The same apples for as it does . Don't use it. Prefer .

Also note that the stream operator returns a reference to the stream. Which (as stated above) when used in a boolean context will convert itself to value that is useful using

Conclusion:

0 thoughts on “Copy File C++ Fstream Assignment”

    -->

Leave a Comment

Your email address will not be published. Required fields are marked *