Why is reading lines from stdin much slower in C++ than Python?

jemfinch · on March 11, 2012

Whenever this question is asked, it's almost always std::sync_with_stdio.

16s · on March 11, 2012

Yes, and it's an easy fix as the accepted answer shows:

cin.sync_with_stdio(false);

codemac · on March 11, 2012

I think it's a larger problem that defaults in standard libraries are rarely understood. For python OR for C++.

I'm asking to the larger Hacker News community here, how do you illustrate the dependencies of a standard library API call without making it more complicated? Syntax is nothing, understanding is everything.

jjguy · on March 11, 2012

Reminds me of Joel's Law of Leaky Abstractions. At some point, you must deeply understand the entire stack.

http://www.joelonsoftware.com/articles/LeakyAbstractions.htm...

spullara · on March 11, 2012

When I/O is slow the first thing to check is to make sure it is buffered. This is true in every language and library. Just turns out this guy ran into something with a default that matches the "read input from terminal" case instead of the "process millions of lines" case.

aeurielesn · on March 11, 2012

The almighty cin has it disadvantages when it comes down to performance. That's why I will always prefer scanf/gets, for a rather simple I/O performance boost.

rwos · on March 11, 2012

Using gets() also makes writing exploits much quicker - win-win! ;-)

Sorry, I couldn't resist - you are of course right with the general stdio over std::iostream thing, though. I've also found that memory usage and executable size explode when using streams - though that's not C++'s fault per se, more a stdlib/compiler problem.

ExpiredLink · on March 11, 2012

Real programmers don't use iostream. Seriously.

joelthelion · on March 11, 2012

Real programmers don't waste time making swooping generalizations about other programmers.

ExpiredLink · on March 11, 2012

To be more precise: Real world C++ programmers don't use iostreams as I know from personal experience (more than 10 years of C++ programming).

Edit: Several (more or less successful) attempts have been made for an alternative library. http://accu.org/index.php/journals/1539 for an overview.

X-Istence · on March 11, 2012

In a lot of cases, so far in my career, using iostream's and sstreams and fstreams in C++ is absolutely fine because most of the programs time isn't even spent there, it is spent in the data crunching algorithms.

I am a real world C++ programmer, I can only claim about 5 years experience though, and I use iostream. So yes, your data is anecdotal, so is mine, and we are both complete opposites.

alexeiz · on March 11, 2012

Raw number of years of experience doesn't mean much. I've seen developers who started using C++ in the early nineties, but haven't bothered to update their knowledge of C++ since then. If I have more years of experience than you, does it make my opinion more authoritative than yours?

alexeiz · on March 11, 2012

Indeed they don't. There are no iostreams in FORTRAN.

serialx · on March 11, 2012

It shows us why just using C/C++ is not a performance-wise decision. You need to spend more time developing the product and spend MORE time to improve it's performance.

mhurron · on March 11, 2012

I think it more shows that just picking C/C++ over Python doesn't mean you automatically get awesome performance. You still need to know what your doing.

adobriyan · on March 11, 2012

like using `gettimeofday()` and not using cat(1), for starters.

jjc4p · on March 11, 2012

What's bad about cat(1)?

shabble · on March 11, 2012

Nothing, except it's entirely unnecessary for this task.

See "Useless use of cat awards" from days of yore: http://partmaps.org/era/unix/award.html#uucaletter

truncate · on March 11, 2012

I still prefer cat, as one simple mistake of < to > and the file is gone.

adobriyan · on March 11, 2012

OP intends to measure time to read data from file and maybe it process into internal representation.

cat(1) almost certainly internally buffers data (32KB here) thus context switches occurs. Shell creates pipe which is buffered inside kernel.

All of this muffles measurements.

What was one or several read(2) calls + processing is now one or several calls of maybe smaller sizes + whatever scheduling differences + in one of the examples OP used /usr/bin/time of the whole thing also.

This is of course is not visible because data were dumbed down by using time() which has horrible granularity, but when finer grained timer it'd be visible, I'm sure.

_riwy · on March 11, 2012

you're

meastham · on March 11, 2012

That's quite a lot to read into a tiny example like this.

_delirium · on March 11, 2012

I agree, though I think it could fairly be taken as one small bit of evidence in favor of "C++ has a lot of gotchas". In this case it looks like the culprit is C++'s C-compatibility-driven decision to sync with stdio by default, and therefore to avoid buffering input. Of course, if they made the opposite decision on defaults, "C++ doesn't sync with stdio by default" would be a different, probably also common, variety of "gotcha".

mikeash · on March 11, 2012

Why isn't cin implemented on top of C's stdin and FILE? That way you get both buffering and compatibility.

jemfinch · on March 11, 2012

I've not implemented the C++ std library, but my guess is it's because iostreams need to implement their own buffering anyway, so it would just add complexity and unpredictability to buffer atop an already-buffering library.

mikeash · on March 12, 2012

I'm not sure this quite makes sense. The buffering can already be disabled, clearly, since that's what's being discussed. The non-buffering implementation could be easily placed atop FILE (I don't know the details, but I can't imagine a FILE-based iostream implementation being at all complex) at which point you have a buffered implementation that also cooperates with pure C stdio. iostream would need buffering for other operations, but could just leave it off permanently for stdio, and the switch already exists.