Hacker News new | past | comments | ask | show | jobs | submit login
Why is reading lines from stdin much slower in C++ than Python? (stackoverflow.com)
138 points by Arkid on March 11, 2012 | hide | past | favorite | 26 comments



Whenever this question is asked, it's almost always std::sync_with_stdio.


Yes, and it's an easy fix as the accepted answer shows:

cin.sync_with_stdio(false);


I think it's a larger problem that defaults in standard libraries are rarely understood. For python OR for C++.

I'm asking to the larger Hacker News community here, how do you illustrate the dependencies of a standard library API call without making it more complicated? Syntax is nothing, understanding is everything.


Reminds me of Joel's Law of Leaky Abstractions. At some point, you must deeply understand the entire stack.

http://www.joelonsoftware.com/articles/LeakyAbstractions.htm...


When I/O is slow the first thing to check is to make sure it is buffered. This is true in every language and library. Just turns out this guy ran into something with a default that matches the "read input from terminal" case instead of the "process millions of lines" case.


The almighty cin has it disadvantages when it comes down to performance. That's why I will always prefer scanf/gets, for a rather simple I/O performance boost.


Using gets() also makes writing exploits much quicker - win-win! ;-)

Sorry, I couldn't resist - you are of course right with the general stdio over std::iostream thing, though. I've also found that memory usage and executable size explode when using streams - though that's not C++'s fault per se, more a stdlib/compiler problem.


Real programmers don't use iostream. Seriously.


Real programmers don't waste time making swooping generalizations about other programmers.


To be more precise: Real world C++ programmers don't use iostreams as I know from personal experience (more than 10 years of C++ programming).

Edit: Several (more or less successful) attempts have been made for an alternative library. http://accu.org/index.php/journals/1539 for an overview.


In a lot of cases, so far in my career, using iostream's and sstreams and fstreams in C++ is absolutely fine because most of the programs time isn't even spent there, it is spent in the data crunching algorithms.

I am a real world C++ programmer, I can only claim about 5 years experience though, and I use iostream. So yes, your data is anecdotal, so is mine, and we are both complete opposites.


Raw number of years of experience doesn't mean much. I've seen developers who started using C++ in the early nineties, but haven't bothered to update their knowledge of C++ since then. If I have more years of experience than you, does it make my opinion more authoritative than yours?


Indeed they don't. There are no iostreams in FORTRAN.


It shows us why just using C/C++ is not a performance-wise decision. You need to spend more time developing the product and spend MORE time to improve it's performance.


I think it more shows that just picking C/C++ over Python doesn't mean you automatically get awesome performance. You still need to know what your doing.


like using `gettimeofday()` and not using cat(1), for starters.


What's bad about cat(1)?


Nothing, except it's entirely unnecessary for this task.

See "Useless use of cat awards" from days of yore: http://partmaps.org/era/unix/award.html#uucaletter


I still prefer cat, as one simple mistake of < to > and the file is gone.


OP intends to measure time to read data from file and maybe it process into internal representation.

cat(1) almost certainly internally buffers data (32KB here) thus context switches occurs. Shell creates pipe which is buffered inside kernel.

All of this muffles measurements.

What was one or several read(2) calls + processing is now one or several calls of maybe smaller sizes + whatever scheduling differences + in one of the examples OP used /usr/bin/time of the whole thing also.

This is of course is not visible because data were dumbed down by using time() which has horrible granularity, but when finer grained timer it'd be visible, I'm sure.


you're


That's quite a lot to read into a tiny example like this.


I agree, though I think it could fairly be taken as one small bit of evidence in favor of "C++ has a lot of gotchas". In this case it looks like the culprit is C++'s C-compatibility-driven decision to sync with stdio by default, and therefore to avoid buffering input. Of course, if they made the opposite decision on defaults, "C++ doesn't sync with stdio by default" would be a different, probably also common, variety of "gotcha".


Why isn't cin implemented on top of C's stdin and FILE? That way you get both buffering and compatibility.


I've not implemented the C++ std library, but my guess is it's because iostreams need to implement their own buffering anyway, so it would just add complexity and unpredictability to buffer atop an already-buffering library.


I'm not sure this quite makes sense. The buffering can already be disabled, clearly, since that's what's being discussed. The non-buffering implementation could be easily placed atop FILE (I don't know the details, but I can't imagine a FILE-based iostream implementation being at all complex) at which point you have a buffered implementation that also cooperates with pure C stdio. iostream would need buffering for other operations, but could just leave it off permanently for stdio, and the switch already exists.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: