I think it's a larger problem that defaults in standard libraries are rarely understood. For python OR for C++.
I'm asking to the larger Hacker News community here, how do you illustrate the dependencies of a standard library API call without making it more complicated? Syntax is nothing, understanding is everything.
When I/O is slow the first thing to check is to make sure it is buffered. This is true in every language and library. Just turns out this guy ran into something with a default that matches the "read input from terminal" case instead of the "process millions of lines" case.
The almighty cin has it disadvantages when it comes down to performance. That's why I will always prefer scanf/gets, for a rather simple I/O performance boost.
Using gets() also makes writing exploits much quicker - win-win! ;-)
Sorry, I couldn't resist - you are of course right with the general stdio over std::iostream thing, though. I've also found that memory usage and executable size explode when using streams - though that's not C++'s fault per se, more a stdlib/compiler problem.
In a lot of cases, so far in my career, using iostream's and sstreams and fstreams in C++ is absolutely fine because most of the programs time isn't even spent there, it is spent in the data crunching algorithms.
I am a real world C++ programmer, I can only claim about 5 years experience though, and I use iostream. So yes, your data is anecdotal, so is mine, and we are both complete opposites.
Raw number of years of experience doesn't mean much. I've seen developers who started using C++ in the early nineties, but haven't bothered to update their knowledge of C++ since then. If I have more years of experience than you, does it make my opinion more authoritative than yours?
It shows us why just using C/C++ is not a performance-wise decision. You need to spend more time developing the product and spend MORE time to improve it's performance.
I think it more shows that just picking C/C++ over Python doesn't mean you automatically get awesome performance. You still need to know what your doing.
OP intends to measure time to read data from file and maybe it process into internal representation.
cat(1) almost certainly internally buffers data (32KB here) thus context switches occurs. Shell creates pipe which is buffered inside kernel.
All of this muffles measurements.
What was one or several read(2) calls + processing is now
one or several calls of maybe smaller sizes + whatever scheduling differences + in one of the examples OP used /usr/bin/time of the whole thing also.
This is of course is not visible because data were dumbed down by using time() which has horrible granularity, but when finer grained timer it'd be visible, I'm sure.
I agree, though I think it could fairly be taken as one small bit of evidence in favor of "C++ has a lot of gotchas". In this case it looks like the culprit is C++'s C-compatibility-driven decision to sync with stdio by default, and therefore to avoid buffering input. Of course, if they made the opposite decision on defaults, "C++ doesn't sync with stdio by default" would be a different, probably also common, variety of "gotcha".
I've not implemented the C++ std library, but my guess is it's because iostreams need to implement their own buffering anyway, so it would just add complexity and unpredictability to buffer atop an already-buffering library.
I'm not sure this quite makes sense. The buffering can already be disabled, clearly, since that's what's being discussed. The non-buffering implementation could be easily placed atop FILE (I don't know the details, but I can't imagine a FILE-based iostream implementation being at all complex) at which point you have a buffered implementation that also cooperates with pure C stdio. iostream would need buffering for other operations, but could just leave it off permanently for stdio, and the switch already exists.