The most copied StackOverflow snippet of all time is flawed (2019)

TacticalCoder · on Sept 27, 2023

I find it interesting that all the answers using hardcoded values / if statements (or while) are all doing up to five comparisons.

It goes B, KiB, MiB, GiB, TiB, EiB and no more than that (in all the answers) so that can be solved with three if statements at most, no five.

I mean: if it's greater or equal to GiB, you know it won't be B, KiB or MiB. Dichotomy search for the win!

Not a single of the hardcoded solutions do it that way.

Now let's go up to ZiB and YiB: still only three if statements at most, vs up to seven for the hardcoded solutions.

I mention it because I'd personally definitely not go for the whole log/pow/floating-points if I had to write a solution myself (because I precisely know all too well the SNAFU potential).

I'd hardcode if statements... But while doing a dichotomy search. I must be an oddball.

P.S: no horse in this race, no hill to die on, and all the usual disclaimers

IshKebab · on Sept 27, 2023

I would expect your binary search solution is possibly slower than just doing 6 checks because the latter is only going to take 1 branch. Branching is very slow. You want to keep code going in a straight line as much as possible.

johnnyanmac · on Sept 28, 2023

Yup, know your hardware and know problem. Dichotomic search is wonderful when your data can't fit in RAM and it starts being more efficient to cut down on number of nodes traversed.

for a problem space limited by your input size (signed 64 bit number) to a 6 entry dictionary? At best you may want to optimize some in-lining or compiler hints if your language supports it. maybe setup some batching operations if this is called hundreds of times a frame so you're not creating/desrtoying the stack frame everytime (even then, the compiler can probably optimize that).

But otherwise, just throw that few dozen byte lookup table into the registers and let the hardware chew through it. Big N notations aren't needed for data at this scale.

zeroonetwothree · on Sept 27, 2023

It depends on the input distribution. If it’s very common to have smaller values then the linear search could be superior.

throwaway9870 · on Sept 27, 2023

Your comment and mine are basically the same. This is what I call terrible engineering judgement. A random co-worker could review the simple solution without much effort. They could also see the corner cases clearly and verify the tests cover them. With this code, not so much. It seems like a lot of work to write slower, more complex, harder to test and harder to review code.

roryokane · on Sept 27, 2023

(2019)

Past discussions:

https://news.ycombinator.com/item?id=21693431

https://news.ycombinator.com/item?id=21698619

https://news.ycombinator.com/item?id=27533684

dang · on Sept 27, 2023

Thanks! Macroexpanded:

The most copied StackOverflow snippet of all time is flawed (2019) - https://news.ycombinator.com/item?id=27533684 - June 2021 (334 comments)

The most copied StackOverflow snippet of all time is flawed - https://news.ycombinator.com/item?id=21698619 - Dec 2019 (88 comments)

The most copied StackOverflow snippet of all time is flawed - https://news.ycombinator.com/item?id=21693431 - Dec 2019 (3 comments)

throwaway9870 · on Sept 27, 2023

I don't understand. There are 7 suffixes, can't you pick the right one with binary search? That would be 3 comparisons. Or just do it the dumb way and have 6 comparisons. How are two log() calls, one pow() call and ceil() better than just doing it the dumb way? The bug being described is a perfect example of trying to be too clever.

emerongi · on Sept 27, 2023

The author apparently went back to using a loop after recognizing that it's not readable: https://programming.guide/java/formatting-byte-size-to-human...

Notably, it's still slightly better than the first code example in the original article, as it takes the rounding bug into account.

zeroonetwothree · on Sept 27, 2023

The author says at the beginning that it’s not actually better than the loop.

Also 6 comparisons is only if you’d have the max value which seems unlikely in actual usage. Linear could be better if most of the time values are in B or KB ranges

ComputerGuru · on Sept 27, 2023

Shameless plug: another option to format sizes in a human readable format quickly and correctly (other than copying from S/O), you can use one of our open source PrettySize libraries, available for rust [0] and .NET [1]. They also make performing type-safe logical operations on file sizes safe and easy!

The snippet from S/O may be four lines but these are much more extensive, come with tests, output formatting options, conversion between sizes, and more.

[0]: https://github.com/neosmart/prettysize-rs

[1]: https://github.com/neosmart/PrettySize.net

drunkendog · on Sept 27, 2023

Replacing 4 line solutions with extensive libraries is what caused left-pad.

Analemma_ · on Sept 27, 2023

I understand where you're coming from here, but the whole point of this article is at the 4-line solution is wrong (and the author specifically mentioned that every other answer on the stack overflow post was wrong in the same way as well). "Seemingly-simple problem where every naïve solution contains a subtle bug" is exactly the right use case for a well-designed library method.

riversflow · on Sept 27, 2023

> “It’s wrong”

But in a completely benign way. I question why a few edge cases of writing 1000kb instead of 1Mb—so not even a misrepresentation—would ever be worth the code bloat. This is about making stuff slightly more convenient to read.

mikepurvis · on Sept 27, 2023

I agree with you— that was a lot of drumming for what turned out to be kind of a nothingburger as far as the "bug".

At the same time, putting this kind of thing in a library (or even a language's stdlib) is worthwhile for exactly this kind of reason— it allows devs to confidently reach for code that other smart people have really agonized over and which definitely covers the corner cases, similar to other common utilities such as sort methods.

paholg · on Sept 27, 2023

One example: I display available memory in my status bar, which expects strings to be a constant width. If it displayed 1000kb, it would cause alignment issues and annoy the heck out of me

Cthulhu_ · on Sept 27, 2023

Yeah, copying an incorrect answer from SO thousands of times is much better!

(The subject at hand isn't whether libraries are good or not, it's whether copying something off the internet is. In the post, it turns out it isn't. If it was a library, the author could have fixed and updated the library, and the issue would be fixed for everyone that uses it. left-pad isn't an issue with libraries per se, it's an issue with library management)

LoganDark · on Sept 27, 2023

No. left-pad was placing a 4-line solution in a library. prettysize is well deserving of library status.

eviks · on Sept 27, 2023

What caused left-pad is the the ability to delete published code

delta_p_delta_x · on Sept 28, 2023

You should see the implementation of `std::midpoint`[1].

Accounting for correctness even in edge-cases is what large libraries do better than throwaway bits of code.

[1]: https://github.com/microsoft/STL/blob/6735beb0c2260e325c3a4c...

oooyay · on Sept 27, 2023

Out of curiosity, is there a sizable number of developers that just copy and paste untrusted code from StackOverflow into their applications?

The conjecture that people just copy from StackOverflow is obviously popular but I always thought this was just conjecture and humor until I saw someone do it. Don't get me wrong, I use StackOverflow to give me a head start on solving a problem in an area I'm not as familiar with yet, but I've never just straight copied code from there. I don't do that because rarely does the snippet do exactly and only exactly what I need. It requires me to look at the APIs and form my own solution from the explained approach. StackOverflow has pointed me in the direction of some niche APIs that are useful to me, especially in Python.

JimDabell · on Sept 27, 2023

I once worked with a developer who wouldn’t let anything come between him seeing an answer and copying it into his code. He wasn’t even reading the question to make sure it was the same problem he was having, let alone the answer. He would literally go Google => follow the first link to Stack Overflow he saw => copy and paste the first code block he saw. Sometimes it wasn’t even the right language. People had to physically take the input away from him if they were pairing with him because there was nothing anybody could say to stop him, and if you tried to tell him it wasn’t right then he’d just be pasting the second code snippet on the page before you could get another word out. He was freakishly quick at it.

Now he was an extreme case, but yes, there are a lot of developers out there with the mindset of “I need code; Stack Overflow has code; problem solved!” that don’t put any thought at all into whether it’s an appropriate solution.

vidarh · on Sept 27, 2023

A hiring round nearly two decades ago we realised something was off with the answers to the usual pre-phone interview screening questions. They were simple, and we asked people to only spend like 20 minutes on them. We knew people would "cheat", but they were only there to lighten our load a little bit, so it was ok if they let through some bad candidates.

But for whatever reason, in one hiring round the vast majority had cut and pasted answers from search results verbatim (we dealt with a new recruiter, and I frankly suspected this new recruiter was telling them this was ok despite the instructions we'd given).

These were not subtle. But the very worst one was one who did like the developer you described: He'd found a forum post about a problem pretty close to the question, had cut and pasted the code from the first answer he found.

He'd not even bothered to read a few comments further down in the replies where the answer in question was totally savaged by other commenters explaining why it was entirely wrong.

This was someone who was employed as a senior developer somewhere else, and it was clear in retrospect looking at his CV that he probably kept "fleeing the scene of the crime" on a regular basis before it was discovered he was a total fraud. We regularly got those people, but none that delivered such obviously messed up answers.

For ever developer like this, you're probably right there will be a lot more that are less extreme about it, and more able to make things work well enough that they're not discovered.

jihadjihad · on Sept 27, 2023

It is hard for some people to grasp the sheer amount of fraud in this industry. A while back I worked with two guys, one with a Master's and the other with a PhD. One day they came to me asking for help, because the program they'd written (in Python) wouldn't run. It was supposed to analyze some text, and spit out whatever the result of the analysis was.

The problem? They were passing the input text as hardcoded plaintext, i.e. it wasn't even a string with quotes or anything -- just `foo(here is my raw, non-string input, no quotes necessary lol)`, and they could not conceive of what the issue might be.

rightbyte · on Sept 27, 2023

That has to be bug blindness? I.e. they have decided that there is no bug at that line, and can't see it afterwards. How could they even write the program in the first place, if they were not aware of string literals?

flexagoon · on Sept 27, 2023

Did they write code in notepad? How did that not get detected by the LSP?

bigbacaloa · on Sept 27, 2023

This is like grading calculus exams. Student gives the memorized answer which most resembles (in his mind) the question asked.

londons_explore · on Sept 27, 2023

If you're paying a developer by the hour, and want your app released in the app store using as few hours as possible, then this approach can be the most cost efficient one.

Sure, it isn't good practice. Sure, it probably isn't what NASA should be doing. But if you're literally building yet another uber-like app, you probably shouldn't be spending too long thinking about details.

JimDabell · on Sept 27, 2023

> this approach can be the most cost efficient one.

No it can’t. Quick and dirty? Sure. Take on some tech debt to get to market quicker. Blindly copying and pasting? You’re never going to build functional software that way. This guy was committing code with syntax errors that he’d obviously never even run. How are you going to get to market quickly that way?

nonameiguess · on Sept 27, 2023

The comment you're responding to said the guy was copying the wrong language at times. Code that won't even compile isn't making it into the app store.

HeyLaughingBoy · on Sept 27, 2023

Yeah, those details like whether or not it works really don't matter. NASA is overrated.

somsak2 · on Sept 27, 2023

rarely are things so black and white. If you're just pushing out an MVP, something that takes 5 seconds and is 95% correct is often better than 30 minutes and 100% correct.

calfuris · on Sept 27, 2023

I'm willing to entertain the idea that copy/paste from SO may the right option in some cases, but you have to apply at least a little scrutiny. I'm not sure exactly where the bar should be for an MVP, but "[s]ometimes it wasn’t even the right language" is definitely below it.

grayhatter · on Sept 27, 2023

maybe if you don't give a fuck about your users or the future maintainers, but for the time span of just 30m to make sure there's no bugs, and it's easy to maintain? MVP or not you're still a bad engineer if you actually do this.

Correct and broken are black and white if you can divide the problem correctly, and there's no excuse for shipping broken code. At some point someone has to take responsibility for not shipping garbage. I get that you, me, or any engineer don't always have that luxury, but it should be a shameful thing not something you accept as normal or ok.

somsak2 · on Sept 27, 2023

Maybe spending 30 minutes on one bug is worth it, maybe not. If you're pre-revenue / pre-product-market-fit and you compound tens to hundreds of these 5s to 30m decisions, you're risking running out of time or money before anyone even uses your product.

I would argue it's much worse "engineering" to have no product at all.

grayhatter · on Sept 30, 2023

> pre-product-market-fit

Is that a euphemism for wandering around aimlessly? committing random code to see what works? That's also not good engineering....

Not saying it won't end in the outcome you want people gamble all the time, I'm just saying it's bad engineering.

somsak2 · on Sept 30, 2023

I mean, the default of any new company is pre-product-market-fit, no? How else could you start something new? During such an early stage much of your code may be written as a very rough MVP that you're only really using to validate a concept. Sometimes, you're going to just have to trash all of it because the idea was totally wrong and people don't actually care about the problem you're solving.

Those, among others, are the types of cases where spending extra time getting something exactly right (even if just a few hours) is just not worth it.

HeyLaughingBoy · on Oct 2, 2023

No, the vast majority of companies are entering a mature market with a product that is based on what the market wants, but with their own value proposition.

It seems that only in Silicon Valley startups and the like do people start companies with only the vaguest idea of what they are actually going to build and no idea of whether or not they're solving an actual problem that anyone cares about.

siva7 · on Sept 27, 2023

That's not software development. That's wild guessing.

iFreilicht · on Sept 27, 2023

I've seen "wild guessing" quite a bit when people don't actually understand the problem they're solving. Mostly students, but it happens in professional contexts as well.

I'm not sure why, maybe people are missing knowledge that would allow them to understand, so they just try random things in the hope that it works? It surprises me every time it happens.

hutzlibu · on Sept 27, 2023

To some that is the same. Try and modify until it sort of works.

siva7 · on Sept 27, 2023

I don't think you would be able to solve complex problems or development tasks with such an approach as described above (if that's what you're referring to). That's something i could expect from a bloody junior but not anymore from a seasoned professional.

hutzlibu · on Sept 28, 2023

In combination with ChatGPT4 and alike and lots of iterations, you probably would get pretty far today. But I agree, this is not software engeneering and I would not use such code for anything important. But if someone makes a (sandboxed) game with it, it still might be fun.

hollander · on Sept 27, 2023

Just out of curiosity… what was his salary and how long did it take to fire him? Did they fire the HR manager as well?

JimDabell · on Sept 27, 2023

No idea. I left before he did.

somsak2 · on Sept 27, 2023

this is basically how GitHub copilot works

WorldMaker · on Sept 27, 2023

Worse too, because some of the copy/pasters at least remember to copy past the StackOverflow URL, too. GitHub Copilot doesn't even give you that.

jancsika · on Sept 27, 2023

> People had to physically take the input away from him if they were pairing with him because there was nothing anybody could say to stop him, and if you tried to tell him it wasn’t right then he’d just be pasting the second code snippet on the page before you could get another word out. He was freakishly quick at it.

Sounds like this guy understands concurrency. :)

cdchn · on Sept 27, 2023

Just wait til that guy discovers ChatGPT.

TerrifiedMouse · on Sept 27, 2023

I won’t be surprised if that guy is ChatGPT’s main audience.

Personally I can’t see how it would be faster to ask ChatGPT for an answer then carefully scrutinize the output to make sure I understand what it’s doing. Code is often easier to write than read - especially when it’s not your code.

In hindsight the solution is obvious, just run the code without reading it then try to fix it if it doesn’t produce acceptable results.

smithza · on Sept 27, 2023

ChatGPT could help this dev if they understood the problems they are trying to solve. That is such a fundamental flaw in this. They will be on a PIP and out of a job in any respectable workplace. That would be a mercy.

sp332 · on Sept 27, 2023

Yes, and it happens more for things that feel out of scope for the part of the program that I'm interested in. After all, we import library code from random strangers into our programs all the time for the parts we consider "plumbing" and beneath notice. If I wanted to dig in and understand something, I would be more likely to write my own. But if I want this part over here to "just work" so I can get on with the project, it's compiler-error-driven development.

jtolmar · on Sept 27, 2023

Same, and even more so if it's something that feels like it should be in the library code in the first place.

My most copy-pasted code is projecting a point onto a line segment. I end up needing it all the time, it's never in whatever standard library for vector math I'm using, and it's faster to find on SO than to find and translate the code out of whatever my last project that needed it is. Way faster than re-deriving it.

Your vector math library is probably already code imported from random strangers, likely even imported by random strangers, so adding one more function from a random stranger feels entirely appropriate.

zelda-mazzy · on Sept 27, 2023

I hardly ever just copy and paste for the exact reason the author talks about. Instead, I try to make sense of the solution, and if I have to, I'll hand-copy it down line-by-line to make sure I properly understand and refactor from there. I also rename variables, since often times there are so many foos and bars and bazes that it's completely unreadable by a human.

Also if I come across the problem a second time, I'll have better luck remembering what I did (as opposed to blindly copying).

ahoka · on Sept 27, 2023

Yes, people do that. After looking at a huge number of incorrect TLS related code and configuration at SO, I’m now pretty sure that most systems run without validating certificates properly.

tialaramex · on Sept 27, 2023

This was more true when libraries and tooling defaulted to not checking.

Somewhere in my history is a recent HN (or maybe Reddit) post where somebody insists Curl has been 100% compatible from day one, and like, no, originally curl ignores certificates, today you need to specify that explicitly if it's what you want.

I think (but don't take my word for it) that Requests (the Python library) was the same. Initially it didn't check, then years back the authors were told that if you don't check you get what you didn't pay for (ie nothing) and they changed the defaults.

Python itself is trickier because it was really hard to convince Python people that DNS names, the names we actually care about in certificates, aren't Unicode. I mean, they can be (IDNs), but not in a way that's useful to a machine. If your job is "Present this DNS name to a user" then sure, here's a bunch of tricky and maybe flawed code to best efforts turn the bytes into human Unicode text, but your cert checking code isn't a human, it wants bytes and we deliberately designed the DNS records and the certificate bytes to be identical, so you're just doing a byte-for-byte comparison.

The Python people really wanted to convert everything messily to Unicode, which is - at best if you do it perfectly - slower with the same results and at worst a security hole for no reason.

OpenSSL is at least partly to blame for terrible TLS APIs. OpenSSL is what I call a "stamp collector" library. It wants to collect all the obscure corner cases, because some of its authors are interested. Did the Belgian government standardise a 54-bit cipher called "Bingle Bongle" in 1997? Cool, let's add that to our library. Does anybody use it? No. Should anybody use it? No. But it exists so we added it. A huge waste of everybody's time.

The other reason people don't validate is that it was easier to turn it off and get their work done, which is a big problem that should be addressed systemically rather than by individually telling people "No".

So I'd guess that today out of a thousand pieces of software that ought to do TLS, maybe 750 of them don't validate certificates correctly, and maybe 400 of those deliberately don't do it correctly because the author knew it would fail and had other priorities.

m3047 · on Sept 27, 2023

Apache used to not reject SNI hostname headers ending in a dot, in contravention of RFC 6066. Firefox notoriously didn't strip the trailing dot before sending the header. Some versions of curl (or the underlying libraries?) did, some didn't. I filed a bug at bz.apache.org about it.

formerly_proven · on Sept 27, 2023

requests pulls in certifi (Firefox's trust store, repackaged) via urllib3, so it probably uses those root certs by default, not the system store.

PhilipRoman · on Sept 27, 2023

To be fair that might be partly the fault of TLS libraries. There should be a single sane function that does the least surprising thing and then lower level APIs for everything else. Currently you need a checklist of things that must be checked before trusting a connection.

jihadjihad · on Sept 27, 2023

Oh boy, where to begin. You obviously haven't had the pleasure of working in a codebase written by Adderall-fueled 23-year-olds.

smithcoin · on Sept 27, 2023

What about Adderall-fueled 35 year olds?

cdchn · on Sept 27, 2023

What about Red Bull-fueled 43 year olds?

DoctorDabadedoo · on Sept 27, 2023

What about retirement driven 30 year olds?

hu3 · on Sept 27, 2023

What about folks from https://www.reddit.com/r/overemployed?

shusaku · on Sept 27, 2023

I think the section “ A Study on Attribution” and associated paper might be as good of an answer as you’ll get to that

foobarian · on Sept 27, 2023

Well. You (collective you) start by copying and pasting a code snippet first, and then modifying it as needed. Does that count? If no modifications are needed, then it stays.

SwiftyBug · on Sept 27, 2023

That's what I do. I almost always rename things to match the coding style of the codebase I'm working on, though.

stringtoint · on Sept 27, 2023

Plenty of developers paste arbitrary bash commands posted on sites like GitHub without thinking because they look "legit", I suppose. I see it similarly as you do: StackOverflow (and Copilot) can be helpful to start but it's.

Had an exchange like this some time ago:

Me: Hey, I'm reviewing your PR. Looks pretty fine to me. Except for this function which looks like it was copy-pasted from SO: I literally found the same function in an answer on SO (it was written in pure JS while we were using TS in our project).

Dev: Yes, everyone copies from SO.

Me: Well, in that case I hope you always copy the right thing. Because this code might run but it is not good enough (e.g. the variable names are inexpressive, it creates DOM elements without removing them after they are not needed anymore).

nikanj · on Sept 27, 2023

There really is, but people do give it a cursory read. See also: https://en.wikipedia.org/wiki/Underhanded_C_Contest

hobs · on Sept 27, 2023

Yes. I was told from a reliable source that at one point they tried to log all the copy and paste events and it brought their systems to their knees.

londons_explore · on Sept 27, 2023

I wouldn't do it in most professional settings due to licensing...

But for personal projects where I just want to get something running, then yes, I would copy paste and barely even read the code.

I don't really care about bugs like this either - I'm happy to make something that works 99% of the time, and only fix that last 1% if it turns out to be an issue.

ryandrake · on Sept 27, 2023

> I wouldn't do it in most professional settings due to licensing...

Underrated comment. I think most tech companies' General Counsel would have a heart attack if they were aware of StackOverflow copy-pasting by their developers. I highly doubt some rando-engineer who pastes bubblesort code into their company's code base gave even a passing though to what license the SO code was under, what license his own company's code was under, and whether they were compatible.

The big (FAANG) tech companies I've worked at all have written policies about copying and pasting code from external sources (TLDR: Don't), but I've seen even medium-sized (~1000+) companies with zero guidance for their developers.

hattmall · on Sept 27, 2023

In the server side JavaScript world absolutely, it seems like it's standard practice, people are injecting entire dependencies without even remotely looking at the code. Bringing in an entire library for a single function that could be accomplished in a couple lines and usually is posted below the fold.

naikrovek · on Sept 27, 2023

...you would not believe...

not long ago I worked on a team who actively chose libraries and frameworks based on the likelihood they felt their questions would be answered on StackOverflow.

ehutch79 · on Sept 27, 2023

Yes.

This is why PHP got such a bad reputation. A lot of new developers where copy and pasting quick example code from stack overflow, or code from other new developers who only kind of knew what they were doing.

digging · on Sept 27, 2023

> This is why PHP got such a bad reputation.

I don't think that's the only reason, lol.

pc86 · on Sept 27, 2023

What? SO launched in 2008 and PHP had a bad reputation prior to that.

PH95VuimJjqBqy · on Sept 27, 2023

The point stands, it just wasn't SO they were getting the bad information from prior to 2008.

ehutch79 · on Sept 27, 2023

You're right, prior to that it was random forums,

vaylian · on Sept 27, 2023

and the comment section in the php.net documentation.

bornfreddy · on Sept 27, 2023

Less and less every day. Now they are using ChatGPT.

yellowsir · on Sept 28, 2023

when i had to used python i felt like copy pasting anything was out of scope due to indentation errors.

TrackerFF · on Sept 27, 2023

Millions.

berkle4455 · on Sept 27, 2023

Wait til you find out about chatGPT

marginalia_nu · on Sept 27, 2023

I don't understand why you'd use floating point logarithms if you want log 2?

Unless I'm missing something, this gives you an accurate value of floor(log2(value)) for anything positive less than 2^63 bytes, and it's much faster too:

  Long.bitCount( (Long.highestOneBit(value) << 1) - 1) - 1

zeroonetwothree · on Sept 27, 2023

The “common” units are powers of 10 so this doesn’t work

morpheuskafka · on Sept 29, 2023

The original SO question did actually state they wanted powers of two (kilobyte as 1024 bytes). Although, they should have used KiB, GiB, instead to be pedantic.

lifthrasiir · on Sept 28, 2023

But you can avoid binary search because there are at most one power of tens between 2^k and 2^(k+1). So you can turn it into a lookup table problem.

jprete · on Sept 27, 2023

I took one look at the snippet, saw a floating-point log operation and divisions applied to integers, and mentally discarded the entire snippet as too clever by half and inherently bug-prone.

zeroonetwothree · on Sept 27, 2023

That’s basically the point of the article

dleeftink · on Sept 27, 2023

Knowledge cascades all the way down; it goes to show how difficult it is to 'holster' even the smallest piece of knowledge once its drawn.

I wonder with the rate Stack Exchange is losing active contributors, what it would take for 'fastest gun' answers to be corrected that are later found to be off mark, and what it would mean for our collective knowledge once these 'slightly off' answers are further cemented in our annals of search and increasingly, LLM history.

dirtyv · on Sept 27, 2023

This reminds me of when I was in basic training. The drill sgts would give us new recruits a task that none of us knew how to do, purposefully without guidance, and then leave. One guy would try and start doing it, always the incorrect way, and everyone else would just copy that person.

nomilk · on Sept 27, 2023

I wonder if this is exacerbated by human tendencies to not want to look bad relative to others, even if it leads to silly outcomes like intelligent people following a bad or rushed idea.

Something similar happens in public economic forecasts because those who get it wrong when others get it right are treated much more harshly than those who get it wrong when others get it wrong too.

zeroonetwothree · on Sept 27, 2023

What was the goal of this?

didntcheck · on Sept 27, 2023

"Don't jump off a cliff just because everyone else is doing it" basically

I guess the next logical exercise would be asking them to do something with instructions that are complete, but incorrect or at least inefficient, to teach the lesson of questioning superior orders rather than just peers. Actually, I'm honestly not sure it that's desired in military discipline or not (no direct experience here)

PH95VuimJjqBqy · on Sept 27, 2023

I drove a forklift one summer for a manufacturing plant.

I had a supervisor tell me to do something that was clearly not right and I refused. I came in the next day and they tried to write me up and I refused to sign the paperwork for it.

The one thing no one could accurately describe is why the supervisor was right.

I agree with the idea of being willing to go against authority but disagree that it's always a good career move :)

Of course it was easier for me, it was just a summer job, I was going back to Uni in the fall.

Kwpolska · on Sept 27, 2023

The usual goal of anything in military training, being cruel to new recruits?

koromak · on Sept 27, 2023

In a way, I don't even consider floating point errors to be "flaws" with an algorithm like this. If the code defines a logical, mathematically correct solution, then its "right". Solving floating point errors is a step above this, and only done in certain circumstances where it actually matters.

You can imagine some perfect future programming language where floating point errors don't exist, and don't have to be accounted for. Thats the language I'm targeting with 99% of my algorithms.

bloak · on Sept 27, 2023

This reminds me of a weirdness with some sat navs: the distance to your exit/destination is displayed as: 12 ... 11 ... 10 ... 10.0 ... 9.9 ... 9.8 ... with the value 10.0 shown only while the distance is between 9.95 and 10. It's not really a bug but it's strange seeing the display update from 10 to 10.0 as you pass the imaginary ten-mile milestone so perhaps it's a distraction worth avoiding.

bombcar · on Sept 27, 2023

Mercedes for awhile had a fuel gauge that showed 1/4 1/2 3/4 1/1

They had another one that went R 2/4 4/4

I'm still undecided which was more weird. You can see them both on eBay.

elzbardico · on Sept 27, 2023

There's nothing weird here. Those are very common fractions used across several domains, including cooking.

But one thing that I would really love to see are actual liters or gallons (depending on the country where I am at the moment).

envsubst · on Sept 27, 2023

Almost every top stack overflow answer is wrong. The correct one is usually at rank 3. The system promotes answers which the public believes to be correct (easy to read, resembles material they are familiar with, follows fads, etc).

Pay attention to comments and compare a few answers.

hnreader1230 · on Sept 27, 2023

Years ago I tried to answer a comment on StackOverflow, but I didn’t have enough points to comment. So I tried to answer some questions so that I could get enough points to comment. But when looking at the new questions, it seemed to be mostly a pile of “I have a bug in my code please fix it” type stuff. Relatively simple answers to “What is the stack and the heap?” had thousands of points, but also already had tons of answers (though I suppose one of the reason why people keep answering is to harvest points). I was able to answer a question on an obscure issue that no one had answered yet, but received no points.

Then I saw that you could get points for editing answers. OK, I thought, I can get some points by fixing some bugs. I found a highly upvoted post that had code that didn’t work, found that it was because one section had used the wrong variable, and tried to fix it. Well, the variable name was too short to meet the necessary 6 characters to edit the code (something like changing “foo” to “bar”).

I went to see what other people did in these situations, and they suggested just adding unnecessary edits in order to reach the character limit.

At that point, I just left the bug in, and gave up on trying to contribute to Stack Overflow.

CrazyStat · on Sept 27, 2023

I was active on the statistics Stack Exchange for a while in grad school. There were generally plenty of interesting questions to answer, but the obsession some people (the most active people, generally) had with the points system became really unpleasant after a while.

My breaking point was when I saw a question with an incorrect answer. I posted a correct answer, explained why the other answer was incorrect, and downvoted the incorrect answer. The author of the incorrect answer then posted a rant as a comment on my answer about how I shouldn't have downvoted their answer because they were going to fix it, and a couple other people chimed in agreeing that it was inconsiderate or inappropriate of me to have downvoted the other answer.

I decided Stack Exchange was dumb and stopped spending time there, which was probably good for my PhD progress.

chias · on Sept 27, 2023

The trick to getting a lot of reputation on Stack Overflow and the like is to have posted a long time ago and then just leave it alone.

I was quite active on stack overflow back around 2010, asking a lot of questions, answering questions when I knew the answers, and so on. The idea of getting a gold badge seemed wildly crazy, and someone who had one (or even two!) was clearly a sign that they knew what was what. I used it for a while, never made much of a reputation, but did manage to earn a small handful of silver badges which I was quite proud of.

Then I forgot about it for quite a while.

Fast forward to today. My reputation chart just keeps going up at a steady linear rate. At this point I am in the top 3% of users with 14,228 reputation and 25 gold badges. I haven't been active in a decade. I don't know what most of my badges even are.

---

Most of my reputation comes from my questions. In case you're wondering what a top-3%er's top questions looks like, they are:

Apr 15, 2011 (207) -- CSS: bolding some text without changing its container's size

Aug 19, 2009 (110) -- How long should SQL email fields be? [duplicate]

Jun 29, 2010 (89) -- php: check if an array has duplicates

Jul 3, 2010 (63) -- centering a div between one that's floated right and one that's floated left

Jan 5, 2010 (44) -- CodeIgniter sessions vs PHP sessions

Apr 12, 2011 (40) -- Java: what's the big-O time of declaring an array of size n?

Jan 11, 2011 (28) -- Javascript / CSS: set (firefox) zoom level of iframe?

Jul 15, 2010 (25) -- Javascript: get element's current "onclick" contents

Aug 22, 2009 (21) -- SQL: what exactly do Primary Keys and Indexes do?

Jul 3, 2010 (20) -- Getting the contents of an element WITHOUT its children [duplicate]

For anyone keeping score, that last one one was marked as a duplicate of a question that was asked a year after mine, and which seems similar on the surface to someone who does not have a good understanding of the DOM structure but is actually not the same thing.

geraldwhen · on Sept 27, 2023

Exactly this. I have a very, very high point score well beyond yours for being very active 13 years ago.

I have well over 50 gold badges.

I haven’t used stackoverflow in at least 5 years, probably longer, and I stopped contributing about 10 years ago.

allannienhuis · on Sept 27, 2023

I have a similar experience. About 10 years ago, I had some time on my hands for about 6 months, and answered a bunch of questions, with a small handful of them (3-4) getting a lot of upvotes. I haven't answered a question in years and years, but those same few questions keep getting new upvotes every month, so my progress continues to climb sort of linearly. I'm in the top 7% of contributors this year, while contributing exactly nothing new...

dleeftink · on Sept 27, 2023

From a cursory glance, would you say these are still issues people run into? Aggregating these initial questions and the amount of activity they generate up until this day should tell us much about the progress and stagnation of certain programming languages/libraries/frameworks/else and their usage barriers.

chias · on Sept 27, 2023

In most cases, yes, but I don't think it implies stagnation. With the exception of the CSS ones which have been obsoleted by modern flexbox, those questions are mostly basic enough to defy change:

php: check if an array has duplicates

Java: what's the big-O time of declaring an array of size n?

SQL: what exactly do Primary Keys and Indexes do?

dleeftink · on Sept 27, 2023

I agree, plateauing may be more apt in this case. I wonder to what extent exemplary questions like these remain universal, or have an expiry date that just isn't known at this time.

ChrisMarshallNY · on Sept 27, 2023

> I suppose one of the reason why people keep answering is to harvest points

It's interesting to see some of the top (5- or 6-digit SO scores) people's activity charts.

They usually have a 3-5-digit answer history, and a 1-digit question history, with the digit frequently being "0."

In my case, I have asked almost twice as many questions, as I have given answers[0].

For a long time, I had a very low SO score (I've been on the platform for many years), but some years ago, they decided to award questions the same score as answers (which pissed a lot of people off), and my score suddenly jumped up. It's still not a top score, but it's a bit less shabby.

Over the years, I did learn to ask questions well (which means they get ignored, as opposed to insulted -an improvement), but these days, I don't bother going there, anymore.

[0] https://stackoverflow.com/users/879365/chris-marshall

Intralexical · on Sept 27, 2023

If you get enough points on one of the more niche and less toxic StackExchange sites, it'll also let you comment, vote, etc. network-wide.

I had gotten most of my points by asking and answering things about Blender workflow/API/development specifics, so I got to skip some of the dumb gatekeeping on StackOverflow.

Worldbuilding's fun, too— Codegolf's not bad either, if you can come up with an interesting way to do it— Arquade looks good, and so does Cooking— Literature, English, Scifi, etc look interesting— If you program software, I suppose CodeReview might be a safe bet.

gigatexal · on Sept 27, 2023

Yeah ... the extra critical nature of SO is why their lunch is being eaten by LLMs. I once had a buddy who is now super duper senior at Amazon working on the main site to ask his Q on SO and he flat out said no because he'd had hostile interactions before when asking questions. Right or wrong the reputation that they've developed has hurt them a ton.

lawlessone · on Sept 27, 2023

>it seemed to be mostly a pile of “I have a bug in my code please fix it” type stuff.

it's mostly people asking you to do their comp sci homework.

Guvante · on Sept 27, 2023

The edit queue was sitting at over 40k at one point.

Unfortunately people trying to game the system creates enormous work for those who can review.

(Not saying you were doing anything wrong just pointing out why there are automated guards)

kevin_thibedeau · on Sept 27, 2023

You need to focus on niche tags to find worthwhile unanswered questions. Browsing the $foolang tag is just for the OCD FOMO types who spend their day farming rep.

emerongi · on Sept 27, 2023

Back in ye olden days, almost every answer involving a database contained a SQL injection vulnerability.

noduerme · on Sept 27, 2023

To their credit, a lot of people went back a decade later and fixed those. Although it doesn't stop people from repeating the mistakes.

I just got beaten up in HN for asking how the hell sql injection is still a problem. People get defensive, apparently.

sodapopcan · on Sept 27, 2023

Sounds about right.

Not even a few years ago I worked with people who insisted it was ok to write injection unsafe code if you knew for sure that you owned the injected values. Didn't matter that maybe one day that function would change to accept user-supplied data, that's not their problem! It was a Rails app and they were literally arguing wanting to do:

    .where("id = #{id}")

over:

    .where("id = ?", id)

in those certain situations. So, you know, it takes all kinds, I guess.

PH95VuimJjqBqy · on Sept 27, 2023

This is a case of militancy.

If we're talking about a typed integer there is no chance of that turning into an sql injection attack.

If we're talking about a string, I'd probably insist on parameterizing it even if we completely own it just on the off chance that the future changes.

To draw an analogy, gun safety is important and everyone knows it. But I don't practice gun safety while watching television on my couch because the gun is locked away. I practice gun safety when I'm actually handling the thing that is dangerous.

And yes, I realize it being locked away is technically gun safety, it's an imperfect analogy, please roll with it.

beart · on Sept 27, 2023

Your analogy is not flawed, but your conclusion is.

It is a perfect analogy because you are practicing gun safety by locking the gun away. If someone that you are not expecting wanders into your home while you are sitting on the couch, such as a child, they will not suddenly have access to the firearm. This is exactly why you don't assume that you will never receive unsafe input in this situation.

PH95VuimJjqBqy · on Sept 27, 2023

and as you're sitting on that couch watching television you're also practicing car safety because you're not actively breaking any traffic laws.

IOW, you're free to make that claim and you're not wrong per se, but you're not right and it doesn't refute the point.

beart · on Sept 28, 2023

The equivalent analogy is that you didn't leave the car in neutral on the top of a hill.

The number one rule of firearm safety - Treat every firearm as if it were loaded.

And yet children shoot themselves or others all the time because a gun was not safely stored.

But I digress...

willsmith72 · on Sept 27, 2023

to be pedantic, just being "typed" is not enough these days with dynamically-typed server code.

PH95VuimJjqBqy · on Sept 28, 2023

I disagree with you, if it's typed it's safe. The issue is if it's untyped or the type isn't enforced (by the runtime, by the compiler, or by the code itself).

I understand your point, I'm just saying if it's actually typed, it's safe.

vaylian · on Sept 27, 2023

> If we're talking about a typed integer there is no chance of that turning into an sql injection attack.

Unless the database table switches to non-integer ids at some point.

sodapopcan · on Sept 27, 2023

Ruby is a dynamic language.

prosqlinjector · on Sept 27, 2023

I think I agree with your coworkers. If the data is predefined constants, then you don't need to worry about injection. All functions have preconditions which must be met for them to work. As long as that's specified, that's acceptable.

Imagine the internals of a database. An outer layer verifies some data is safe, and then all other functions assume it's safe.

The example you're sharing is a bit of straw man. It's just as easy to use the parameter, so of course that's the right thing. But interpolating a table name into the string from a constant isn't wrong.

sodapopcan · on Sept 27, 2023

I'm not sure if this is a troll or not and I don't really want to debate this kind of thing on HN, but you've baited me. It is not a straw man. As I said, the source of the input could change in the future and it could be missed. The safe version is no more complicated than the unsafe version, so why wouldn't you just do the safe one? There is zero advantage to the unsafe way and it's straight up reckless to defend it.

I'm one of those people who moved from Ruby to Elixir. Ecto, Elixir's defacto database wrapper, will throw and exception if you try and write interpolated code like this, so luckily I don't have to have these insane arguments anymore (well, I work alone now, so there are several reasons I don't have to have them).

EDIT: My bad, I glazed past the last part of your statement.

Ya, I think this is probably where some of the defensiveness comes from: using a library vs rolling your own. If you're rolling your own, of course you're going to need to interpolate table names and whatnot, but it shouldn't even be possible to interpolate values. My example and argument is based of Rails, though, where you never specify a table name or anything like that. So in the specific case of my coworkers, they were wrong.

greiskul · on Sept 27, 2023

Yeah, bad code doesn't stop being bad code just because it is correct. Good code not only is correct, but it is obviously so. There are zero excuses in a case like this to write it in the unsafe way. Just because you know a gun is not loaded, doesn't mean you should play with it.

nextaccountic · on Sept 27, 2023

Yeah if a codebase is full of stuff like this, auditing it is awful. It's like, instead of employing computers to check the details your code, force it to be done manually (in an error prone way)

prosqlinjector · on Sept 27, 2023

This is nonsensical. When you use a function, how do you know what it will do? You guess from its name?

> auditing it is awful.

If a function specifies a requirement, you look at the callers and see if that requirement is met. If it's easy to verify in code, you can assert. Is there an easier way to audit correctness?

noduerme · on Sept 28, 2023

Idk. I have some pieces of production code that need to inject `$tableIdentifier`.`$field` into a query, where both are nominally passed from the client. I don't rely strictly on a list of constants in those cases. I take the user request, check the table name against a constant list, then run a query (every time) to describe the fields in that table and type-check them against what's in the user-submitted variables. Then escape them. Anything mismatched in name or shape at any stage of that is considered malicious.

prosqlinjector · on Sept 27, 2023

The only principle I want to defend is that a function is correct relative to its preconditions. If the caller doesn't meet them, that's on them.

nextaccountic · on Sept 27, 2023

That kind of reasoning only works if the language or ecosystem has some kind of compile time error or linter or comprehensive testing that will catch the error if the preconditions ever change. One way of doing is is encoding the preconditions in the type system. Another is through fuzzing

If you keep the preconditions informal and never check them, the code becomes brittle to modifications and refactoring. For a sufficiently large codebase you almost guarantee that at some point you will have a SQL injection bug.

That said, using prepared statements isn't the only way to guard against SQL injections. You can also use a query builder that will escape properly all data (provided this query builder itself is hardened against bugs). Using dynamic sql is the only way to make some kinds of queries, so a query builder is a must in those cases.

What you shouldn't do is to use string concatenation to build query strings in your business logic. It may or may not contain a bug right now, but it is brittle to changes in the codebase.

prosqlinjector · on Sept 27, 2023

> That kind of reasoning only works if the language or ecosystem has some kind of compile time error or linter or comprehensive testing that will catch the error if the preconditions ever change.

Most requirements can't be verified at compile time, or even at runtime in a feasible amount of time.

If you expect functions to do things that they don't say they do, I don't know what to tell you. Conventions and specs are the best we have.

noduerme · on Sept 29, 2023

I think you were broadly misunderstood. If the defined constants come from or are checked against the ones stored in the database, fair play. If they're floating around in some static consts in a code file, also ok as long as that's extremely well documented and someone knows what's what. If some boss pays to cut corners for it to be written with magical constants like "WHERE life.meaning!=42" and then fires the person who they hired to write that script, they deserve whatever they get.

Just like the meaning of life, it's best not to come to premature conclusions. Could all work out, or it could be a funny joke for aliens in the end.

Sohcahtoa82 · on Sept 27, 2023

> I just got beaten up in HN for asking how the hell sql injection is still a problem.

It's possible for developers to think they're actually doing the right thing, but it turns out they're not.

https://www.npmjs.com/package/mysql#escaping-query-values

> This looks similar to prepared statements in MySQL, however it really just uses the same connection.escape() method internally.

And depending on how the MySQL server is configured, connection.escape() can be bypassed.

noduerme · on Sept 28, 2023

Yeah, the Nodejs ecosystem is sketchy in this regard. I've never put a Node-mysql site into production. Basically everything I write that runs DB queries is in PHP with PDO. But I got interested in Node for side projects and spotted this escaping flaw in node-mysql. That npm package also has two escaping modes, one which it calls "emulated" and which is probably less trustworthy. It doesn't seem like it was ever ready for primetime. I don't know if node-mysql2 addresses that... I ended up writing a promise wrapper for the original one that also turns everything into prepared statements. You still need to make sure NO_BACKSLASH_ESCAPES is off, although I have no idea why you'd ever turn it on.

So yeah, I'm coming from a PHP mindset where you can generally trust your engine to bind and escape values. My experience with Nodejs in this particular area caused me to write a lot of excess code (mostly to satisfy my own curiosity) and still convinced me not to trust it for the purpose.

In that light, I can understand how someone who jumped into the Nodejs ecosystem would think they were dealing with reliably safe escaping, and didn't realize what they were actually getting if they didn't read the fine print.

psd1 · on Sept 27, 2023

Hi! Sorry to report this, but I've pushed a SQL injection vuln to prod when I was still very green.

In my defense, we trusted the input. But that's post-rationalisation, because I simply didn't know what I was doing at the time.

It gets worse. If I'd done it properly, my senior would have beaten me up in code review for "complexity". That was a man who would never use a screwdriver when a hammer was already in his hand.

RajT88 · on Sept 27, 2023

I once argued with a senior dev (later engineering manager, I guess he is a director of development now somewhere), that storing password hashes in unsalted SHA1 was bad.

His defense? "This system is internal only and never connected to the internet"

Senior titled devs don't necessarily know their shit.

ryandrake · on Sept 27, 2023

A little off topic, but I love how you mention his career progression before sharing the example of his ignorance, because this seems to be a pretty common theme in tech companies (I've witnessed it more times than I can remember or count). The people I knew in my career who were most full of shit are pretty much all now Directors and VPs, enjoying a life of success, and the ones who were the most actually knowledgable are still grinding away as IC's, worried about layoffs. This industry is really bad about rewarding competence.

meepmorp · on Sept 27, 2023

> This industry is really bad about rewarding competence.

If you promote the competent people, you leave the incompetent ones to do the actual work.

noduerme · on Sept 28, 2023

The trick then is not hiring bozos in the first place.

psd1 · on Sept 28, 2023

The team I described in GGGP were all strong in the roles they were originally hired for. The company likes to promote internally, which mostly works out for them. This shit team was an edge case.

noduerme · on Sept 29, 2023

This is a good counterpoint that explains why, maybe as roles change or companies grow, people who weren't exceptionally good at one role end up overseeing it. The pithy / laconic observation I was immediately responding to was pretty spot on though, and still seems to pertain (in general).

Breaking it down: That the most diligent / irreplaceable people who know the guts of the machine tend to be chained to their roles with occasional raises seems fairly logical from a C-Suite perspective. The tendency to promote incompetence - particularly overconfident incompetence - is the part that bears more scrutiny. If it were isolated to a few companies, it wouldn't be so relatable. I have a theory that it has to do with certain kinds of communication skills (specifically, bullshitting), being selected for in certain roles. And being able to write good code and explain why it has to be done that way requires the opposite of bullshitting.

envsubst · on Sept 28, 2023

Non security expert here. Walk me through the attack scenario here.

The database has access control right? So only a few people in the org can read the data. And you are imagining a case where they:

a) find an inverse image of a password hash and use that login as another person to do something bad.

b) reverse the password from the hash to use in another context.

If a is an issue, why does this individual have sensitive data access in the first place? b is still unlikely. Any inverse image is unlikely to be the password if there is salting.

It sounds like an improvement could be made, but maybe not the highest priority. Can you inform me?

Aeolun · on Sept 27, 2023

To be fair, I’ve pushed vulnerabilities to prod when considered a senior and with 10+ years of experience. Nobody is immune to their own stupidity and hubris.

grayhatter · on Sept 27, 2023

People who don't understand things often get cranky when they're told it's easy. Seems fair though, it does seem rude to tell someone missing a leg it's easy to run... But it also seems rude to get upset at someone who's good at something they've studied so perhaps everyone is bad at understanding the person they're talking to, and people should assume more good faith.

caspper69 · on Sept 27, 2023

That’s why I prefer to use “straightforward” rather than “easy.”

People seem to take that much better.

MostlyStable · on Sept 27, 2023

I also like "simple". Lots and lots of very hard things are not at all complicated.

Aeolun · on Sept 27, 2023

Hitting a homerun is straightforward, but it’s not easy.

caspper69 · on Sept 27, 2023

I would argue the concept of hitting a homerun is straightforward, but the preparation, training and execution are not.

You’re arguing semantics.

The two words are synonymous in most casual conversation where you would be in danger of offending by saying something is easy or simple.

Aeolun · on Sept 28, 2023

I think I was trying to agree with OP. Just giving an example that came to mind.

Conversely, setting up Jira is neither straightforward, easy or simple.

buffet_overflow · on Sept 27, 2023

If you ever have an issue with the Requests library in Python, just try again with verify=false.

psd1 · on Sept 27, 2023

Easier than getting the app team to fix their TLS.

WorldMaker · on Sept 27, 2023

Or the corporate IT team to remove their TLS-trashing MITM attack (because their Firewall Vendor claims that's still "Best Practice" in 2023 and/or the C-Suite loves employee surveillance).

capableweb · on Sept 27, 2023

Just be sure to try running the program with sudo first, before trying shitty solutions like that.

alphager · on Sept 27, 2023

That seems insecure; just chmod -R 777 /

Aeolun · on Sept 27, 2023

At least node has a variable to disable checks globally.

ehutch79 · on Sept 27, 2023

Good thing we trained all those AIs with these answers.

spelunker · on Sept 27, 2023

What if that was the goal all along? Time traveling freedom fighters set up SO so that the well for AI would be poisoned, freeing us from our future overlords!

madeofpalk · on Sept 27, 2023

StackOverflow and those AIs optimise for the same thing - something that looks correct regardless of how actually correct it is.

bachmeier · on Sept 27, 2023

A couple months ago, someone commented that one of my answers was wrong. Well, sure, in the years since answering, things changed. It was correct when I wrote it. Otherwise it wouldn't have taken so long for someone to point out that it's wrong. The public may have believed it to be the correct answer because it was at that time.

m_0x · on Sept 27, 2023

> The system promotes answers which the public believes to be correct

Well.. duh?

Until AI takes over the world, this will be correct for everything. News, comments, everything.

crabbone · on Sept 27, 2023

Mmm... no? StackOverflow is powered by voting. Not all forums work like that (it was a questionable choice at the time StackOverflow started).

I've been a moderator on a couple of ForumBB kind of forums and the idea of karma points was often brought up in moderator meetings. Those with more experience in this field would usually try to dissuade the less experienced mods from implementing any karma system.

Moderators used to have ways of promoting specific posts. In the context of ForumBB you had a way to mark a thread as important or to make it sticky. Also, a post by a moderator would stand out (or could be made to stand out), so that other forum users would know if someone speaks from a position of experience / authority or is this yet to be determined.

Social media went increasingly in the direction of automating moderator's work by extracting that information from the users... but this is definitely not the only (and probably not the best) way of approaching this problem. Moderators are just harder to make and are more expensive to keep.

smithza · on Sept 27, 2023

I hold little hope that LLM's will help us to reason through "correctness." If these AI's scourge through the troves of idiocy on the internet believing what it will according to patterns and not applying critical reasoning skills, it too will pick up the band-wagon's opinions and perpetuate them. Ad Populum will continue to be a persistent fallacy if we humans don't learn appropriate reasoning skills.

MostlyStable · on Sept 27, 2023

They've already proven that LLMs are capable of creating an internal model of the world (or, in the case of the study that proved it, a model of the game it was being trained on). If LLMs have a world model, then they are fully capable of generating truth beyond whatever they are trained on. We may not be there yet (and who knows how long it will take), but it is in principle true that LLMs can move beyond their training data.

jprete · on Sept 27, 2023

AI isn’t going to do better in current paradigms, it has exactly the same flaw.

envsubst · on Sept 28, 2023

Of course, consensus is a difficult philosophical topic. But not every system is based on public voting.

cpach · on Sept 27, 2023

I sure hope people don’t copy stuff from SO before they understand what the code does.

dylan604 · on Sept 27, 2023

people are writing entire programs with ChatGPT. these are the same people that previously would copy&paste multiple SO answers cobbled together. now, it's just a copy&paste the entire script from a single response.

HeyLaughingBoy · on Sept 27, 2023

ROFLMAO!

Please, tell me that was sarcastic.

cpach · on Sept 27, 2023

I refuse to believe anything else ;-)

BlackjackCF · on Sept 27, 2023

Yeah, I never look at just the top comment. If it isn’t wrong, it’s suboptimal.

ehsankia · on Sept 27, 2023

> easy to read

Sounds like you're counting that as a negative. Obviously it depends on the use case, but more often than not I'll lean towards the easier to read code than the most optimal one.

cpach · on Sept 27, 2023

Easy to read is good, but it doesn’t trump correct.

ehsankia · on Sept 27, 2023

Sure, but it's also generally a lot easier to tell if a simple code is correct (the loop over powers of 10) than the more complex ones (using log and pow); especially when it comes to edge conditions.

ChrisMarshallNY · on Sept 27, 2023

> The correct one is usually at rank 3

This has generally been my experience.

crabbone · on Sept 27, 2023

Long time ago, when ActionScript was a thing, there was this one snippet in ActionScript documentation that illustrated how to deal with events dispatching, handling etc. In order to illustrate the concept the official documentation provided a code snippet that created a dummy object, attached handlers to it, and in those handlers defined some way of processing... I think it was XML loading and parsing, well, something very common.

The example implied that this object would be an instance of a class interested in handling events, but didn't want to blow up the size of this example with not so relevant bits of code.

There was a time when I very actively participated in various forums related to ActionScript. And, as you can imagine, loading of XML was paramount to success in that field. Invariably, I'd encounter code that copied the documentation example and had this useless dummy object with handlers defined (and subsequently struggled to extract information thus loaded).

It was simply amazing how regardless of the overall skill of the programmer or the purpose of the applet, the same exact useless object would appear in the same situation -- be it XML socket or XML loaded via HTTP, submitted and parsed by user... it was always there.

----

Today, I often encounter code like this in unit tests in various languages. Often programmers will copy some boilerplate code from example in the manual and will create hundreds or even thousands of unit tests all with some unnecessary code duplication / unnecessary objects. Not sure why in this specific area, but it looks like programmers both treat these kinds of test as some sort of magic but also unimportant, worthless code that doesn't need attention.

----

Finally, specifically on the subject of human-readable encoding of byte sizes. Do you guys like parted? Because it's so fun to work with it because of this very issue! You should try it, if you have some spare time and don't feel misanthropic enough for today.

derstander · on Sept 27, 2023

I feel like there ought to be a software analogue to that aphorism about models (if it doesn’t exist already) — maybe something like:

All code is wrong, but some is useful.

adolph · on Sept 27, 2023

Agreed, but is code not a model?

corbezzoli · on Sept 27, 2023

Why do you need a 4-line dependency?

This is the reason.

bauruine · on Sept 27, 2023

There is still the chance that the person that created the 4 line dependency also just copy pasted it from the flawed StackOverflow answer. Or is the same person or is also just a random person creating the package like the random person that created the SO answer. I'm not sure why random_person1 should be more trustworthy to produce non flawed code than random_person2.

OTO: It's at least easily upgrade able so it has an advantage.

corbezzoli · on Sept 27, 2023

> There is still the chance

There's no chance if you avoid random_person1 and use known_oss_provider’s package instead. At the very least, look at the tests.

Any package with tests is guaranteed to be more correct than a never-before-run SO answer.

JimDabell · on Sept 27, 2023

There is still the chance. As the article states, OpenJDK copied from the Stack Overflow answer.

ziml77 · on Sept 27, 2023

Sure, but if OpenJDK is exposing that function then anyone who is using it will get the correct output when OpenJDK fixes the problem. If everyone copies the function into their own code then in many cases it's likely to never be corrected.

envsubst · on Sept 27, 2023

What if you write the code and test in your project?

Rapzid · on Sept 27, 2023

The most impressive suggestion Copilot has given me was a solution to this that used a loop to divide and index further into an array of units..

It never dawned on me to approach it that way and I had never seen that solution(not that I ever looked). Not sure where it got that from but was pretty cool and.... Yeah, it gets simple stuff wrong all the time haha.

seeknotfind · on Sept 27, 2023

I was surprised to find log implementations are loopless. Cool.

https://github.com/lattera/glibc/blob/master/sysdeps/ieee754...

zeroonetwothree · on Sept 27, 2023

It basically has the loop unrolled. But it looks like it’s evaluating a polynomial approximation so I suppose it makes sense

bradley13 · on Sept 27, 2023

When StackOverflow was new, it was an incredible resource. Unfortunately, so much cruft has accumulated that it is now nearly useless. Even if an answer was once correct (and many are not), it is likely years out of date and no longer applicable.

meling · on Sept 27, 2023

While reading I was thinking why aren’t stackoverflow “mandating” that solutions have tests, so that this problem isn’t left to everyone else, ref. to the comment at the end of the article:

Test all edge cases, especially for code copied from Stack Overflow.

nelsonic · on Sept 27, 2023

How does the author determine this is the "most copied snippet" on SO? The Question/Answer has only been Viewed 351k times. There are posts with many millions of views e.g: https://stackoverflow.com/questions/927358/how-do-i-undo-the... which have definitely been copy-pasted more times. Yes, there may be many instances of this Java function on GitHub. But only because the people doing the copying are too lazy to think about how it works never mind alter the function name. If there's a bug, just update the SO answer and fix the problem. No need to write a lengthy self-promoting post about it.

_fizz_buzz_ · on Sept 27, 2023

Third paragraph of the post:

It's according to this paper: https://link.springer.com/article/10.1007/s10664-018-9650-5

vb-8448 · on Sept 27, 2023

> How does the author determine this is the "most copied snippet" on SO?

According to [this paper](https://link.springer.com/article/10.1007/s10664-018-9650-5) it's the most copied *from SO java answers*.

cdrini · on Sept 27, 2023

It's mentioned in the article

> A PhD student by the name Sebastian Baltes publishes a paper in the journal of Empirical Software Engineering. The title is Usage and Attribution of Stack Overflow Code Snippets in GitHub Projects [...] As part of their analysis they extracted code snippets from the Stack Overflow data dump and matched them against code from public GitHub repos.

moribunda · on Sept 27, 2023

It's described in the article...

nelsonic · on Sept 27, 2023

Read the article. The methodology is flawed. It should say most copy-pasted Java function on GitHub.

eviks · on Sept 27, 2023

it does say that: "We present results of a large-scale empiricalstudy analyzing the usage and attribution of non-trivial Java"