Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Almost every top stack overflow answer is wrong. The correct one is usually at rank 3. The system promotes answers which the public believes to be correct (easy to read, resembles material they are familiar with, follows fads, etc).

Pay attention to comments and compare a few answers.



Years ago I tried to answer a comment on StackOverflow, but I didn’t have enough points to comment. So I tried to answer some questions so that I could get enough points to comment. But when looking at the new questions, it seemed to be mostly a pile of “I have a bug in my code please fix it” type stuff. Relatively simple answers to “What is the stack and the heap?” had thousands of points, but also already had tons of answers (though I suppose one of the reason why people keep answering is to harvest points). I was able to answer a question on an obscure issue that no one had answered yet, but received no points.

Then I saw that you could get points for editing answers. OK, I thought, I can get some points by fixing some bugs. I found a highly upvoted post that had code that didn’t work, found that it was because one section had used the wrong variable, and tried to fix it. Well, the variable name was too short to meet the necessary 6 characters to edit the code (something like changing “foo” to “bar”).

I went to see what other people did in these situations, and they suggested just adding unnecessary edits in order to reach the character limit.

At that point, I just left the bug in, and gave up on trying to contribute to Stack Overflow.


I was active on the statistics Stack Exchange for a while in grad school. There were generally plenty of interesting questions to answer, but the obsession some people (the most active people, generally) had with the points system became really unpleasant after a while.

My breaking point was when I saw a question with an incorrect answer. I posted a correct answer, explained why the other answer was incorrect, and downvoted the incorrect answer. The author of the incorrect answer then posted a rant as a comment on my answer about how I shouldn't have downvoted their answer because they were going to fix it, and a couple other people chimed in agreeing that it was inconsiderate or inappropriate of me to have downvoted the other answer.

I decided Stack Exchange was dumb and stopped spending time there, which was probably good for my PhD progress.


The trick to getting a lot of reputation on Stack Overflow and the like is to have posted a long time ago and then just leave it alone.

I was quite active on stack overflow back around 2010, asking a lot of questions, answering questions when I knew the answers, and so on. The idea of getting a gold badge seemed wildly crazy, and someone who had one (or even two!) was clearly a sign that they knew what was what. I used it for a while, never made much of a reputation, but did manage to earn a small handful of silver badges which I was quite proud of.

Then I forgot about it for quite a while.

Fast forward to today. My reputation chart just keeps going up at a steady linear rate. At this point I am in the top 3% of users with 14,228 reputation and 25 gold badges. I haven't been active in a decade. I don't know what most of my badges even are.

---

Most of my reputation comes from my questions. In case you're wondering what a top-3%er's top questions looks like, they are:

Apr 15, 2011 (207) -- CSS: bolding some text without changing its container's size

Aug 19, 2009 (110) -- How long should SQL email fields be? [duplicate]

Jun 29, 2010 (89) -- php: check if an array has duplicates

Jul 3, 2010 (63) -- centering a div between one that's floated right and one that's floated left

Jan 5, 2010 (44) -- CodeIgniter sessions vs PHP sessions

Apr 12, 2011 (40) -- Java: what's the big-O time of declaring an array of size n?

Jan 11, 2011 (28) -- Javascript / CSS: set (firefox) zoom level of iframe?

Jul 15, 2010 (25) -- Javascript: get element's current "onclick" contents

Aug 22, 2009 (21) -- SQL: what exactly do Primary Keys and Indexes do?

Jul 3, 2010 (20) -- Getting the contents of an element WITHOUT its children [duplicate]

For anyone keeping score, that last one one was marked as a duplicate of a question that was asked a year after mine, and which seems similar on the surface to someone who does not have a good understanding of the DOM structure but is actually not the same thing.


Exactly this. I have a very, very high point score well beyond yours for being very active 13 years ago.

I have well over 50 gold badges.

I haven’t used stackoverflow in at least 5 years, probably longer, and I stopped contributing about 10 years ago.


I have a similar experience. About 10 years ago, I had some time on my hands for about 6 months, and answered a bunch of questions, with a small handful of them (3-4) getting a lot of upvotes. I haven't answered a question in years and years, but those same few questions keep getting new upvotes every month, so my progress continues to climb sort of linearly. I'm in the top 7% of contributors this year, while contributing exactly nothing new...


From a cursory glance, would you say these are still issues people run into? Aggregating these initial questions and the amount of activity they generate up until this day should tell us much about the progress and stagnation of certain programming languages/libraries/frameworks/else and their usage barriers.


In most cases, yes, but I don't think it implies stagnation. With the exception of the CSS ones which have been obsoleted by modern flexbox, those questions are mostly basic enough to defy change:

php: check if an array has duplicates

Java: what's the big-O time of declaring an array of size n?

SQL: what exactly do Primary Keys and Indexes do?


I agree, plateauing may be more apt in this case. I wonder to what extent exemplary questions like these remain universal, or have an expiry date that just isn't known at this time.


> I suppose one of the reason why people keep answering is to harvest points

It's interesting to see some of the top (5- or 6-digit SO scores) people's activity charts.

They usually have a 3-5-digit answer history, and a 1-digit question history, with the digit frequently being "0."

In my case, I have asked almost twice as many questions, as I have given answers[0].

For a long time, I had a very low SO score (I've been on the platform for many years), but some years ago, they decided to award questions the same score as answers (which pissed a lot of people off), and my score suddenly jumped up. It's still not a top score, but it's a bit less shabby.

Over the years, I did learn to ask questions well (which means they get ignored, as opposed to insulted -an improvement), but these days, I don't bother going there, anymore.

[0] https://stackoverflow.com/users/879365/chris-marshall


If you get enough points on one of the more niche and less toxic StackExchange sites, it'll also let you comment, vote, etc. network-wide.

I had gotten most of my points by asking and answering things about Blender workflow/API/development specifics, so I got to skip some of the dumb gatekeeping on StackOverflow.

Worldbuilding's fun, too­— Codegolf's not bad either, if you can come up with an interesting way to do it— Arquade looks good, and so does Cooking— Literature, English, Scifi, etc look interesting— If you program software, I suppose CodeReview might be a safe bet.


Yeah ... the extra critical nature of SO is why their lunch is being eaten by LLMs. I once had a buddy who is now super duper senior at Amazon working on the main site to ask his Q on SO and he flat out said no because he'd had hostile interactions before when asking questions. Right or wrong the reputation that they've developed has hurt them a ton.


>it seemed to be mostly a pile of “I have a bug in my code please fix it” type stuff.

it's mostly people asking you to do their comp sci homework.


The edit queue was sitting at over 40k at one point.

Unfortunately people trying to game the system creates enormous work for those who can review.

(Not saying you were doing anything wrong just pointing out why there are automated guards)


You need to focus on niche tags to find worthwhile unanswered questions. Browsing the $foolang tag is just for the OCD FOMO types who spend their day farming rep.


Back in ye olden days, almost every answer involving a database contained a SQL injection vulnerability.


To their credit, a lot of people went back a decade later and fixed those. Although it doesn't stop people from repeating the mistakes.

I just got beaten up in HN for asking how the hell sql injection is still a problem. People get defensive, apparently.


Sounds about right.

Not even a few years ago I worked with people who insisted it was ok to write injection unsafe code if you knew for sure that you owned the injected values. Didn't matter that maybe one day that function would change to accept user-supplied data, that's not their problem! It was a Rails app and they were literally arguing wanting to do:

    .where("id = #{id}")
over:

    .where("id = ?", id)
in those certain situations. So, you know, it takes all kinds, I guess.


This is a case of militancy.

If we're talking about a typed integer there is no chance of that turning into an sql injection attack.

If we're talking about a string, I'd probably insist on parameterizing it even if we completely own it just on the off chance that the future changes.

To draw an analogy, gun safety is important and everyone knows it. But I don't practice gun safety while watching television on my couch because the gun is locked away. I practice gun safety when I'm actually handling the thing that is dangerous.

And yes, I realize it being locked away is technically gun safety, it's an imperfect analogy, please roll with it.


Your analogy is not flawed, but your conclusion is.

It is a perfect analogy because you are practicing gun safety by locking the gun away. If someone that you are not expecting wanders into your home while you are sitting on the couch, such as a child, they will not suddenly have access to the firearm. This is exactly why you don't assume that you will never receive unsafe input in this situation.


and as you're sitting on that couch watching television you're also practicing car safety because you're not actively breaking any traffic laws.

IOW, you're free to make that claim and you're not wrong per se, but you're not right and it doesn't refute the point.


The equivalent analogy is that you didn't leave the car in neutral on the top of a hill.

The number one rule of firearm safety - Treat every firearm as if it were loaded.

And yet children shoot themselves or others all the time because a gun was not safely stored.

But I digress...


to be pedantic, just being "typed" is not enough these days with dynamically-typed server code.


I disagree with you, if it's typed it's safe. The issue is if it's untyped or the type isn't enforced (by the runtime, by the compiler, or by the code itself).

I understand your point, I'm just saying if it's actually typed, it's safe.


> If we're talking about a typed integer there is no chance of that turning into an sql injection attack.

Unless the database table switches to non-integer ids at some point.


Ruby is a dynamic language.


I think I agree with your coworkers. If the data is predefined constants, then you don't need to worry about injection. All functions have preconditions which must be met for them to work. As long as that's specified, that's acceptable.

Imagine the internals of a database. An outer layer verifies some data is safe, and then all other functions assume it's safe.

The example you're sharing is a bit of straw man. It's just as easy to use the parameter, so of course that's the right thing. But interpolating a table name into the string from a constant isn't wrong.


I'm not sure if this is a troll or not and I don't really want to debate this kind of thing on HN, but you've baited me. It is not a straw man. As I said, the source of the input could change in the future and it could be missed. The safe version is no more complicated than the unsafe version, so why wouldn't you just do the safe one? There is zero advantage to the unsafe way and it's straight up reckless to defend it.

I'm one of those people who moved from Ruby to Elixir. Ecto, Elixir's defacto database wrapper, will throw and exception if you try and write interpolated code like this, so luckily I don't have to have these insane arguments anymore (well, I work alone now, so there are several reasons I don't have to have them).

EDIT: My bad, I glazed past the last part of your statement.

Ya, I think this is probably where some of the defensiveness comes from: using a library vs rolling your own. If you're rolling your own, of course you're going to need to interpolate table names and whatnot, but it shouldn't even be possible to interpolate values. My example and argument is based of Rails, though, where you never specify a table name or anything like that. So in the specific case of my coworkers, they were wrong.


Yeah, bad code doesn't stop being bad code just because it is correct. Good code not only is correct, but it is obviously so. There are zero excuses in a case like this to write it in the unsafe way. Just because you know a gun is not loaded, doesn't mean you should play with it.


Yeah if a codebase is full of stuff like this, auditing it is awful. It's like, instead of employing computers to check the details your code, force it to be done manually (in an error prone way)


This is nonsensical. When you use a function, how do you know what it will do? You guess from its name?

> auditing it is awful.

If a function specifies a requirement, you look at the callers and see if that requirement is met. If it's easy to verify in code, you can assert. Is there an easier way to audit correctness?


Idk. I have some pieces of production code that need to inject `$tableIdentifier`.`$field` into a query, where both are nominally passed from the client. I don't rely strictly on a list of constants in those cases. I take the user request, check the table name against a constant list, then run a query (every time) to describe the fields in that table and type-check them against what's in the user-submitted variables. Then escape them. Anything mismatched in name or shape at any stage of that is considered malicious.


The only principle I want to defend is that a function is correct relative to its preconditions. If the caller doesn't meet them, that's on them.


That kind of reasoning only works if the language or ecosystem has some kind of compile time error or linter or comprehensive testing that will catch the error if the preconditions ever change. One way of doing is is encoding the preconditions in the type system. Another is through fuzzing

If you keep the preconditions informal and never check them, the code becomes brittle to modifications and refactoring. For a sufficiently large codebase you almost guarantee that at some point you will have a SQL injection bug.

That said, using prepared statements isn't the only way to guard against SQL injections. You can also use a query builder that will escape properly all data (provided this query builder itself is hardened against bugs). Using dynamic sql is the only way to make some kinds of queries, so a query builder is a must in those cases.

What you shouldn't do is to use string concatenation to build query strings in your business logic. It may or may not contain a bug right now, but it is brittle to changes in the codebase.


> That kind of reasoning only works if the language or ecosystem has some kind of compile time error or linter or comprehensive testing that will catch the error if the preconditions ever change.

Most requirements can't be verified at compile time, or even at runtime in a feasible amount of time.

If you expect functions to do things that they don't say they do, I don't know what to tell you. Conventions and specs are the best we have.


I think you were broadly misunderstood. If the defined constants come from or are checked against the ones stored in the database, fair play. If they're floating around in some static consts in a code file, also ok as long as that's extremely well documented and someone knows what's what. If some boss pays to cut corners for it to be written with magical constants like "WHERE life.meaning!=42" and then fires the person who they hired to write that script, they deserve whatever they get.

Just like the meaning of life, it's best not to come to premature conclusions. Could all work out, or it could be a funny joke for aliens in the end.


> I just got beaten up in HN for asking how the hell sql injection is still a problem.

It's possible for developers to think they're actually doing the right thing, but it turns out they're not.

https://www.npmjs.com/package/mysql#escaping-query-values

> This looks similar to prepared statements in MySQL, however it really just uses the same connection.escape() method internally.

And depending on how the MySQL server is configured, connection.escape() can be bypassed.


Yeah, the Nodejs ecosystem is sketchy in this regard. I've never put a Node-mysql site into production. Basically everything I write that runs DB queries is in PHP with PDO. But I got interested in Node for side projects and spotted this escaping flaw in node-mysql. That npm package also has two escaping modes, one which it calls "emulated" and which is probably less trustworthy. It doesn't seem like it was ever ready for primetime. I don't know if node-mysql2 addresses that... I ended up writing a promise wrapper for the original one that also turns everything into prepared statements. You still need to make sure NO_BACKSLASH_ESCAPES is off, although I have no idea why you'd ever turn it on.

So yeah, I'm coming from a PHP mindset where you can generally trust your engine to bind and escape values. My experience with Nodejs in this particular area caused me to write a lot of excess code (mostly to satisfy my own curiosity) and still convinced me not to trust it for the purpose.

In that light, I can understand how someone who jumped into the Nodejs ecosystem would think they were dealing with reliably safe escaping, and didn't realize what they were actually getting if they didn't read the fine print.


Hi! Sorry to report this, but I've pushed a SQL injection vuln to prod when I was still very green.

In my defense, we trusted the input. But that's post-rationalisation, because I simply didn't know what I was doing at the time.

It gets worse. If I'd done it properly, my senior would have beaten me up in code review for "complexity". That was a man who would never use a screwdriver when a hammer was already in his hand.


I once argued with a senior dev (later engineering manager, I guess he is a director of development now somewhere), that storing password hashes in unsalted SHA1 was bad.

His defense? "This system is internal only and never connected to the internet"

Senior titled devs don't necessarily know their shit.


A little off topic, but I love how you mention his career progression before sharing the example of his ignorance, because this seems to be a pretty common theme in tech companies (I've witnessed it more times than I can remember or count). The people I knew in my career who were most full of shit are pretty much all now Directors and VPs, enjoying a life of success, and the ones who were the most actually knowledgable are still grinding away as IC's, worried about layoffs. This industry is really bad about rewarding competence.


> This industry is really bad about rewarding competence.

If you promote the competent people, you leave the incompetent ones to do the actual work.


The trick then is not hiring bozos in the first place.


The team I described in GGGP were all strong in the roles they were originally hired for. The company likes to promote internally, which mostly works out for them. This shit team was an edge case.


This is a good counterpoint that explains why, maybe as roles change or companies grow, people who weren't exceptionally good at one role end up overseeing it. The pithy / laconic observation I was immediately responding to was pretty spot on though, and still seems to pertain (in general).

Breaking it down: That the most diligent / irreplaceable people who know the guts of the machine tend to be chained to their roles with occasional raises seems fairly logical from a C-Suite perspective. The tendency to promote incompetence - particularly overconfident incompetence - is the part that bears more scrutiny. If it were isolated to a few companies, it wouldn't be so relatable. I have a theory that it has to do with certain kinds of communication skills (specifically, bullshitting), being selected for in certain roles. And being able to write good code and explain why it has to be done that way requires the opposite of bullshitting.


Non security expert here. Walk me through the attack scenario here.

The database has access control right? So only a few people in the org can read the data. And you are imagining a case where they:

a) find an inverse image of a password hash and use that login as another person to do something bad.

b) reverse the password from the hash to use in another context.

If a is an issue, why does this individual have sensitive data access in the first place? b is still unlikely. Any inverse image is unlikely to be the password if there is salting.

It sounds like an improvement could be made, but maybe not the highest priority. Can you inform me?


To be fair, I’ve pushed vulnerabilities to prod when considered a senior and with 10+ years of experience. Nobody is immune to their own stupidity and hubris.


People who don't understand things often get cranky when they're told it's easy. Seems fair though, it does seem rude to tell someone missing a leg it's easy to run... But it also seems rude to get upset at someone who's good at something they've studied so perhaps everyone is bad at understanding the person they're talking to, and people should assume more good faith.


That’s why I prefer to use “straightforward” rather than “easy.”

People seem to take that much better.


I also like "simple". Lots and lots of very hard things are not at all complicated.


Hitting a homerun is straightforward, but it’s not easy.


I would argue the concept of hitting a homerun is straightforward, but the preparation, training and execution are not.

You’re arguing semantics.

The two words are synonymous in most casual conversation where you would be in danger of offending by saying something is easy or simple.


I think I was trying to agree with OP. Just giving an example that came to mind.

Conversely, setting up Jira is neither straightforward, easy or simple.


If you ever have an issue with the Requests library in Python, just try again with verify=false.


Easier than getting the app team to fix their TLS.


Or the corporate IT team to remove their TLS-trashing MITM attack (because their Firewall Vendor claims that's still "Best Practice" in 2023 and/or the C-Suite loves employee surveillance).


Just be sure to try running the program with sudo first, before trying shitty solutions like that.


That seems insecure; just chmod -R 777 /


At least node has a variable to disable checks globally.


Good thing we trained all those AIs with these answers.


What if that was the goal all along? Time traveling freedom fighters set up SO so that the well for AI would be poisoned, freeing us from our future overlords!


StackOverflow and those AIs optimise for the same thing - something that looks correct regardless of how actually correct it is.


A couple months ago, someone commented that one of my answers was wrong. Well, sure, in the years since answering, things changed. It was correct when I wrote it. Otherwise it wouldn't have taken so long for someone to point out that it's wrong. The public may have believed it to be the correct answer because it was at that time.


> The system promotes answers which the public believes to be correct

Well.. duh?

Until AI takes over the world, this will be correct for everything. News, comments, everything.


Mmm... no? StackOverflow is powered by voting. Not all forums work like that (it was a questionable choice at the time StackOverflow started).

I've been a moderator on a couple of ForumBB kind of forums and the idea of karma points was often brought up in moderator meetings. Those with more experience in this field would usually try to dissuade the less experienced mods from implementing any karma system.

Moderators used to have ways of promoting specific posts. In the context of ForumBB you had a way to mark a thread as important or to make it sticky. Also, a post by a moderator would stand out (or could be made to stand out), so that other forum users would know if someone speaks from a position of experience / authority or is this yet to be determined.

Social media went increasingly in the direction of automating moderator's work by extracting that information from the users... but this is definitely not the only (and probably not the best) way of approaching this problem. Moderators are just harder to make and are more expensive to keep.


I hold little hope that LLM's will help us to reason through "correctness." If these AI's scourge through the troves of idiocy on the internet believing what it will according to patterns and not applying critical reasoning skills, it too will pick up the band-wagon's opinions and perpetuate them. Ad Populum will continue to be a persistent fallacy if we humans don't learn appropriate reasoning skills.


They've already proven that LLMs are capable of creating an internal model of the world (or, in the case of the study that proved it, a model of the game it was being trained on). If LLMs have a world model, then they are fully capable of generating truth beyond whatever they are trained on. We may not be there yet (and who knows how long it will take), but it is in principle true that LLMs can move beyond their training data.


AI isn’t going to do better in current paradigms, it has exactly the same flaw.


Of course, consensus is a difficult philosophical topic. But not every system is based on public voting.


I sure hope people don’t copy stuff from SO before they understand what the code does.


people are writing entire programs with ChatGPT. these are the same people that previously would copy&paste multiple SO answers cobbled together. now, it's just a copy&paste the entire script from a single response.


ROFLMAO!

Please, tell me that was sarcastic.


I refuse to believe anything else ;-)


Yeah, I never look at just the top comment. If it isn’t wrong, it’s suboptimal.


> easy to read

Sounds like you're counting that as a negative. Obviously it depends on the use case, but more often than not I'll lean towards the easier to read code than the most optimal one.


Easy to read is good, but it doesn’t trump correct.


Sure, but it's also generally a lot easier to tell if a simple code is correct (the loop over powers of 10) than the more complex ones (using log and pow); especially when it comes to edge conditions.


> The correct one is usually at rank 3

This has generally been my experience.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: