Hacker News new | past | comments | ask | show | jobs | submit login

It sucks as a metric but it does have some rough correlation in most cases, and I'm not aware of any better easily measurable metric - if you have one in mind, it'd be great to hear. The alternative of having a bureaucrat "simply judge quality" IMHO is even worse, even less objective, and even more prone to being gamed.

The main problem is that there is an objective need (or desire?) by various stakeholders to have some kind of metric that they can use to roughly evaluate the quality or quantity of scientist's work, with the caveat people outside your field need to be able to use it. I.e. let's assume that we have a university or government official that for some valid reason (there are many of them) needs to be able to compare two mathematicians without spending excessive time on it. Let's assume that the official is honest, competent and in fact is a scientist him/herself and so can do the evaluation "in the way that scientists want" - but that official happens to be, say, a biologist or a linguist. What process should be used? How should that person distinguish insigtful, groundbreaking novel and important research from pseudoscience or salami-sliced paper that's not bringing anything new to the field? I can evaluate papers and people in my research subfield, but not far outside of it. Peer review for papers exists because we consider that people outside of the field are not qualified to directly tell whether that paper is good or bad.

The other problem, of course, is how do you compare between fields - what data allows you to see that (for example) your history department is doing top-notch research but your economics department is not respected in their field?

I'm not sure that a good measurement can exist, and despite all their deep flaws it seems that we actually can't do much better than the currently used bibliographic metrics and judgement by proxy of journal ratings.

Saying "metric X is bad" doesn't mean "metric X shouldn't get used" unless a better solution is available.




I think a problem here is Goodhart's law: "When a measure becomes a target, it ceases to be a good measure." [1] And it seems like there's an element of the streetlight effect [2], too; sometimes a bad metric really is worse than no metric.

Also, I really question your notion that people outside a field should be able to evaluate the quality of someone's work, especially in academia, where the whole point is to be well ahead of what most people can understand. That theory seems like part of managerialism [3], which I'll grant is the dominant paradigm in the western corporate world.

I understand why a managerialist class would like to set themselves up as the well-paid judges of everybody else. But I'm not seeing why anybody would willingly submit themselves to that. It's a commonplace here on HN that we avoid letting managers make technical decisions, however fancy their MBA, because they're fundamentally not competent to do it. That seems much more important for people doing cutting-edge research.

[1] https://en.wikipedia.org/wiki/Goodhart%27s_law

[2] https://en.wikipedia.org/wiki/Streetlight_effect

[3] https://en.wikipedia.org/wiki/Managerialism


> the whole point is to be well ahead of what most people can understand

That’s not the case at all. Being at the leading edge of research should mean that you are creating new knowledge. That doesn’t imply that people cannot understand it. This expectation that laypeople cannot possibly understand science is one of the reasons so many papers are written so densely and obtusely. “They” can’t understand it anyway, right?

Feynman said if he couldn’t explain it to freshmen he didn’t understand it himself.


I think a lot of cutting-edge work is also done at the edge of understanding, and that's fine. It can be hard enough to explain groundbreaking work to experts with deep context; it's reasonable to me that it takes more time and work to find the explanations that make sense to the average person.

I do agree that researchers should be able to give decent "here's what I do" explanations to the general public. But that's very different than a member of the general public understanding the context well enough that they can judge the value of the work to the field.


Okay, I'll try to clarify what exactly I mean by "people outside a field should be able to evaluate the quality of someone's work" - especially because, as I said regarding peer review, we generally consider that it's impossible to do so directly.

It's about the question of resource allocation. Pretty much every subfield of academia is a net consumer of resources, i.e. someone outside of that subfield is funneling resources to it. That someone - no matter if it's a university, or some foundation, or a gov't agency, or a philantrophist - needs to make a decision on how to allocate resources. And, in general, they honestly want to make a good, informed decision on which projects and researchers to support; but nonetheless they have to make a decision according to some criteria. So there's no choice of "no metric", there will always be a metric and we can only argue that it should be better. And the answer to "why anybody would willingly submit themselves to that" is that duh, you don't get a choice - you can suggest a better method to fulfil their goals of allocating resources in a way that is (also in their opinion) fair and objective; but you can't get around the fact that scientists are generally funded by nonscientists. And they need(or want) to make decisions.

They could delegate that, but that doesn't solve the question about the criteria - if they delegate that to universities, they still have to decide on how to allocate between departments; if they delegate that to scientist councils uniting all the departments in the country working on some subfield, they have to decide on how to allocate between the different organizations. So no matter what, you have to compare not only quality of similar scientists, but also of dissimilar scientists working in different (sub)fields. And delegation doesn't absolve you from responsibility, so if the money is (or looks!) wasted, then that's a failure - so when you delegate, you want to require them to use objective criteria. Which is hard - I could tell you which researchers in my subfield are doing excellent work and which are useless; but if I had to justify these decisions, to demonstrate why they're not just my bias because of politics/liking certain methods/gender/ethnicity/etc then it would actually be tricky; and I think that I'd actually reach out for these metrics. And I'm quite certain that the metrics (for the people that I have in mind) would agree with my subjective opinion; on average, the great research gets cited much more and is in higher-ranking venues; while the lousy stuff gets no citations apart from the author's only grad student.

Also, there's a lack of trust (IMHO not totally unwarranted). You could get a bunch of experts who are qualified to evaluate who gets what amount of resources, you can't rely on them actually doing so - if we take spiders as a totally random example, in general you're qualified to distinguish which spider research is good and which is useless only if you actually work on spider research, most likely in one of these teams - and the expected result is nepotism, allocating resources based on purely (intra-field) political reasons. And who'd decide on how to split resources between spider research and bird research? Do you expect the spider guys and bird guys to reach a consensus? Or would it go to whatever field the dean is in? This is a big problem even currently, and a big part of why the metrics are being gamed - but at least metrics are something that require effort to game and can't be gamed totally; if we'd do away with them, then we'd be left with absolutely arbitrary political allocation, which would be even worse.

So at the end of the day "they" need some way to transform the only reasonable source of truth - actual peer-review - to something that "non-peers" can use to judge what the the aggregate of that peer review says. That need is IMHO not negotiable, I really believe that they do actually need it - they don't want to do resource allocation totally arbitrarily, they want to do it well, they need (because of external pressures) objectivity and accountability, and currently this (journal rankings, bibliometerics, etc) is the best what we have suggested to summarize the results of that peer review.

If I had to write a law draft for a better process of allocating resources, what should be written in it?


Just to be clear, I understood what you were saying before. I just disagree. I think the right approach is to select trusted experts and let them make the decisions about their fields.

Again, since this is a tech community, let me use that for an analogy. It's a classic problem for non-technical founders to evaluate their technical hires. They aren't qualified.

The right solution is not to find some gameable metric of tech-ness, like LoC/day or Github stars. Instead one uses either direct experience-based trust or some sort of indirect trust, like where you have a technical expert you trust and have that person interview your first tech hires.

Yes, having expert humans make the decisions is imperfect. But it's not like a managerialist approach is either. And the advantage of using expert humans, rather than a gameable metric and managerial control, is that we have centuries of experience in how people go wrong and many good approaches for countering it.


This raises a simple issue: how would the non-experts in an area choose the most appropriate set of experts? In this case it would correspond to funding agencies or governments needing to decide on a "fair" way to establish the right experts to ask. It is very difficult to suggest a way to do this that would not correlate strongly with "highly successful under the current system". That group of experts would, of course, have a strong bias towards the current system.


I think we solve this problems not through finding a universal approach, but through heterogeneity.

We fund academic work because we see value in it. But there are many kinds of value, and many different sorts of value. So I think it's appropriate that we have many different universities which have many different departments. Many different funding agencies and many different foundations. Each group has their own heuristics for picking the seed experts.

There are still systemic biases, of course, but that's true of any approach. And distributed power is much more robust to that then centralized power or a single homogeneous system.


It seems like "Selecting trusted experts" alone would defer more to human subjectivity and biases than would be necessary if objective measures were utilized as much as possible.

Existing community/expertise based moderation and reputation systems might not be directly transferable or adequate. But it shows there are new ways to think about more decentralized measures of reputation that are new to this century and haven't been tried. New ways that may be preferable to a small group of kingmakers.

I think the biggest problem is leadership and cooperation of community to try something different. It's not just that there is no person who can mandate these things, it's that multiple constituencies have widely diverging interests, i.e. authors, universities, corporations, journals.


I understand why it seems that nominally objective measures would be better. But I don't think cross-field, non-gameable objective measures of research quality are practically possible

I also don't think it's a problem that different groups have different interests, etc. As I say elsewhere, I think that diversity is the solution.


You could be right, but I don't see how it can be known with any confidence until a few approaches are given extended good faith trials. There are anecdotal examples supporting both scenarios and the problem simply seems too unknowable and important not to test drive whatever the top 2 or 3 approaches end up being.

>I also don't think it's a problem that different groups have different interests, etc.

I don't see how you can disagree that cooperation of community to try something different is not a major hurdle.

How many years has it been since important issues in the academic process were widely known? How much success in adoption has there been to date, regarding any fundamental changes?

It seems on its face to be crucial.


I doubt there's a single solution, so I think trying to get people in many, many fields to coordinate will just slow down improvement. If anything, I think the drive to centralize and homogenize, which is part of managerialism, is a big part of some of the prominent problems in academia.


Require all academic scientists to be self funded...no universities, no gov't agencies, no foundations, no philanthropists...problem solved.


I wonder about the idea where we measure a metric, and cut off the tails of the distribution. This presumes we have a regular means of sampling the metric to lend a time profile of it.


Why not simply use replication as a measure? Have your studies been replicated? How many other studies have you replicated?

Would both help solve the replication crisis, and resolve this problem.

Of course then you might have 10 000 studies replicating the same easy to do study... which is why the "score" should be reduced based on how many other times that study has been replicated.


This solution seems to assume that all studies are equal, and they're not.

An insightful study that's replicable (but has not yet been) is valuable. A lousy study that's been replicated five times (not because it's interesting, but because it was easy to do, and the replicators knew that they'd be rewarded for replicating anything) is not valuable.

A metric that says "number of studies" is IMHO even more arbitrary, more gameable, and more detached from actual value than citation count - which does have some notion that your study actually matters to other poeple; that it was worth writing that paper because someone read it.


This might work for hard sciences, but not for mathematics.

Or, I dunno, paleontology or sociology or other stuff.


Indeed. My research (in statistics) is primarily methodological: I invent and describe methods that might be useful, and on a good day prove some theoretical results demonstrating that they might be useful. There's nothing to replicate there.

Citations can be a useful metric here, particularly if you can identify citations of people actually using the method (as opposed to people just mentioning it in passing, or other methodological researchers comparing their own methods to it).


Wouldn't replication here just be peer reviews ?


If you judge contributions by just getting papers through peer review then that's even worse than using citations.


Well, I'm talking about a math paper. So to me, it's the same as a code review. Someone has to go over the logic and proofs, and double check no mistakes are made.

The number of people who did and gave their approval would be a good indicator I can trust the paper.

What does a citation do that's better then this?

For experiments, or non math papers, you might need something more robust. I think mostly because reviewing the paper isn't really reviewing the full study, but only what the researcher put in the paper. So it is very hard to review methodology and details to be sure they followed proper protocols, etc. You'd need someone to have been reviewing the study as it is happening, and not just the output paper from it.


A citation indicates that other people actually care about the content of the paper.

Consider Researcher A, who has one paper with a hundred citations, and Researcher B, who has ten papers with two citations each. Probably Researcher A has made a larger contribution.

Whether you're in math or any other field, the fact that a paper is correct or reasonable enough to make it past peer review doesn't mean anybody gives a shit about it.


The issue is how many researchers want to spend their time replicating the research of other people rather than doing their own original work. Getting funding is already incredibly hard, plus no-one is going to give you tenure or promote you for replicating the work of others.


In academic situations yeah, but in industry? You can make money replicating other people’s work.


Extreme self replication!


Yes, bizarrely, reputation / trust is still the primary foundation of academia from a pragmatic perspective, even though it is the antithesis of science. At least some disciplines can have replication studies cross-culturally. It’s a hard problem to solve; knowledge is inherently a social/shared experience.


Trust is a curious one, it's not the antithesis of science, indeed it's required for science to actually work rather than simply be an idea.

You trust in people, in consistency of physical laws, in coherency of your own mind, in constancy of temporal flow, in so many things because - as the Pyrrhonist refrains - nothing is certain, not even this.


Blaming the authors for demonstrating the deep flaws in this metric is certainly wrong however. This article is almost accusing some rigorous scientists, who choose to publish often and diligently reference their prior work to encourage fact checking and peer review, as frauds.

I disagree with the assertion that bad metrics should be used if there are no alternatives. Bad metrics give wrong answers, and only the illusion of meaningful information. The most common use of bad metrics is to lie to people, and it isn't the scientists using the metrics but the organizations that employ them.


>I'm not aware of any better easily measurable metric

Why should an easily measurable metric which has meaningful value exist? It doesn't seem obvious to me that it should at all. Determining the capability of a researcher is inherently a very complex intellectual task. The desire is to reduce that task to something which removes the need for the person doing the evaluation to read and understand the produced research, or to even understand the field of study in many cases. Perhaps, instead, those who are put in charge of things like awarding grant funding, granting tenure at universities, and deciding who to hire to teach ought to be expected and required to evaluate the research on its merits. This would greatly increase the intellectual sophistication and capability needed for people in those positions, but the alternative will always be fairly easily exploitable because it is easier to goose a metric than to do solid research.

We see the shortcomings of trying to reduce complex intellectual challenges to checklists or metrics all the time. And we simply ignore the alternative of relying upon intellectually capable people meeting the challenge. Personally, I don't understand why.


I think that we could add some random (i.e. pure luck) factor for evaluation. Although this may sound unfair, almost all optimization problems do this, be it natural or man-made. Evolution does this by adding random gene mutation, and machine learning does this by randomizing certain parameters to avoid being stuck at local minima. In theory, the right mixture of rigid metrics and randomization can make a better result.


>The alternative of having a bureaucrat "simply judge quality"

The bureaucrat in this case would be another university professor working in the same field.


How about page rank instead of only using number of citations




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: