I think a problem here is Goodhart's law: "When a measure becomes a target, it ceases to be a good measure." [1] And it seems like there's an element of the streetlight effect [2], too; sometimes a bad metric really is worse than no metric.
Also, I really question your notion that people outside a field should be able to evaluate the quality of someone's work, especially in academia, where the whole point is to be well ahead of what most people can understand. That theory seems like part of managerialism [3], which I'll grant is the dominant paradigm in the western corporate world.
I understand why a managerialist class would like to set themselves up as the well-paid judges of everybody else. But I'm not seeing why anybody would willingly submit themselves to that. It's a commonplace here on HN that we avoid letting managers make technical decisions, however fancy their MBA, because they're fundamentally not competent to do it. That seems much more important for people doing cutting-edge research.
> the whole point is to be well ahead of what most people can understand
That’s not the case at all. Being at the leading edge of research should mean that you are creating new knowledge. That doesn’t imply that people cannot understand it. This expectation that laypeople cannot possibly understand science is one of the reasons so many papers are written so densely and obtusely. “They” can’t understand it anyway, right?
Feynman said if he couldn’t explain it to freshmen he didn’t understand it himself.
I think a lot of cutting-edge work is also done at the edge of understanding, and that's fine. It can be hard enough to explain groundbreaking work to experts with deep context; it's reasonable to me that it takes more time and work to find the explanations that make sense to the average person.
I do agree that researchers should be able to give decent "here's what I do" explanations to the general public. But that's very different than a member of the general public understanding the context well enough that they can judge the value of the work to the field.
Okay, I'll try to clarify what exactly I mean by "people outside a field should be able to evaluate the quality of someone's work" - especially because, as I said regarding peer review, we generally consider that it's impossible to do so directly.
It's about the question of resource allocation. Pretty much every subfield of academia is a net consumer of resources, i.e. someone outside of that subfield is funneling resources to it. That someone - no matter if it's a university, or some foundation, or a gov't agency, or a philantrophist - needs to make a decision on how to allocate resources. And, in general, they honestly want to make a good, informed decision on which projects and researchers to support; but nonetheless they have to make a decision according to some criteria. So there's no choice of "no metric", there will always be a metric and we can only argue that it should be better. And the answer to "why anybody would willingly submit themselves to that" is that duh, you don't get a choice - you can suggest a better method to fulfil their goals of allocating resources in a way that is (also in their opinion) fair and objective; but you can't get around the fact that scientists are generally funded by nonscientists. And they need(or want) to make decisions.
They could delegate that, but that doesn't solve the question about the criteria - if they delegate that to universities, they still have to decide on how to allocate between departments; if they delegate that to scientist councils uniting all the departments in the country working on some subfield, they have to decide on how to allocate between the different organizations. So no matter what, you have to compare not only quality of similar scientists, but also of dissimilar scientists working in different (sub)fields. And delegation doesn't absolve you from responsibility, so if the money is (or looks!) wasted, then that's a failure - so when you delegate, you want to require them to use objective criteria. Which is hard - I could tell you which researchers in my subfield are doing excellent work and which are useless; but if I had to justify these decisions, to demonstrate why they're not just my bias because of politics/liking certain methods/gender/ethnicity/etc then it would actually be tricky; and I think that I'd actually reach out for these metrics. And I'm quite certain that the metrics (for the people that I have in mind) would agree with my subjective opinion; on average, the great research gets cited much more and is in higher-ranking venues; while the lousy stuff gets no citations apart from the author's only grad student.
Also, there's a lack of trust (IMHO not totally unwarranted). You could get a bunch of experts who are qualified to evaluate who gets what amount of resources, you can't rely on them actually doing so - if we take spiders as a totally random example, in general you're qualified to distinguish which spider research is good and which is useless only if you actually work on spider research, most likely in one of these teams - and the expected result is nepotism, allocating resources based on purely (intra-field) political reasons. And who'd decide on how to split resources between spider research and bird research? Do you expect the spider guys and bird guys to reach a consensus? Or would it go to whatever field the dean is in? This is a big problem even currently, and a big part of why the metrics are being gamed - but at least metrics are something that require effort to game and can't be gamed totally; if we'd do away with them, then we'd be left with absolutely arbitrary political allocation, which would be even worse.
So at the end of the day "they" need some way to transform the only reasonable source of truth - actual peer-review - to something that "non-peers" can use to judge what the the aggregate of that peer review says. That need is IMHO not negotiable, I really believe that they do actually need it - they don't want to do resource allocation totally arbitrarily, they want to do it well, they need (because of external pressures) objectivity and accountability, and currently this (journal rankings, bibliometerics, etc) is the best what we have suggested to summarize the results of that peer review.
If I had to write a law draft for a better process of allocating resources, what should be written in it?
Just to be clear, I understood what you were saying before. I just disagree. I think the right approach is to select trusted experts and let them make the decisions about their fields.
Again, since this is a tech community, let me use that for an analogy. It's a classic problem for non-technical founders to evaluate their technical hires. They aren't qualified.
The right solution is not to find some gameable metric of tech-ness, like LoC/day or Github stars. Instead one uses either direct experience-based trust or some sort of indirect trust, like where you have a technical expert you trust and have that person interview your first tech hires.
Yes, having expert humans make the decisions is imperfect. But it's not like a managerialist approach is either. And the advantage of using expert humans, rather than a gameable metric and managerial control, is that we have centuries of experience in how people go wrong and many good approaches for countering it.
This raises a simple issue: how would the non-experts in an area choose the most appropriate set of experts? In this case it would correspond to funding agencies or governments needing to decide on a "fair" way to establish the right experts to ask. It is very difficult to suggest a way to do this that would not correlate strongly with "highly successful under the current system". That group of experts would, of course, have a strong bias towards the current system.
I think we solve this problems not through finding a universal approach, but through heterogeneity.
We fund academic work because we see value in it. But there are many kinds of value, and many different sorts of value. So I think it's appropriate that we have many different universities which have many different departments. Many different funding agencies and many different foundations. Each group has their own heuristics for picking the seed experts.
There are still systemic biases, of course, but that's true of any approach. And distributed power is much more robust to that then centralized power or a single homogeneous system.
It seems like "Selecting trusted experts" alone would defer more to human subjectivity and biases than would be necessary if objective measures were utilized as much as possible.
Existing community/expertise based moderation and reputation systems might not be directly transferable or adequate. But it shows there are new ways to think about more decentralized measures of reputation that are new to this century and haven't been tried. New ways that may be preferable to a small group of kingmakers.
I think the biggest problem is leadership and cooperation of community to try something different. It's not just that there is no person who can mandate these things, it's that multiple constituencies have widely diverging interests, i.e. authors, universities, corporations, journals.
I understand why it seems that nominally objective measures would be better. But I don't think cross-field, non-gameable objective measures of research quality are practically possible
I also don't think it's a problem that different groups have different interests, etc. As I say elsewhere, I think that diversity is the solution.
You could be right, but I don't see how it can be known with any confidence until a few approaches are given extended good faith trials. There are anecdotal examples supporting both scenarios and the problem simply seems too unknowable and important not to test drive whatever the top 2 or 3 approaches end up being.
>I also don't think it's a problem that different groups have different interests, etc.
I don't see how you can disagree that cooperation of community to try something different is not a major hurdle.
How many years has it been since important issues in the academic process were widely known? How much success in adoption has there been to date, regarding any fundamental changes?
I doubt there's a single solution, so I think trying to get people in many, many fields to coordinate will just slow down improvement. If anything, I think the drive to centralize and homogenize, which is part of managerialism, is a big part of some of the prominent problems in academia.
I wonder about the idea where we measure a metric, and cut off the tails of the distribution. This presumes we have a regular means of sampling the metric to lend a time profile of it.
Also, I really question your notion that people outside a field should be able to evaluate the quality of someone's work, especially in academia, where the whole point is to be well ahead of what most people can understand. That theory seems like part of managerialism [3], which I'll grant is the dominant paradigm in the western corporate world.
I understand why a managerialist class would like to set themselves up as the well-paid judges of everybody else. But I'm not seeing why anybody would willingly submit themselves to that. It's a commonplace here on HN that we avoid letting managers make technical decisions, however fancy their MBA, because they're fundamentally not competent to do it. That seems much more important for people doing cutting-edge research.
[1] https://en.wikipedia.org/wiki/Goodhart%27s_law
[2] https://en.wikipedia.org/wiki/Streetlight_effect
[3] https://en.wikipedia.org/wiki/Managerialism