I think I can convince you otherwise. If I publish a paper saying I have an algo...

lonesword · on Dec 14, 2021

This won’t work for empirical research. I vividly recall weeks spent trying to reproduce a paper on information retrieval (a deep learning model). What saved me is skimming through the author’s codebase and chancing upon an undocumented sampling step. They were only using the first and last passage in a document as training data and uniformly sampling from 10% of the remaining passages, and the paper didn't mention this. I adopted their sampling strategy, and i was able to obtain their results.

My argument is that there are nuances and subtleties that are often omitted in a paper (accidentally or otherwise), but are nevertheless required to reproduce the research.

eesmith · on Dec 14, 2021

My example of a protein model is an example of empirical research, yes?

My understanding is the X-ray gives you a diffraction pattern which is hard to invert to a structure, while if you have the structure the diffraction pattern is easy to compute. The diffraction pattern therefore gives you a way to verify that one model is a better fit than another model.

It may not be perfect, certainly not. It might not even be correct once more data arrives. But if you predict a novel fold, and that fold matches the diffraction pattern significantly better than the current model, then it doesn't matter how you came up with the new fold, does it?

It could have been a dream. It could have been search software. The result is still publishable.

All of what you have said is true, but my point is for some research being able to verify the correctness of the result is all that matters, not being able to reproduce the research.

Can you reproduce Kekulé's dream?

tsbischof · on Dec 14, 2021

What do you see as the fundamental point of scientific communication? In your counterpoints you narrow in on papers being a means of communicating concepts or proof of work. In this view, showing the process itself is pointless or at least irrelevant to the main axiom.

However, others (myself included) see the the communication of methods as a primary function of the literature, because this is what enables others to understand, critique, and build upon the idea.

eesmith · on Dec 14, 2021

There is no single fundamental point.

If you want to be that broad about it, science journals publish a lot more than just method development, including obituaries and opinion pieces on where funding should be directed.

Here's a famous paper showing that "Euler's conjecture on sums of like powers" is incorrect - https://www.ams.org/journals/bull/1966-72-06/S0002-9904-1966... . I will repeat the body in full:

> A direct search on the CDC 6600 yielded 27⁵ + 84⁵ + 110⁵ + 133⁵ = 144⁵ as the smallest instance in which four fifth powers sum to a fifth power. This is a counterexample to a conjecture by Euler [l] that at least n nth powers are required to sum to an nth power, n>2.

Do I need to know how the direct search was carried out to confirm Euler's conjecture was false?

No.

  >>> 27**5 + 84**5 + 110**5 + 133**5 == 144**5
  True

And now that you know it isn't true, you might adjust which project areas to spend your time on. Which is part of what we get from scientific publications.

Just because you prefer one sort of scientific research doesn't mean other forms aren't science.

Again, is Kekulé's model of the benzene ring less scientific because it came to him in a daydream?

We accept Newton's publications where he secretly used the calculus, even though he didn't publish the calculus, because they could be proved through other more laborious means.

Why is it not scientific to write publications which use secret software, so long as we can verify the results?

tsbischof · on Dec 14, 2021

For the article on Euler's conjecture, I am aware of this paper and that it serves as a sufficient proof of work for publication. It includes context for its purpose by citation, and the methods for verification were well-established. There is a class of literature where this type of structure works.

For the Kekule paper [1] there is a significant amount of information about the context and reasoning for the claim. This is not an isolated concept and he wrote at length as to why the idea might be plausible given the current evidence. He also could have written solely about the dream without context, but that lacks a grounding in the reality he was attempting to describe.

If it is possible to write a paper where the result is possible to verify using already-known methods, then by all means write in that style. But this is a subset of the useful papers to be written, and in my experience a small one.

[1] https://gallica.bnf.fr/ark:/12148/bpt6k281952v/f102.item

eesmith · on Dec 14, 2021

> But this is a subset of the useful papers to be written, and in my experience a small one.

Certainly. I never claimed otherwise.

But bloaf's and lonesword seem to think such papers are of only superficial merit at best, and that detailed steps to reproduce the research are essential.

I disagree with that viewpoint.

kragen · on Dec 14, 2021

Yes, there are occasional exceptions where you don't have to repeat or replicate the experiments reported in a paper to verify them. But that is very much the exception.

Generally you are expected to explain what you did in enough detail that the reader can replicate your experiment. If you're fitting a protein model to X-ray diffraction data, you aren't expected to include all the other protein models you considered that didn't fit, or explain to the reader your procedure for generating protein models, but you are expected to explain how you measured the fit to the X-ray diffraction data (with what algorithms or software, etc.) so that the reader can in theory do the same thing themself.

eesmith · on Dec 14, 2021

Sure, but "I found the structure after 5 months playing around with it in Foldit" isn't that reproducible or informative either.

The result is still the same - a novel fold which is a significantly better fit than existing modules, based on measured vs. predicted x-ray diffraction patterns and whatever other data you might have.

Which is publishable, yes?

When the Wikipedia entry at https://en.wikipedia.org/wiki/Foldit says "Foldit players reengineered the enzyme by adding 13 amino acids, increasing its activity by more than 18 times", how is that much different than "A magical wizard added 13 amino acids, increasing its activity by more than 18 times"?

Or "secret software".

What's publishable is that the result is novel (and hopefully interesting), and can be verified. The publication does not require that all step can be repeated.

kragen · on Dec 14, 2021

I agree!

Unfortunately we have a long way to go to make it easy to repeat the calculation that a novel structure is "a significantly better fit than existing modules, based on measured vs. predicted x-ray diffraction patterns". (If I run STEREOPOLE and it says the diffraction pattern from your new structure is a worse fit, is that because I'm running a different version of IDL? Maybe there's a bug in my FPU? Or the version of BLAS my copy of IDL is linked with? Or you're using a copy of STEREOPOLE that a previous grad student fixed a bug in, while my copy still has the bug? And stochastic software like GAtor is potentially even worse.)

This is something we could and should completely automate. There's been work on this by people like Konrad Hinsen, Yihui Xie, Jeremiah Orians, Eelco Dolstra, Ludovic Courtès, Shriram Krishnamurthi, Ricardo Wurmus, and Sam Tobin-Hochstadt, but there's a long way to go.

thfuran · on Dec 14, 2021

>Yes, there are occasional exceptions where you don't have to repeat or replicate the experiments reported in a paper to verify them. But that is very much the exception.

And even in this exceptional case, the algorithm itself is interesting above and beyond the fact of its existence.

kragen · on Dec 14, 2021

It is, but if the algorithm produces a result such as a protein structure or a sorting network that is itself novel and verifiable, you can very reasonably publish that result separately. As long as it doesn't require knowing the search algorithm to replicate your result that the sorting network sorts correctly, which it wouldn't.

necovek · on Dec 14, 2021

> If I publish a paper saying I have an algorithm which can factor large composites, and in the paper publish the factors to all of the RSA numbers

If some factors of those numbers are also large composites, without access to a good algorithm, nobody can truly verify your claims.

If not and you include all of those factors in an easily digestible way for computers to process (let's call that "code"), it will be easy for anyone to reproduce your results (run that code which multiplies all the factors and gets the resulting RSA numbers).

With code, they could easily check that there's not an error in your verification method too (eg. large number multiplication broken).

This would achieve both goals: you'd withhold your algorithm for security reasons, and your results would be easier to verify.

Edit: but to be honest, I think withholding the research is a bit of a special case. You are doing it on purpose, and you can easily offer a service to prove your algorithm works (eg. imagine a "factoring" web service that instantly gives you a hash of the resulting sequence of factors, and then only mails you the actual sequence in two days).

aliceryhl · on Dec 14, 2021

It's a lot easier to check if something is a prime than to factor it.