Hacker News new | past | comments | ask | show | jobs | submit login

It depends, feedback is rarely as polite as "I noticed you used this other dataset". The feedback would probably look like.

- "Nice paper, however the results are not relevant to current research due to the use of X dataset rather than Y or Z datasets score 2/5 do not accept."

- "Nice paper, however the results are of unknown quality due to the use of X dataset 3/5 recommend poster track".

In fact I'd generally say that most paper reviews would drop the first three words of those feedbacks. It's not an unreasonable assertion that progress is measured on standard datasets - but it's also necessary to push back on this.




Absolutely. Failure to report results on a popular benchmark suggests to some reviewers that you have something to hide - even though they might be computationally expensive or tangential to the main point of the work.


If a non-standard dataset is being used, I would expect there to be a discussion/analysis on what characteristics of that dataset made it unusable for this paper. Especially if a proposed model is being compared against models that were trained on those standard datasets.

If you are establishing new baselines using those same models on your non-standard dataset, then one would expect you to put in a good amount of effort to finetune all the knobs to get a reasonable result. If the authors put are able to put in that much effort, then that kind of feedback is definitely unreasonable.


>> If a non-standard dataset is being used, I would expect there to be a discussion/analysis on what characteristics of that dataset made it unusable for this paper.

Unfortunately that just adds more work for the reviewer, which is a motive for many reviewers to scrap the paper so they don't have to do the extra work.

That sounds mean, so I will quote (yet again) Geoff Hinton on things that "make the brain hurt":

GH: One big challenge the community faces is that if you want to get a paper published in machine learning now it's got to have a table in it, with all these different data sets across the top, and all these different methods along the side, and your method has to look like the best one. If it doesn’t look like that, it’s hard to get published. I don't think that's encouraging people to think about radically new ideas.

Now if you send in a paper that has a radically new idea, there's no chance in hell it will get accepted, because it's going to get some junior reviewer who doesn't understand it. Or it’s going to get a senior reviewer who's trying to review too many papers and doesn't understand it first time round and assumes it must be nonsense. Anything that makes the brain hurt is not going to get accepted. And I think that's really bad.

https://www.wired.com/story/googles-ai-guru-computers-think-...

Basically a new dataset is like a new idea: it makes the brain hurt, for the overburdened experienced researcher or inexperienced younger researcher alike. Testing a new approach on a new dataset? That makes brain go boom.

Which is a funny state of affairs. Not so long ago it used to be that one sure-fire way to make a significant contribution that would give your paper a leg up over the competition was to create a new dataset. I was advised as much at the start of my PhD (four ish years ago). Seems like this has already changed.


You might be right here. My comment was more of my expectation as a reader on what should be present in such a paper.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: