Hacker News new | past | comments | ask | show | jobs | submit login

Hi! I'm another author on this paper. To answer your questions:

1. Monday puzzles are the easiest for our model, and Thursdays are the most difficult. You can see a graph of day-by-day performance here: https://twitter.com/albertxu__/status/1527704535912787968

2. Our current system doesn't have any handling for rebuses or similar tricks, although Dr. Fill does. I think this is part of why Thursday is the hardest day for us, even though Saturday is usually considered the most difficult.

3. We trained it with 6.4M clues. As new crosswords get published, we could theoretically retrain our model with more data, but we aren't currently planning to do that.




I don't suppose you gave more weight to more recent puzzles? Is there a time period or puzzle setter that was harder to solve because they favored an unusual clue type?


We didn't give more weight to recent puzzles. In fact, we trained on pre-2020 data, validated on data from 2020, and evaluated on post-2020 data.

Our model seems to perform well despite this "time generalization" split, but there are a couple instances where it struggled with new words. For example, we got the answer "FAUCI" wrong in a puzzle from May 2021. Even though Fauci was in the news before 2020, I guess he wasn't famous enough to show up in crosswords, and therefore his name wasn't in our training data.

I think evaluating performance by constructor would be really interesting! But we haven't done that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: