Hi! I'm another author on this paper. To answer your questions: 1. Monday puzzle...

sp332 · on May 20, 2022

I don't suppose you gave more weight to more recent puzzles? Is there a time period or puzzle setter that was harder to solve because they favored an unusual clue type?

nickatomlin · on May 20, 2022

We didn't give more weight to recent puzzles. In fact, we trained on pre-2020 data, validated on data from 2020, and evaluated on post-2020 data.

Our model seems to perform well despite this "time generalization" split, but there are a couple instances where it struggled with new words. For example, we got the answer "FAUCI" wrong in a puzzle from May 2021. Even though Fauci was in the news before 2020, I guess he wasn't famous enough to show up in crosswords, and therefore his name wasn't in our training data.

I think evaluating performance by constructor would be really interesting! But we haven't done that.