The paper linked in the blog was actually published back in June. And as the sibling comment says, people have been trying to use attention mechanisms to augment recurrent networks for some time. The novel idea here seems to be sorting the recurrent network, and just keeping the attention mechanism.