At which NLP tasks are they state the art? The only one where they are really co...

nl · on Feb 17, 2020

Graph NNs aren't really used for SOTA NLP tasks.

On the other hand, larger and larger transformer networks are constantly improving.

Assuming you are in the Northern Hemisphere so "last summer" means Julyish, then Google's T5[1] and Microsoft Turing-NLG[2] come to mind.

I find keeping an eye on HuggingFace's models list[3] is useful for this.

[1] https://arxiv.org/abs/1910.10683

[2] https://www.microsoft.com/en-us/research/blog/turing-nlg-a-1...

[3] https://github.com/huggingface/transformers#model-architectu...

tastroder · on Feb 16, 2020

> At which NLP tasks are they state the art? The only one where they are really competitive is dependency parsing. (from my own disjonction of cases)

At tasks that actually involve graphs presumably. https://arxiv.org/pdf/1901.00596.pdf https://paperswithcode.com/task/graph-classification has a bunch of GNNs ranked #1

> Also were there any new real SOTA on any NLP tasks since last summer? I feel like accuracy progress has frozen..

That's pretty normal for winter/spring, it's not conference season.

The_rationalist · on Feb 16, 2020

it's not conference season Weird ^^ Imagine that I'm a scientist and that I made a big discovery X during winter. But for audience/visibility I only want to publish my results on conference Y during summer. Somebody, right before summer make the same discovery than mine. How do I protect the fact that I am the first one discoverer if I publish after the second?

p1esk · on Feb 16, 2020

You post it on arxiv

The_rationalist · on Feb 17, 2020

So the answer would be the SOTA results are on arxiv but are posted on paperswithcode.com leaderboards only at conference time? Sounds unlikely.

huac · on Feb 16, 2020

> Also were there any new real SOTA on any NLP tasks since last summer? I feel like accuracy progress has frozen..

Just in the last week there are two papers which claim SOTA on different tasks.

Microsoft released Turing NLG (https://www.microsoft.com/en-us/research/blog/turing-nlg-a-1...) recently which claims SOTA on a couple tasks, it seems like the same transformer architecture, but with more layers and parameters, made feasible by training efficiency improvements. The biggest one seems to be partitioning how the model learns across different processes instead of replicating those states, which significantly improves communication, memory overhead, and training speed.

Deepmind released the Compressive Transformer (https://deepmind.com/blog/article/A_new_model_and_dataset_fo...) which claims SOTA on two other "long-range" benchmarks. My understanding of the improvement here is that instead of discarding older states, in the traditional attention layer, the compressive transformer learns which states to keep, and which states to remove.

I think these are two good examples of paper archetypes -- one where SOTA is achieved through more layers/training data/neurons (and the more interesting contribution is the improvements to model parallelism in training), and one where SOTA is achieved through a new/improved model architecture.

I wonder, for most industrial practitioners, how much either paper is useful though. The Microsoft paper helps for training billion parameter models, but most won't train a model that deep; the Deepmind paper helps for training models over very long sequences, but most people aren't using book-length sequences.*

* I remember reading somewhere that the attention mechanisms tend to only remember around 5 states (would love to see a source or study on this), which is pretty short, so would be interesting to try this model and see if the compressed transformer/attention mechanism can remember longer sequences.

nl · on Feb 17, 2020

Neither of these are Graph NNs though (although it was a weird question because Graph NNs solve a different problem)

huac · on Feb 17, 2020

yes, true. there has been progress on NLP SOTA, just not from graph NN architectures. I have thought about using graph NN's where I am currently using RNN's, but they seem fairly immature / difficult to train over very large datasets so have not made much progress.

I was following the work of https://www.octavian.ai/, but have not seen much recently. http://web.stanford.edu/class/cs224w/info.html also looks interesting. looking briefly over the student projects, there appear to be a couple nlp projects (e.g. http://web.stanford.edu/class/cs224w/project/26418192.pdf) but most leverage knowledge graph concepts.

nl · on Feb 17, 2020

On the contrary, Graph NN's I've found Graph NNs extraordinarily useful and very easy to train!

I'm not sure what kind of problems you are trying to use them for, but generally they are really useful for node classification or recommendation type tasks.

So for example in the knowledge graph context, you can do things like give it a tiger, jaguar and panther and it will find nodes like lions and leopards.

https://arxiv.org/pdf/1901.00596.pdf is a survey paper which has a decent overview of the tasks they are useful for. http://research.baidu.com/Public/uploads/5c1c9a58317b3.pdf is using them for QA, but I'm not familiar enough with the dataset to evaluate it fairly.

huac · on Feb 18, 2020

I spent some time experimenting with https://github.com/facebookresearch/PyTorch-BigGraph and a few other libraries, I ran into some challenges given that my dataset was very large (O(100M) edges) and very sparse; and didn't pursue it further.

The papers you linked are very interesting, I will have to dig further! One more recent writeup: https://eng.uber.com/uber-eats-graph-learning/ -- a real production use case, seems promising to explore more.

nl · on Feb 18, 2020

I like https://github.com/facebookresearch/Starspace#graphspace-lin... for fast and easy graph embedding generation via a GNN.