Deep learning with text: Learning when to skim and when to read

Frenchgeek · on March 15, 2017

So... We are now automating laziness? There goes my last stronghold...

frostirosti · on March 15, 2017

Not quiet! Choosing between using a fast mostly accurate model and a slow very accurate model. Time saved vs accuracy gained.

nl · on March 16, 2017

Not quiet!

No usually "that person", but: quite

Only pointing it out because of the irony: Choosing between using a fast mostly accurate model and a slow very accurate model.

samscully · on March 16, 2017

No usually

You did that on purpose ;)

nl · on March 16, 2017

Wow, I deserve that!

thisisdave · on March 16, 2017

see also: Muphry's Law: https://en.wikipedia.org/wiki/Muphry's_law

r00fus · on March 15, 2017

Isn't automation "laziness incarnate"?

lechiffre10 · on March 15, 2017

Automation doesn't necessarily equate to laziness. You can automate the boring shit so that you can focus on more productive things.

taurath · on March 16, 2017

Boring things can be productive things!

baq · on March 15, 2017

automate the automation so you don't have to.

stephengillie · on March 16, 2017

Automate the process of automating things.

ruleabidinguser · on March 15, 2017

Skimmming is definitely not a bad thing.

rimliu · on March 16, 2017

Not so sure. I do skim sometimes, but it always leaves me with are dirty feeling of cheating. Also one needs to go back and check whether something important was missed and this undoes most of the benefit of skimming. The proper place or skimming would be to skim before reading just to get an overview of what's to come, or to skim while evaluating if the thing is worth reading in the first place.

ruleabidinguser · on March 16, 2017

The only reason I've ever had not to skim was because I was asked not to. But for example at school my purpose is really to learn not to satisfy the desires of my teachers, and actually most of the time reading, taking notes, and so on doesnt help me at all. Its a much smarter and more effective strategy. Trim the fat as the great adam levine would say.

itschekkers · on March 16, 2017

cool paper, i enjoyed following the post (and really appreciated the effort put into the bokeh viz!)

i wonder if this would have been improved by being clearer about the motivation. the authors frame it as though the ~60ms penalty for using the LSTM for prediction is a huge burden, and i can imagine situations where it is. however, it seems like if this is the case, we need some real life/"scaled out" examples of how this solution would work in practice. e.g. how long does the decision logic take to execute (maybe 5ms?); what proportion of the time will you have to run the LSTM after the BoW model anyway? note that those instances you are now worse off than just running the LSTM in the first place (total time = BoW time + decision time + LSTM time). once you have all these you can run the math and know (on average) how much time you'll actually save, and how much performance you sacrifice

alrojo · on March 16, 2017

In the Baseline section, you will find a sentence with the following: "However, there is a cost to using the strategies. We have to run all of the sentences through the bag-of-words model first, to determine if we should use the bag-of-words or the LSTM." followed by some math that takes the added cost into account, which is also what we based our plots/results on.

sixhobbits · on March 16, 2017

A 'batch' is how much of the data you put in memory at once while training the NN. To train even a small language model, you'll go through 1000s of batches, so the time difference is way bigger than it sounds. I agree a more practical example would have been nice -- maybe it'll come out in the paper.

itschekkers · on March 16, 2017

my impression was that this was about the time taken to make each prediction, not to train the model? and yep, looking forward to the paper!

alrojo · on March 16, 2017

It was based on test time prediction, so given you have received a sentence, how fast does it take to compute the prediction with either a bag-of-words or an LSTM.

When you say practical example, would that be in the scenario that you have an API server running? So to consider such costs as latency, data transfer, API overhead etc.?

Thanks for your feedback!

everling · on March 16, 2017

It irks me that BoW is referred to as an algorithm. They do refer to it as a model later, why conflate it?

muyun_ · on March 16, 2017

toread