*The hashing trick (a la sklearn's dictvectorizer) can make a huge difference.* ...

achompas · on Sept 1, 2018

Does the fact that sklearn reverses signs for features that collide change your opinion?

lsorber · on Sept 1, 2018

How are you compressing the feature space in that case, by truncating the trie?

ma2rten · on Sept 1, 2018

You could use clustering, dimensionality reduce or feature selection.

However, the way I have seen the hashing trick being used is not to compress the feature space. For most problems it would be a bad idea to just lump your most discriminative features together with some other random ones. Instead people just choose a very large feature space which makes collisions unlikely. For model implementations using sparse matrices it doesn't matter if the feature space is very large. The main advantage of this is that you don't have to keep an expensive hash map of your vocabulary in memory (hence my suggestion to use a trie).