I have only worked with text classification methods where I chose the features myself. As I understand it, a deep network still has (like a 'traditional'/non-deep ANN) a fixed number of inputs in its input layer, i.e. one would have to process each input text somehow before feeding it into the network (to make the input sizes equal). Is there a usual way to do this without doing feature-extraction?
If using raw bag of words or n-grams, why not hash the strings to maybe 2^13-1 slots, with something like MurmurHash3 or with multiple hashes to prevent collisions, then use that sparse vector as input to a deep learning model?
So the parameter would be the number of slots (== number of input units of the deep NN).
And the transformation of the text into bag-of-words / n-grams would not be considered feature-engineering - or at least only 'low level feature engineering' - the higher level features will be learned by the deep network.
I guess one could go lower level still and even do away with bag-of-words / n-grams : limit the text size to e.g. 20000 characters, represent each character value with a numerical value (e.g. its ASCII code point when dealing with mainly English texts) and then simply feed this vector of codepoints to the input layer of the deep network. Given enough input data, it should learn location-invariant representations like bag-of-words / n-grams (or even better ones) itself, right?
I have only worked with text classification methods where I chose the features myself. As I understand it, a deep network still has (like a 'traditional'/non-deep ANN) a fixed number of inputs in its input layer, i.e. one would have to process each input text somehow before feeding it into the network (to make the input sizes equal). Is there a usual way to do this without doing feature-extraction?