While it is true that CNTK can use CuDNN LSTM, if the recurrence does not fall into the 4 recurrences that CuDNN supports, CNTK is still much faster. The simplest way to verify this is to take a Keras script that uses whatever recurrent network you want and run it (on a GPU) with Tensorflow backend and with CNTK backend. Some anecdotal evidence suggest an easy 3x speedup.
There's actually two 1-bit things going on with CNTK 2.0.
One is the 1-bit SGD that has long been in CNTK and has been criticized for its weird license. I am not a lawyer, but my understanding is it says something like you cannot use this unless you call it from inside CNTK. The terms are not that bad and you don't have to use 1-bit SGD if you don't like them.
CNTK 2.0 has another 1-bit thing going on as well which is binary convolution. This uses the Halide compiler to generate code that is 10x faster than optimized 32-bit convolution. This still seems to be at a proof of concept stage.