Yeah.. I figured as much.. zeromq in python is not slow though :-)
I could probably port the service to c++ or go, it's really just some string parsing and a hash table lookup of sorts.. but when my PoC python version does 160k lookups a second, I don't feel the need to spend the time :-)
"On python" can mean a few different things. It can mean a straight port, running in the python interpreter, or it can mean Cython (or similar) with all of the tight loops running as auto-generated compiled C code.
Numpy is a great example of this; all of the numerical operations are running on very fast compiled code, and being good at writing fast numpy involves knowing the ins and outs of how to minimize passing information between the slow python interpreter and the fast numerical engines. You want to just do all of the computation 'inside' of numpy, and then get the result at the end.
Yeah, I'm not sure how optimized the python protocol buffer stuff is. Years ago I benchmarked the pure python protobuf lib and it was terribly slow.
grpc was nice to work with though. I generated the stubs and stuck my logic in there and had a working client/server in about 20 minutes. The streaming request/reply stuff was crazy easy to use, though I don't know if it does pipelining.