Thanks for the article. I am wondering if anyone here has any experience of using Storm for working with real-time market (financial) data. If we take FX market data for example, you could be looking at multiple feeds, each updating a few times a second. Is a framework like Storm suitable for these types of applications which are very sensitive to end to end latency?
Also, given a set of nodes a set of input data, is end to end latency consistent?
There's at least one company using it for algorithmic trading, although I don't know the details of how they're using it.
Storm isn't intended for sub-millisecond type processing, but latencies on the order of milliseconds is certainly doable on Storm. Obviously a lot of that depends on how complex your processing is.
Thanks Nathan. The sort of latency that we are working with is in the order of milliseconds. There are many HFT shops out there that have invested in FPGAs or other hardware based solutions to get sub millisecond latencies for processing incoming data. I think they are fairly specialized firms.
Currently we are using a proprietary CEP platform that has issues with scaling as the number of deployed models (i.e. data consumers) goes up.
Storm's topology based approach maybe worth some investigation because it would allow the addition of more compute nodes transparently as our needs expand. I will take a more detailed look.
By the nature of the business, most of these shops are secretive. But there are some technology vendors out there building products based on FPGAs. Stone Ridge Technology is one of them (I saw an article in magazine, have no experience myself)! I have heard that FPGAs are mainly use to parse and normalize incoming market data. For example a data feed coming from NYSE in the ITCH format.
Other hedge funds are also using hardware based solutions. But more likely GPU based solutions for number crunching.
If you are interested I would recommend checking out the forums at wilmott.com or at the Nuclear Phynance [sic] board. Wilmott magazine also has had a few articles on models written in CUDA for pricing options etc.
I've thought about this, and the answer is probably not for any real time trading platform.
Storm is meant to be distributed across multiple nodes, and even if they are different machines on the same hardware I think whatever communication system storm uses is going to be orders of magnitude slower than any direct memory bus. So if latency is your goal, you probably have to stick to tried and true, whatever that may be (GPU server farms?)
However, I bet it'd be perfect for a market analytic systems, where it can scale to consume and create massive amounts of data in near real time. Maybe as a compliment to your latency sensitive system in some way.
I think Storm is more of "web" real-time, deadlines on the order of seconds, rather then traditional real time systems which have deadline accuracy requirements on the order of milli to microseconds.
Also, given a set of nodes a set of input data, is end to end latency consistent?