Real-time feed processing with Storm

rluhar · on Jan 26, 2012

Thanks for the article. I am wondering if anyone here has any experience of using Storm for working with real-time market (financial) data. If we take FX market data for example, you could be looking at multiple feeds, each updating a few times a second. Is a framework like Storm suitable for these types of applications which are very sensitive to end to end latency?

Also, given a set of nodes a set of input data, is end to end latency consistent?

nathanmarz · on Jan 26, 2012

There's at least one company using it for algorithmic trading, although I don't know the details of how they're using it.

Storm isn't intended for sub-millisecond type processing, but latencies on the order of milliseconds is certainly doable on Storm. Obviously a lot of that depends on how complex your processing is.

rluhar · on Jan 26, 2012

Thanks Nathan. The sort of latency that we are working with is in the order of milliseconds. There are many HFT shops out there that have invested in FPGAs or other hardware based solutions to get sub millisecond latencies for processing incoming data. I think they are fairly specialized firms.

Currently we are using a proprietary CEP platform that has issues with scaling as the number of deployed models (i.e. data consumers) goes up.

Storm's topology based approach maybe worth some investigation because it would allow the addition of more compute nodes transparently as our needs expand. I will take a more detailed look.

newman314 · on Jan 26, 2012

Do you have more detail around how these HFT shops are using FPGAs etc.?

I would be interested in learning more about said implementations.

rluhar · on Jan 26, 2012

By the nature of the business, most of these shops are secretive. But there are some technology vendors out there building products based on FPGAs. Stone Ridge Technology is one of them (I saw an article in magazine, have no experience myself)! I have heard that FPGAs are mainly use to parse and normalize incoming market data. For example a data feed coming from NYSE in the ITCH format.

Other hedge funds are also using hardware based solutions. But more likely GPU based solutions for number crunching.

If you are interested I would recommend checking out the forums at wilmott.com or at the Nuclear Phynance [sic] board. Wilmott magazine also has had a few articles on models written in CUDA for pricing options etc.

hncommenter13 · on Jan 26, 2012

Here's an article from back in 2007 about folks doing this, so you can imagine where they are now.

http://www.hpcwire.com/hpcwire/2007-06-08/high_frequency_tra...

Exegy is one company I'm familiar with, based in St. Louis, MO.

djb_hackernews · on Jan 26, 2012

I've thought about this, and the answer is probably not for any real time trading platform.

Storm is meant to be distributed across multiple nodes, and even if they are different machines on the same hardware I think whatever communication system storm uses is going to be orders of magnitude slower than any direct memory bus. So if latency is your goal, you probably have to stick to tried and true, whatever that may be (GPU server farms?)

However, I bet it'd be perfect for a market analytic systems, where it can scale to consume and create massive amounts of data in near real time. Maybe as a compliment to your latency sensitive system in some way.

I bet nathan could answer better though.

Game_Ender · on Jan 26, 2012

I think Storm is more of "web" real-time, deadlines on the order of seconds, rather then traditional real time systems which have deadline accuracy requirements on the order of milli to microseconds.

scott_s · on Jan 26, 2012

I work on IBM's Infosphere Streams, and we can handle those latencies: http://www-01.ibm.com/software/data/infosphere/streams/

You can also email me if you want to know more.