I'd argue that a lot of the hard parts of algo writing are solved by Quantopian. Hard:
* Data. You need to test your idea. Most historical stock data (like Yahoo) excludes companies that went bankrupt or were bought or otherwise disappeared. That's called survivorship bias. If you run a backtest on the finance industry and you don't include things like Lehman, you're going to get the wrong answer. Add in things like Hurricane Sandy, MLK Day, 9/11, mergers, acquisitions, stock splits, etc. and data is very painful to put together.
* A backtester. Once you have you data, what do you put it into? How do you calculate commissions? How do you calculate slippage (your order affects the price, remember)? How do you avoid look-ahead-bias and other bugs that plague backtesters?
Coming up with an idea to trade is hard, but it's only a part of the problem. I'd say it's the most fun part of the problem, but it's only a part. Quantopian is trying to remove all of the hard parts and let you do the easy parts. We have tens of thousands of lines of code (backtester, IDE, etc.) and we're leaving the most exciting 100 lines of code to our members.
I would describe data collection/mangling/processing as more annoying and tedious than "hard". I'd seen the Quantopian github account before, and appreciate what ya'll have done. I didn't know there was a business model behind it, but I get it now.
You said it solves "the hard parts of algo writing". I guess that's true if speaking of algos in a general sense, it makes it easier to get started. But if the goal is to write profitable algorithms, then its much less true. I just looked again at the API, and using it to create a profitable strategy would actually be more difficult (and less profitable) than it would be to trade elsewhere.
The main reason is because the function to make a trade will only place market orders. Its well known (in the financial literature) that an algorithm which trades with limit orders will almost always outperform one which trades with market orders (pretty much obviously true since the limit order is the better price). That's because where a passive trading (market making) algorithm is earning the spread, with market orders you have to beat the spread just to break even.
Of course, that's why many brokers only allow market orders (they collect the spread as their trading fee).
Also, I don't see how the backtester avoids look-ahead bias. Backtesting like that just encourages Data Dredging[1] an algorithm overfitted to the test set. It would be better to backtest on random subsets of the data (cross-validation and all that). But I can understand why that isn't done (would be harder for users to create should-be-profitable algorithms and start trading with them).
The main reason is because the function to make a trade will only place market orders.
We (I work for Quantopian) will surely support other order types when we support live trading.
Also, I don't see how the backtester avoids look-ahead bias.
We provide over ten years of historical minute bar data for U.S. equities, with no survivorship bias. This means two things:
1. The amount of data we provide is sufficiently large that if you test your algorithm against a bunch of stocks over that entire period of time and it performs reasonably well, it's unlikely that it's overfitted to the data in a way that is going to bite you on the ass in live trading.
2. But just to be even more paranoid, the smart algorithm writer will do just what you describe -- divide the available data into lots of subsets, randomly pick which subsets of the data to test again each time you backtest, and don't start live trading an algorithm until you've confirmed that it performs well on random subsets of data that you haven't previously tested it on. Right now on Quantopian you'd have to do all that data segmentation and selection by hand, but I suspect that we will eventually add features to make it easier to do automatically.
> We (I work for Quantopian) will surely support other order types when we support live trading.
I suspect they won't be true limit orders, but stop orders (true limit orders wouldn't slip). Also, for a passive trading algorithm its important to have level II data (the order book), or the algorithm is flying blind.
Re 1, its not the size of the data set which prevents overfitting (larger data sets actually make it more likely), but the use of it when developing/training the algorithm. Repeated testing and tweaking the algorithm will overfit, unless one is careful to maintain the complexity of the algorithm.
Anyway, best of luck to your users! But they should be warned that active trading on signals (price prediction) is the hard way to algorithmic profit. That's why market making and arbitrage is the bread and butter of the pros, not signal trading.[1]
In that video the quotes to which I guess you are referring are "[investing] is almost entirely a non-professional role" (08:40), "[Market-making] is a highly professionalized role" (09:13), and "[an arbitrageur is a] highly professional role" (11:22).
I thought the video had something in support of your "price prediction is the hard way to algorithmic profit" statement, but I can't find anything very explicitly in support of that (what I thought was your implication). He kind of implies something like that at 15:00 when he says "we have no idea how [price prediction] works [at longer then a few days in the future", but that's not really very strong.
When you said "active trading on signals [...] is the hard way", did you mean active trading at minute-to-day or greater holding periods? That seems a) right :); but b) slightly at odds with your "it's important to have level II data", since I would have thought that is less important at a hourly-to-day or greater holding period.
Backtesting is only 1/2 to 1/3 of the actual task of algorithmic trading. When it comes to actually trading money, if you're not careful and cover every corner case, you really could lose your shirt. I can't believe that 100 lines of code is all that you would need to cover all the edge and failure cases that could lose yourself a lot of money.
What happens if the network drops while you're in a trade, or in the middle of executing an order? What if you have a partial fill, and you have a partial buy order hanging around and a full sell order out there? What if you have a stop limit order to exit a trade, and it blows through your limit? Do you have a backup stop order just in case? There are a lot of issues that you can't backtest that can only be learned once you start trading real money. I've had situations where I ran my algos on the DAX overnight, and I woke up to find that the exit order never executed, and I was 1000 euros away from where I should have gotten out. Luckily, this trade was in my favor, but it scared the shit out of me because it could have easily gone the other way. I've also had my internet connection drop overnight, and I had to scramble to figure out how to get out of a trade I was in.
The other issue is interpreting backtesting data, and knowing the difference between over-optimized data (ie. curve fitting) vs something with an actual edge. You can make almost any algo profitable if you curve-fit, even a simple MA cross-over can show extremely profitable results, if you over-optimize the data. But it won't work in real life. So being able to sift between falsely good algos and actually profitable algos is very, very hard, and takes experience. This is the biggest problem with trying to find an actual algo with an edge, it's very very hard.
I wrote my own backtester and I download my own data nightly, and that definitely takes a lot of time and effort, but the hardest part is the actual trading, by far. The psychology involved with trading is an order of magnitude harder than coding, and the hardest thing I've ever done in my life. I blew through a shitload of money, just to learn the ropes.
In terms of algo trading books, I really don't have any recommendations. I found most of the algo trading books are similar, telling you to watch out for curve fitting, etc. Where they lack is helping you come up with actual trades. My recommendation is a book called "Mastering the Trade" by John Carter. He gives out trade setups that he actually used. They may no longer be profitable, but it's the closest thing you can get to actually learning various day trading techniques, and you can implement those to get an idea, and then work your way from there.
I'm not hating on Quantopian, it looks pretty nicely done, and hopefully it takes off. But from experience, I know that algorithmic trading is by far the hardest thing I've done. If inexperienced traders jump in, they'll spend a lot of money on "tuition" for sure.
Try the paper "The Winners and Losers of the Zero-Sum Game" by Lawrence Harris. I found it most enlightening. I'm paying it forward, since it was here that I saw it recommended (hat tip to tptacek iirc).
What makes a zero-sum game so hard? Competition. Read Nate Silver's book chapter on Poker for a great explanation of that. In summary: he won a lot of money playing online poker when the services were growing and there was a large supply of fishy players (losers). When the growth of new poker players stopped, and the fish had mostly left (after losing their money), then he started losing.
What makes it so hard--trying to come up with a good model?