The paper trains an XGBoost model to predict investment outcomes for startup investments from the time period 2014-2016. The model is trained on investments prior to 2014, and uses features from Pitchbook. It says the most important features are text descriptions of the company and CEO, which are featurized as unigram and ngram TF-IDF counts.
The paper avoids the very first methodological problem one would check for, which is using a held-out set that is mixed in time. That is: at least it uses a heldout _period_ (2014-2016), instead of individually held-out samples. Using individually held-out samples would let the model learn about future trends. If you know some companies that will be successful in 2026, that will help a lot in deciding what to invest in today, even if you can't pick those companies. So at least the paper doesn't do that.
However, the paper doesn't seem to have limited itself to information written and documented about the companies at the time of the investment decisions. Descriptions about the CEO will change over time. Once a company is looking to go public, maybe they emphasise that their CEO has an MBA. If your company died at seed, maybe this detail isn't included in the text. Even the people listed as the founders of the company is something that might be revised over time (Tesla is a famous example).
The paper really doesn't give us much insight into what the model learns. This means that we need to consider competing explanations for the model's accuracy --- we're not forced to accept the author's conclusion that the investments were indeed "predictably bad" in a way that could be profitably implemented as a trading strategy.
By itself, the fact that a black-box machine learning model with a large number of features can distinguish between two classes on a held-out set really isn't very strong evidence that the model has learned something of practical importance, that will generalise beyond the train and held-out sets you've collected. You need to show us what it's actually learned. If it's supposed to identify wolves, is it looking at the canines or the snow? For a study like this, I'd want to see feature pruning down to where the model is still accurate but with only a few textual features, and then show us what those features are. If you can't do that in a way that actually leads to human insight, it's extremely unlikely that the model is learning anything real.
> The one data point that is not specific to the incubator deal is the company description.
Pitchbook does not store or provide a time-varying description of the company. Instead, for each startup I have a current description of the firm, its product, and activities, independent of its health or status. I use this information in my analyses as a way to extract information about the company’s product and thus refer to this field as product information. A key assumption in this approach is that the descriptions found in Pitchbook do not evolve as a function of early-stage funding or late-stage success which would mechanically make the descriptions predictive. To the best of my knowledge the company descriptions available to me today are not systematically different from the descriptions available to early-stage investors when they would have been engaged in due diligence. Further, all results are robust to the exclusion of the product information.
This is a prime place to go looking for a leak of signal from the future that inflated the performance of the model.
It would be interesting to go look at Pitchbook's descriptions of successful and failed startups and see if these descriptions seem to leak any information about the later success or failure of the firm.
And I agree, without some kind of sensitivity analysis, partial dependence analysis, etc. it's hard to draw conclusions about what, if anything, the model has learned.
It's also particularly important to test your model against simulated data with a known effect built into it. Your model should be able to learn real effects and avoid learning spurious effects. Simulation studies can be time-consuming and difficult to design, but not much moreso than a good test suite for a piece of software. I don't know why this technique isn't more common in statistics and ML, even in the world of traditional probability models. It really should be taught in stats and ML courses.
"Several of these quantities are not provided in the data and therefore need to be imputed assumed. First, the valuation of the firm at t=2 in model time is defined as the latest associated with a “late-stage” round by the end of 2020. If no such rounds exist the company valuation is set to zero. If such a round exists but the valuation is provided by Pitchbook, I set it to zero. Next, Pitchbook typically provides the initial stake of the early stage investors. If it does not then I assume the early stake is Finally, I assume a constant dilution factor of .75."
I've made assumptions about the thing that effects what I try to predict the most and have found that my algorithm says people are wrong.
Sounds like those people that first learn about ML than immediately runs off to build a stock market predictor before inevitably learning about the investment worlds definition to "Data mining"
My naive belief is that VC's invest more in companies that they think will succeed (i.e. larger funding rounds).
Is it possible the model is partially latching onto this, meaning their result could actually be just saying "companies that VC's back more are also the one's that succeed", presumably because VC's have some of their own reasonable criteria for doing this.
I didn't see this discussed in the paper, I wonder if it's results would be as strong if they excluded size of funding rounds as an input.
This is a super interesting paper, but I wouldn't put much trust in the author's suggestion that their machine learning model knows how to make better investment decisions than VC's do.
Of course their model soundly beat the returns of real investors. Its training dataset contained hard data about the future of the tech industry. Real investors don't have hard data on the future, they have to guess.
The dataset includes descriptions of each of the firms. The model gets to learn from the training dataset whether sectors like VR, 3D printing, business collaboration, clean tech, or etc. were good or bad areas to invest in in 2014, and then gets to apply that knowledge to the test dataset which covers the same epoch. (EDIT ADDED: to quote another commenter here, "[The paper] says the most important features are text descriptions of the company and CEO, which are featurized as unigram and ngram TF-IDF counts.")
Real investors making real investment decisions don't know in advance whether sectors like VR or 3D printing or clean tech or low-code/no-code or etc. are going to be hot or duds, they have to guess. Equally importantly, this model doesn't know which of the sectors being touted as hot investment opportunities today will, in hindsight, prove to have been hot today. It only knows which ones were hot in 2014.
>Real investors making real investment decisions don't know in advance whether sectors like VR or 3D printing or clean tech or low-code/no-code or etc. are going to be hot or duds, they have to guess
Did you just baseline against a random walk?
If so, then at least 60 years of quant investing will beat the model control sample.
As with earthquakes, successful start ups are notoriously hard to predict. We should be thankful that VC invest so "broadly". In the same vein, where a founder has previously worked should be viewed as just a luck indicator rather than a predictor.
Previous success doesn’t just reveal inherent quality, it’s also a learning experience: people who have succeeded know the amount of work, frustration, and personal transformation that goes into founding a company and successful growth. They can reasonably choose to do it again.
I don’t think that first-timers ignore the amount of work that much, but simply having an idea beforehand helps. Knowing what it’s like to change yourself is also a very difficult thing to anticipate.
Exactly, perhaps if you could find the people who were aware of their own luck and the role it had played in their successes to date, they would be the people to back. This would rule out the inflated egos of Elizabeth Homes, Adam Neumann, etc
If the CEO speaks loving about the beauty of an idea like Jodorowsky and says he's "the valuable idea guy" (who doesn't do any measurable work)...
If the COO says Papa Murphy's stole their idea from business school...
If the CTO talks about placing tubes in the sky for renewable zero point energy...
If the CEO has a long pinky fingernail...
If no one can explain how money could ever possibly be made...
If no one can explain how it's uniquely better, necessary, and defensible...
If the stress level is constantly crash schedule burnout...
If like a country club meetup, the 100 employees without measurable sales or more than 2 months of runway go on retreats, do team building exercises more than actually working...
Unstable, immature founders constantly shouting at each other unproductively while one does no work and another is likely crazy...
CEO is too cheap to hire employees who could speed things up, buy laptops or chairs for software engineers, or purchase office supplies despite having funding...
With 50 employees, 5 layers of management and CEO salary is $1m / yr pre-product ...
Yeah, someone’s parents are fucking rich or someone on the founding team took a shine to someone who never succeeded as an entrepreneur or both.
The sad thing isn’t that we’re still money-laundering success through those channels.
The sad thing is that we’re discussing this on a forum dedicated to the idea that knowing people isn’t everything and the guy who got the job of running it by flying his only startup into the side of a mountain isn’t even in charge anymore, he’s too busy, he’s long since handed it off to the guy who told Autodesk that SocialCam was going to be the next YouTube.
> Anyone at a C-level who has to shout at staff is a red flag for me.
The current generation of very large market cap companies had CEOs like this.
It's harder to find a current unicorn or now established +$10B company that didn't have a C-level like this than ones that did (there are some good eggs at that level, but there's a hell of a lot of bad ones too).
I briefly worked at a smaller unicorn. During "all hands" meetings, there would be frequent swearing. One of the "senior executives" gets up there, tells us we need to work harder because he has a ton of stock and he needs to get rich off of it. "Let's f-in get this done guys!!!" Real professionals.
At a earlier company (not near unicorn levels, but raised over $100 million), the CEO gets up on stage. He tells us we need to be willing to work nights and weekends on [major project]. If we don't believe in the vision and the company, we should get out now. "The stairs are in the back of the room!"
We can at least be sure that Energy Vault (traded as NRGV) is a bad investment.
Their original crane tree for concrete blocks was an obvious failure: it would not work in any breeze. It would anyway be able to store only a tiny amount of energy if you (e.g.) put a dome over it.
Their current scam, the hi-rise condo for concrete blocks, could in principle work, between failures. It too could store only a tiny amount of energy, but is also extremely expensive and unreliable. It seems difficult to imagine energy storage way more expensive than batteries, but they succeeded.
I expect NRGV to pivot shortly to a third model that will be harder for any random civil engineer to prove is stupid. Chances are it still won't work, though.
No non-crooked utility will ever buy from NRGV; it is a pure ploy to scam investors. The successful bulk storage solutions will (1) be very cheap, (2) cost mainly for kW in or out, not kWh stored, (3) cheaply expand to store more, if needed, (4) be not necessarily very round-trip efficient.
Storage using liquified ammonia (at room temp. under light pressure) or hydrogen (in underground cavities) will be common despite higher cost because there will be such insatiable demand for both of those. Utilities will overbuild generation capacity just to keep their synthesizers busy.
Cheapest storage will rely on gravity using natural elevation differences, or compressed air relying on earth or water to contain pressure. Undersea methods are attractive for simplicity, relying on depth for pressure, or on buoyancy. Batteries will always be more expensive than what utilities use for bulk storage, but they are the only method usable "off-grid". Those will not use lithium.
Seasonal storage will be rare. Most utilities will import ammonia from numerous competing solar farms in the tropics to burn in their existing combined-cycle turbines when local generation flags.
What is most absurd about Energy Vault is that simply using water (a kind of mini-pumped hydro) would be orders of magnitude cheaper to build and far more efficient.
Gravity storage is not new. It predates the electric motor by centuries. You store power by using it to lift a heavy thing and then let that heavy thing drop and spin a wheel or a turbine when you want that energy back. The weight can be anything; a bag of rocks, a gallon of water, or a custom-form concrete block.
From an engineering perspective, water is the way to go. It's cheap, easy to work with, and turbines/pumps are efficient. Best of all, you don't need to build some bizarre, towering structure to hold it. Just use tanks if there are no natural elevations.
In fact, you could put the whole thing underground if you wanted to. All that matters for gravity storage is the relative distance you move the mass.
Energy Vault seems to be taking us back to the medieval bag-of-rocks method of energy storage.
I've never understood why people consider them innovative.
Water is great mainly because we have so much equipment already invented for handling it automatically. Imagine if you had to invent pumps and pipes before you could demonstrate your idea.
Cables, pulleys, and winches are another mature technology. If you can provide a large constant force on a cable over long distance, reversibly, without need to build anything that long, you might have a winner. You should be able to share the expensive bit, the motor/generator/winch, with lots of cable reels. And, it is better if the winch and reel are off-the-shelf items.
So, put a float with hundreds of tons of buoyancy on a cable through a pulley screwed into the sea floor, thence to a reel and winch on shore. You can share the expensive motor/generator/winch among as many reels as you like. The float might be a sealed balloon full of lithium-saturated liquid ammonia, which is half as dense as the water it displaces.
Air handling is a little less mature. Pump air into a big, light tank well anchored to the sea floor, displacing water. Let the air out through a turbine. People invent elaborate systems to avoid losing heat, but when marginal cost of generation is zero, efficiency doesn't matter so much. The tank only has to be strong enough to hold back the buoyancy of the air. You can add as many tanks as you like. Or, use natural cavities beneath the water table. Maybe pump hydrogen down there, instead. Or hydrogen in one, air in another.
There will be zillions of storage schemes, that all work, until we settle on a few we like best for their secondary benefits. Certainly pumped hydro will remain important.
The main problem for energy storage investment is that investors demand a complex, patentable system, but the best systems are too simple for that. So, there will be a rash of over-complicated storage scheme startups, and not enough simple ones.
So to field a good system, you need to rely on grants, instead. A grant might suffice to prove viability, but it takes investment to build a factory to build systems in the huge volume we will need. So, the parts for your system need to be already manufactured in volume for some other purpose. And you probably need a utility to commit to a huge order and pay in advance.
We need an investment fund that selectively funds scaling up the simple storage systems venture capital can't.
I'm struggling to understand how the methodology avoided peeking into the future. The author says he trained on companies that completed accelerators before 2014, leaving 2014-2016 as a holdout period. But then in the Outcomes section, he is focused on "post-incubator early-stage investment and late-stage exit".
The author's claim of "predictably bad investment" requires a specific time at which this machine learning algorithm could have been trained based on past data and then predicted the future. What was that time here? It couldn't have been 2014-01-01, since the algorithm needs to be trained on an outcome, and it sounds like he's relying on post-incubator data to get that outcome. Meaning at least some of the data needed to train his model comes from after 2014-01-01... right?
Am I missing anything here? Isn't he cheating by peeking into the future?
If you have two sets of companies, that who participated pre 2014, and those from 2014-2016, you can train based on the earlier ones and validate based on the later ones without peeking into the future. I haven't read enough to know that's what happened, but that's a simple way to avoid this.
But if you need information from after 2014 to define the outcome metric that's used to train your model, you're still peeking into the future, even if you've limited yourself to companies that completed the incubator before 2014.
He's relying on "post-incubator early-stage investment and late-stage exit" to train the model. Unless I'm missing something, that information would not have been available at the beginning of 2014, so the model could not have been trained at that time.
In order to say the human investments were predictably bad, we need to specify a time at which all necessary data to train the model was available, and predict from that time point forward only.
The paper made me wonder: How many false negatives? ie, did the model declare an investment as "bad" even though the investment turned out well?
Essentially: I was looking for a scatter plot with "expected return" on one axis and "actual return" on the other. (Maybe I missed it: The paper is pretty long/dense, and I only skimmed.)
But the real question is: How did the return on the good investments compare to the costs of the bad investments? And how did the overall net return compare to, say, S&P?
One hint is that although half of the investments were bad, they also only accounted for 10% of the money invested.
> How did the return on the good investments compare to the costs of the bad investments? And how did the overall net return compare to, say, S&P?
Both answered in the paper, so not sure how they are “the real question”.
The goal is always to make better investment decisions. If I can make 55% returns instead of 40% returns by making better decisions, the s&p returning less than 40% is irrelevant. I want to make the 55%.
The real missing unanswerable alternative is if a better decision framework had been used, what unfunded startups would have been funded and how much they would have returned.
But if, in practice, the only realistic way to make better-enough decisions to actually realize that 55% is to receive a letter from your future self telling you whether it was good or bad, then "wanting to make the 55%" is irrelevant.
It is frequently clear in hindsight which decisions were better, but the point of hindsight is that you have information you did not have when you made the decision—and, in most cases, could not possibly have without violating causality.
I would also question whether investments that are unlikely (or less likely) to make a direct return are always "bad".
The key points are that investment is an iterated game, deals are not necessarily independent and "networks" of people exist.
Consider a successful serial entrepreneur; you might fund something you consider "not so great" right now because you value the relationship and want access to future deals with that person or their network. Or their uncle, whatever.
Some sub-optimal deals might lead to more investments or opportunities due to "goodwill" and be net positives when the broader picture is taken into account.
From this perspective it is not guaranteed that making "locally optimal" decisions at every turn would actually improve overall returns.
Unless I am missing something, that's how you would expect it to be. The failure rate is very high for venture capital and returns concentrated in a handful of outsized winners, so you would expect from, a large sample size, for investments to be predictably poor no matter what public information or indicator you use, such as eye color, hair color, sector, whatever. EMH only applies to public markets, not early stage private ones anyway. I am skeptical of anything published on SSRN, zero peer review to publish there.
SSRN is a pre-print repository, so that’s a strange criticism: it’s exactly what it says it is.
And this article isn’t really about EMH, it’s about the quality of capital allocation decisions based on the information that was available when the decisions were made.
The paper doesn’t say VC risk is high. It says that 50% of the investments are predictably bad and can be safely dropped because there are better risk adjusted alternatives in the public markets (stocks or bonds) and dropping these would boost returns.
I know he seems to score returns on features such as funding raised and the founder’s background (he claims VCs over value the background). But how do they compare those features with bonds and stocks? I was expecting sales growth, margins or P/S ratios to be the features.
This seems to miss the point. If you are pursuing a particular rate of return, you can be almost guaranteed of not reaching it with bonds and traditional equities. Risk adjusted return is only one framework for examining your portfolio.
The inherit risk in a given asset should not be a constraint.
You can simply lever bonds, equities or most other investments by a variety of means. For example you can get way more risk in 2-year US gov securities via futures than would be sensible.
Likewise, you can de-risk an asset by holding cash alongside that asset.
The point is not to stop investing in VC, but stop investing in the subset of VC that doesn’t measure up to your otherwise available investments. (About half the shots).
You can redistribute the same amount of money into the other (better) half.
Absolutely. But in this case the author claims he does know which half. That’s what the whole paper is really about: figuring out which half. (Not saying you have to agree with him)
Need to be clear about the definition of ‘investors’. Institutional investors, referred to in the abstract, make bets on VC firms. The VC firms bet on startup candidates. Unless the institutional investor has an in-house VC team.
Venture Capitalism is called "Venture" for a reason. It is a high risk, high reward activity.
But I won't call it predictable. It is only predictable after the fact. Things that were extremely useful for society were extremely bad investments,short term, like the Gutenberg press(Gutenberg went bankrupt), the Columbus expeditions, the first aeroplanes(most early aviators died or had brain contusions for life),the Apollo landings.
Today, for example, the return on all investments in nuclear fusion has been negative. In fact I have invested lots of money on a nuclear fusion startup that had made lots of progress recently. I believe that even if I loose all my invested money, it is worth it.
We had recently a bubble of malinvestments but it has way more to do with States having access to the money printing press than about companies itself. That correlation is total, 100%. With 0% interest rates that you can pay with debt it makes sense to "invest" in 20 projects expecting one of them to be the Unicorn that explodes.
Interesting results, would be even more interesting if the importance of the features in the trained model had been analysed with a method that considers the interaction between them, like SHAP values
> ...approximately half of the investments were predictably bad...
I thought it would be higher than that. Investing in startups seems to predictably bad right off.
If this gentleman has solid evidence that there are predictors of success for startups that would be a much more interesting signal than noting that most startup investments are bets that went bad.
It would be more useful if there was such a thing as a check list for good bets, but it’s fairly interesting (at least to me) that the founder’s background is consistently over-valued in the results.
This has always seemed fairly logical to me - at the end of the day the users are investing in a product while the investors are investing in a person. The mismatch here is bound to create loss but this presents data to support.
I'm curious if anything was modeled being undervalued for predictive value.
Human sight is overvalued in the context of identifying optical illusions, but we don't have a lot of effective alternatives for making decisions about how we move about the world.
Isn't background pretty much the deciding factor when investing in a company? If we would look at the background of the founders of all the YC companies, I suspect almost all will have a "good" background.
The point here at least is that the data is showing that this is a poor way to judge the outcome of the company.
This stands to reason, valley investment is often a conveyor belt of:
Ostensibly good school -> Ivy League STEM programme -> Tech job / investment
That's a very narrow selection pool to part your money on the basis of, given that by default it will produce candidates with little experience of the real world or the realities of competing in a marketplace right before they compete in a global marketplace.
There are obviously exceptions to the above (there's a great many even within YC) but it just reinforces the point that having a few people pick other people to give money to based on them being their kind of people to outcompete a globally large group of yet more people performs relatively poorly as an investment strategy.
By predictably bad, it means that the model said it would be bad and it was bad. There are also "unpredictably bad" outcomes. E.g. if 19 out of 20 that are about to receive investment are bad, the claim is that the model could identify and eliminate ~10 bad companies at that point and let you invest in a portfolio where only 9 out of 10 companies are bad.
The paper has a thorough discussion about the "selective label" problem, which is largely why the claim is in this form.
> I find that despite the fact VCs outperform the market on average, the returns are driven by the top half of the predicted quality distribution. By dropping the bottom half of investments and instead investing in the market, returns would have increased by 7 to 41 percentage points. This qualitative finding is robust to a set of outside options (stock market or bond market). Together the results suggest that there is significant room to improve how venture capitalists select into investments.
Assuming the trained model is actually correct, this aligns with my view of VC - sharp investments, padded out with a bunch of crap because investing in VC has become a status symbol and the capital has to go somewhere.
The paper avoids the very first methodological problem one would check for, which is using a held-out set that is mixed in time. That is: at least it uses a heldout _period_ (2014-2016), instead of individually held-out samples. Using individually held-out samples would let the model learn about future trends. If you know some companies that will be successful in 2026, that will help a lot in deciding what to invest in today, even if you can't pick those companies. So at least the paper doesn't do that.
However, the paper doesn't seem to have limited itself to information written and documented about the companies at the time of the investment decisions. Descriptions about the CEO will change over time. Once a company is looking to go public, maybe they emphasise that their CEO has an MBA. If your company died at seed, maybe this detail isn't included in the text. Even the people listed as the founders of the company is something that might be revised over time (Tesla is a famous example).
The paper really doesn't give us much insight into what the model learns. This means that we need to consider competing explanations for the model's accuracy --- we're not forced to accept the author's conclusion that the investments were indeed "predictably bad" in a way that could be profitably implemented as a trading strategy.
By itself, the fact that a black-box machine learning model with a large number of features can distinguish between two classes on a held-out set really isn't very strong evidence that the model has learned something of practical importance, that will generalise beyond the train and held-out sets you've collected. You need to show us what it's actually learned. If it's supposed to identify wolves, is it looking at the canines or the snow? For a study like this, I'd want to see feature pruning down to where the model is still accurate but with only a few textual features, and then show us what those features are. If you can't do that in a way that actually leads to human insight, it's extremely unlikely that the model is learning anything real.