Hacker News new | past | comments | ask | show | jobs | submit login
The unreasonable effectiveness of Soccermatics? (2017) (interaliamag.org)
95 points by mpweiher on July 26, 2018 | hide | past | favorite | 43 comments



Funny to stumble across this on Hacker News.

I'm the analyst mentioned in the article. Happy to answer any questions about our weird little industry.


From following various soccer blogs, it seems that defensive stats aren't as polished and explored as offensive ones. I'm curious what stats beyond things such as tackles/interceptions are looked at. For example, Maldini has the reputation of being one of the top defenders ever, but also is known for his quote "If i have to tackle, then I have already made a mistake" (paraphrased). His tackling stats seem to support that style of play in that he made fewer challenges than most. Perhaps he made tons of interceptions, and in that sense, never had to tackle? I'm unable to find a good source of his playing stats.

Anyways, what sort of things do you look at for defensive players? It seems that its when I look at things such as WhoScored's statistical team of the season, it has players such as Mustafi, who generally has a negative reputation for his play. I suspect he is so high because the rating metric used by whoscored overvalues offensive contributions of defenders, vs. pundits more likely look for a defender's defensive contributions. Are there any form of 'advanced stats' for defenders beyond the basic measured stats of challenges, interceptions, etc, that you and/or the industry looks for?

https://www.whoscored.com/Statistics


Defensive metrics are very difficult for a couple reasons.

The first problem is the data. The soccer viewing public is largely familiar with event-level data, typically provided by (my previous employer) Opta. They've done a great job normalizing soccer statistics on the cultural level, but the information they've collected at scale isn't that useful for creating good defensive metrics.

Other companies have sensed an opportunity here and have started providing more detailed data around things such as defensive pressure. Suddenly, you can contextualize each offensive event with the level of defensive pressure applied to it. I think this will be a game changer, but we're in early days there.

Other companies provide player-tracking solutions that give you real-time position of all players on the field. This is great because you have a "complete" picture of the game, but it requires a lot of work to build more sophisticated spatial/geometric models.

There's also the "Howard Effect", coined largely as a basketball term, but it's similar to the Maldini example you provided. Some defenders are so good that they don't have to be "active" defenders. That's something which is really difficult to adequately control for.


Thanks! Also, what are some sources for stats that would be available for free online? I've looked at some 538 stuff, some Statsbomb stuff, whoscored, and football-data.


Legally, there isn't much. Most of those sites are powered by Opta data, which is exclusively for-purchase.

Statsbomb is a little different. They've started collecting their own data and are offering some free data from various Womens leagues.


Thanks! Appreciate your comments


This repo should grow each week, and eventually catch up to the real world in terms of NWSL data:

https://github.com/statsbomb/open-data


THere's similar things like this in other sports, like American Football. You don't tell an amazing cornerback by just counting how many passes he broke down or intercepted: You also look at how often the player he was covering was targeted at all, and compare to their baseline. When a received that is normally amazing barely gets passed to, and when he does, the plays are not very fruitful, it's the defender's fault. When it comes to easy to digest statistics, that's handled by counting passing attempts towards the defender's area.

You can do the exact same thing in soccer, if you have the data: You can assign responsibilities to players, just through computer vision. If Messi gets 3 touches in an attacking position when Sergio Ramos is defending him, you can credit that to Ramos, and compare that to Messi's touches vs the average Barcelona opponent.


What Maldini meant by that quote was that as a defender, you should be in the right position so that the dangerous pass is never made, or the attacker doesn't have the chance to run past you. Tackling is sort of a last-ditch effort to stop the other player, and comes with risks of conceding fouls/penalties.


Hmm yes, my interest is that this sort of defender may be hard to identify purely from event driven stats, as stats mostly track actions done (tackles succeeded/attempted, interceptions made). A player such as Maldini wouldn't have shown up on lists sorted for tackles made, although he may have shown up on tackle success % or interceptions (not sure because I can't find a good source of data for his playing stats). Despite this, he has the reputation of being one of the best defenders to play the game. I'm just curious about what sort of metrics could identify a player such as Maldini.


Do you have any recommendations for how a hobbyist can get more into the field? Someone who has a technical background, but works in a different field unrelated to sports at all. Any reading material to get familiar with some of the stats and metrics that are used in the industry?


There isn't a ton of great public resources out there. For the most part, anyone who's writing smart analysis in the public space gets hired away to a club. It's exactly what happened in other sports.

But, I would read "The Numbers Game: Why Everything You Know About Soccer Is Wrong" and read the backlogs of the StatsBomb blog. That will get you up to speed pretty quickly.


It doesn't really explain what 'Premier League analyst' is. Is this someone who works for a particular club? Or writes for some publication? Or walks the earth like Caine of Kung Fu, offering analysis to those in need?


I'm the other analyst mentioned in the article, but my position is probably quite similar.

I work with a team's coaching staff and management to help them make more data-driven decisions. This ranges from topics around opposition analysis to player recruitment.


Oh, I coffeelessly missed that bit. Analyst for an MLS team sounds even more interesting. You should do an AMA, perhaps.


Why does being an analyst for an MLS team sound more interesting than being an analyst for a Premier League team?


It seems more interesting to hear from an insider under those circumstances. For starters, an MLS analyst might actually be able and willing to chitchat with you. MLS is an odd place compared to the top European leagues - there's no threat of relegation, no scrambling to earn Champions or Europa League spots and, comparatively, also no money. There's the business with the 'designated players'. The dominant sport in this particular analyst's market is ice hockey. Etc.


'Premier' meaning the England soccer competition


That's not what I was asking but thanks.


How did you end up in your current line of work? Ever used any of your stats for gambling?


I stumbled into it largely by coincidence and luck. As you could imagine, there aren't many people that study computer science who were also relatively high level soccer players. I'm very fortunate to occupy the very narrow intersection of that obscure venn diagram.

I stumbled across some high resolution soccer data during college and began writing a blog that became popular in soccer analytics circles. The company that produced the data that I was scraping eventually hired me to their data science team. I spent a few years with them before I was recruited to my current team.

I have never worked in gambling but there are a few people in my sort of role that have that sort of background. It requires a pretty similar set of skills. I am not sure that I could beat the market by a large enough margin to make it worthwhile. It's quite efficient. But for the most part, I'm more interested in understanding the underlying mechanics of tactics than I am interested in predicting the result of games.


Thanks for your reply. Is your blog still going?


For those interested in this sort of thing, 'packing' and 'impect' are also interesting to read about, developed in Germany but it still hasn't widely spread into the mainstream like more traditional stats like possession, pass completion %, or meters run. These traditional stats very often do not explain why team A or B actually won the game. Therefore, two new statistical measures were developed, which better explain why a particular team won a game.

Packing is the measurement of how many opposition players are beaten by a pass (or dribble or other move).

Impect is the number of deep lying defenders beaten, which is obviously more valuable compared to high pressing strikers.

The key insight is that defenders beaten, particularly deeper-lying defenders are the measurement needed to identify who is expected to win (and therefore assessing the value of passing players, or defenders).

http://bundesligafanatic.com/20160610/impect-packing-the-fut...


I'm not that familiar with soccer stats, but this section needs a reply:

> A “shutdown” cornerback like Richard Sherman can be a star in the NFL thanks to interceptions, broken up plays and tackles. “Lock down NBA defenders” like Bruce Bowen and Dennis Rodman can prove their worth with steals, blocks and rebounds. Football has goals and assists, that’s it.

Just as in soccer, those "counting stats" are not great measures of defensive ability. The recently retired cornerback Darrelle Revis was regarded as the best defensive player in the NFL from about 2009-2011. Yet he did not rack up interceptions or pass breakups -- in fact, in 2010, he had 0 interceptions and only 9 pass breakups over 13 games. Why? Because the receivers he was covering were never open, so quarterbacks rarely attempted to pass in his direction.

Similarly, steals, blocks, and rebounds are only a vague indicator of defensive ability; it's never a bad thing to get a steal or a block, but if you routinely leave your man to try to poke the ball away from someone else's man, you're likely hurting your team overall. The NBA has been working on developing better stats, including deflections (you get your hand on a pass but don't necessarily come away with the steal) and shots defended (you're within a short distance of a shooter). But Darryl Morey, general manager of the Houston Rockets and a well-known stats nerd, has said in a Reddit AMA that no publicly available defensive statistic is useful.

Part of this is because all three sports in question are team sports, which means a great deal depends on the defensive scheme. You may not block shots, but is it your job to block shots or to stop the ballhandler from getting near the basket? You don't get tackles, but is it your job to get tackles or to funnel the ballcarrier right into the linebackers?

It's extremely difficult for any outsider to determine the defensive effectiveness of a player. The only thing we can offer is guesswork.


Sounds very similar to the problem of accurately assessing performance of software developers working as a team.

Sure, you have people who write a ton of code and implement a lot of features, and fix a lot of bugs, who clearly are contributing a lot. But you also have people who, though other means (code reviews, refactoring and other code-health work, etc) ensure that a project is maintainable and sustainable.

How do you measure the value of 100 bugs that never made it to production because of high quality code-reviews? Or those 5 high-value features which were a snap to implement because somebody took the time to clean up all the cruft from Mr Rockstar Bro who made a gigantic mess?


You can begin to measure a player's defensive performance by working out what they allow the _opponent_ to do (or indeed not do), especially in something like soccer with larger play areas and slightly less fluidity of movement. There's a section in the Soccermatics book about an old metric I worked on called PATCH which implemented this quite naively based on defender territory and ball progression, which nevertheless flagged up people like Umtiti before his Barcelona move.

Of course there's also stuff like GoalImpact which just tries to apply plus minus to football, but all the old arguments about football being a low scoring game apply here.


That was a really interesting read, thanks for sharing it.

> Impect disregards all passes that go backwards or don’t beat any defender. It’s a correct assessment, a pass that doesn’t travel forwards, doesn’t help you score. Can’t argue with that.

This is a really odd statement. Plenty of passes don't go forward but absolutely do increase the likelihood of scoring. For example, the final pass here [0] is backwards, might even allow an additional defender time to track back (-1 packing?) but opens up the space required to take a shot and score.

[0] https://www.youtube.com/watch?v=w0oihU_fLas


The article looks like it's just a generic sports' journalist reporting on a novel statistical technique, with cross language translation added into the mix for extra confusion. I wouldn't take it too literally.

I'd guess that either "Impect disregards all passes that go backwards or don’t beat any defender" would be better phrased as "Impect disregards passes unless they go forward or beat defenders" or the statisticians would defend the original solely in terms of it being a heuristic that makes the problem more tractable.

Additionally, your linked example, with #25 picking up the ball in a danger position after a deflection, is the kind of "penalty box slop" that, naively at least, seems extremely difficult to handle analytically.


Isn’t a pass going backward about avoiding a defender they can’t get past?

Useful backward passes fall into at least two categories. The ones that eventually result in an impect, and the ones that run down the clock at the end of the game.


> Isn’t a pass going backward about avoiding a defender they can’t get past?

Yes, that's basically what I'm saying is bad about the statement that backwards passes don't help you score goals.

> Useful backward passes fall into at least two categories. The ones that eventually result in an impect, and the ones that run down the clock at the end of the game.

I think this is almost correct, but I think there is a third category: Backwards passes that immediately result in a shot resulting in a goal. In the example video I posted above, the final pass is backwards and is immediately shot into the net. This doesn't increase impect (ie: Messi dribbles past 4 players, then shoots and scores past 3 more, is only impect +4, according to the article).


And a third category, whatever Spain was doing against Russia.


> Packing is the measurement of how many opposition players are beaten by a pass (or dribble or other move).

This is what the Italians really do well with their "verticalizzazione" (I guess "verticalization" would be the closest English translation, even though I'm not sure that's a word yet), i.e. deep passing when you're (usually) 30 to 50 meters outside the opposing post. For an excellent example see this YT video (https://www.youtube.com/watch?v=wWr1tTFt7a4) of Sarri's Napoli doing it (Sarri is now Chelsea's coach, that should be interesting).


> The fact that non-mathematicians can produce sharp and correct criticism of an applied mathematician’s work shows us that the success of mathematics does not rest on its abstract beauty. The ability of researchers not versed in the subtleties of mathematics to help develop models contradicts Wigner’s ‘unreasonable effectiveness’ argument. It tells us that people not trained in mathematics are also able to give deep insights to the subject. Mathematics is not there to be discovered, it is part of the patterns of reasoning in all of our brains.

The key insight. It kind of reminds me of how physicists like Robbert Dijkgraaf (I think) have argued that physics and other fields are now contributing to mathematics through their own outlandish needs for weird maths.


I found this article a bit odd; in my mind, some parts seemed to contradict others in strange paradoxical ways.

For example, from your quoted section:

> Mathematics is not there to be discovered, it is part of the patterns of reasoning in all of our brains.

Isn't that the very definition of elegance? He later says, talking about A.J. Ayer's definition of "non-sense":

> Wigner freely admits that his idea about maths comes from a feeling that can’t be verified by known scientific methods

Elegance, while there is often a wide collective understanding of what it means in mathematics, is essentially a subjective, aesthetic property; an intuitive one. Is this not exactly what:

(a) Wigner means in that quote,

(b) A.J. Ayer means in his definition, AND

(c) the author refers to as "patterns of reasoning in all of our brains"?


We like compression. That's our main requirement. We may not like hard coded constants in our physical theories but thats only because we'd prefer a single constant that uncompresses to all six. Or a single law about laws that uncompresses to the entire theory. Constructor theory [1]

We also like symmetry but symmetry is just a way to identify attack surfaces for compression.

[1] https://www.edge.org/conversation/david_deutsch-constructor-...


Is this an article or a book teaser?

(DDG !libgen soccermatics)

Ah, looks very good!


There's two editions of Soccermatics, and his new book Outnumbered.


Related research on basketball: https://www.physics.umass.edu/events/2015-09-09-statistics-b... (arXiV: https://arxiv.org/abs/1503.03509)

I had the pleasure of attending the mentioned colloquium by Sid Redner back in 2015 and I was absolutely blown away by the effectiveness of a simple random-walk model.


> The ability of researchers not versed in the subtleties of mathematics to help develop models contradicts Wigner’s ‘unreasonable effectiveness’ argument. It tells us that people not trained in mathematics are also able to give deep insights to the subject.

This would seem to be consistent with the notion of "common sense" that it's possible even for an average person to be able to ask meaningful questions and cast doubt on subjects they're not specifically trained in.


Newton developed the mathematics of calculus to solve physics problems. That goes against Wigner’s hypothesis. I believe that it is difficult to pinpoint where inspiration comes from. However, mathematical rigor can often help. You could also argue reductionalistically that, since we all live in a physical universe, we are always inspired by something physical.


interesting read. some of the passing diagrams and such are quite interesting. arsenal fan fyi, COYG and FOYS (ef off) spuds fans. with wenger gone arsenal will overtake =)

from a blog post, shows passing networks and has some vector fields overlaid on the soccer pitch. https://www.fourfourtwo.com/features/soccermatics-how-mesut-...


"The unreasonable effectiveness" is the new "considered harmful". Be more creative!


You should write an article called 'The unreasonable effectiveness of "The unreasonable effectiveness" considered harmful'




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: