I have worked for some pretty arrogant, business types who fancy themselves “data driven“ but actually they knew nothing about statistics. What that actually meant was they forced us to run AB tests for every change, and when the tests nearly always showed no particular statistical significance, they would accept the insignificant results if it supported their agenda, or if the insignificant results were against their desired outcome, they would run the test longer until it happend to flop the other way. The whole thing was such a joke. You definitely need some very smart math people to do this in a way that isn't pointless.
This. Ron Kohavi 1) has some excellent resources on this 2). There is a lot of noise in data, that is very often misattributed to 'findings' in the context of A/B testing.
Replication of A/B tests should be much more common in the CRO industry, it can lead to surprising yet sobering insights into real effects.
A/B testing works fine even at a hundred users per day. More visitors means you can run more tests and notice smaller differences, but that’s also a lot of work which smaller sites don’t really justify.
Isn’t the biggest problem with A/B testing that very few web sites even have enough traffic to properly measure statistical differences.
Essentially making A/B testing for 99.9% of websites useless.