How do you test without a feature flag?

erik_seaberg · on June 21, 2020

The flag becomes a percentage. The feature isn't always off or always on, it's on for a small fraction of workload, and then that fraction grows as you demonstrate it's safe. Ideally you want metrics that tell you whether the experiment is worse or better than the control group.

wahlrus · on June 21, 2020

What you're describing is a common way of using feature flags—except the percentage part comes from how you manage the servers running the binary with config. I.e. on day one, 5% of servers in cluster get True for the flag value. The double the percentage every day until 100% or otherwise rollback if it's a bad cut.

Twirrim · on June 22, 2020

Then rolling forwards and backwards is a whole deployment away, or mucking about with infrastructure, vs tweaking a percentage flag somewhere.

If you want to get fancy with changes (and I've seen it done) you have something else capable of controlling that percentage setting that is tied in to your monitoring. Start out low, say 1% of requests hitting the new path. Automatically ramp up over time to full 100%. If you see failures, automatically drop back to 0% until it can be ascertained that the failure didn't come from the new code.

erik_seaberg · on June 22, 2020

Partitioning by instance works if you have enough instances to avoid big increases, but at that point you can just deploy known-good and new-feature builds. Runtime checking helps if it's a lot faster than rolling back to the known-good build, or if you're doing concurrent experiments (you may not have enough instances to try every possible combination).

RSZC · on June 22, 2020

Having done it both ways this would not be my recommendation unless it's necessary - I think it adds a fair amount of complexity.

Some considerations: you'll need some sort of storage mechanism for these flags - is that a centralized configuration service for all your services? Maybe just a table in your database? But database / network calls are expensive to be adding to every single time your code executes the path in question - maybe it makes sense for your service to cache these values locally...but then doesn't that lose part of the purpose of 'fast rollbacks'? Maybe instead of a local cache you spin up a redis instance - but what if this goes down? Will all your instances default to the same value? Etc, etc, etc.

I'm not saying this approach is bad, only that it has complexity, and I find I generally can get away without it.

solidasparagus · on June 21, 2020

But how do you test the feature without a boolean flag that you can set to enable the feature?

hoorayimhelping · on June 21, 2020

I think it might be less confusing to say, how do you verify the feature? As in: how do engineers and product managers and designers know that the flagged behavior is correct and how do you verify that in production if all you do is ramp up? How do you make sure the interested parties are always bucketed into the on experiment?

foota · on June 21, 2020

You mean unit testing? You can add a way in your framework to force the flag on.

Thaxll · on June 21, 2020

This is not feature flag then it's A/B testing.