Hacker News new | past | comments | ask | show | jobs | submit login
Uber open-sources tool to automatically clean up stale code (infoq.com)
142 points by imheretolearn on June 21, 2020 | hide | past | favorite | 63 comments



Pirahna, Uber's code cleanup tool, was previously discussed at https://news.ycombinator.com/item?id=23516823.

The paper Pirahna: Reducing Feature Flag Debt at Uber is at https://github.com/uber/piranha/blob/master/report.pdf.


We just wrote one that does something similar for Typescript if anyone wants us to OSS it... The idea is that any stale code causes a HUGE amount of headache and removing it can be a life safer.


Piranha author here -- Will you be willing to contribute it to Piranha?


@burtonator yes curious what you have or maybe it's a better fit under Piranha?


What does OSS stand for? I know the OS stands for Open Source, but can't quite figure out how this would be used as a verb...


it stands for "open-source software". It's never abbreviated as "OS" for obvious reasons.


Thanks, I guess I'll add "Open Source Software" to my list of unusual yet widely understood verbs.

The English language confuses a native German speaker once again!


To be fair, I am a native english speaker and this is the first time I've seen it used as a verb. "OSS" generally acts as a noun. In this instance I would have used "open source" as the verb of what you could do to your proprietary Software, after which it would become OSS.


Other commenters saying it isn't used as a verb are wrong. They're correct that people never say "I open-source softwared my last project."

However you can definitely use "open-source" as a verb: "I open-sourced my last project." It's pretty common. And in this case "OSS" stands for "open-source", not "open-source software".

What's confusing you is that when we use the abbreviation "OSS" we're really intended people to read that as "open-source" even though the abbreviation technically means "open-source software". btw, you would never verbally spell out "OSS"—chiefly because it's the same number of syllables as "open-source".


I wouldn’t add that to your list of widely understood verbs based on a single HN comment. Any noun can be (mis)used as a verb in modern English, if speaking very informally, and often with a slight tongue-in-cheek humour about the clunky incorrectness of it. Some examples eventually become mainstream (to google something, to text someone, to roadmap it).


I wouldn't call it unusual although I hear FOSS more often.


As a noun I'd agree on it not being unusual, but as a verb it's weird.


I you were to OSS it I would definitely use it!


I'd run something like that over my TS repos for sure!


Can't wait to give it a try


+1 !


Please do!


A similar tool, Vulture, exists for Python: https://github.com/jendrikseipp/vulture

I haven't used it yet myself, but discovered it during a search inspired by this post and thought it would be worth sharing. Definitely a trickier problem to solve for dynamic languages, but looks useful.


I'm surprised to see that Boolean feature flags are common. We almost always ramp up percentages of UUID space or randomly chosen requests, to reduce the blast radius of a bad change.


At $JOB, we have feature flags of innumerable shapes and sizes. Some are based on account standing, some are % gradual rollout at random, others are a more thoughtful low-risk to high-risk rollout across customers and hosts. Some are manual flipped per customer/only on certain dev hosts. Literally anything you can think of, we have tied behavior to it.

But, we've got good frameworks in place such that at the call sites where behavior diverges, it's just checking a boolean.

if PermissionController.get().get(MyPerm.class): doA() else: doB().

I suspect this is pretty common, and its still easy to do the dead code elimination on.


> if PermissionController.get().get(MyPerm.class): doA() else: doB().

oh, but if a behavior changes when a feature flag is active, there's a very strong case for it to be pluggable behavior strategy, so I like these so much better as an unconditional call to `self.getThingStrategy().execute()`


Won't `getThingStrategy()` do something like `if PermissionController.get().get(MyPerm.class): doA() else: doB()` ?


Yes, but you move the logic for deciding which branch to take to an underlying function. It pollutes the parent function less, but results in more overall functions for places to hide.

Also, if the logic gets more complicated than just an if statement (if Permision... and date < cutoffdate: etc) you don't further pollute the parent function.


Naah, but at least that would make the decision simple and clear (if condition return doA else return doB).

Better would be for whatever ThingFactory or getThingService instantiates the Thing to make the decision, and compute it up front.

If-else statements in application logic tangle the concepts of "what should be done and why?" with "let me do this Way 1" and "let me do this Way 2". Ideally a typical service (or model or similar) shouldn't be aware that "feature flags" as a concept exist, and this should be regarded as inimical to their encapsulation. It should just know that it delegates a decision to Way N.


That sounds a lot like enterprisy Java programming. I'm not always thrilled to work with the results of that. :-)


Many feature flag systems ramp up doing that from the config server side. So when your client requests feature flags, it gets assigned true/false on that basis.


How do you test without a feature flag?


The flag becomes a percentage. The feature isn't always off or always on, it's on for a small fraction of workload, and then that fraction grows as you demonstrate it's safe. Ideally you want metrics that tell you whether the experiment is worse or better than the control group.


What you're describing is a common way of using feature flags—except the percentage part comes from how you manage the servers running the binary with config. I.e. on day one, 5% of servers in cluster get True for the flag value. The double the percentage every day until 100% or otherwise rollback if it's a bad cut.


Then rolling forwards and backwards is a whole deployment away, or mucking about with infrastructure, vs tweaking a percentage flag somewhere.

If you want to get fancy with changes (and I've seen it done) you have something else capable of controlling that percentage setting that is tied in to your monitoring. Start out low, say 1% of requests hitting the new path. Automatically ramp up over time to full 100%. If you see failures, automatically drop back to 0% until it can be ascertained that the failure didn't come from the new code.


Partitioning by instance works if you have enough instances to avoid big increases, but at that point you can just deploy known-good and new-feature builds. Runtime checking helps if it's a lot faster than rolling back to the known-good build, or if you're doing concurrent experiments (you may not have enough instances to try every possible combination).


Having done it both ways this would not be my recommendation unless it's necessary - I think it adds a fair amount of complexity.

Some considerations: you'll need some sort of storage mechanism for these flags - is that a centralized configuration service for all your services? Maybe just a table in your database? But database / network calls are expensive to be adding to every single time your code executes the path in question - maybe it makes sense for your service to cache these values locally...but then doesn't that lose part of the purpose of 'fast rollbacks'? Maybe instead of a local cache you spin up a redis instance - but what if this goes down? Will all your instances default to the same value? Etc, etc, etc.

I'm not saying this approach is bad, only that it has complexity, and I find I generally can get away without it.


But how do you test the feature without a boolean flag that you can set to enable the feature?


I think it might be less confusing to say, how do you verify the feature? As in: how do engineers and product managers and designers know that the flagged behavior is correct and how do you verify that in production if all you do is ramp up? How do you make sure the interested parties are always bucketed into the on experiment?


You mean unit testing? You can add a way in your framework to force the flag on.


This is not feature flag then it's A/B testing.


The latest place I work has a dusty monolith full of skeletons and zombies. They’ve been migrating off it to micro services for like 5 years now.

I’d be nice to have a tool to clean it up. But the use cases is slightly different than this one.


Can you elaborate further? It will be interesting to see other use cases for code cleanup.


I like the concept using an analyzer like this but I've been looking for a tool that removes code that isn't perse stale but isn't executed for, say, x amount of time.

We have quite a few online services in production of which I'm sure have a lot of code that isn't executed/touched by our users.

As an example I'm thinking about code in if statements that are never reached because the statement always returns False. These tools discussed aren't capable of detecting such stale code.

Any suggestions?


Do you already know the code that is not executed? If not, dynamic program analyses can help with identify the regions of code that are untouched (e.g., take a look at javaagent for Java programs). Subsequently, you can use some form of reachability analysis to determine which code blocks to delete without causing a compilation failure.


Is this drastically different than https://unused.codes/ ?


From a quick reading, yes. Your tool identifies code that is never run. The featured tool identifies code that never _will be_ run if we make X change to the API.


oh interesting, tbanks for the clarification!


For Piranha: a) Code related to stale flags is deleted b) Determination of staleness is based on status of the feature. c) Patch is created automatically and in a majority of the cases, compiles and passes tests.

Based on my understanding of unused codes, a) unused codes is used to delete deadcode independent of features b) determining the deadcode is based on their usage in tests c) unclear whether the code is flagged for deletion or a patch created.


lol, I just thought about writing a script that does excactly that for javascript code. For easy implementation I thought about using annotations/comments that tell the script what to delete once a feature switch is being deleted.


Total nit pick, but I think Piranha is a totally wrong name here. Piranhas mostly feed on fresh meat, its known they are excited and eager to attack when victim moves in water, reassuring predator that its a healthy meat. Much better name would be Scavenger, or Vulture - a bird of prey that eats on dead meat.

I assume you would rather cleanup your program from dead code, rather than strip it down to the bone from live functions :)


https://en.wikipedia.org/wiki/Maggot_therapy might be a better analogy - you don't want to lose everything, just to clean out the necrotic bits.


As far as I know that aspect of Piranhas is overhyped in popular culture, and they're pretty much omnivores, hunting fresh prey, scavenging on carcasses and eating plant matter. Agreed though that naming it after a pure scavenger would make more sense.


Wikipedia (and the sources listed) contradict this comment, saying Piranhas are vicious carnivores, with some even exhibiting cannibal behavior.

https://pt.m.wikipedia.org/wiki/Piranha


English: https://en.wikipedia.org/wiki/Piranha

> Although generally described as highly predatory and primarily feeding on fish, piranha diets vary extensively, leading to their classification as omnivorous. In addition to fish (occasionally even their own species), documented food items for piranhas include other vertebrates (mammals, birds, reptiles), invertebrates (insects, crustaceans), fruits, seeds, leaves and detritus. The diet often shifts with age and size.

> In another study of more than 250 Serrasalmus rhombeus at Ji-Paraná (Machado) River, 75% to 81% (depending on season) of the stomach content was fish, but about 10% was fruits or seeds


Piranhas are scavengers.

The typical diet of red-bellied piranhas includes insects, worms, crustaceans, and fish.[13] In packs up to hundreds, piranhas have been known to feed on animals as large as egrets or capybara. Despite the piranha's reputation as a dangerous carnivore, it is actually primarily a scavenger and forager, and will mainly eat plants and insects during the rainy season when food is abundant. ~ Wikipedia.


Could also change it to ¬Piranha


/jokes

How does Pirahna clean itself?


I find it strange that a company which provides taxi hailing/routing technology felt the need to write a code cleanup tool.


"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html


A company with billions in revenue, live updates routing around traffic, has complex machine learning modes predicting ride times and costs, and operates in something like 65 countries.

Uber isn’t trivial


I didn’t know they did their own navigation, I assumed they would use one of the map services


Those services charge corporate users. If uber develops nothing homegrown then it is subject to the whims of the major map services, mostly google.

Most drivers use a 3rd party app in practice but uber probably needs to run mapping to avoid a source of weakness/cost.

(Could be totally wrong about incentive structure)


I thought between Google, VLS, Tom Tom, and I can't imagine that I just coughed up a complete list, there'd be enough competition to just procure this function competitively. I assumed it was more the Not Invented Here syndrome of a VC funded company who (until recently) had little of a cap on their Engineering spend.


You can make most of the Big N sound silly like this. Amazon? Basically a warehouse. Netflix, YouTube? They just stream video. Facebook, Twitter? CRUD websites.

It all sounds like anyone can put something like that together, but try scaling up to billions of users.


Exactly. And each one of these companies had built the business around a key idea. I interview a lot of candidates for data scientists, and one of my favorite questions is what makes one of these companies what they are.


For sure if you remove the scalability challenges and profits optimizations techniques, these websites aren't that exciting for engineers.


Uber's article [1] linked on the page mentions it at the start:

> These nonfunctional feature flags represent technical debt, making it difficult for developers to work on the codebase, and can bloat our apps, requiring unnecessary operations that impact performance for the end user and potentially impact overall app reliability.

> Removing this debt can be time-intensive for our engineers, preventing them from working on newer features.

[1] https://eng.uber.com/piranha/




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: