Python at Scale: Strict Modules

pmiller2 · on Dec 18, 2020

(2019)

Previous discussion: https://news.ycombinator.com/item?id=21284669

matesz · on Dec 18, 2020

> 90. Consider Static Analysis via typing to Obviate Bugs

Personally I wouldn't recommend using typings (we've tried around 1 year ago for multiple greenfield services ~5k LOC each):

- you will be better off investing into proper testing (I mean here code coverage), refactoring and onboarding documentatio/handholding sessions with devs who are on top of the codebase

- in practice annotations make code LESS readible, not more (unless variable names and general code structure and conventions are bad)

- remember most Python devs have little experience with staticly typed langauges, so in practice your team will be skipping annotations - e.g using Dict[str, Any], which defeats the whole purpose of annotations; actually seasoned C++ devs will be skipping it once they will see they can get away with it

Typings are still in 0.x version and are constantly changing - do it for fun or to put typings on your CV, but don't expect it's going to help with readibility, bug prevention, documentation. These are solved with proper structure, conventions, testing and developer prep (readme + pairing).

If you believe bigger projects need annotations tell me why Sentry didn't go with them yet?

orf · on Dec 18, 2020

> in practice annotations make code LESS readible, not more

Definitely not true, unless you are writing c++ in Python

> so in practice your team will be skipping annotations - e.g using Dict[str, Any], which defeats the whole purpose of annotations

Using “Dict[str, Any]” doesn’t defeat the whole point of using annotations at all, and adds useful documentation for callers and IDEs. It can also be gradually improved later.

> Typings are still in 0.x version and are constantly changing

Typing is part of the standard library. Some new things are added, but the interface is very stable. Mypy is constantly changing.

> These are solved with proper structure, conventions, testing and developer prep (readme + pairing).

And with some degree of type checking/type annotations.

> If you believe bigger projects need annotations tell me why Sentry didn't go with them yet?

What an odd point to make.

sgarland · on Dec 18, 2020

> and adds useful documentation for callers and IDEs.

THIS. This is why I started using them, along with standardizing my docstrings. Enormously helpful when you're trying to remember what the function is expecting to just hover it and find out.

memco · on Dec 20, 2020

Yes, they are probably most useful to me for the IDE completions and docs as well. Raymond Hettinger gave at least one talk [1] where he discussed how there are tradeoffs: reasons why you might use them and reasons why you might not. At the end of the day you can pass whatever you want to a function and the only reason to use types if they make your code better—but there are other tools that might be more worth your time in some situations.

[1]: https://www.youtube.com/watch?v=ARKbfWk4Xyw

matesz · on Dec 18, 2020

I see your points and I was going through pretty much all of them when deciding whether to go with typings or not (eventyally we did because I wanted to try it out and because pressure from Java people who found it odd that we don't care about types - actually we do, but instead of types we have strict naming conventions).

The point with sentry was that when making a decision in field you don't have much experience in, you look what other guys with resources are doing since they had these conversations already.

IMO in practice you will realize that in most cases typings are more of a burden than help - devs don't know typings and will spend a lot of time thinking how to annotate, making constant decisions whether to annotate/not annotate (aka incremental typing). That inconsistency drives me nuts - aka defeats the whole purpose - you do something right or don't do it at all right?

I see the point with documentation, yes it is nice to have, but more important like I said before is proper onboarding readme file and docs + willingness of other devs who are working on the codebase to handhold when necessary.

hibbelig · on Dec 18, 2020

> you do something right or don’t do it at all

Down that path lies madness. The Boy Scouts rule is to leave the camping ground cleaner than you found it. It doesn’t say the campground has to be spotless.

matesz · on Dec 18, 2020

Great analogy.

I see how this sort of attitude can be seen as mad/neurotic. However the good side is that it prevents adding more complexity (rules for when to annotate/not annotate is adding complexity).

CraigJPerry · on Dec 18, 2020

>> Using “Dict[str, Any]” doesn’t defeat the whole point

I think you’ve misunderstood the parent.

orf · on Dec 18, 2020

Python allows gradual typing everywhere. Adding “Dict[str, Any]” gives you value and let’s you catch bugs. Not as much as a more specific signature, but far more than having no annotations.

matesz · on Dec 18, 2020

> Adding “Dict[str, Any]” gives you value and let’s you catch bugs

What types of bugs are you speaking of? Can you share an example?

Remember about constructive feedback method - what, why and how.

pas · on Dec 18, 2020

The classic bugs are passing the wrong type of argument and calling a method on None.

PyCharm did static analysis and caught a lot of these, but it wasn't always that obvious. With a type hint, it's easy for the IDE to spot things. It's easy to integrate mypy into the CI process, etc.

We inherited a py2.7 codebase, written by one dev. It uses a library written also by that dev. We had absolutely no idea what worked how. Sure, looking at the code gave us a very rough big picture, but adding typing helps with the details.

matesz · on Dec 18, 2020

> We inherited a py2.7 codebase, written by one dev. It uses a library written also by that dev. We had absolutely no idea what worked how. Sure, looking at the code gave us a very rough big picture, but adding typing helps with the details.

I see how this could be helpful for legacy codebase - add typings, run mypy and see if it breaks. Good point - thanks.

How many lines of code that inherited codebase had btw?

pas · on Dec 19, 2020

Quite small in LOCs less than 10K. (A horribly non-idiomatic custom ETL lib, and a big ball of MVC stuff using pyramid and sqlalchemy/zope.transaction.)

ArchieMaclean · on Dec 18, 2020

e.g. say you're writing some code that does something like this:

account = queryDatabase(email)

updatedUser = update(account)

ping(account)

queryDatabase returns a dictionary. Later on, someone decides that a single username can have multiple accounts, so someone changes queryDatabase to return a list of dictionaries instead of a single dictionary. Now the above code is broken, since update expects a single dictionary. Of course, you could grep through the whole codebase to try and find all the problems, but it would be easier and less prone to human error to rely on the type system to fail, especially if the change was something less contrived, and if changes like this were happening all the time.

The effect of this could be limited by other means (e.g. naming the function better) but type systems are just another way of adding robustness to a program.

orf · on Dec 18, 2020

You’re asking how type annotations can help you catch bugs. I’m not particularly inclined to synthesise a bunch of examples (especially after the highly condescending initial wording of your comment), it seems fairly obvious and I’m sure you’re smart enough to figure it out.

matesz · on Dec 18, 2020

Can you show me this "initial highly condescending wording of your comment"?

orf · on Dec 18, 2020

I think we’ll leave this thread here, but if in the future you need to ask something from someone in good faith I would work on refining your wording a bit.

matesz · on Dec 18, 2020

> I’m sure you’re smart enough to figure it out

You are making it personal (well, actually trying to insult), where I was asking you to provide more value.

zo1 · on Dec 18, 2020

That makes two of us. What was the parent trying to say, then?

jgraham · on Dec 18, 2020

I recently converted a medium size project from Python 2 to 3 by first adding type annotations to everything, then getting mypy to pass with both versions of Python and then fixing up the remaining issues on the unittests. The functionality of the code depends a lot on external services that are accessed via HTTP/subprocesses. This makes it pretty hard to test well; at best your tests end up exercising an inaccurate partial facsilmile of the external service.

Overall I didn't really love mypy; the type system it implements seems relatively simplistic and it was frequently easier to change the code rather than work out how to write working annotations for the existing code. The lack of support for decorated property setters in particular was a big problem; this particular codebase makes extensive use of those to annotate methods that require mutable access to the underlying data. Nevertheless I'm convinced that this approach to the migration was much faster and less painful than it would have been to try and figure out all the string/bytes differences without any tooling support except for the tests. And post migration the code seems overall somewhat higher quality than at the start.

On balance, I think the lesson was actually that Python isn't a great choice for this kind of project. When you are depending on external services the ability to unit test everything is greatly diminished, but the ability to make type assertions is not. So for correctness you want the latter to do as much lifting as possible. That suggests that other langauges such as typescript (if you still want a language in vaugely the same category as Python), or Rust (if you don't), which come with more expressive, better integrated, type systems, might be superior options.

ensignavenger · on Dec 18, 2020

What version of Py3 were you using? There have been a lot of additions to type annotations in recent releases that have made them better.

I'm not sure what you mean by lack of support for decorated property setters? Can't you annotate them like any other method?

wocram · on Dec 21, 2020

Mypy doesn't like function attributes, https://github.com/python/mypy/issues/2087

codethief · on Dec 18, 2020

> you will be better off investing into proper testing (I mean here code coverage)

Type safety and tests are completely orthogonal issues. Gary Bernhardt explains this really well:

https://www.destroyallsoftware.com/talks/ideology

matesz · on Dec 18, 2020

That is really on point, thanks!

ps. watched the presentation - the idea here is that static type checking can eliminate entire category of possibile bugs which are impossible to be eliminated by test - for example enforce variables to be of specific type only.

It doesn't work that well where in the most bug prone places typings are made optional.

giantDinosaur · on Dec 18, 2020

I'm actually quite curious as to where annotations make things less readable. Do you have any examples? I can imagine if the annotations are wrong and somehow pass checks that this would be true - which is a possibility, I haven't worked with type annotations in Python at all.

matesz · on Dec 18, 2020

In our case biggest problem arose with Pandas, where each dataframe transform would need to have it's own type. When there are a lot of transforms it is more readable to infer what type variable has just by looking at transforms (more or less) or when needing to be precise step through the code with ipdb/jupyter etc.

What about incremental typing? That's what we did, and to be honest I wouldn't do it again. Yes, it's nice to see what type function is returning/arguments it's accepting, but if the name is shit type won't tell you much - you are better off spending more time of figure out better naming convention (prepend dataframes with df_ for example) and keep at it.

Another problem with incremental typing is that now of course you will have to decide whether to annotate/not annotate, and when reading code thinking why this is annotated and that is not annotated.

giantDinosaur · on Dec 18, 2020

Right. Similar problem to Typescript - generally its best users are those already sold on static types, for many others the temptation of <any> and friends is too much, and interacting with untyped/poorly typed libraries + poor type inference can be really frustrating. Although in that case it's still a massive gain for new projects, in my experience.

Barrin92 · on Dec 18, 2020

another solution yet is going back to one of the fundamental principles of message based OO, leaving interpretation of what objects receive and how to handle them to the object.

I've always disliked the idea of using static code-evaluation tools on dynamic object oriented languages, it kind of defeats the point, which is runtime dynamism.

lmm · on Dec 18, 2020

That's not the solution, it's the problem. A large system is impossible to understand when state could be anywhere and any part could be coupled to any other part.

I think you're partly right, but ultimately runtime dynamism has failed; even in Python, the costs of being truly dynamic are higher than the benefits.

Barrin92 · on Dec 18, 2020

the intention of that proposed solution is radical decoupling though. Think of objects being used analog to network servers. The internet doesn't go down because someone's server changes (the BEAM VM probably comes closest as far as languages are concerned).

Objects communicate messages and status and when something crashes it's up to the object to do something, often restart, but either way it's a matter of interpretation.

It's the static typing that couples because it introduces global behaviour through its types, which impose meaning from the top. In a truly dynamic system, nobody needs to understand the entire system.

I admit it's a fairly theoritcal point because I don't think many people even embrace this paradigm any more, and ironically if they do it's in functional languages. But to say it's failed I think is not right, because nowadays people write dynamic languages basically like they'd write Java, they don't embrace the dynamic aspects.

lmm · on Dec 18, 2020

> It's the static typing that couples because it introduces global behaviour through its types, which impose meaning from the top.

I don't think that's true; static typing on a local scale or at interface boundaries is useful precisely because it enables local reasoning. Without those constraints you have no chance of understanding any part of the system without understanding the whole thing. If you go so far as to expose all state manipulation in the type system then that becomes sort of global, but even then I'd argue what you're actually doing is not introducing global coupling but rather exposing the global coupling that was already there.

> But to say it's failed I think is not right, because nowadays people write dynamic languages basically like they'd write Java, they don't embrace the dynamic aspects.

I think that very fact is the proof that it has failed; even in languages that were designed for it, people prefer not to follow that paradigm.

antpls · on Dec 18, 2020

My personal anectodal experience :

3 weeks ago, I had a new pet project, and decided to go with Python again, which I left many years ago, before Python typing was a thing.

I didn't know where to start, so I started by creating a file with data models I would need. Basically C struct-like / record models. All the classes inherited NamedTuple, and I typed all the attributes. I then created a few pure functions based on those classes. NewType is a great addition. When you work with many int or float variables representing different things in real life (price, amount, quantities, etc), it makes the code cleaner and catch some mistakes when you are tired at the end of the day.

I gradually stopped to type things when I introduced pandas into the project and types started to be way more abstract. There is probably a way to make typing work with pandas, but I wanted to learn pandas first and get the project running, so I didn't really try. I expected pandas to do dynamic things under the hood to work, so I didn't bother to understand how types work with it.

Of course, being a hobby project, there is no unit test and no documentation, so I may not see all the pros and cons of typing. But I have a feeling gradual typing is an interesting idea that will continue to improve.

It's nice to have the support of static typing when you need to, but also be able to "just hack" a solution real quick when needed.

matesz · on Dec 18, 2020

> there is probably a way to make typing work with pandas

So we did use a lot of pandas as well. The way to make it work is to create custom types and of course each dataframe will have it's own type which is going to be a total mess.

lmeyerov · on Dec 18, 2020

typing pandas seems like an r&d-level problem as it seems quite close to tricky areas like row types / dependent types that mostly only work in theory

attempting manual workarounds like pure nominal typing for it as you describe is against the grain of a type system and pushes a lot of work to the user. We just... don't. We stick with class ('pd DataFrame'), and the one extension we are thinking is Index, while getting into actual columns gets into a mess quite quickly.

I tried to engage some PL researchers on this awhile back but no go. IMO would be a great project for most type system research students.

hirple · on Dec 18, 2020

Not exactly this, but I've had some success using logic programming (minikanren) to statically assert certain facts about dataframes in spark. E.g. "if Row_a is filtered to Row_b, and Row_a is not null, Row_b is also not null".

misnome · on Dec 18, 2020

> in practice annotations make code LESS readible, not more

Agree, and disagree. I think it depends on lots of things, including how "heavy" you are with type annotations - I think annotating everything, or mandating to, including inline variable definitions - is over-the-top and usually makes the code harder to read. Likewise, chasing down every minor type error for the corner cases can end up wasting loads of time for almost no gain.

I also won't argue that sometimes the type definitions are excessively verbose and get in the way of reading code, and your point about less experienced devs is also true - for developers who aren't so familiar with type annotations, at worst, they make it harder to understand.

But carefully used, they can really aid understanding a function or codebase. We have a large, somewhat messy codebase open source academic codebase where the input to a function could come from dynamic dispatch some ~5 layers away, and usually the first task of diagnosing a problem is working out what you've actually been passed. Type annotations help solve this problem, as a somewhat statically verifiable documentation of inputs. We've found bugs using them.

It's probably not controversial to say that expected types/inputs aren't always obvious from the context of a function, and so need to be documented, to help people using the code. The alternatives to annotations are docstring documenting of types, which is defining types but decoupled and in a non-statically-checkable way - so easy to get them out-of-sync, which is easy when developers are undisciplined; naming (hard, and messy for compound types), pyi files (checkable, keeps them out of the way, but decouples definitions from code), or type comment - which solves some of the readability and checkability, but is still harder to keep up-to-date than inline.

So, overall I like them and have seen them bring value to the code I work with, even if that's just documentation, but don't think it's a black-and-white issue. I've certainly eschewed using them when helping less experienced developers write small personal scripts, but use them in my own code or when refactoring something vaguely defined to cross-check my assumptions.

dragonwriter · on Dec 20, 2020

> remember most Python devs have little experience with staticly typed langauges, so in practice your team will be skipping annotations - e.g using Dict[str, Any], which defeats the whole purpose of annotations

Using mypy or pylance gives benefits even without annotations, and while you can often do better than Dict[str, Any], rough cut annotations like that often, IME, provide value. (For one, they’ll catch paths that would allow either the dict or a key in it to be None when that isn't expected.)

LockAndLol · on Dec 18, 2020

Are there statically typed languages that do support reloading parts of the codebase?

Type hints in an interpreted language just don't seem to make much sense to me, as in, why use a dynamically typed language when you actually want a statically typed one? What are the benefits or staying in between the two?

Besides of course a legacy code base...

untitaker_ · on Dec 18, 2020

> If you believe bigger projects need annotations tell me why Sentry didn't go with them yet?

Because we just came off of Python 2 and typing under Python 2 is awful.

p5a0u9l · on Dec 18, 2020

I couldn’t tell, is __strict__ something they’ve implemented in house, or exists in CPython? I can’t find any references to it outside that article.

eshyong · on Dec 18, 2020

From the last paragraph:

> Strict modules are still experimental. We have a working prototype and are in the early stages of rolling it out in production.

Sounds like it's in-house to me.

codethief · on Dec 18, 2020

Yeah, that was my impression, as well.

Does anyone know whether they've released the code of their prototype somewhere?

coldtea · on Dec 18, 2020

"Many thanks to Dino Viehland and Shiyu Wang, who implemented strict modules and contributed to this post."

aitchnyu · on Dec 18, 2020

Its a fork of Cpython interpreter right?

ashish01 · on Dec 18, 2020

How can one learn about advanced python concepts like this - slots, meta programming etc. Is there a good book for this?

japhyr · on Dec 18, 2020

Serious Python is a short, quick overview of some higher-level concepts I wasn't aware of before. I'm working through High Performance Python at the moment, and it's a much more in-depth treatment of similar concepts.

https://nostarch.com/seriouspython

https://www.amazon.com/High-Performance-Python-Performant-Pr...

mattficke · on Dec 18, 2020

Effective Python is one of my favorite intermediate-level books on the subject https://effectivepython.com/

sedeki · on Dec 18, 2020

"Fluent Python" qualifies, iirc.

pjmlp · on Dec 18, 2020

The Python manuals actually, or if you want to dive into the ideas that Python copies from, Lisp books like "The Art of MetaObject Protocol".

rurp · on Dec 18, 2020

This was a great article that I learned a lot from. One part stood out a bit though.

    def myview(request):
        SomeClass.id = request.GET.get("id")

Is this pattern common in Python projects? A coworker and I were discussing solutions to a tricky problem the other day and one solution that involved a step like this was immediately dismissed as likely to cause a huge mess. We're far from world class Python developers and would definitely struggle with the other issues outlined in the post, but this one seems pretty clear.

krab · on Dec 18, 2020

If you're careful global state is useful. It provides a sort of tele-interface - tunneling a value from the top level scope to the bottom where it is used.

Imagine a larger CLI application. You get all of the command line flags at once at the beginning, but some of them are used at the bottom of the implementation. Passing them all across all the layers gets tedious so you would invent some sort of "context" object passed through your functions. Another example is Flask's global request object.

If done correctly, it may be much cleaner than passing the values through the stack. You must not overuse it and you have to think about thread safety. It's useful to ensure the values are set once and then they're immutable to avoid surprises, and you must ensure the global value is cleaned when no longer valid.

kortex · on Dec 18, 2020

Of course global state is useful. All interactions with the universe (all i/o, anything with time) involve global state. So what makes DB's a smart pattern but global variables usually bad?

It gets hairy whenever you have "behind your back" mutation within a single scope. Because db calls are costly, people usually batch the transaction and pass the result to mostly pure functions. If you were to do something silly like wrap a Redis get/set in a Mapping, it'd feel an awful lot like a global dict.

This is why things like transactions, event sourcing, and the reactor pattern exist. You batch all i/o to the global state so you a) deal with immutable structures everywhere else b) have ACID c) have traceability.

krab · on Dec 18, 2020

Yes. And also transactions are interesting because of their explicit nature. Including STM. They lead you to think at least a bit about all the concurrent traffic.

theelous3 · on Dec 18, 2020

Unless there was a great case for it, it would be dismissed as you said.

I had a reasonably legitimate usecase for this pattern recently, wherein there is a singleton pattern, but this singleton acts as an kind of metaclassinterface for an instanciator under the hood.

The user would inherit from a base class, define the required abstract methods, and create a singleton for use in decorations. The invocation of this singleton spawns an instance of the internal engine and gives it access to the custom functionality the user defined in the abstract methods, by assignment of the bound method to the internal instance.

kortex · on Dec 18, 2020

IMHO the cleaner way to achieve this is some sort of message/reactor pattern, passing data to some in-process event sourcing store, that way you don't have things altering globals willy-nilly.

However, I recently needed to just yeet some large data objects across two contexts. The solution I used was the Borg pattern with a Mapping interface and with an RLock around getitem/setitem.

It feels surprisingly clean. It behave much like an external message queue, which is what it was replacing.

https://github.com/faif/python-patterns/blob/master/patterns...

fantod · on Dec 18, 2020

If you're referring to the monkeypatching, I would generally try to avoid it unless but that doesn't mean I wouldn't ever do it. Not sure why you would do this in this example though.

ipsum2 · on Dec 18, 2020

It seems like its mostly used for patching in tests, e.g.

    def test():
        SomeClass.do_network_call = lambda x: x
        < test actual logic >

fnord123 · on Dec 18, 2020

This will break other tests because there is no logic here for reverting SomeClass back to it's previous behaviour. Use the patch function as a decorator or as a context.

ipsum2 · on Dec 18, 2020

Yes, this is an anti-pattern, not an example of a useful piece of code :)

mbreese · on Dec 18, 2020

It can be common to use thread-local variables to store state for a single WGSI request. I can't say I'd recommend that either, but that's the most common use-case I've seen for something like that.

whb07 · on Dec 18, 2020

The problem with this is something most Pythonistas don’t think about, the return type of the function.

Officially it’s either a None or Int. But most people would just say Int. Will cause problems obv when not properly handled.

hannofcart · on Dec 18, 2020

Python type annotations, especially the 'Optional' type annotation in conjunction with a good typechecker like mypy alleviates this problem.

whb07 · on Dec 18, 2020

Agreed, there is a perverse attitude where people jump into Python for the “it just works!” mentality, and hate the idea of type annotations.

zo1 · on Dec 18, 2020

I'm confused, the function doesn't return anything so why would it be an int return type? It should be "None" because it's a void function.

whb07 · on Dec 18, 2020

Oof well let’s just assume it did, it’s a view about the specific user-ID so in theory it would return that.

I misread and then I’ll blame my mind for thinking implicit return as last line ;)

1337shadow · on Dec 18, 2020

This is a pattern to avoid because SomeClass is shared across requests.

memco · on Dec 20, 2020

> log_to_network(f"{func.__name__} called!")

I'm seeing a lot of examples of logs using f-strings, but I was taught that you should use % string syntax because it would more optimally handle the string when logging is filtered, disabled, etc. Is my understanding faulty, outdated or correct and unused by others?

karlicoss · on Dec 20, 2020

I guess, by %-syntax, you mean templace strings, something like logging.info('logging object %s', obj)? (as opposed to the 'old' printf-style formatting with % operator)

This is a fair concern, but in my personal experience, that was never an issue in Python. Sometimes I had similar problems in c++ code, but then typically if you choose c++, it's probably for the very reason you expect to squeeze every last bit of performance.

Another potential benefit of pattern strings is that they allow for more defensive behaviour

- e.g. imagine __str__ method for obj throws the extension, then f'logging object {obj}' will crash immediately. logging.info('object %s', obj) can catch the exception from __str__, log it, and continue

- or imagine a typo in the logging pattern, something like logging.info('objects %s %s', obj1). Depending on the logger configuration, it might just log the error (e.g. 'missing value') and carry on without crashing.

Such defensive behaviour may or may not be desirable, of course. In my experience, such errors just never happen (especially the latter, if you use f-strings and mypy, most of these can be statically checked). But YMMV

krab · on Dec 18, 2020

I worked on a codebase in Django that started in > 5s (after removing all side effects and moving optional stuff to be lazily loaded). Incremental reload would be useful, for sure. Even with times in the range of multiple seconds, the reload slows you down noticeably.

Compared to Java, we had slower startup times with Java, but then the hot reload worked very well, directly from IDE.

Jiocus · on Dec 18, 2020

“the order of imports is [..] an emergent property of the imports present in all modules in the entire system (and can also vary based on the entry point to the system).”

I think this sounds beautiful.

kuu · on Dec 18, 2020

This sounds as chaos and most likely "hard-to-solve" errors to me :)

Jiocus · on Dec 18, 2020

I fully agree with you, (it's both in my opinion) )

carapace · on Dec 18, 2020

Footguns upon footguns and armored boots made out of footguns.

cosmic_quanta · on Dec 18, 2020

To be fair, from the article:

> But we’re past the point of codebase size where a rewrite is even feasible.

So realistically, what else can they do?

carapace · on Dec 18, 2020

They've built themselves a special kind of hell. I call this sort of thing Winchester Mystery House development. It's amazing what you can get away with when you're sitting in a fire-hydrant of money. Entropy, gradients, etc., like radiotrophic fungi in a cracked reactor this codebase has burgeoned and now they have a tiger by the tail, a mutant three-headed tiger with laser eyes.

Anyway...

Imagining myself coming in as CTO or whatever, with the mandate to "fix" the situation (is Instagram in any sort of trouble? I don't follow such things. The point is, maybe it isn't broken by the metrics that IG cares about.) ...this is what I'd do.

Since I don't have a lot of data on the details of IG's code (and I can't bring myself to reread that horror story this early in the morning) I can only speak in generalities. The most important thing would be that there's a real mandate to "get 'er done", with that hypothetical postulated, I'd issue some top-level directives:

#1) Reduce LoC by 10% per month.

#2) 1/5 of all coders are on refactoring duty, not features.

#3) Start counting the beans. Track the cost of existing features to be able to predict the cost of proposed features.

Not being a particularly original thinker, I'd lean heavily on prior art cribbed from the movie industry. To wit: "Hollywood Secrets of Project Management Success" by James Persse https://www.microsoftpressstore.com/store/hollywood-secrets-...

If you reduce LoC, prioritize refactoring, and track your actual costs transparently, I think that would have a chance of righting IG's software boat.

strokirk · on Dec 18, 2020

What would your replacement language of choice be? Personally I'd balk at forcing all my developers to learn the new, untested language. The grass always seems greener before jumping over the fence.

carapace · on Dec 19, 2020

You don't necessarily need to replace the language. Python can be used for large projects with some discipline, I feel. But I might look into compiling with Cython or Nuitka (I have no idea why Nuitka isn't a bigger deal in the Python ecosystem!)

ilaksh · on Dec 18, 2020

They should have looked into Nim.

coldtea · on Dec 18, 2020

Yes, they should have changed their hundreds of thousands of lines of code for a new, less popular, less mature, less supported, hugely smaller ecosystem and libs, language, to solve one specific issue they have...

/s

divtiwari · on Dec 18, 2020

They should rewrite it in Rust!! /s

cutler · on Dec 18, 2020

Why not just use Clojure?

avador · on Dec 18, 2020

Look, here we have the humble “Just”, sitting low in the wordy shrubs, almost hidden from view, pecking at its surroundings with confidence. Its natural habitat rife with nutrition, the Just pecks in any direction it likes, and, with ease reminiscent of more rare species, like the Merely, or the Simply, fills itself up for blind and rampant reproduction across all the fields of human thought.

alexpetralia · on Dec 18, 2020

So funny. Love this. I imagined David Attenborough's voice.

coldtea · on Dec 18, 2020

They have a problem in their Python codebase, and someone says, "why not just use Clojure". Now they have 50 problems.

1) Porting hundreds of thousands of lines of code 2) New infrastructure (for building, testing) 3) Training programmers in Clojure and waiting for them to get good 4) New Clojure dev hires 4) New bugs due to porting 5) Packages they rely on not being available 6) ...

I mean, do people even stop to consider things when making such suggestions?

cat199 · on Dec 18, 2020

from tfa:

"One reasonable take might be that we’re stretching Python beyond what it was intended for. It works great for smaller teams on smaller codebases that can maintain good discipline around how to use it, and we should switch to a less dynamic language.

But we’re past the point of codebase size where a rewrite is even feasible. "

YMMV.

Guthur · on Dec 18, 2020

I've been doing this for long enough to know that if you think static typing will save us from the complexity of large organically grown code bases you'll be sorely mistaken.

The code and system design will reflect you culture per Conway's. If you value fast delivery it'll be full of dirty hacks throughout, piling one on top of each other. If you have single tight nit team it'll ram a million and one concerns into the same system, and so on.

Languages will not save us from ourselves, ever.

lmm · on Dec 18, 2020

That's like saying a workout plan can't keep you in shape because you need discipline to stay in shape. Static typing is the only way I've found to save you from yourself; it's how you can set guardrails that make it easier to do the right thing than the wrong thing. If not static typing then what?

sateesh · on Dec 18, 2020

I agree static typing is no silver bullet.But when there is a choice between looking at python code written some time ago and wondering (and at times speculating) what type the function args are v/s inferring from function args (say in Go) is a smoother experience, especially when debugging a thorny heisenbug causing an outage.

Guthur · on Dec 18, 2020

Don't worry golang will have it's day of horror story code bases where developers made all sorts of wacky decisions around pointer passing and badly specified interfaces within a code bases 100kloc too large.

redis_mlc · on Dec 18, 2020

FYI: Perl5 has always had moderately strong static typing and runtime warnings, as well as forward references. I always wince when people try to tell me that python is "better" or "more modern."

To enable in Perl:

use strict;

use diagnostics;

luord · on Dec 18, 2020

Indeed. In fact, a strong contender for the absolute worst codebase I've worked on was in Go. No type system is going to save you if you if you don't follow pretty much any good practice. Conversely, well modeled systems can be coded in any language, as long as good practices and good communication are continuously enforced.

There are even studies on this and, so far, there's no strong correlation between type system (or paradigm, for that matter) and number of bugs in a codebase.

In fact, one recent study on the matter ended up turning into a flamewar between two different groups of scientists, the ones who first published the study and the ones trying to replicate and double check the data... As discussions on type systems and paradigms are won't to do. Evidently not even academia is safe.

I wish there actually were empirical evidence to support one's preference in that area, if only to get rid of those flamewars that have been going on for decades. Alas, until that happens, what language one prefers mostly remains opinion or personal experience.

lmm · on Dec 18, 2020

> In fact, a strong contender for the absolute worst codebase I've worked on was in Go.

So the most notoriously limited typesystem of the century, the one that ignored decades of progress, lead to a bad codebase? And you see this as an argument against type systems?

luord · on Dec 18, 2020

I don't think I was making any arguments. Furthermore, if there's a takeaway from my comment, is that I don't think there are arguments to be said in that particular discussion, only opinions supported by one's specific experience. There's certainly a dearth of solid empirical evidence one way or the other.

I was also agreeing with GP's saying that "static typing won't save us". There are many things to say about Go's type system, but it is definitely more static than Python's, which I've also worked with.

lmm · on Dec 19, 2020

Go-style type systems won't save us, but Go's type system is literally from the early 1950s. It's closer to Python's type system (i.e. unityping - "static type system" is a tautology, the things that Python calls types are not types) than it is to a post-ML type system. Judging the limits of type systems based on Go is like judging the limits of VR based on the Virtual Boy.

strokirk · on Dec 18, 2020

Can you send a link to the study? Sounds interesting!

luord · on Dec 18, 2020

My favorite summary of the whole thing is here: https://www.hillelwayne.com/post/this-is-how-science-happens...

It was shared here a while ago (https://news.ycombinator.com/item?id=22531453), if you also want to read the HN discussion about it.

pjmlp · on Dec 18, 2020

I learned during the first .com wave while using our own in-house version of ActiveAOL inspired server, and friends doing Zope projects, not to take shortcuts with dynamic languages at scale.

Static typing does solve some issues, provides tooling opportunities, and easier ways to generate native code when that performance gets called into action and like here the application is behind any hope of getting rewritten.

In fact, pretty much so, that all dynamic languages that got widespread adoption are now trying to fit some kind of static typing support into their infrastructure.

oblio · on Dec 18, 2020

Giving up and not using superior tools won't help us, either.

mhh__ · on Dec 18, 2020

I think this is true however there is a tendency to think that not using Python can only mean using C++, Something on the JVM, etc. C++ is an awful language, you can do better (albeit it's potentially not for the faint of heart)

Static Typing is important, but the thing with Python is really the lack of a compiler - compilers (ignoring efficiency) catch bugs now rather than half way through a job (Obviously Linters do exist but a sufficiently advanced Linter is not only repeating work a compiler would've done, but also probably requires some kind of help e.g. annotation)

An example of something that demonstrates Static Typing working and not working at the same time: Compare the code of GCC and LLVM, they're both in C++ (now) but one is hugely better structured and easier to read - GCC grew organically which is part of the problem, but LLVM was specifically designed around a certain architecture (And still grew quickly).

To paraphrase Batman, "It's not what type-system I use underneath, but what I do that defines me" definitely applies but drawing equivalence between all code regardless of language isn't right either - the language you use affects the way you code and how you structure your programs.

dragonwriter · on Dec 18, 2020

> Static Typing is important, but the thing with Python is really the lack of a compiler - compilers (ignoring efficiency) catch bugs now rather than half way through a job

Nope, that's static type checkers (which are often part of compilers), not compilers as a separate thing from static type checking. In fact, Python has a compiler run in the course of normal execution.

And Python also has static type checkers available (and good editor integrations for them), they are just separate from the compiler.

> Obviously Linters do exist but a sufficiently advanced Linter is not only repeating work a compiler would've done, but also probably requires some kind of help e.g. annotation)

Yes, static type checkers whether incorporated in a compiler or separately tend to need type annotations to do their best work, especially if the type system isn't specifically crafted to maximize inferrability (Haskell can get away with fewer annotations than Java, often.)

taeric · on Dec 18, 2020

It's odd. The crux of this post seems to be about not allowing new commiters to do many of the tricks that got the code to where it is.

Which is fair, but also muddies a point. That being that many will just point out how they shouldn't have done all of those tricks. Many actually sound like a straw man. Laughably so. But they got them to where they are.