As a long-time Java developer, Python was a beauty. It brings back the joy of pr...

kuschku · on Oct 12, 2017

Python is only fun for tiny projects. Once you reach 120k LOC in a project, refactoring in Python is an insanity even with PyCharm, and debugging becomes impossible, too.

Have you tried Kotlin?

chirau · on Oct 12, 2017

Data Engineer here... How do you get to 120K LOC without splitting up your infrastructure? If anything, it is poor design on your part. Python is beautiful. I've used it at 3 different companies now, 2 of which i encouraged them to try it out and they have nothing but love for it.

kuschku · on Oct 12, 2017

Even if you split your infrastructure, have you ever tried refactoring larger projects, with many contributors, while ensuring API contracts are kept?

Without a strict and static type system it becomes quite problematic to ensure new code keeps the API contract, unless you have unit codes for every possible value.

A good type system accelerates your coding speed, compared to writing equivalent unit tests, and it improves your quality, compared to no testing.

tanilama · on Oct 12, 2017

> A good type system accelerates your coding speed, compared to writing equivalent unit tests, and it improves your quality, compared to no testing.

Coding speed is least of my concern for a ML project, to be honest. And unit tests aren't useful either, since ML by large is not deterministic. A lot u said is true for web application, but didn't really apply for a ML project

nielsbot · on Oct 12, 2017

“accelerates coding speed”

this is debatable

kuschku · on Oct 12, 2017

Well, the comparison was to writing unit tests that provide the same safety as an equivalent type system.

And compared to that, the type system is certainly faster.

lstmemery · on Oct 12, 2017

Numpy enforces type consistency within arrays. Type errors are still possible but generally rarer and are noticed sooner than base Python.

nielsbot · on Oct 13, 2017

valid point

HelloNurse · on Oct 12, 2017

Once you reach 120k LOC in a machine learning project, you should have split it up into disparate projects (input adapters, interactive applications, transforming results...) many orders of magnitude ago, even in verbose and refactoring-friendly languages like Java.

real-hacker · on Oct 24, 2017

Most deep learning programs are in the range of 100s lines of code, even for quite complicated models.

bertomartin · on Oct 12, 2017

Can you be specific on what makes this a headache? Your "refactoring" tells me that the code probably wasn't well structured in the first place and if so this would make refactoring difficult for any language, particularly dynamically typed ones.

grtrans · on Oct 12, 2017

> Your "refactoring" tells me that the code probably wasn't well structured in the first place

Well considering this is a realistic scenario for fallible humans, it’s still decent advice to keep your exploratory projects in python small to avoid ridiculous tech debt. It’s not quite as bad as with ruby, but it’s close.

kuschku · on Oct 12, 2017

I've got exactly that experience.

Many languages make it problematic to keep code actually bug-free and maintainable, and Python and especially Ruby are problematic for that, while Java and Kotlin, but even C++ (with a strict style guide) are a lot nicer to work with at scale.

If you want to keep consistent APIs between modules, strict types and checked exceptions are very helpful, while with python one typo can lead to accesses being lost — which is why so many use slots nowadays, and TypedPython, and annotations. But if I do that, I might as well use Java or Kotlin, and get a better IDE.

Compared to unit tests, strict and static types are faster, compared to no testing, static types are safer.

bertomartin · on Oct 12, 2017

I probably could have put that better, so here goes: shitty code can be a challenge to refactor regardless of language

tanilama · on Oct 12, 2017

Python in data/ml rarely goes into that scale. It is used for Training. Several thousands line per project at most, it is tractable and I don't think machine learning models really can be refactored or debugged like a web application.