As a long-time Java developer, Python was a beauty. It brings back the joy of programming and makes data manipulation a breeze. C# is better than Java, but it's still not as elegant/simple/clean as Python for data science.
Python is only fun for tiny projects. Once you reach 120k LOC in a project, refactoring in Python is an insanity even with PyCharm, and debugging becomes impossible, too.
Data Engineer here... How do you get to 120K LOC without splitting up your infrastructure? If anything, it is poor design on your part. Python is beautiful. I've used it at 3 different companies now, 2 of which i encouraged them to try it out and they have nothing but love for it.
Even if you split your infrastructure, have you ever tried refactoring larger projects, with many contributors, while ensuring API contracts are kept?
Without a strict and static type system it becomes quite problematic to ensure new code keeps the API contract, unless you have unit codes for every possible value.
A good type system accelerates your coding speed, compared to writing equivalent unit tests, and it improves your quality, compared to no testing.
> A good type system accelerates your coding speed, compared to writing equivalent unit tests, and it improves your quality, compared to no testing.
Coding speed is least of my concern for a ML project, to be honest. And unit tests aren't useful either, since ML by large is not deterministic. A lot u said is true for web application, but didn't really apply for a ML project
Once you reach 120k LOC in a machine learning project, you should have split it up into disparate projects (input adapters, interactive applications, transforming results...) many orders of magnitude ago, even in verbose and refactoring-friendly languages like Java.
Can you be specific on what makes this a headache? Your "refactoring" tells me that the code probably wasn't well structured in the first place and if so this would make refactoring difficult for any language, particularly dynamically typed ones.
> Your "refactoring" tells me that the code probably wasn't well structured in the first place
Well considering this is a realistic scenario for fallible humans, it’s still decent advice to keep your exploratory projects in python small to avoid ridiculous tech debt. It’s not quite as bad as with ruby, but it’s close.
Many languages make it problematic to keep code actually bug-free and maintainable, and Python and especially Ruby are problematic for that, while Java and Kotlin, but even C++ (with a strict style guide) are a lot nicer to work with at scale.
If you want to keep consistent APIs between modules, strict types and checked exceptions are very helpful, while with python one typo can lead to accesses being lost — which is why so many use slots nowadays, and TypedPython, and annotations. But if I do that, I might as well use Java or Kotlin, and get a better IDE.
Compared to unit tests, strict and static types are faster, compared to no testing, static types are safer.
Python in data/ml rarely goes into that scale. It is used for Training. Several thousands line per project at most, it is tractable and I don't think machine learning models really can be refactored or debugged like a web application.