i've been coding in python professionally for 4 years now, and i'm currently working on the biggest project i've ever worked on.
to me, being scared of changing the signature of a function because the static analyser will not be able to spot all the places i've used this function is a real problem ( along with incomplete autocomplete ). i do unit test everything, but i'd like to keep unit tests for things a computer can not theorically do.
since i'm still in the early phase of the project, i know that python expressiveness is an edge, but i'm looking right now at what's going to be the "definitive" language i'm going to rebuild my product for the next 3 to 4 years.
Python badly needs optional typing. really. i'm pretty sure that would solve both the speed and tooling issues. right now, for me, it starts to become unsuitable as soon as you reach 5-10k lines of code and a team of 2.
I'm surprised you're hitting that boundary at just 5-10K lines. I work on a team of 6* with about 60,000 logical Python statements (Not including tests) and aren't running into any language or framework induced roadblocks.
And this isn't even a case of "If you do everything perfect like me, [Language] works great!" - We only have 45% code coverage, and our code architecture, in certain places, is really sub-optimal (Caused by us, not the language).
When I'm about to make a major change to a function (different inputs or different outputs) I'll always start by grepping first to get an idea of what I'm about to get myself into, write or adjust tests for the new 'signature', change the function, and code/debug/test until it works. My team is in the loop before my code is committed to a shared branch.
* We have other responsibilities besides the Python code.
Edit:
If your team is stepping all over each other at such a low LOC, my hunch is that its related to a communication problem with the team, your code is too tightly coupled, or there's not enough architecture planning (Too much is bad, but so is too little).
I guess it really depends on what you're building. Just to give you a taste here's the kind of thing i'm doing:
- tree-structured sqlalchemy managed objects comparaison , generating diffs, then applying diffs to those trees, and persist everything. I'm using sqlalchemy declarative approach.
That diff applying is performed in a celery background task, reusing my flask configuration.
So, in the worst case, i have to deal at the same time with :
- Business logic on a bunch of SqlAlchemy ORM object ( declarative approach)
- a Flask request context
- a celery task context
- and sqlalchemy session
At that point, i'm changing the signature of a function that takes pieces of those three parts to perform some business logic. Now i'm telling that the IDE (pyCharm, the best one) and python "compile phase" doesn't give you a CLUE on what you're doing.
You're dealing with so much "magic" that it becomes unmanageable. You don't need that many lines of code to reach that point.
EDIT : you've got 6 people working on 60 LOC spread on 8 discrete apps. That's about the same amount of isolated group of LOC per person than me (a person having to deal with a group of 5-10K LOC)
I'm not sure the complexity breaks down that way. More people + more code means things need to be better organised because things might change under you without warning (or you may have to work with an unfamiliar part of the system).
How is your app structured? I'm guessing that Flask is the top level glue and everything else is scattered around the Flask app. That's the general approach (in most modern MVC frameworks) and I think it's also the root cause of complexity.
Celery and Flask (and sql alchemy too) should really be asides to the main codebase. The code should be layered and discrete libraries for handling different parts of the system. If you have 6k loc that all cross reference one another then you have problems in any language. Presumably there are a number of different components in there. Each should stand on its own with as simpler api as possible. As ever, too much coupling is going to make it impossible to reason about your code.
If you're about to change a signature for a function, it should already be fairly obvious as to where it is called from. If not, you need to ask yourself why. What is this function that's so fundamental to the system that it could be called by any module? Why is it buried in another module an being accessed from elsewhere?
My current app has about 4k loc in python and the same again in js (angular). It's broken into dozens of parts that I only connect where needed through a simple api.
At the core is a sort of image processing library (that itself contains lots of different components). On top of that is a system that works with the image processing. Above that another system that interacts with the data models, uses the system below and farms out processing to picloud (though could use celery). Finally, the Flask layer just provides a web interface to talk to the system that handles that business processing. I can tap into any of those layers to drive them. The point is that I can operate at a high level without needing to consider any details of deeper parts of the system.
These are the layers of abstraction that make a system understandable and stop it from being brittle.
The problem lies in the interfaces. Components, even when they are independents, expose interfaces. Those interfaces acts as a contract between the component and its users.
Python needs a way to make those interfaces automatically verifiable.
It's even worse once you start to use big libraries. If you're using top level functions, then maybe your IDE can help you, but as soon as you're dealing with magical properties or parameters, that becomes a mess.
Take for example the "desc" magical function in SQLAlchemy, on things like "order_by" on relationships. That's extremely useful and clever, but i'd really like python to give me some "ok, you're not doing things wrong" message as I'm typing.
Even better, once i enter a relationship declaration, it should give me a list of all the parameter i can use, along with the things thoses properties accept for value. This way i wouldn't have to check the documentation every time i'm writing one.
It could also let me discover new things as i'm typing ("hey what's that property doing ? that looks interesting..."). Autocompletion is another way of discovering APIs.
But for that, you need type declarations.
EDIT : as for my api, it's really nothing fancy. It's structured in three big parts "admin / common / public". They each have their "model / business / service" layers, and each have their modules. Only the service layer is impacted by flask. I have some "utils" modules for very low-level stuffs (json serialization, etc).
Flask configuration are used a bit everywhere, because i want my api to have only one configuration file.
Nothing special, really.
It's not a total solution but I find the zope.interface and zope.component libraries really good for this. I believe that zope.component provides some stuff to build unit tests that verify if and interface is correctly implemented/provided too.
I definitely understand where you're coming from. On the discovery aspect, maybe that's just not something I notice because I work in ipython so it's interactive anyway.
Not to come across as too much of a Haskell fanboy, but this is almost the exact sweet spot for Haskell: Typesafe operations over recursive structures. I'd say C# and F# would also be good choices. I completely understand and agree with your concerns about microsoft, but there always mono :-)
On a separate note: sqlalchemy is one of the best pieces of software I've ever used in any language!
Biggest benefit is probably the flexibility. SA allows every pattern from raw table access right up to complex object hierarchies mapping to joins or views. If you've already got an existing database structure it's invaluable to be able to hide the "implementation" in this way (I know that's not really the right term) and just present the API that better reflects the domain.
On the other hand it's also a really good tool for creating the tables yourself from the python table declarations.
I've used SA in both enterprise environments mapping really hairy old database schemas to greenfield web apps and I've never found anything that it can't do well. I'd go so far as to say If you're doing relational database stuff in python and you're not using SA - you're doing it wrong :-)
Great, thanks for the summary :) I'm not using Python but I'm writing an alternative HaskellDB API, so it's good to know the strengths of other systems.
Would definitely like to hear more about that! I'd like to build the backend of whatever I build next in Haskell. Is it on github or accessable anywhere?
It's currently internal only, but hopefully I'll be able to release it to the public soon. I'm happy to correspond by email about it. How should I get in touch with you?
What's your test coverage? It's usually not that hard to reach 100% if you have a decent coverage analyzer. Or even if you don't - one easy way is to never write a line of code unless you have a failing test that exercises it.
Changing a function signature is the kind of thing that tests can & should catch. You need them anyway to catch edge cases in the logic and document the code, and then once you've done that you usually get pretty good coverage for free.
My unit tests did catch it, but if you've ever (seriously) coded in a statically typed language, you really like to have those errors show almost as you type.
To give you an example on how i do small refactoring in objective-c :
I change the function (or class) signature i know i need to change. I compile, then xcode shows me every single line of code i need to change.
Most of time, it shows me places i didn't remember also used that function (i'm talking "utils"-like functions).
There's nothing that prevents an IDE from running all the tests associated with a project and then highlighting the line numbers of any exceptions that get thrown. Infinitest will do this for Java in Eclipse and IntelliJ, though I don't think it works with Python.
Python has an unfortunate culture of "just an editor, please", so the main IDEs for it are light-years behind Java IDEs, but you could probably easily whip up a vimscript or .el that does this and highlights the line in the editor.
It's also possible you have a 60k loc python app because you aren't able to get near the amount of code reuse or abstractions due to the lack of static typing, and this is why you haven't run into issues with refactoring.
It wins you more code reusability with less code written to accomplish aforementioned reusability.
I always run into silly inheritance chains, type coercion or adapter fns/methods to handle reusing code for unlike-types, rather than relying on a common base of functionality regardless of where it was derived from (function API, inheritance, mixin, etc).
Best way I've seen this done is in Clojure. Good mix of the best of both worlds in terms of static and dynamic typing, multimethods, protocols, essentially structurally typed arguments, etc.
Cython can turn Python source into a binary `.so` that you import, thus removing the overhead of the bytecode compiler/interpreter.
You can also annotate any or all of your Python variables with C types. This way, Cython will generate raw high-performaning C using Python syntax. Other Python functions can call your optimized C versions, allowing you to optimize only the hotspots.
Further, if you need to integrate with other languages, Cython can import other libraries' C functions and call them. Conversely, it can generate a header file for your Cython code, thus allowing other languages like C to call your existing Python or Cython functions.
My experience with Cython is limited to some hacking on lxml, but I found the compilation phase extremely slow. So, sure, you get performance but you get a big hit in terms of development speed.
I don't recall Cython's default optimization level, but you may be able to decrease compilation time by adding a -O0 flag for debug builds. This StackOverflow question addresses adding optimization flags to a Cython setup.py script: http://stackoverflow.com/q/16285011/1628916
Huh. I find that if you have a short cython script, you can just hack aruond in the IPython Notebook. Evaluating a 200-line cython script takes under a second for me, which is fine for my purposes.
I'm in a similar boat... Been programming python for ages, and now working on a huge, shared "hyper-agile" (yes they really call it that) code base. It's constantly breaking, even with unit tests!
Personally, I'd love to be doing multi person large project stuff in Haskell... I can dream :-)
I think Haskell has the same problem as Lisp. Since you can abstract lots and lots of things away, there are tons of codebase specific abstractions that developers must keep on their heads before they can start coding.
That said, the Haskell situation is better than Lisp's, because it has a big set of "default" abstractions that become part of the language, reducing the number of codebase specific ones.
Interesting that you would say that. To me Haskell often feels like the most reads-like-English language; e.g. Parsec parsers written in applicative style, or function in pointfree style. Of course that is not true of all Haskell constructs...
I find that the unit of abstraction is smaller and more manageable in Haskell. What I mean by that it that there are a core set of abstractions based on mathematical laws that are ubiquitous and compose-able. Once you understand them and how they work together you see them everywhere and can very quickly understand new code you see. Contrast this with python or other more mainstream languages; you have similar "patterns" with inconsistent naming and implementation details which you first have to grok - every time mind - before you can start to understand the code.
To see what Haskell is like for larger projects, you could go and have a look at GHC's or xmonad's code, although GHC does use some conventions that most Haskell code doesn't.
bsaul, are you on windows by any chance? we're looking for some good large scale test cases for our python static analysis engine. would love to see how PTVS (pytools.codeplex.com) does on your project. thx.
but basically you should do a "Create project from existing code", then wait for a while until the analysis engine parties on your code. if you run into any issues/questions, feel free to contact me at zatroms@microsoft.com (my login is spelled backwards). thanks!
That's a good question. There was my line of reasonning :
1_ I just coded something close to what i'm doing in java (using spring), and that made me feel to stay as far as i could from that bloated ecosystem (although the language in itself is ok)
2_ I looked at golang but was scared away when i saw that there wasn't generics or anything that would code some "meta level" functions. Plus, it still felt not broadly enough adopted at that time. Yet it IS the language i'm planning to rebuild my project for now (although i'll probably first rebuild just a small part of my api to see how the language behave with business process modeling and DB interactions, because i'm still very skeptical).
3_ I love C# , frankly the best language hands down, yet i don't like the idea of depending on the MS ecosystem (i've always felt like one day they'll come and make me pay big time). Plus windows seems like a falling platform to me.
4_ I just coded a moderately big website on django, and a big "sudoko-like" generator in pure python and loved the expressiveness and power of that language (as long as you're using the default data structures). Speed never was an issue, and since it was basically just a website, reading the django documentation was everything i needed. I had a look at ruby but frankly there wasn't a big enough difference to make me want to switch.
i know some people (the european southern observatory here in santiago) that use zope interfaces to address your concerns. i find the whole idea somewhat odd, but they seem to like it and make decent software, from what i've seen.
+1 The zope component architecture properly used is a thing of beauty. To see it in practice, probably the best example is the pyramid web framework. Most of the important things in pyramid are expressed in terms of interfaces and components so if you want to use your own you just plug it in. You've got a much higher chance of it working in pyramid than (for example - not in particular) django.
Yeah that's only Python 3, which most people aren't using yet. Also it's not optional typing -- it's optional annotations. Someone would still have to actually implement the type checking -- not even sure that's been done.
I have wanted to play around with implementing some type checking, but none of my stuff is Python 3 unfortunately.
I read that and it gave me 5 seconds of hope. But then
"The only way that annotations take on meaning is when they are interpreted by third-party libraries"
and
"Following from point 2, this PEP makes no attempt to introduce any kind of standard semantics, even for the built-in types. This work will be left to third-party libraries"
means the author really isn't serious about it. If you don't have a standard semantic you've got nothing useful.
Running tests, even with coverage and quickcheck does not replace static verification. At very best, such tests can assert that program should probably work with some known probability. In contrast, strong statically-checked typing may be used prove that program produces results correctly to the extent type system may express the necessary proof.
Most type systems in statically typed languages can't make that guarantee. For example I can use Java reflection to call a method, change the method's parameters, and my code will only fail at runtime.
That's kind of silly. If you're using Java and altering method signatures at runtime using reflection you're either in a very niche field where perhaps you shouldn't be using Java, or you're nuts.
Just because you can get around a feature doesn't invalidate that feature.
I'm not sure how I feel about static typing in Java, but it does do what OP talks about, because by default you're not playing devils advocate with reflection.
With Java + Spring + XML you can get one language for the price of two!
How useful is Java's type checking when one typo in an XML file breaks your program?
At least in Python I can follow the logic of the language and find where the mistake is. Spring's XML files don't have such logic and usually all you have is a confusing traceback to work with.
I'm not sure how I feel about static typing in Java, but it does do what OP talks about, because by default you're not playing devils advocate with reflection.
Static typing can guarantee the functionality of quite a lot of code, but it cannot do this for any arbitrary piece of code. As an example, no compiler in any language can know what whether a user will numbers or garbage, when you're expecting numbers. You need to check for an exception in Java even in this simple case.
When you're dealing with real-world systems you're always dealing with "probably work with some known probability". The hardware might burn out, or someone might trip over the cable, or a cosmic ray might randomly flip a bit and make your program go haywire. A proven-correct program doesn't help when you can't trust the physical infrastructure it runs on.
Instead, you need to figure out what level of reliability you require and then work out a strategy to get there. Static-checking can be one element of this strategy, since you can use it to eliminate whole classes of bugs. But I've usually found that architectural solutions like canaries, automatic restarts, feature disablement, fallback codepaths, etc. are more effective at reducing errors per programmer-hour spent on them, and when you have those the reliability thresholds for the actual code you write can be much looser. (You probably still want to verify the hell out of the framework code that implements these architectural mechanisms, though.)
Hah, clever. Bonus points if you don't need to run the code to figure it out (since the cipher is trivial and the ciphertext isn't particularly well obfuscated in the Haskell version).
I've written plenty of Python but only played a little with Haskell (just in GHCI, to see what the excitement is about).
Yet, looking at his two scripts, the Haskell was so transparent as to not need running, whereas the Python was opaque (probably because I don't use numpy).
The best part is that it's still his name at a common domain. I didn't bother with the code, but you can pick the common letters and match the positions to be pretty confident. The hardest part is his middle initial.
With the info you have, you can easily figure out the cipher and apply it to the middle initial. This being said, it still means a recruiter has to go to some effort to scan the code to find where the ciphertext is and examine it.
I imagine webmaster@stephendiehl.com would also get to him.
Lisp has had metaprogramming, optional type declarations, speed, and DSLs since at least the '80s. This isn't some magic new technology just introduced by the shiny new languages cited by the author.
Lisp's optional type declarations don't provide any compile-time guarantees though, they're just hints. (Although some implementations do use them to make compile-time checks.)
Depends what you are going for. If the point of type declarations is speed, I think most high-quality CL implementations will take advantage of a type declaration. If the point is compile-time type safety, yeah, that's a more obscure feature. Of course you can always use check-type for run-time type safety though.
The article doesn't mention recent/ongoing improvements to PyPy in the "speed" section, and doesn't mention concurrent.futures in the "asynchronous programming" section. Seems incomplete to me.
I take issue with the idea that you can count PyPy as a +1 for Python's speed without also counting it as a -1 for libraries. Especially considering SciPy and NumPy are not yet supported by PyPy, which means you lose out on a lot of libraries that depend on them. (For example, sci-kit learn).
You can't say that Python has both speed and great libraries. It has one or the other. Hopefully this will change at some point and I'll be able to reap the benefits of both.
It is not up to PyPy to support non-pure-Python libraries; it is up to those libraries to fix themselves by evicting their C and FORTRAN balls and chains.
It might not be PyPy's fault, but it is PyPy's problem.
If I have code that wants those libraries, PyPy is not an option. As an end user, I don't care who should fix it. I just know that I can't get my job done with PyPy.
Absolutely. But that's somewhat tangential to the question of whether or not Python is fast AND has good libraries. Regardless of blame, it doesn't change the fact that Python doesn't meet that criteria.
"there are variety of technologies encroaching on Python’s niche"
It wasn't clear to me what in particular the author thinks is Python's "niche", so I didn't understand the point of the article as an article, although the content was interesting.
I think Python's niche is understood to be as a readable, concise, batteries-included language suitable for scripting, prototyping, and application development on a small-to-medium scale (and large scale for some people, but that's controversial,) and serving as a convenient interface to C and C++ libraries in all those roles. Python differentiates itself from similar languages in the niche by elevating simplicity, explicitness, and readability over conciseness and expressiveness. That's the conventional wisdom, anyway.
I'd say the biggest challenge to Python in that niche is the emergence of type inference for statically typed languages. Not too long ago, the difference between a statically typed language and Python was
List<Foo> foos = new List<Foo>();
foos.add(new Foo(2));
foos.add(new Foo(4));
versus
foos = [ Foo(2), Foo(4) ]
Now, a static language might look more like this:
val foos = List(Foo(2), Foo(4))
As far as I know, there still isn't a statically typed language that matches Python's simplicity and low barrier to entry, but I say that as someone who knows little about Go.
even the fact that the 2 and 4 refer to foos can be inferred, as long as Foo behaves like a number and you make it part of the Num typeclass. It's pretty cool!
So, oddly enough, static typing can actually make your code less noisy rather than more.
I've been writing a lot of Groovy recently, and it feels like Scala without type checking. That's not the worst thing in the world, I'd rather just use Scala. With Groovy, I get a lot of the expressiveness of Scala, but the lack of static type checking combined with Groovy doing its best to be magical is a recipe for bugs. For example, in Groovy, you can overload methods accidentally without a warning, which can create a hell of a surprise when somebody passes a null value. Rather than throw an error, Groovy dispatches the call to one of the methods even if neither one is more specific than the other.
There's also this classic bug:
def printOne(Collection c) {
if (c.empty) {
print("Collection is empty")
} else {
print(c.iterator().next())
}
}
Can you spot the bug? This code works for all Collections... except Maps. If c is a Map, Groovy translates c.empty to c.get("empty"). Constantly having to be on my toes to avoid stuff like that is a pain.
This might seem very petty, but I avoid python just because of the dichotomy between Python 2 and Python 3. For me, there are plenty of other scripting languages at my disposal, and for things like numpy, or twisted, I have better alternatives too.
2.7 is a dead language that won't receive new features. That's mostly irrelevant if you're not investing in the long, long term with your projects. Just a tradeoff to consider, since it could mean one day it'll be worth it to migrate to Python 3, depending on the sort of project you're working on. Yes, it's ok to be using a "frozen" language for many cases.
Honest question. Outside of matlab and octave (both of which fall down in a bunch of other obvious areas), what scripting language offers anything close to an alternative to numpy/scipy?
What do you think of Mathematica? I've been toying with the idea of using it for sketching out ideas related to my computer graphics work, but it's a big investment. :/
I found it's utility primarily is dynamically controlled visualizations, for example, when I wanted to see how residues were distributed in the C^2 plane for a certain class of complex functions.
I'm not certain what sort of graphics work you are considering, but if it involves a lot of nonlinearity or higher level functions, it may be a good fit.
Although there's a very sophisticated set of bindings that makes it almost trivial to call Julia code from python and vice verse, making it possible to use matplotlib instead of waiting for a solution written in Julia. Take a look at this [1] (skip to the ten minute mark for the PyCall stuff) presentation from this year's scipy conference.
Except modify the scenario such that jQuery announced a few years ago that will not be adding any more features and only doing sporadic bug fix releases, and they suggest moving all development to mootools. However, almost all code is written for jQuery and its not trivial to port libraries and applications from jQuery to mootools.
tl;dr: I don't think the Python2/3 : jQuery/mootools analogy is that fitting.
On the other hand, I still happily use Python(2) for some things, but that is primarily because of familiarity and awesome tools like what was mentioned in the article (namely Theano and pandas). Once the Rust language becomes more stable, I hope to move most of my development to that.
Both Ruby and Perl are in active development. Both jQuery and mootools are in active development. Python 2 is not in active development and most of the useful python libraries are not ported to python 3.
> most of the useful python libraries are not ported to python 3.
Obviously this depends on what libraries are useful to you, but of the common libraries that people talk about, most of them do support Python 3. Numpy, matplotlib, Django & PyQt, for example, all do.
Agreed, I should have cited my examples more carefully. A better example of an active Python 2 project is Ansible. There are others when I looked into Python several months ago but I can't remember them all of the top of my head.
I agree with your comment about 2 and 3 being different languages, but to be fair, Ruby and Perl don't share one set of libraries/extensions that may work on both, yet may work on only one. I don't find the jQuery/Mootools comparison relevant because neither of them are (or even claim to be) languages of their own.
Or perl hackers. "I think there's still a big streak of Modernism running through the middle of computer science, and a lot of people are out of touch with their culture. On the other hand, I'm not really out to fight Modernism, since postmodernism includes Modernism as just another valid source of ideas." http://www.wall.org/~larry/pm.html
I like hacker ish approach to extending (abusing) a much loved language to bring in new ideas.
But I am wary, despite being a python bigot myself, of usin one language for all these things. At a very early point down this road it is simply better to pay the cost of adding a new platform and using clojure to make my DSL
In the end, for production, there is a fine line between bending and breaking.
I should think OCaml or SML should be added to the mix. Neither has Async built in, but the other points are handled quite well. http://ocaml.org/description.html
Other main strength of golang is no run time dependecy! Making it highly distributable. And also cross compiles across different operating systems and different cpu architectures. This also should be mentioned in this article!
since i'm still in the early phase of the project, i know that python expressiveness is an edge, but i'm looking right now at what's going to be the "definitive" language i'm going to rebuild my product for the next 3 to 4 years.
Python badly needs optional typing. really. i'm pretty sure that would solve both the speed and tooling issues. right now, for me, it starts to become unsuitable as soon as you reach 5-10k lines of code and a team of 2.