Hacker News new | past | comments | ask | show | jobs | submit login

This is why I see little hope for Python, which is to say that while I'm sure it will continue to have a large following for many years a la C, C++, etc, I don't have hope for it being an exciting language or one that is particularly productive. Python already has performance and packaging problems which don't seem to be easily divorced from CPython, since virtually the whole reference implementation is depended upon directly by much of the ecosystem due to the sprawling C-extension interface.

Pypy has done yeoman's work in improving performance while maintaining compatibility with an impressive amount of the ecosystem, and even still there are many important packages which aren't compatible with Pypy and for which Pypy-compatible analogs don't exist (or aren't supported/maintained). For example, the only Pypy-compatible Postgres drivers were unsupported last I checked.

Moreover, the Python community (or at least its leadership) seems to have very little energy around tackling these longstanding problems. Meanwhile, there are many other languages which are not only performant, but which are rapidly encroaching on Python's historically unique(ish) "easiness" (in the sense that Python is considered "easy", which is to say for people who don't have to manage build/deploy/packaging/etc or otherwise have performance issues). Further, many of these languages continue to improve at a remarkable pace, while Python is content to rest on the laurels of its scientific computing mindshare--and given the rather poor nature of the numeric computing package APIs and their somewhat low performance ceiling, I don't expect Python to be so dominant in this domain in another 5-10 years, especially as more companies need to figure out how to productionize scientific workloads.




A different perspective is that Python leadership has been overwhelmed addressing the concerns of the enormous and growing Python community, for whom generally performance is not yet the primary concern - believe it or not. It's probably fair to suggest PSF has stumbled in executing some of their goals, most notably and publicly the transition to v3, but overall it seems like the general Python community is most interested in language and ecosystem features, while performance-critical workloads are already being addressed in a number of projects (not just PyPy, but Numba, Cython, and others).

Data science may be at the heart of Python's strengths, but it's simply not accurate to suggest Python (whoever that is!) is resting on its laurels, as evidenced by the very active, albeit sprawling, package ecosystem. And while the CPython devs seem to have found their stride in 3.x releases.

It's also inaccurate to suggest numeric computing in Python has a "low performance ceiling," considering you can get near-C performance via JIT or AOT using a package like Numba, and in most cases it's not even necessary because of Numpy and the many other highly optimized compute packages that can do most of the heavy lifting.

I think the main draw of Python is not just that the syntax and language features are approachable, but that the package ecosystem is so broad and active, you are likely to make lighter work of the same job done in another language. I think to displace Python you would have to displace the package ecosystem, which seems as big and broad as it's ever been.


I agree with your overall characterization. As someone who works a lot with Python and is very familiar with its package ecosystem, I find the lack of leadership from PSF to be discouraging and upsetting. The past few Python releases have what I would characterize as cosmetic improvements while repeatedly missing opportunities to improve interpreter, packaging, and interface fundamentals. The position that CPython has to be kept simple as a reference implementation is untenable in the absence of an active collaboration or effort to produce a performance oriented implementation.

As far as I'm concerned, CPython as an interpreter technology has not advanced in the past decade - and Python has grown thanks to the data science community efforts and excellent library ecosystem you mention, not due to PSF. PSF can only miss so many opportunities before something comprehensively better starts to eclipse it.


I do get the impression that position is not the overwhelming consensus among CPython core devs (nor does it really make sense, given the huge dependency of many packages to the C API), whether or not it is what PSF may have publicly communicated. There are a few glimmers that interpreter improvements are on the horizon, with some proposals like subinterpreters and GIL mitigation getting a lot of attention, all which (to my knowledge) are necessary and prerequisite to serious, bold performance improvements.

I agree performance is important, and I think we have reason to be optimistic, but with the understanding that that level of improvement, even if currently underway, will just take a lot of effort and time. Meanwhile, as someone fairly new to Python and working on several performance critical pieces, I've been pretty impressed with what you can do with the current compute packages (after taking months to work my way through most of them).


> It's probably fair to suggest PSF has stumbled in executing some of their goals, most notably and publicly the transition to v3

In a recent post linked on HN, Steve Yegge basically nailed it:

> How much new software was written in something other than Python, which might have been written in Python if Guido hadn’t burned everyone’s house down? It’s hard to say, but I can tell you, it hasn’t been good for Python. It’s a huge mess and everyone is miserable.

Note to future language maintainers: don't burn everyone's house down.


I did RTFA, but I haven't been on the Python scene long enough to have a truly educated opinion about the transition to v3. It's not unusual for a language to make breaking changes in early versions. What happens when you realize you need to do it late in the game? Tough call. I think it's probably impossible to know whether cost of breaking changes in the short term (people leaving Python for Go, etc.) was justified relative to the cost of a crufty and inadequate Python in the future (people leaving Python for Go, etc.). I'm generally sympathetic, but I'm glad I didn't get into Python until the transition was well underway.


Link?



Not just the package ecosystem, but you also have to have a large number of developers and jobs, so you can do hiring. Developing a project in a non-mainstream language means more difficult hiring.


> Meanwhile, there are many other languages which are not only performant, but which are rapidly encroaching on Python's historically unique(ish) "easiness"

What are those languages? I may have a blindspot, but the languages that get enough buzz for me to notice are either not competing with Python in important dimensions (e.g. Rust) or have a narrower focus (e.g. Julia). Elixir maybe? JavaScript and its derivatives? But these lack the scientific programming ecosystem that's helped drive Python recently.

I have no doubt that languages exist that might fit the bill of being both as easy as and more performant than Python -- there are a lot of languages out there. But I'm unaware of any whose mindshare has been sufficiently growing that it threatens Python in the mid-term.

If I'm off base please let me know. I'd love to find a viable competitor to Python that's strictly better than it.


> I'd love to find a viable competitor to Python that's strictly better than it.

I strongly recommend Go as a better Python. Personally, I think it's easier to write than Python (although people who care very little about correctness will be bothered a bit by the type checker), and the tooling is many times better (single-binary deployments, great dependency management, etc are awesome). Also, the performance is about 100-1000 times better for serial execution, and Go's goroutines allow you to take advantage of multiple cores much more easily than with Python.

JavaScript and TypeScript are similarly easy-to-use, performant languages with a better-than-Python tooling story. I've also heard similar things about Elixir, Closure, and Kotlin.


Go and Python have pretty minimal overlap IMO. If you are using Python for anything other than a server or CLI, Golang is not a very good replacement for Python.

Some of Python's strengths that I work with regularly are dynamism, easy data exploration, visualizations, succinct & customizable syntax, extremely strong data science libraries, REPL/Jupyter, C bindings, easy to use packaging solution via PyPi (bet some people are going to disagree with that one), quick to prototype code. If you really need performance in Python, you can probably get it out of the box (if you can run deep learning with Python, performance isn't a limitation).

I love Go and use it for a bunch of projects, but I've only once wanted to move a project from Python to Go and that was a performance-centric CLI that was only written in Python originally because of how quickly it let us prototype in comparison to Go.


Julia covers your use cases and overwhelmingly fast.


My gripe with Julia etc. as replacements, is that Python is duct tape. I don't need fast duct tape, I need duct tape that is understood and used by essentially everyone I work with, and that has native, fast handling of large amounts of data (NumPy, Pandas). Good user experience as duct tape.

From my perspective Julia is sacrifising some amount of "duct tape UX" to gain speed, and that's the wrong direction.

Whenever we need more speed, we just pull the slow bits down into compiled languages, and scale them out to many cores with solutions like MPI.

R is another language that is mainly duct tape for stringing pieces of compiled code together. If it had a better UX for developers than Python, I think we would see it dominating much more today, without having any speed advantage.


I agree with you right now, but think that Julia will end up dominating over the long-term, because dropping down into another language absolutely sucks for data scientists without engineering support.

Interestingly, R is probably a better UX for statisticians/data scientists than Python is (almost all the good parts of Numpy/Pandas were in R first), but it really suffers from not being well known by developers.

To be fair to R though, it's much, much easier to deploy than Python, which is a shocking indictment of the current Python packaging ecosystem.


> Whenever we need more speed, we just pull the slow bits down into compiled languages, and scale them out to many cores with solutions like MPI.

This only works sometimes--for problems that allow you to do a relatively large amount of computation in the compiled language to justify the cost of marshalling Python data structures into native data structures. For matrices of scalar values, this works well. For many other problems (consider large graphs of arbitrarily-typed Python objects, or even a dataframe on which you need to invoke a Python callback on each element). If you rewrite a big enough piece of your Python codebase in the compiled language, then it will work, but now you're maintaining a significant C/C++/etc code base and the bindings and the build/packaging system that knows how to integrate the two on all of your target platforms. Python really doesn't have a good answer for these kinds of problems, and these are by far the more common case (though perhaps not more common in data science specifically).


> From my perspective Julia is sacrifising some amount of "duct tape UX" to gain speed, and that's the wrong direction.

What particular language features of Julia make that trade-off? (Not a rhetorical question, I'm not disagreeing with you, just curious; I'm familiar with Python, not really familiar with Julia.)


Probably the usual complaints about package load and compilation time.

These aren't fundamental issues with the language and will be solved by tiered compilation (there's already an interpreter mode, just have to integrate that with normal use), separate compilation and incremental sysimage creation which works with the package manager.


Possibly - I'm keeping my eye on it. But I can't stand the matlab-esque syntax.


I'll be honest, as someone whose non-Python programming (paltry as it is) is mostly done in statically-typed functional languages, I have a bit of a bias against Go for the whole generics thing. I'll give it a closer look at some point.


I gotta be honest, the lack of generics has never bothered me.

The type-assertion escape hatch has always been largely sufficient for all but the most performance-critical projects (of which I have exactly 1, and it was a side project), and runtime panics due to failed assertions are quite easy to eliminate via wrapper types.

My advice: just go for it. You almost certainly won't miss generics.

(Edit: I've broken my own rule and given advice without first asking what kind of programming you do. My assertion holds for most run-of-the-mill stuff, e.g. writing REST interfaces, network servers, etc. If you're within 2-sigma of the industry, you won't miss generics.)


I understand; lots of people have this sentiment. It’s an inconvenience in many cases, but we’re comparing it against Python, which has no type safety at all much less generics (apart from Mypy, which has many, many other issues).


All dynamic languages are generic by default.


If that's how you define "generic", then Go is also generic by virtue of `interface{}`.


No, because interface{} in an empty type that needs to be type cast to the actual type before use.

Dynamic languages do that implicitly.

Implement max in Go with interface{} without casts and reflection:

   def max(a, b):
     if a >= b:
        a
     else:
        b


I'm not aware of any definition of 'generic' that prohibits casting. It certainly seems like a very arbitrary condition. Basically I'm familiar with two definitions:

1. The abstract idea of writing an algorithm that supports a variety of types. This allows for casting, reflection, dynamic typing, etc.

2. The specific idea of a type system that allows for parameterized types (aka "typesafe generics"). This definition excludes castng, reflection, and dynamic typing.

Typically "typesafe generics" is what people talk about when they discuss "generics", but since you chose to pick the "dynamically typed languages are generic" nit, I assumed you were talking about (1).


> single-binary deployments

More than performance (pure Python can be too slow, but NumPy is usually fast enough), this is what I miss when using Python.

There's currently no good solution for bundling Python code and the interpreter into a single binary. PyInstaller works, but the resulting binaries write lots of files to `/tmp` every time they run, which is a hack that leads to long startup times. PyOxidizer avoids this, but includes the entire standard library in every binary, making them too large by an order of magnitude.


What's Go like for REPL-driven / exploratory development? That's mostly how I use Python.


Go has nothing on Python in this regard. I write Go every day, and come from a Python background. I often describe Go as the strongly-typed, more performant version of Python. I say this mostly because my Go code isn't too dissimilar from my Python code (structure, naming, packages). But I still drop into Python if I want to do something quickly. I don't really know why. Maybe it's the Go tooling, e.g. unused variables cause compilation errors, so things like this slow me down. Or maybe it's because Python just offers _so_ much out of the box, e.g. all the data structures you'll ever need (list, set, dict, tuples), and all small things you take for granted (like writing "is a in list", which would require a function in Go). The REPL is Python's killer feature.


> I often describe Go as the strongly-typed, more performant version of Python.

I've heard Go described this way several times, but I've found it to be a significantly lower-level language than Python.

For example, it's much more verbose. In this recent blog post [0], the author converts some C++ code to Go - and it gets longer. 57 lines of C++ become 65 lines of Go. The same code in Python is about 20 lines.

In particular, the author's Go code requires five lines to do the equivalent of Python's `with open(path) as f` and four lines for the equivalent of `word_array = list(word_counts.items())`:

    f, err := os.Open(path)
    if err != nil {
        return err
    }
    defer f.Close()

    // [...]

    wordArray := make([]WordCount, 0, len(wordCounts))
    for word, count := range wordCounts {
        wordArray = append(wordArray, WordCount{word: word, count: count})
    }
[0] http://jmoiron.net/blog/cpp-deserves-its-bad-reputation/


Exactly my opinion! I write python for a living since more than 13 years, and I find Go awfully verbose. It makes simple things feel like a chore. A three-to-one ratio of lines of code sounds about right. That's not a trade-off I can make, no matter how big the performance improvements.


> That's not a trade-off I can make, no matter how big the performance improvements.

This is crazy. Many of those lines are closing brackets or whitespace. But moreover, optimizing for characters or LOC is absurd. Optimize for maintainability or readability, at which point Go is at least as good as Python (I would argue better). Optimize for tooling, especially package management and build tooling--Go is many times better than Python here. Optimize for performance--Go is literally hundreds or thousands of times better here. Optimize for breadth and quality of ecosystem. Optimize for deployment story (single small artifact vs hundreds of megabytes of dependencies). These are the things that matter, not lines of code.


I think Julia is strictly better than Python, both for data oriented applications and for web development.


Julia might be. It's certainly higher performance from what I understand, and it offers some syntactic flexibility that I miss from R when using Python (Python could never have a fully complete dplyr). It seems from afar to be rather more complex than those languages, though.


I actually think that from a programming perspective, Julia is much closer to R than Python, because of the focus on generic functions rather than objects as a means of abstractions.

It's still a little too wild-west for my tastes right now, but I want it to improve as I think it's got real potential.


I agree that packaging and dependency resolution can be a total nightmare at times. Anaconda helps to an extent, but it's far from ideal especially compared to the default options for other languages.

The main problem I have with Python is its maintainability, especially when you have multiple developers working in the same code base. I would never choose a dynamically typed language again to build something that is going to be more than several thousand lines of code, especially because of how it cripples your IDE which is essential for helping junior developers understand the existing codebase. Type hints help to an extent, but they're just a small bandaid over an oozing sore. Languages like Go are significantly better if you're going to build large codebases that need to survive for a long period of time.


100% agree. To a very real extent, Go probably only exists at all because Google poured an infinite amount of effort into Unladen Swallow--a project designed to remove the GIL from Python 2 to make it more usable for concurrent programs and add an LLVM-based JIT compiler to improve performance--and Python's response was to not only not merge it, but to break the entire Python ecosystem for a decade by forking the language... while somehow leaving "concurrency" and "performance" off the list of goals while making their backwards incompatible change (which of course also destroyed all the work on Unladen Swallow); hell: early versions of Python 3 were benchmarking universally slower than Python 2 :/. Google seems to have "gotten the hint", and around the same time (2009) hired Rob Pike to do Go, and has since "moved on" from Python and taken a massive part of the ecosystem of users Python used to have with it (the rest having bailed for node.js; remember when Python was the future of web development, competing with Ruby thanks to Django? those were the days): people still use Python, but it is now an entirely different crop of data scientists and AI people... all of whom are starting to run into the performance issues in even their glue layer, and so if the Swift tensorflow efforts ever work out, Python is done.


Google did not hire Rob Pike with Go in mind. Go did not even come around as an idea until Pike had worked at Google for a while. [1]

Don't forget Rob Pike created Sawzall at Google first (2005). [2]

[1] https://golang.org/doc/faq#history [2] https://research.google/pubs/pub61/


The early Go compiler reused the code generator from Limbo. To the point that many comments referred to Limbo. So while Rob Pike might not have joined Google to work on Go or had Go in mind, the idea and direction for Go was probably preordained.

It doesn't really support or attack your claims. It's just a thing I wanted to share.


Small mistake: there are references to inferno, the operating system which limbo was a key component in. The broad idea is the same: Go has a clear heritage in systems that Rob Pike already worked on.


Unladen Swallow was only an internship project.

http://qinsb.blogspot.com/2011/03/unladen-swallow-retrospect...


Didn’t the original Go creators mention on multiple occasions they intended to build a replacement to C, only accidentally produced an application language? Assuming that’s true, Go would have happened regardless. Google couldn’t have pulled out of Unladen Swallow before Go is usable, since they could not have known it’s actually an alternative at the time.


C++ actually, but most of us did not care.

https://commandcenter.blogspot.com/2012/06/less-is-exponenti...

If this version of generics finally makes it, then I care, otherwise only when dealing with Docker and Kubernetes eco-system.


> Google poured an infinite amount of effort into Unladen Swallow

Google most certainly did not.


The glue layers speedups are really bothering me. I absolutely fail to understand how a data science heavy language in 2020 can have such convoluted parallel processing. Dask, numba, joblib all are unfinished projects and absolutely not headache-free. CuPy works great, but it is not multi-GPU capable as of today.

And anytime you point this out, people will trot out a toy problem in Cython and try to prove you wrong, which has zero relevance to real world issues. Hell, Cython cannot even compile something as basic as Numpy FFTs, something that is absolutely critical in signal processing.


There's so little truth in this it is hard to know where to start.


As other people have pointed out, almost none of this comment is accurate:

Google put some effort, not that much, into Unladen Swallow, and the people who worked on it admitted that it didn't achieve the goals they set. The Python community was well on their way to creating Python 3 before work started on Unladen Swallow (so it was not in any way "Python's response"). As other comments pointed out, Google did not hire Rob Pike to "do" Go. Go was initially a side project of a few people and had nothing whatsoever to do with Python. The Go team was surprised when Python users started migrating to it. Google never used Python for web development, with one major exception, Youtube, an acquisition. I also don't understand how Google has "moved on" from Python (this seems unlikely) and how what they use Python for internally has anything at all to do with the fate of Django and Python webdev.

I don't mean to pile on, but I kinda have to ask, what were you thinking when you wrote this comment? Do you believe all those statements, or were you taking huge liberties with facts and making guesses to tell a story that supports a statement you wanted to make? You seem like a valued member of the community and this kinda makes me distrust the accuracy of everything else you write. (And to be honest, makes me distrust more of what I read in general, which is probably good but sad.)

A couple references for history/dates:

https://en.wikipedia.org/wiki/CPython#Unladen_Swallow https://www.python.org/dev/peps/pep-3000/


Huh, I was wondering why Unladen Swallow faded rapidly since 2009 and why Python seems secondary at Google compared to Java and Go.

Silence and distance seems to be a common echo of failed projects.


I'd be curious which languages you feel are closing the gap, and which gaps are getting closer by them.

For example, I do a bit of Swift in addition to Python, and sometimes I've heard people try to compare the two. But I vastly prefer Python to Swift when I can afford to.


Scipy has a poor performance ceiling? Numpy has a poor API? Compared to what? Eigen? Whatever the Scala guys use? That sounds kind of silly to me, especially when hardly anyone is actually CPU-bound, anyway.


WRT performance ceiling, I'm mostly talking about things like Pandas which eagerly evaluate and which aren't amenable to a parallel execution model (multiple threads operating on the same data frame with minimal contention).

WRT poor APIs, I'm talking about things like matplotlib or pandas or etc that take a whole slew of arguments and try to guess the caller's intent by inspecting the types of the arguments. The referent isn't "some other scientific computing API" (although I'm sure there are some sane scientific computing APIs), but rather "other APIs in general" since there's nothing inherent to any particular domain that demands this kind of 'magical' API.

WRT 'hardly anyone is CPU bound'--the context is numeric computing; what are people bound by if not CPU? I've seen several projects where web endpoints were timing out while grinding in Pandas, largely because there weren't good options for taking advantage of multiple processors. Based on prototypes I did, I'm confident that other languages could serve those requests in single-digit seconds if not sub-second.


You're getting some pushback, but I tend to agree with you on matplotlib and pandas. Great libraries are designed so that you can get a feel for them and -- with practice -- use them intuitively. Even after years of (admittedly light) use I still find pandas' multi-indexes confusing, and I always have to look up the best of myriad ways to do something in matplotlib. In comparison, R's dplyr and ggplot have stuck with me even ages after giving up day-to-day use of R.


Pandas is really similar to base-R, which accounts for much of the weirdness (but at least R can claim to be copying a language developed around the same time as C).


So who does it right? If all these APIs suck compared to an imaginary perfect library, then that isn’t a useful comparison.

Also, if an endpoint is spending minutes to respond, then I would think actually profiling the application would be a good start. Maybe researching prior art in the problem domain would be good too. If nobody can be bothered to explore the several solutions to distributing pandas computations over multiple cores, like Dask, and get the NPV of just buying more or faster cores, then “Python sucks” isn’t your problem.


That’s quite a rant with a lot of assumptions. Just about every library has a better API than matplotlib or pandas. Requests has a pretty good API IMO. The team who was responsible for the slow endpoint did investigate dask and alternatives, and they probably will end up on something like spark because they didn’t feel like they have better options. Maybe our team is just stupid and Python isn’t for mere mortals, I don’t know, but I do know that these problems don’t exist in other languages.


I would certainly hope a minimal HTTP library would be simpler than a suite of functions to manipulate and plot tabular data.

“My application is slow, the language sucks!” Doesn’t indicate a very serious investigation into the problem.


> I would certainly hope a minimal HTTP library would be simpler than a suite of functions to manipulate and plot tabular data.

HTTP is pretty complex, but that's neither here nor there. The relevant bit is that there is no domain for which guessing caller intent based on reflection over argument types is appropriate.

> “My application is slow, the language sucks!” Doesn’t indicate a very serious investigation into the problem.

I was pretty explicit above and elsewhere in this thread about why Python's performance is miserable; I'm not sure why you would invoke such a poorly constructed straw man when everyone can look upthread and see my actual arguments.


IMO panda's lenient inputs is a godsend when you are working with real-world, dirty data regularly. It's my favorite API I've ever worked with because it lets me focus on my high-level tasks and it takes care of the things I don't really care about like whether I am working with a list of dicts or a dict of lists or whatever.

But once you've done the cleaning/exploration, you should move any heavy computing to a high-performance library like numpy.


> there's nothing inherent to any particular domain that demands this kind of 'magical' API

Plotting seems to tend towards magic because plots are basically art, with all the desire for aesthetic customization that applies, and it's a very common task so users also want brevity (magic). The result is a plot() function with a gazillion options hidden behind keyword arguments.

I agree that matplotlib has a sprawling interface, and this can be annoying, but I'm still not sure what "guess the caller's intent by inspecting the types of the arguments" means. Sure, the functions have multiple call signatures, but that's not exactly unusual in libraries or languages. I don't understand the context that brings guesswork into the picture. Skimming the manual—are you using the data keyword argument and hitting the `plot('n', 'o', data=obj)` ambiguity [0]? Or calling plot through `pyplot.plot` &c. (which rely on state) instead of `Axes.plot` &c.?

Asking because if there's an interface trap I'm unaware of I'd like to learn about it before walking into it blindly.

Pandas I sort of agree with; I personally find it harder to remember how to use pandas than dplyr, despite using pandas more often and spending more time reading the pandas documentation. I also find it inconvenient to represent missing values in Pandas (`None` and `NaN` are overloaded, and `None` forces the `object` dtype). But maybe the problem is on my end.

[0] https://matplotlib.org/3.3.2/api/_as_gen/matplotlib.pyplot.p...


"Based on prototypes I [spent a limited amount of time on and didn't research better methods], I'm confident..."


In the context of pandas, 3 GB of (raw, uncompressed) data could easily require 30 GB of RAM, and that kind of overhead adds up quickly.


Pandas is not some mysterious black box. If you need predictable runtime performance or bounded memory usage, you have to figure it out. Pandas doesn't inherently have a staggering or unpredictable amount of overhead, given that it's a statistical analysis package. There are ways to mitigate Pandas memory usage (10x is a sign that something has gone very horribly wrong), and sometimes Pandas is simply the wrong tool for the job.


10x reflects both experience and expert recommendations. You may recognize the author [1]:

> Nowadays, my rule of thumb for pandas is that you should have 5 to 10 times as much RAM as the size of your dataset

[1] https://wesmckinney.com/blog/apache-arrow-pandas-internals/


I don't doubt Wes' upper bound for Pandas OOTB, without optimization. The context was web applications. If you're seeing 10x on a web app, either something is wrong or you probably shouldn't be using Pandas.


I think most of us use Pandas for data exploration and one-offs, so it is completely reasonable to discuss what our likely use of RAM is going to be in this circumstance.

Web application using Pandas and "highly optimized web application" would seem to be nearly disjoint sets...


matplotlib and pandas were designed with the idea of mimicking interfaces more popular than the project (when they were first conceived). The "easy" interface is a large part of why those projects are now more popular then their inspirations.


very true; I found matplotlib very appealing because I didn't have to relearn anything coming from matlab


Ah, that makes sense (I suppose I should have guessed from the name). I never understood matplotlib's popularity, but if its a matlab clone that makes way more sense.


many scientific computing applications are considered to be bounded by io


rather famously, one needs an intense operation like matrix multiplication to get cpu bound (an operation that has many enough arithmetic operations per data element, for I/O to not dominate).


Numpy has a very poor API compared to Julia, Matlab, Mathematica, R. That’s just me comparing to the ones I know. It’s a mishmash of methods and functions, in-place operations and non-modifying operations, confusing indexing and broadcasting API. There are much better things available for array manipulation.


For all of numpy's faults I find it odd that you choose its indexing and broadcasting API as the thing that's confusing and worse than equivalents in Mathematica and R. I can't speak to Julia or Matlab, but I find working with arrays in the other two is like pulling teeth.


I'm curious what the outcome would be if we ran a poll asking what people find easier. I find numpy's API more intuitive, but I could be a minority.

I will say that the differences between python and R syntax are almost trivial; I've taught classes with python and R example scripts that are almost _exactly_ the same, and run in both languages


I like R, but there is nothing elegant or consistent about its standard library. I don’t think you’re making a good-faith argument.


Don’t just randomly accuse people of making bad faith arguments. I’ve use R a lot and while it has its issues, it has a much better interface than numpy.


Well they're different, right?

R provides an interface to dataframes and matrices, whereas numpy is just for matrices (and their generalisations, arrrays). I think the appropriate comparison is between base R and Numpy + pandas. (FWIW, I agree with your major point, but then I learned R first so that may be biasing me).


It’s hardly random. R functions could be noun-adjective or adjective-noun or underscored or camelCase or dotted ... and that’s just naming conventions. If someone, like you, says that Numpy sucks, but R is the real masterpiece, then that’s totally insincere. The good thing about R is that it’s free, the C FFI is OK and R-studio is decent. It’s API is a patchwork. It’s cool to say Java sucks or Python sucks on the orange site because they’re popular and if you say something popular sucks, well you must be a pretty cool guy who is smarter than all those rubes out there. The arguments are almost always total nonsense, though.


My work is 100% CPU bound, and I am very angry at Python for wasting so much of my time.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: