Probably 80% of notable performance problems I’ve seen in the kinds of systems that things like Django and Ruby get used for have been terrible queries or patterns of use for databases (I’ve seen 1,000x or worse costs for this versus something more-correct) and nearly all of the other 20% has been areas that plainly just needed some pretty straightforward caching.
The nice thing about that is that spotting those, and the basic approach to fixing them, if not the exact implementation details, are cross-platform skills that apply basically anywhere.
I actually can’t recall any other notable performance problems in those sorts of systems, over the years. Those are so common and the fixes so effective I guess the rest has just never rated attention. I’ve seen different problems in long-lived worker processes though (“make it streaming—everything becomes streaming when scale gets big enough” is the usual platform-agnostic magic bullet in those cases)
A bunch of TFA is basically about those things, so I’m not correcting it, more like nodding along.
Oh wait I just thought of another I’ve seen: serving large files through a scripting language, as in, reading it in and writing it back out with a scripting language. You run into trouble at even modest scale. There’s a magic response header for that, make Nginx or Apache or whatever serve it for you, it’s a fix that’s typically deleting a bunch of code and replacing it with one or two lines. Or else just use s3 and maybe signed URLs like the rest of the world. Problem solved.
Knowing SQL and how relational databases actually work is one of the best superpowers a backend developer can have.
If you want to go deeper than your database manual, the best place is Andy Pavlo's db course, freely available at youtube. I don't write databases, but after watching it I understand trade-offs and performance considerations much better, and feel much more comfortable reading Postgresql manual.
> Probably 80% of notable performance problems I’ve seen in the kinds of systems that things like Django and Ruby get used for have been terrible queries or patterns of use for databases (I’ve seen 1,000x or worse costs for this versus something more-correct)
ActiveRecord pattern saves you a few lines of code now, and explodes your foot off later.
I have Django code which creates a tar file on the fly from a list of requested files and works well. It doesn't use intermediate storage. The tar format can be pretty simple. I got most of the way into implementing a uncompressed zip version, but then I realised that tar was good enough for my site.
Mmm. If you had the right library, might be able to stream it as it’s being created which might help at least with perceived performance, but yeah, that’s a fun one.
The nice thing about that is that spotting those, and the basic approach to fixing them, if not the exact implementation details, are cross-platform skills that apply basically anywhere.
I actually can’t recall any other notable performance problems in those sorts of systems, over the years. Those are so common and the fixes so effective I guess the rest has just never rated attention. I’ve seen different problems in long-lived worker processes though (“make it streaming—everything becomes streaming when scale gets big enough” is the usual platform-agnostic magic bullet in those cases)
A bunch of TFA is basically about those things, so I’m not correcting it, more like nodding along.
Oh wait I just thought of another I’ve seen: serving large files through a scripting language, as in, reading it in and writing it back out with a scripting language. You run into trouble at even modest scale. There’s a magic response header for that, make Nginx or Apache or whatever serve it for you, it’s a fix that’s typically deleting a bunch of code and replacing it with one or two lines. Or else just use s3 and maybe signed URLs like the rest of the world. Problem solved.