Hacker News new | past | comments | ask | show | jobs | submit login
Looking Back at Postgres (arxiv.org)
290 points by craigkerstiens on June 15, 2020 | hide | past | favorite | 74 comments



As someone who bet his startup in 2000 on Postgres (with little experience at the time) and registered on HN just to praise Tom Lane for being exceptionally helpful and forthcoming on the mailing lists, I enjoyed this but don't think it's very relevant to Postgres' success in the past 20-odd years - that was entirely due to a few good people who had the right priorities and made good decisions without drama and without being overly opinionated. A very notable exception in the world of open source...


> without being overly opinionated

My take is Postgres is extremely opinionated, but in such a way that it is beneficial for the long run.

PG adheres to the SQL standard far stricter than MySQL (i.e. select w/o order, ACID). From 2000-2010, it severely slowed the progress of features compared to MySQL (clustering was big topic for a while).

But the problems of hacks, speed and probably a significant one, management/owners, caught up with MySQL.

Feels like tortoise & the hare.


"Postgres is extremely opinionated"

When I think of the term "opinionated" as it applies to software platforms, I think of things that have intended purposes in mind and optimize for the ease of those use cases, potentially to the detriment of people doing other things. It also suggests a "one way to do it" approach.

If that is the definition, I don't think Postgres fits. Postgres tries pretty hard to meet users where they are, and tackle uncommon or emerging use cases. Examples include JSON support, support for writing functions in various languages (javascript, python, perl, R, etc.), and C extensions that can hack the engine in innumerable ways.


I don't think "opinionated" in the context of a database system means refusing to implement useful features. When I think of unopinionated software, I think of software like Jenkins; it feels like a plugin glue framework that does barely anything useful out-of-the-box and leaves it up to the user to create a working configuration out of the tons of plugins available.

Aside from being immediately useful just as a basic relational data store, my impression of PostgreSQL is that the "opinion" is apparent in the way that new things get done. There seems to be a culture of either implementing things well or not at all, and to avoid rushing things just because people want the new feature right now.


Hrm, I can't disagree with you because I do side with the "suggests a one way to do it" approach.

Could be I misinterpreted the OP's statement:

> few good people who had the right priorities and made good decisions without drama and without being overly opinionated

I attached the "opinionated" to the people as opposed to the software probably because of the way the sentence was structured.

Perhaps we're all agreeing and saying the same thing.


The PG dev community puts data correctness above all else. It is a good, strong opinion to have for database software but it does slow down feature development. The most requested feature for years was UPSERT but it wasn't implemented until 9.4 because they couldn't figure out how to do it without any gotchas until then.


> The highest-order lesson I draw comes from the fact that that Postgres defied Fred Brooks’ “Second System Effect” [Bro75]. Brooks argued that designers often follow up on a successful first system with a second system that fails due to being overburdened with features and ideas. Postgres was Stonebraker’s second system...

I really hope we pull off some of these software adages.


I don't think the fact that someone made a second version without succumbing to second system effect disproves the adage outright; for the vast majority of new programmers it's still worthwhile advice to adhere to.

Like any principle or adage, if you are experienced enough to deeply understand the reason for it, you will know when it does or does not apply.


I've always heard of it as "third system syndrome". You screw up the first one(prototype), learn from it and build the second one(product) and then get overconfident, expand into unknown areas and repeat the process(third system).


> if you are experienced enough to deeply understand the reason for it, you will know when it does or does not apply.

You wont have that experience (or use it really) if you stop at your first system!

Seriously though, I agree, it's by definition that adages reveal truth. This one just feels so pessimistic and boring.

My version of this would be "the third system is the best one". Make it once, learn the space and the limitations of your solution. Make it second time, learn what isn't useful. Make it just right on the third try!


Postgres is truly an exception that proves the rule. "Sure, not every second system falls prey to this phenomenon, but you practically have to be Postgres to avoid it."


>Postgres is truly an exception that proves the rule.

That's not what "an exception that proves the rule" means. It means "you may walk here" implies (proves the rule) there is a rule that you can't walk elsewhere.

In any event, plenty of second systems are fine. It's ridiculously outdated advice. Any time someone replaces their Flask web app with a go or rust version, qed.


> It's ridiculously outdated advice.

I dunno what your experience is like, but I have seen this scenario repeat itself many many times in many places: someone is rewriting something without fully understanding what the old thing did, blindly proclaiming that their rewrite won't have any of the problems of that old clunky thing, introducing new bugs and limitations in the process without adequately solving the problem the old thing was addressing.

All. The. Fucking. Time. Big companies are especially bad at this.

Seems to be a variant of the phenomenon second system syndrome is meant to describe.


> I dunno what your experience is like, but I have seen this scenario repeat itself many many times in many places: someone is rewriting something without fully understanding what the old thing did, blindly proclaiming that their rewrite won't have any of the problems of that old clunky thing, introducing new bugs and limitations in the process without adequately solving the problem the old thing was addressing.

That's not Second System Syndrome. That's Joel Spolsky's The Thing You Should Never Do.


Wikipedia gives a good summary of the various meanings of that idiom: https://en.wikipedia.org/wiki/Exception_that_proves_the_rule

Also, I don't think the advice is outdated at all. I've seen it happen multiple times. However, it describes a tendency to be on guard against. It's not some sort of law of the universe.

It's also not about rewriting in Go or Rust (you may be thinking of a completely different piece of advice about not doing ground up rewrites.) The second system effect is about being overconfident from the success of an initial system with limited scope, and being too ambitious in a second system.


Your meaning is not what "the exception that proves the rule" means in my experience from ~30 years ago.

In my experience, it means:

"This thing stands out to us as exceptional because it is exceptional; there is a rule which normally applies, but we tend not to notice the rule until an exception to the rule reminds us of the rule by contrast".


> Any time someone replaces their Flask web app with a go or rust version, qed

Wait... are you suggesting that replacing a flask web app with a go or rust version is in support of or against second system syndrome? That sounds like a textbook example to me - I've seen so many second versions get built mainly for the sake of a new technology that miss a lot of the original requirements that the first version solved just fine.


An "exception proving the rule" is a colloquial phrase that can be interpreted in several ways. Yours is one of them. I'm using it here as a fun way to underscore the decades of consistent greatness of Postgres[QL], and the folly of using one of the most successful open source projects in history as a counterexample to Brooks's "second system".


The word "proves" in that maxim means "tests", in much the same way that bulletproof armour once meant that the actual armour in question had withstood a bullet (and you could see the dent) or that the proof of the pudding is in the eating.


On the other hand, a lot of what made the original Postgres distinctive has ended up being either removed or considerably downplayed: the "object-relational" features, rules, versioned data, and so on.

(I think what the paper refers to as "versioned data" is the "time-travel queries" stuff, not multi-version concurrency control which was added after Stonebraker's time.)


I would agree with you on rules point... at least relatively recently so once instead of triggers for views were on the scene... but I see some fairly robust use of the object-relational features and don't see much downplaying of them. Mind you, that doesn't mean they're necessarily commonplace (or necessarily should be)...

My guess is the audience here on HN is less likely to run into these features in their day-to-day lives. By my observations, the HN audience tends to be generalist developer types that will readily just hand off database specialization to the most popular ORM layer for their first class development environment. And I would agree they'll not encounter these things... and if they do, they'll largely be annoyances.

I work in "enterprise" type software and I see and do a lot of specifically database development (for PostgreSQL usually these days). If we consider PostgreSQL's type system as part of the object-relational feature set, I find I make use of that fairly frequently, for example compound types... also understanding that object-relational shouldn't be confused with "object-oriented". In my own work it's primarily the more advanced typing that shows up in both procedural code I write as well as queries in certain cases. Less so, though not completely absent is the inheritance feature; there are uses for it for sure, but a rather narrow set of cases (I mean outside of partitioning, which is the most common use case).

Anyway... I'm not so sure it's really a matter of downplayed as much as it is less generally understood as PostgreSQL has become more popular.


Sadly, I've seen this happen still too many times (Second System Syndrome, where all scope of the original 'tech refresh' is lost) and there's often not much (powerful enough) to oppose to people that fear being the hated authors of the new system ; they hated the author of the old system...

The righteous "this time we'll do it right" is a powerful drug, and fear of missing something can drive even the most experimented people to stupidity...


>Postgres was Stonebraker’s second system.

What was his first system?


Ingres. https://en.m.wikipedia.org/wiki/Ingres_(database)

However, if you look at the projects he’s worked on, he has a track record of writing database engines that end up in delivered database systems.


Ingres. The "post" in Postgres refers to the fact that it was the follow-up to Ingres.


I am really interested in his latest work with SciDB and VoltDB also. Cool stuff to check out if you haven't yet.


One of the first C programs I wrote pulled data from an Ingres Database, all running on a PDP 11/34. I now use PostgreSQL heavily on AWS. Thanks Michael Stonebraker!


I'm still developing web services connected to an Ingres instance running on a rickety ~22 year old Sun server with lots of bits and pieces cannibalized from its long dead siblings.

We have the replacement system almost ready but won't really be able to swap the thing until the project owner retires (yay politics!).

It's important but not time-critical, so if it dies before he goes we can just eat the downtime and take a couple of weeks to migrate.


See also: "The Postgres and Illustra Codelines" by Wei Hong:

https://dl.acm.org/doi/pdf/10.1145/3226595.3226623


I have done some work on parallelism in PostgreSQL, and I was fascinated to discover Wei Hong's work in this area from more than a quarter century ago:

https://www2.eecs.berkeley.edu/Pubs/TechRpts/1992/ERL-92-3.p...

The POSTGRES 4.2 tarball includes a bunch of his parallel query work, which you can find wrapped in #ifdef sequent. I assume that this was a precursor to XPRS, and it was ripped out of regular POSTGRES after that. I wonder if anyone knows what happened to the XPRS sources...


I love PostgreSQL history, so this was tons of fun to read me for me.

While reading, I also ended up highlighting and sharing some of my favorite parts on twitter: https://twitter.com/felixge/status/1272613965219139585


Is this the same Postgres as the PostgreSQL database we all know and love here on HN?


Yessir. Originally inspired by a db system called "Ingress", it was called "Postgres" then eventual PostgreSQL to emphasize the SQL part


Yes, although originally it did not support SQL but QUEL.

see https://en.wikipedia.org/wiki/PostgreSQL#History and https://en.wikipedia.org/wiki/QUEL_query_languages


I wish how things would be if it was QUEL that would became the standard. The language looks closer to the access pattern that ORMs using.


Yes, the abstract doesn't mention it, but the PDF has this–

    > 1 OPENING
    > 
    > Postgres was Michael Stonebraker’s most ambitious project—his grand effort to build a one-size-fits-all database system.


I wish he described more the problems of their insular approach to avoid SQL. It used something like postquel, nobody used. It only shined under a C/C++ API, but was not usable for users. Just for developers to map their objects to. An opinionated OO database, which didn't follow the standards. And there were much better OO - DB persistency mappings these days, like for Lisp. There you had to write no glue code. Postgres was a turkey.

But everything changed when someone added SQL. Soon it became the very best SQL standards implementation. That's when I picked it up again, and stayed with it.


Totally off-topic, but anybody who is into databases should at least have heard the name dBase, which was the most successful database in the 80s. However, they totally fucked up the release of version IV, which had serious reliability issues (not something you want in a database, in case you did not know). This lead to a dramatic decline and eventually the death of dBase. Who knows how the database and software landscape would look like had they released a version without major bugs.


I think it wasn't the IV version that killed it; After all, there were many dBase compatible programs at the time, e.g. FoxBase which were at least as good and comparably bug free.

I suspect it's actually Microsoft Office with Access and a mature Excel that did it. Everyone was using office at the time (for Word, Excel, Powerpoint and the emerging Outlook), so access was "free"; and many things that dBase&co were used for, were actually becoming easier to do in Excel.


> at least have heard the name dBase

Oh man, you just wet my eyes! What a trip into the past!

I wrote my first paid project in dBase in 1982 when noone knew what "computer mouse" was. I answered the ad in newspaper and initially they wanted me to help them organize text files. I told them about dBase and that was my first project - a simple database to manage their parking lots. It was such early days of programming, we had basically zero agreement! Just half piece of paper that they pay me 10% in advance and short list of features to get remaining 90%. And funny thing is - no copyright agreement! So while I made few months of rent for their project, I ended up paying off my first mortgage by just offering the same piece of software to many other parking corps! It was incredible feeling! Like creating money out of thin air - they sponsor my plane/hotel and I just told them what PC (with DOS) I need to be ready. Then in just few hours of installation and training, I was on my way home with nice check. I frankly don't even remember how much it was back then...

Good times!


I my dad's office they were analyzing market research data. A database query that takes seconds (or maybe less than a second) today was started on Friday afternoon. On Tuesday morning the database was finished processing :)


Amazing story. How did you get known to the other parking companies? Did you ring them up or something?

I suppose the secret is quickly delivering software that does what someone wants.


Yes. Back then cold calling was welcomed. Everyone picked up the phone with excitement of who's calling. All I had to tell them is the name of the company and how they can benefit for the software. Honestly conversion rate was close to 100%, its just some locations had headquarters across the country and I didn't feel like flying in winter time lol.


Not kidding, my dad still uses dBase databases in his five-man company to this day. (non-IT company) I even re-wrote some of his programs in C# 10 years ago, so he can use some of them with a modern GUI, but not replacing the database to stay compatible with his original software and tools.

I was surprised that databases are contained in *.dbf files, and no real database-service is present to process database-calls.


At the time, and depending on the OS...where would the service be? ;)


I think i do not understand the question ... it is a little warehouse if that is your question. :-)


MS-DOS doesn't support service(daemon) program so it's natural that local db file can be used directly from application.


The dBASE file format lives on in an unlikely place: the "shapefile" format supported by nearly all Geographic Information System (GIS) software.

A shapefile is actually a collection of files, typically using the same filename with different extensions.

There is a .shp file containing the actual geographical "shapes": points, lines, polygons, etc.

Alongside it is a .dbf (dBASE) file with information about each shape. For example, if you have a list of cities, each city may have a boundary polygon (or multipolygon) in the .shp file, and the city name, population, and other demographics in the .dbf file.

The other required file is a .shx containing indices into the .shp file for faster navigation through it.

There may be other files as well; a common one is the .prj file that describes the projection used in this shapefile.

This collection of files, foo.shp, foo.dbf, foo.shx, etc. is referred to together as a "shapefile".

https://en.wikipedia.org/wiki/Shapefile

Bringing us full circle, GIS programmers and users often import a shapefile into PostGIS for processing. And PostGIS is, of course, a GIS extension to PostgreSQL!

https://postgis.net/


Since this is dBase file format, does this mean shapefiles can do more than alternative formats, like geoJSON?


I have exactly the right link to answer this: http://switchfromshapefile.org

Funny enough, two popular contenders are built on SQLite.


That's an interesting question. In practice I've found it to be the other way around - especially the ten character limit on .dbx column names. If your incoming data has a field named "Population in 2010" you have to map it to something shorter like "PopIn2010". Not a big deal but it gets annoying.


dBase III+ came with a bit of an embedded programming language, the logical extension of which was the Clipper programming language (from CA, by the time I came along). It never got me to do Windows programming, I suppose the later versions must have eventually caught up, but/so I moved on. But it was a great combo for making database-based applications, for me as an aspiring programmer. I was a teenager, and probably the best feature of the dBase/Clipper combo was that I had a great mentor in my neighborhood. Never left programming, and gave up other passionate hobbies for it.


I find Postgres a really good database but with really poor tools.

I don't want to stir up a religious battle here, but the tooling around MSSQL is just that much better.

And the sql dialect is imo more to my taste.


Not sure why you're being downvoted, the lack of quality FOSS IDEs for Postgres is well known.

I do feel that Postgres administration mostly caters to command line fans (like myself); the people more used to visual tools like SQL Server Management Studio tend to find it difficult to transition because of this. Even if they might be absolute wizards at writing SQL otherwise.

Part of it is familiarity of course, but there's always something ever so slightly embarrassing about showing people psql when they're used to advanced IDEs.


I also found this when I switched from MySQL. After a while you discover the right tool for the job, and often some tolls that don’t exist for other databases (things like PostgREST come to mind). Sometime I wish it was a bit more idiomatic though.

We’re working on a lot of tooling at https://supabase.io. We’re open source too. If there is anything in particular you’re missing, let me know


Since you guys are doing something I highly agree with and I replied to your announcement post as well...

How about a UML/ERD diagram tool which can spit out schemas and, given a schema to start with, migrations?

There are a couple dozen of these being offered as SAAS with varying levels of usability and complexity. If you built one of those you'd soon have the attention of everyone who is bitching about paying 20 dollars a month to see their own database in a way that management can visualize for their charts and graphs.

ERD diagrams have existed for decades. Even Oracle gives such a tool away. But for Postgres, the easiest way to create a schema with migrations without a credit card is still to write a Django model by hand, afaik.


That's a great idea for the diagrams. I'm slowly working towards dumping schemas/types for the database in various formats over here: https://supabase.github.io/pg-api/

Once we've stabalized this we will start on git integrations - link up your repo for to keep everything in code - your types, schema, tables, functions, etc. We are also brainstorming ways that we can offer branching - since we give a full database server, we can possibly "branch" the default "postgres" database into various schemas which you can test with.

> ERD diagrams have existed for decades

If you know of a good open source tool which we should be supporting for this, let me know. Otherwise we don't mind building from scratch!


Here are a couple of examples in React:

https://github.com/projectstorm/react-diagrams (diagram library)

https://www.npmjs.com/package/react-database-diagram (an example using the above library)

There is a Node migration library that seems well received but as I hinted above I have always trusted Django's migration system and still use it to bang out schemas if I know the schema will change, so I can't vouch for this lib personally...

https://www.npmjs.com/package/db-migrate


> projectstorm/react-diagrams

woah. This looks powerful.

Thanks for the libraries - i'll research more and loop back here if we end up using them.


If this is worth pitching in a meeting, I'd point out that the creator of GraphQL is also the creator of Airtable, which seems to be a thriving startup. The whole business model there is to give people visual tools to get into a database and out of a spreadsheet.

The story in all of this is that the only thing keeping Joe Q. Public from using a database instead of a spreadsheet is the very first step: designing a schema. Database design isn't really an impossible task for a non-developer to learn, if it can be visualized. If you start out at a command line and try to explain many-to-many relationships to them in code, you just lost them. People are visual creatures.

Since we're talking about giving people an API that takes care of CRUD and basic view logic at the database itself, and returns the data and the success/fail/delay messaging in a way that is standardized on the UI via javascript objects and async browser functions, the only hurdle that's left is that first one...

...designing a schema.

edit: on your topic of branching into test environments, I am in the process of evaluating low-code platforms for non-profits with limited development money but complex data needs at the moment, so I think we're thinking about the same things.

Everyone and their brother is working on a low-code platform at the moment. A big part of whether those platforms succeed or not is how they handle "last mile" complex logic. By last mile I mean, you can feasibly deliver basic CRUD and data analysis visualizations with drag and drop and query binding in generic javascript UI components. But there's still going to be the 10% complex business logic cases that don't fall within the 90% of an application that the generic components and boilerplate don't cover.

Off the top of my head, you could guide a user through some form logic that picks the first hundred or fifty or whatever rows from their existing tables that match a certain subset of shared keys, and tells them to flag columns which have sensitive data. Once the columns are flagged the tool could jumble the data in those columns and spit out a sandbox without any identifiable sensitive data that they could then safely hand to a contractor or freelancer to work on for custom development.


> I'd point out that the creator of GraphQL is also the creator of Airtable, which seems to be a thriving startup.

This is news to me. Any sources for this?


I saw it in a conference talk, looking...

edit: https://medium.com/@calebmer, https://github.com/calebmer, https://calebmer.com/

I misspoke, not a founder of the company but an engineering hire there after he left Facebook, and the creator of PostGraphile. The gist of what I was getting at is there is a lot of activity swirling around putting business logic back on databases, it seems.

We might look back at these days as the high water mark of "peak javascript."


I'm curious to the particulars (I have no opinion on the technical merits of Microsoft's offering as I've never tried it). What specifically does it do better?


Last I checked, it has 3 layers of included BI/reporting tools. Plus a lot more.


A good read but man these PDF's are terrible to read on mobile.


On an iPhone, I don't have much difficulty as double-tapping a column zooms into that column in a readable way.


Do submitters write academic papers in "native" LaTex these days? Or do they write in some intermediary format like markdown?

If it's the latter, it'd be pretty trivial to generate a more mobile-friendly html format. I wonder if arxiv supports something like this.


You would be unsurprised how many working scientists I know who confess to doing their papers in Word or LO and setting the font to Computer Modern.

There's even a (humorous) economics paper on why this is an advantageous method: https://onlinelibrary.wiley.com/doi/abs/10.1111/joes.12318


As others said, yes, academic papers are still written in LaTex. However, arXiv does not actually accept a pdf for submissions, but require the actual LaTex sources and they generate their own pdf. As LaTex-to-HTML improves, they could always generate it for all of the submissions for which they have the source.


That would be fantastic! I hope it’s on their long term roadmap :). They probably have the biggest corpus of LaTex source in the world, that’d be pretty useful to the project maintainers.


Most academic papers are still written directly in Latex.


(In computer science, physics and mathematics.)


LaTeX2html is a thing too.


Academic paper has been like these for years. And yeah, I don't enjoy reading paper on mobile.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: