Over my career I've certainly written more SQL than any other type of code. Not ...

bob1029 · 2025-01-02T20:55:59 1735851359

Over the years I have been pushing more and more responsibilities into the RDBMS. I now see things mostly in terms of ETL, SQL and schema. Virtually every conversation I've ever had about the application of technology to the business could be expressed in these terms. Business logic defined as SQL queries can be quite intuitive when the schema is structured well and aligned to the business stakeholders' perspectives.

Code, frameworks, ORMs, "best practices", patterns, et. al. are ultimately a distraction. There are a million ways to get the data in & out of the database. Moving the bits around is not valuable. There are many overblown software solutions out there that could have been a simple merge statement or CSV import job.

I think that a lot of the misconceptions and bad moods about SQL come out of being forced to work with nasty schemas. The language itself is really domain specific. Consider that one wouldn't complain as much about a super fucked up nested query (and resulting SQL syntax woes) if it wasn't necessary to write such a query in the first place. Aligning tuples and relations to the way the business typically talks about them means you will be less likely to be fighting these things over time. Often, it isn't possible to refactor the schema from zero, but you can put replicas/views around a "bad" schema and target it with your new development & refactors.

fifilura · 2025-01-02T21:08:51 1735852131

> Consider that one wouldn't complain as much about a super fucked up nested query (and resulting SQL syntax woes) if it wasn't necessary to write such a query in the first place.

And in "modern" SQL this is solved with CTEs. Use them to unwrap the schema the way you want it first, before implementing the business logic.

larodi · 2025-01-02T22:56:38 1735858598

this is poor man's SQL to unwrap for business logic. SQL is such more more about everything else and so litte about the trouble with mapping business logic into storage.

bbkane · 2025-01-03T02:43:03 1735872183

Could you please add some examples? I'm still living in poor man's SQL land

solumunus · 2025-01-03T06:32:44 1735885964

Please elaborate.

larodi · 2025-01-03T10:55:50 1735901750

CTEs can indeed align mismatched and denormalised storage to some E/R which follows some business logic. That much you can say. But CTEs allow for recursion and save time when you need complexity but you want to follow a precise step-by-step reasoning.

This dis not arise as a need to solve business mess, but as a need to skip DDL for views/mviews.

You can have arbitrary dimensions sliced in CTEs which does not immediately imply a goal of business schema alignment.

Besides the top SQL devs I’ve met don’t lose time to align schema at all, but write the DB table names and columns as they are, because well… because they were usually the people who created the mess this way.

Many reports such as month-to-month increase of sales or other statistical stuff is much more readable when implemented with CTes. Besides - older versions of DB software didn’t always have window functions or proper ranking, so problems such as top-n were not (and in some occasions still are not) trivial to write unless view/cte is used. We talking pages of single query here, not textbook examples.

So is really about dice and slice in a convenient way, but less about business schema or the ideal E/R

fifilura · 2025-01-03T11:40:24 1735904424

Maybe I misunderstood, and if I did it makes for an interesting discussion, but your explanation is mainly about what CTEs can be used for not what they can not be used for.

In my experience, normalizing tables is mostly for simplifying inserts. When you write business logic it is rather the opposite, you want non-normalized data because this is where the data is all in one place. And this de-normalization is a great use of CTEs. As well as a general cleanup of the input data, which was my original point.

solumunus · 2025-01-04T13:22:49 1735996969

That makes sense. I misunderstood your OP.

fifilura · 2025-01-04T16:12:50 1736007170

I am happy you understood! I still feel this discussion is above my head, so if you are still there i would love an explanation!

fifilura · 2025-01-02T20:22:06 1735849326

I second all of that!

I wish more people would see the beauty. After a session of SQL, when I take a step back and think.

"Hold on. What I have been doing lately is just pure logic. No library dependency resolution, no concurrency problems (even though massive concurrency is certainly under the hood). No mutability issues. Just logic."

SQL obviously has its warts, some of them serious, like testability. But at the end of the day, I wish all programming was like that. Let the computer decide how to do stuff under the hood. And let the human focus on the logic.

I have somewhat half-assed tried to read up on Prolog for trying to take it to the next level, but failed sofar unfortunately. (It was also a goal to try to unlearn some SQL to avoid getting stuck in some local optimum). Maybe somewhere between SQL and Prolog is the future of programming.

Rendello · 2025-01-02T20:46:07 1735850767

> Maybe somewhere between SQL and Prolog is the future of programming.

Must be Datalog then ;)

coliveira · 2025-01-02T22:15:23 1735856123

Prolog is very powerful, if you see what professionals can do with it it's eye opening. Unfortunately, it takes a complete relearning of programming to achieve that proficiency level. And after you reach it, you probably cannot use it in your day job...

arkh · 2025-01-03T14:36:00 1735914960

My problem with Prolog is every time I want to start using it I feel like I'm populating a database then doing queries on it. So it feels like SQL with another syntax and less power.

I'm sure I'm wrong and missing something but that's where I stop.

diggan · 2025-01-03T15:48:01 1735919281

> I feel like I'm populating a database then doing queries on it. So it feels like SQL

I'm not sure what database technology/data storage that doesn't involve doing those two things to get started.

I haven't done any Prolog, but Datalog which is similar and focused on querying data, and the benefits of Datalog for me is that you can write complex/medium-complex queries a lot easier than in SQL, at least for me. Simpler queries are just less characters, but pretty much the same beyond that.

Probably helps that the Clojure ecosystem embraced Datalog so it mostly feels like writing Clojure code although it's really Datalog. You don't get that same feeling regardless of what SQL library/DSL you use.

coliveira · 2025-01-03T18:31:18 1735929078

Prolog is in fact a kind of database. However it is untrue that it has less power than SQL, just the opposite. Probably it feels harder to use because you're already so used to how SQL works.

larodi · 2025-01-02T22:55:31 1735858531

> Maybe somewhere between SQL and Prolog is the future of programming.

it was, it most probably is

crazygringo · 2025-01-02T21:08:40 1735852120

> Once you stop thinking iteratively and start thinking in set operations it becomes quite natural and powerful.

I dunno... I've written a tremendous amount of SQL, and I still have to think imperatively (iteratively) in order to write queries that are actually performant, and to know which indexes need to exist.

It would be wonderful if I could just think in terms of set operations, but that tends to result in queries that take 5 minutes to execute rather than 5 milliseconds.

My entire thought process is basically -- what table do I start with, what rows in what order, joining to what, under what conditions, aggregating how, rinse and repeat... It's entirely a mental model of loops and aggregation, never of set operations.

fifilura · 2025-01-02T21:13:38 1735852418

It may be true, until you do your ETL in an index-less database such as BigQuery or Trino. Postgres will always be faster for optimized, end user serving, queries.

But BigQuery allows you to scale it to 100s of CPUs without having to worry about indexes.

liontwist · 2025-01-03T01:15:17 1735866917

This sounds awful.

I would do almost any amount of iteration and index tuning to keep the query on a single machine rather than deal with a networked distributed system.

When you get slow queries the real problem is algorithmic complexity and linear workers only can do so much,

RobinL · 2025-01-03T08:32:10 1735893130

You can use duckdb on a single machine. It's also indexless (or more accurately, you don't have to explicitly create indexes)

fifilura · 2025-01-03T03:54:03 1735876443

You are rightfully proud of your skills!

crazygringo · 2025-01-02T22:42:59 1735857779

Yes, I'm talking about end user queries. Not reports that take 2 hours to run.

But even with BigQuery, you've still got to worry about partioning and clustering, and yes they've even added indexes now.

The only time you really just get to think in sets, is when performance doesn't matter at all and you don't mind if your query takes hours. Which maybe is your case.

But also -- the issue isn't generally CPU, but rather communications/bandwidth. If you're joining 10 million rows to 10 million rows, the two biggest things that matter are whether those 10 million rows are on the same machine, and whether you're joining on an index. The problem isn't CPU-bound, and more CPU's isn't going to help much.

fifilura · 2025-01-03T03:53:28 1735876408

Of course there are optimizations to be made, such as not joining on the raw data or saving the order by to last. And avoid outer joins between two large sized partitioned tables.

But to me those optimizations are not imperative in nature.

(And BQ will probable eat the 10 million to 10 million join for breakfast...)

griomnib · 2025-01-02T22:24:56 1735856696

Being able to master the theoretical, practical, and skill-based components of designing a good database schema is the absolute truest test of understanding any systems design.

People skip ahead to all kinds of nonsense; but most of software engineering is putting the right data into the right format, and moving it around reliably.

I just did a major refactor of a complex distributed code base. I pretty much only count the work I did on schema re-design as the actual “job”, the rest was many hours of coding, but that’s really just implementation.

There are other ways to define schema than SQL of course, but it’s really the perfect way to learn true systems engineering.

SoftTalker · 2025-01-02T23:20:30 1735860030

Very true. My manager at one of my first jobs liked to say "get the data model right and everything else will be easy" and that has largely been proven true in my experience (and it even applies if you're not using an RDBMS).

maCDzP · 2025-01-02T22:19:17 1735856357

SQL clicked once I read the original paper and it explained it terms of sets.

pjmlp · 2025-01-03T07:17:44 1735888664

Yeah, I also kind of like coding in SQL, with PL/SQL being my favourite extension language, which is kind of heresy in HN, but whatever.