PRQL – A proposal for a better SQL

ianbicking · on Jan 24, 2022

I like it, it's readable, unlike some SQL alternatives I've seen it doesn't make me feel like I'm dumb and don't understand what a query even is.

I can't decide if it would be better or worse if it stuck more closely to SQL keywords. You use "from" and "select", but not "where", "order by", "group by". There's some danger of it being in an uncanny valley of SQLish, but I'm pretty sure I'd prefer just using those terms verbatim (including the space in "order by"... that style is less common in modern languages but it's not really that much harder to parse).

I'd like to see more examples of joins and composing SQL. Does this language make it easier to make more general SQL queries? Can I take two queries and squash them together in a reliable way? I feel like I end up with a lot of theme and variation in my queries, often involving optional filters.

I might even like a notion of encapsulation that could help this query language when it's embedded in other languages. Like if I could say, in the language itself, that a query has certain unbound variables (and not just ? or other placeholders). This language seems like it would be better for generating than SQL, and sometimes generation is just necessary (like in any application that supports data exploration), but for most common cases I'd hope to avoid that. Defining inputs and then making whole filter sections or other statements conditional on those inputs would help here.

cogman10 · on Jan 24, 2022

Yup, I like a lot of things about the way this looks. In particular, I like how friendly this looks to be for things like auto complete (pretty annoying to need to practically type the entire sql query only to go back and fix up the columns in order to get autocomplete to work).

Specific things I'd like to see.

How do you handle column ambiguity. In the examples, they show a join of positions to employee on employee_id == id. But what happens when you have 2 columns with the same name that you are joining on? (like employee_id to employee_id in some mapping table).

Subqueries are pretty important in what I do, so what do those look like (perhaps covered by the "thinking about CTEs section").

How about opportunities for optimization hints? In T-SQL you can hint at which index the optimizer should prefer to a specific query.

Common SQL patterns would also be interesting. Like, how would you do keyset pagination?

Edit: Also, I'd like a discussion about null. SQL null handling rules are terrible. I understand them, I work with them, but at the same time, they are so different from other languages concept of "null" that they are easy to trip over.

dotancohen · on Jan 25, 2022

  > SQL null handling rules are terrible. I understand them, I work with them, but at
  > the same time, they are so different from other languages concept of "null" that
  > they are easy to trip over.

Could you elaborate? I'm really only versed in the MySQL accent, but I don't find anything unusual or unexpected about NULLS in MySQL. If there are any pitfalls that I should be aware of, I'd love to know about it here before my users start complaining about bugs.

Thanks.

Sankozi · on Jan 25, 2022

In SQL NULL does not mean "no value" it means "unknown value". Existence of such value introduces three value logic because expression "NULL = <anything>" is neither true nor false. This makes queries harder to understand without any benefit.

Additionally "unknown value" concept is not used consistently. Things like DISTINCT, or UNIQUE indexes (in some databases) treat NULL as single "no value".

dotancohen · on Jan 25, 2022

  > Existence of such value introduces three value logic because expression "NULL = <anything>" is neither true nor false.

Could you elaborate on that? I'm thought that in SQL `NULL=NULL` returns FALSE, much like the floats `NAN==NAN` returns false:

  > select if(null=null, "Yes", "No")
  > +----------------------------+
  > | No                         |
  > +----------------------------+

What does it mean that this is neither TRUE or FALSE? I very much appreciate the lesson!

yencabulator · on Jan 25, 2022

The value in your if is not false, it's NULL. NULL=NULL behaves exactly like NULL=42, the value is NULL. Which is what the parent was trying to explain.

With Postgres:

  postgres=# \pset null <null>
  postgres=# select null, null=null, null=42;
   ?column? | ?column? | ?column?
  ----------+----------+----------
   <null>   | <null>   | <null>
  (1 row)

dotancohen · on Jan 26, 2022

  > mysql> select 42=42,42=314,42=null,null=null;
  > +-------+--------+---------+-----------+
  > | 42=42 | 42=314 | 42=null | null=null |
  > +-------+--------+---------+-----------+
  > |     1 |      0 |    NULL |      NULL |
  > +-------+--------+---------+-----------+

I see, thank you. This behaviour is news to me.

dspillett · on Jan 25, 2022

Not the person you replied to, but I don't think by “from other languages” he means other dialects of SQL.

Instead, I think other languages away from the database are being referred to - in many of those NULL is treated like any other value², for instance in Javascript¹ null==null is true and null!=null is false, and due to type coercion null on its own is “falsey”. Personally I have no problem with SQLs handling of NULL with one exception, and find other languages treating it as a single value rather than an unknown feels odd.

The one thing that I have occasionally tripped over with NULL in SQL is the effect of “<val-or-var> NOT IN (<set>)” when NULL is one of the entries in <set> - it makes sense when you think about it because the IN operator can only return true or false and it can't definitively say the searched for value isn't equal to the unknown one(s)³ but this doesn't seem intuitive.

Some SQL dialects do handle NULL a little differently, more like languages like JS. MS SQL Server can be made to with SET ANSI_NULLS OFF to force its ancient not-standards-compliant behaviour⁴.

[1] quick & easy to test in your browser's console

[2] well, technically in JS I think null is specifically a null object reference, that being one of the differences between null and undefined

[3] more concretely, “var NOT IN (1, 2, NULL)” being equivalent to “var<>1 AND var<>2 AND var<>NULL” which becomes “true AND true AND NULL” which is NULL as any logical operator against NULL returns NULL.

[4] though note that this option is officially deprecated, as of at least 2016, and might be removed or just ignored in future versions

dotancohen · on Jan 25, 2022

  > The one thing that I have occasionally tripped over...this doesn't seem intuitive.

This paragraph led me to discovering this oddity:

  > select 'true' from dual where 3 not in (1, 2, null)
  > Empty set (0.00 sec)

That's unexpected, and definitely something that I should be aware of. Thank you.

yencabulator · on Jan 25, 2022

SQL NULL is like NaN, it never equals anything. (And on top of that, the result of any comparison with NULL isn't a boolean, it's NULL.)

Similarly (except the result is a boolean),

  $ deno eval --print 'NaN == NaN'
  false

dotancohen · on Jan 26, 2022

Thanks.

maximilianroos · on Jan 24, 2022

Thanks!

I just fleshed out composing CTEs, which is a small step towards the broader goal of making composition easier: https://github.com/max-sixty/prql/commit/dc68fcaaceef26cc078...

Let me know if you have a good case of the sort of composition you find difficult in SQL (either here or in an issue). Thank you!

rkrzr · on Jan 25, 2022

I think supporting variables and functions already solves most of my composability gripes with SQL.

Another problem that I have with composing SQL is that large queries quickly become unreadable, and error messages are also often not terribly helpful. I think having a more expressive type system would help with the error messages. Do you have any plans on adding a type system to PRQL?

rswail · on Jan 25, 2022

> Can I take two queries and squash them together in a reliable way? I feel like I end up with a lot of theme and variation in my queries, often involving optional filters.

That's essentially what SQL views do. Each view is a query and then you can treat it like a table and filter/join on it.

Of course, then the problem becomes whether or not the query planner can see through the view to the underlying tables to optimize correctly.

ergest · on Jan 30, 2022

> I can't decide if it would be better or worse if it stuck more closely to SQL keywords. You use "from" and "select", but not "where", "order by", "group by". There's some danger of it being in an uncanny valley of SQLish, but I'm pretty sure I'd prefer just using those terms verbatim (including the space in "order by"... that style is less common in modern languages but it's not really that much harder to parse)

I agree 100% here. As a SQL veteran, it would make the transition a lot easier if you used common SQL keywords like group by, order by, limit, etc. e.g.

    from employees
    where country = "USA"
    derive [
      gross_salary: salary + payroll_tax,
      gross_cost:   gross_salary + benefits_cost
    ]           
    where gross_cost > 0
    group by:[title, country] [
        average salary,
        sum     salary,
        average gross_salary,
        sum     gross_salary,
        average gross_cost,
        sum_gross_cost: sum gross_cost,
        count,
    ]
    order by:sum_gross_cost
    where count > 200
    limit 20

munk-a · on Jan 24, 2022

I like the flow direction compared to standard SQL. SQL is supposed to read like a sentence I suppose but I have many times looked at it and really wanted things to be in a more logical order.

My main suggestion would be to be a bit less terse and introduce a bit more firm formatting. I'm not a huge fan of the term "split" and feel like jazzing that up to "split over" or even just reviving "group by" would improve readability. Additionally the aliasing could use work, I'd suggest reversing the assignment to be something closer to `use salary + payroll_tax as gross salary`. In terms of firm formatting, unless I'm missing something there isn't any reason to allow a filter statement before any aliases - so you can force two fixed positions for filter clauses which would make it always legal to reference aliases in filters.

On the brief topic of form vs. flexibility. SQL is a thing that, when complex, is written by many people over the course of its lifetime - removing the ability to make bad decisions is better than enabling the ability to write simple things even simpler - those silly do nothing queries like "SELECT * FROM customer WHERE deleted='f'` are written once[1] in a moments time and never inspected again. The complex queries are what you want to optimize for.

1. If they even are - with ORMs available a lot of those dead simple queries just end up being done through an ORM.

hn_throwaway_99 · on Jan 24, 2022

> On the brief topic of form vs. flexibility. SQL is a thing that, when complex, is written by many people over the course of its lifetime - removing the ability to make bad decisions is better than enabling the ability to write simple things even simpler

Hallelujah! But, to your footnote, this is a major reason why I despise ORMs. In my mind they make writing simple code slightly easier, but they make complicated SQL statements, especially when you get some weird issue under load and you're trying to debug why your DB is falling over, a ton more difficult and you spend so much time just battling your ORM.

makeitdouble · on Jan 24, 2022

On ORMs, the best use I see of them is for “transparent” queries that you don’t define.

Like fetching a record by id, or a single record and all of its related properties. Or a list of all the record in a table matching a simple filter.

That’s 98% of what we do against the DB, and I’m all for having it basically invisible.

Then let’s just bypass the ORM altogether the minute we think about joining or grouping things together. There are libs in most language that help just sanitize queries, so it’s no difficult really.

flukus · on Jan 25, 2022

With a middle ground like a micro ORM those transparent queries are barely visible anyway, literally a line or two lines of embededed sql strings. Especially micro ORMs that can handle dynamic filters. They're generally write once and only get looked at again when modifications are necessary, so they're not worth "optimizing" by adding the complexity of an ORM.

A common pattern seems to be over engineering these simple scenarios though. Someone decides that embedded sql is evil and needs to be extracted out of normal code, often to stored procs. Then these simple queries have enough friction that an ORM starts to look good, then you end up with an ORM generating simple queries dynamically in the same place that used to have a simple embedded string.

jseban · on Jan 25, 2022

The fundamental problem with an ORM is that you're using a lower level language to compile to a higher level language. This is completely backwards. It's like having a framework in your assembly to generate Java code for you, so you don't have to bother with all that "weird" Java, and can just stay in your comfort zone.

brigandish · on Jan 25, 2022

Isn’t it more important that the query you write with the ORM is readable than the underlying SQL it spits out? Using an ORM I can get reusable parts of a query, while writing complex joins, I’m not sure why skipping that part is good?

makeitdouble · on Jan 25, 2022

In my experience, a lot of very semantically reasonable and readable code end up with very penalizing SQL at the end, and it's a real challenge to then rewrite the whole into decent queries.

There can be part of an app where a very bad query here and there is not important, but more often than not it creeps up in key parts of the user experience, and it becomes very hard to untangle when it becomes something important enough to thoroughly optimize, but also complex enough that the existent tests only cover a tiny portion of the important use cases (if you're reusing a bunch of query bits, you're probably working with a wide combination of input/outputs). I've seen literally weeks spent on trying to optimize ORM chained subqueries.

brigandish · on Jan 26, 2022

My experience is that the ORMs I've used most (LINQ and Ruby's Sequel) can produce far more efficient SQL than a human can, and if not, you change the code, just as you would have to if you wrote a slow SQL query.

Of course, I've not seen every query in existence so it's more than possible you've seen bad SQL from an ORM, but the untangling part would again fall to those skilled in the language of the ORM - unless the ORM can't produce efficient SQL in a particular case. And just as it would if the query was originally written in SQL, you'd need someone skilled in SQL to untangle that.

What would that case (where an ORM cannot produce efficient SQL) look like?

munk-a · on Jan 26, 2022

I have some serious doubts that an ORM can tune queries as well as a human due to the fact that the ORM lacks one key piece of information that both me (and the database planning a query) can leverage - table statistics. An ORM can produce a query that will behave well in the best general circumstances, but as soon as you get into topics like subquery performance fencing ORMs simply have no ability to compute optimality on the fly. One specific example I've seen is where multiple paths exist through the database to transit from one fact to another with one path being more strictly optimal and the other path having more associations that may be needed - this can effect the join strategies you want to use so if your ORM is anything more than "I'll essentially tell you the SQL but in a weird syntax" then there's a good chance it'll chose the wrong path.

brigandish · on Jan 29, 2022

But you write the code via the ORM, so I don't see the difference between you writing the SQL with knowledge of the table statistics and you writing the code via the ORM with knowledge of the table statistics.

makeitdouble · on Jan 26, 2022

I haven't got the chance yo try Sequel, on the ruby side I played more with plain ActiveRecord and querying layers like ransack (my predecessor on the job loved abstraction layers)

In general ORM queries become ugly at three to four levels of joins and/or excluding under non trivial conditions (e.g. finding users that have not participated to a specific set of events). They will spit out something that works, but will take a few orders of magnitude more than an optimized query.

As you say, there is the option to play jenga with the ORM code to hit the right combination that produces a better output. But that feels like teaching a toddler to solve a puzzle that you already solved and are keeping the cheat sheet in your pocket. I personally don't see the beauty of it and would prefer to directly use the right SQL and call it a day.

On people skilled in SQL, you should have a few onboard anyway if you're doing more than basic CRUD on the DB, and it's easier to find than ORM gurus IMHO.

brigandish · on Jan 27, 2022

> I personally don't see the beauty of it and would prefer to directly use the right SQL and call it a day.

I think that's fair enough, there are enough ways to do things now that it should be possible to accommodate both.

> On people skilled in SQL, you should have a few onboard anyway if you're doing more than basic CRUD on the DB, and it's easier to find than ORM gurus IMHO.

I agree but I'm not sure there are more SQL gurus than those used to ORMs nowadays. Lately I've favoured using SQL but even 15 years ago most devs I knew couldn't use it well, I can't see devs used to Rails et al having the chops for it, sadly. What was once convenient easily becomes one's master.

> I haven't got the chance yo try Sequel

If you get the chance, I think it's worth it. It's easy to drop into plain SQL without dumping the ORM, and I've never had a problem with the stuff it generates. It's a pity ActiveRecord gets all the love instead.

Maxion · on Jan 25, 2022

Im not so sure it's always best to optimize for absolute performance, how you should code a solution to a specific problem is always dependant on it's context IMO.

I work on a lot of smaller IT projects for SME's, internal tools and platforms that are made on a small budget and, thus, end up having a tight deadline in order to not go over budget.

The vast majority of these projects are versions of CRUD apps for this and that, we tend to value readable code over performant hard-to-read code as it makes code review faster.

makeitdouble · on Jan 25, 2022

I think it’s all nice and fine if performance has no impact (which can be the case). For instance if your client can wait 30s for a report, no big deal.

Things go down when a query that used to take 4s now takes 30s as the product has taken off, handles 10x more data, and it’s not one user but a few hundreds having their request queued.

You wont have the luxury to not re-write that part in (probably) hard to read performant code. It can be a huge enough effort to blow away your deadlines and budget and sour pretty hard the relationship if your team struggles on something they aren’t used to do at all.

nicoburns · on Jan 25, 2022

In my experience neither the output or the input is readable when using an ORM to generate complex queries. For simple queries they're great. Writing raw SQL also makes it much easier to jump out into SQL-specific tooling to debug a query, then copy it straight back into the code once you're done.

dotancohen · on Jan 25, 2022

  > Isn’t it more important that the query you write with the ORM is readable than the underlying SQL it spits out?

I would look at the issue from a slightly different angle. Performance issues aside, I personally prefer either the ORM or the SQL depending on which is easier for the guy maintaining it to understand. Getting a row from the database and transforming it into an object? ORM. Generating a report on historical data across half a dozen tables? SQL.

jnsie · on Jan 24, 2022

I like the flow direction specifically for intellisense/autocomplete. I'm sure it would be easier to provide hints when the table name is known immediately.

mschaef · on Jan 25, 2022

This is exactly why LINQ uses a similar ordering.

m1sta_ · on Jan 24, 2022

I'd love for the next release of SQL to have optional alternative ordering of clauses

maximilianroos · on Jan 24, 2022

This is great feedback, and I agree with you re de-prioritizing terseness.

And I agree with you on both the assignments and `split` being a bit awkward. Kusto just uses `by`, WDYT?

tomtheelder · on Jan 24, 2022

Not the original commenter, but just using `by` makes total sense to me.

maximilianroos · on Jan 24, 2022

I've made this change [1]. Thank you!

[1] https://github.com/max-sixty/prql/commit/dde7fcfc13daaadbdce...

hadley · on Jan 25, 2022

FWIW the separate `group_by()` is one of my greatest design regrets with dplyr — I wish I had made `by` a parameter of `summarise()`, `mutate()`, `filter()` etc.

maximilianroos · on Jan 25, 2022

Thanks for commenting! I hope you can see the influence of dplyr & tidyverse here. Please let me know if you have any other feedback!

sin7 · on Jan 25, 2022

Is it too deeply entrenched to change? The number of times I have had a data.frame grouped when it wasn't supposed to be, I can count on my fingers. But the hours that I spent trying to figure it out must amount to a paycheck or two.

Maxion · on Jan 25, 2022

There's a lot of dplyr code out there, and a lot of people who know most every part of the tidyverse by heart, making breaking changes like this so far into a frameworks life would cause a lot of unnecessary work in re-coding old code as well as requring people to re-learn syntax.

IMO for such a small adjustment the benefits don't outweight the costs.

hadley · on Jan 25, 2022

Yeah, that's my current position. It's possible that we might be able to add it in optionally (by adding a new `.by` argument to summarise and friends), but just the analysis to determine how it would affect existing code is a lot of work.

evntdrvn · on Jan 24, 2022

6gvONxR4sf7o · on Jan 24, 2022

Group by seems good enough and not changing terms where there isn’t good reason seems like a good goal. Is PRQL’s split the same as SQL’s group by?

loic-sharma · on Jan 24, 2022

Yes Kusto's `by` is excellent!

munk-a · on Jan 24, 2022

By actually sounds great to me to, yea. In this case it's short but it's extremely communicative!

magicalhippo · on Jan 24, 2022

Now this is actually nice, unlike the other suggestion posted today[1].

Maybe I'm just too used to non-standard extensions of our database but the SQL example could, at least for our db, be rewritten as

    SELECT TOP 20
        title,
        country,
        AVG(salary) AS average_salary,
        SUM(salary) AS sum_salary,
        AVG(gross_salary) AS average_gross_salary,
        SUM(gross_salary) AS sum_gross_salary,
        AVG(gross_cost) AS average_gross_cost,
        SUM(gross_cost) AS sum_gross_cost,
        COUNT(*) as count
    FROM (
        SELECT
            title,
            country,
            salary,
            (salary + payroll_tax) AS gross_salary,
            (salary + payroll_tax + healthcare_cost) AS gross_cost
        FROM employees
        WHERE country = 'USA'
    ) emp
    WHERE gross_cost > 0
    GROUP BY title, country
    ORDER BY sum_gross_cost
    HAVING count > 200

This cuts down the repetition a lot, and can also help the optimizer in certain cases. Could do another nesting to get rid of the HAVING if needed.

Still, think the PRQL looks very nice, especially with a "let" keyword as mentioned in another thread here.

[1]: https://news.ycombinator.com/item?id=30053860

ako · on Jan 24, 2022

With a CTE it would read a bit more like prql:

  with usa_employees as (
    SELECT
            title,
            country,
            salary,
            (salary + payroll_tax)                   AS gross_salary,
            (salary + payroll_tax + healthcare_cost) AS gross_cost
    FROM  employees
    WHERE country = 'USA'
    AND   (salary + payroll_tax + healthcare_cost) > 0
  )
  select  title,
        country,
        AVG(salary)         AS average_salary,
        SUM(salary)         AS sum_salary,
        AVG(gross_salary)   AS average_gross_salary,
        SUM(gross_salary)   AS sum_gross_salary,
        AVG(gross_cost)     AS average_gross_cost,
        SUM(gross_cost)     AS sum_gross_cost,
        COUNT(*) as emp_count
  from      usa_employees
  group by  title, country
  having    count(*) > 200
  order by  sum_gross_cost
  limit 3

Readability is pretty similar to prql. It would really help in SQL if you could refer to column aliases so you don't have to repeat the expression.

gunshai · on Jan 25, 2022

My brain just thinks in CTEs over sub queries. I really dislike that my co-workers use these ridiculously nested sub sub sub queries.

I just look at something like this and I immediately know what's going on. If it's nested sub queries it always takes me much longer.

Maxion · on Jan 25, 2022

To me those nested sub sub sub SQL queries come from a similar place as beginner coders who tend to make nested IF statements - a lack of experience with the language.

For very complicated stuff, SQL does become very hard to read compared to e.g. tidyverse + targets in R.

cribwi · on Jan 25, 2022

In some cases for removing repeating (intermediate) calculations, I generally find it easier to use a lateral join (in postgres), like

    select
        title,
        country,
        avg(salary)         as average_salary,
        sum(salary)         as sum_salary,
        avg(gross_salary)   as average_gross_salary,
        sum(gross_salary)   as sum_gross_salary,
        avg(gross_cost)     as average_gross_cost,
        sum(gross_cost)     as sum_gross_cost,
        count(*)            as emp_count
    from
        employees,
        lateral ( select
            (salary + payroll_tax)                   as gross_salary,
            (salary + payroll_tax + healthcare_cost) as gross_cost
        ) employee_ext
    where
        country = 'usa'
        and gross_cost > 0
    group by  title, country
    having    count(*) > 200
    order by  sum_gross_cost
    limit 3;

jseban · on Jan 25, 2022

So now we have easily come up with three different ways of rewriting the query to avoid that duplication (which obviously was not a problem at all to begin with): subquery, CTE and lateral join. And there are also several more well known ways (views, custom functions, computed columns etc) so the whole premise now for even inventing a "better" language than SQL is then false? Or what am I missing.

It's also weird how people always argue for immutability and eliminating local state, when using procedural languages, but as soon as they switch to SQL, that actually works like this, they immediately want to introduce mutability and local state.

dspillett · on Jan 25, 2022

> so the whole premise now for even inventing a “better” language than SQL is then false?

I don't think anyone is using the above examples to try invalidate PRSQL, just suggesting the baseline for comparisons should account for all constructs available in the SQL standards and common implementations there-of.

> Or what am I missing.

The statement “I can do X better than <SQL example> with <something else>” does not properly show the benefit of <something else> if “I can do X better than <SQL example> with <another SQL example>” is also true (assuming <another SQL example> is actually agreed to be better, not for instance convoluted/confusing/long-winded/other so just replacing some problems with others).

paulhodge · on Jan 25, 2022

If there's multiple ways to do the same thing that's usually a BAD thing in terms of language design. Especially if some approaches are just newbie traps that experts learn to avoid, or if deciding the best method is a really subtle context-dependent decision. The ideal design is that the language encourages the one obviously "good" way to do it.

ako · on Jan 25, 2022

I'm not aware of any general purpose programming language that doesn't have multiple ways to achieve a specific goal. Can you give an example of language with good design?

dvasdekis · on Jan 24, 2022

Column aliases would have saved me hundreds of hours over the course of my career. Sorely missing from standard SQL, and would make the need for PRQL less acute.

correct-me-plz · on Jan 24, 2022

Snowflake lets you refer to column aliases, and it's great!

There's the slight issue of shadowing of table column names, which they resolve by preferring columns to aliases if both are named the same. So sometimes my aliases end up prefixed with underscores, but that's not a big deal.

twoodfin · on Jan 24, 2022

The trade-off is that a schema change (adding a column) unrelated to your query can modify its behavior.

Favoring aliases over columns instead has the potential to introduce irresolvable ambiguities as you can’t “qualify” a column alias with a SELECT list or subquery ID the way you can qualify a column by its table/view alias.

magicalhippo · on Jan 24, 2022

> With a CTE

The DB we use supports those, I just learned about them too late so keep forgetting they exist :(

> It would really help in SQL if you could refer to column aliases so you don't have to repeat the expression.

The DB we use supports that, so in your CTE you could write

   AND   gross_cost > 0

We do that all the time, which will be a pain now that we're migrating to a different DB server which doesn't.

gmfawcett · on Jan 24, 2022

Not all database systems can optimize queries well over CTE boundaries. I believe this is still true for PostgreSQL (no longer true, see below -- it was true a few years ago). So there's a potential performance hit for (the otherwise excellent advice of) writing with CTE's.

Rovanion · on Jan 24, 2022

IRC tells me this has been fixed now.

gmfawcett · on Jan 24, 2022

Awesome news! thank you for sharing this. I found this post which confirms IRC and suggests it was an improvement in PG 12:

https://paquier.xyz/postgresql-2/postgres-12-with-materializ...

Today is a great day to have been wrong on the Internet. :)

mmsimanga · on Jan 24, 2022

Sybase IQ allows you to use the column alias anywhere else in the query.

jsyolo · on Jan 24, 2022

what expressions are being repeated here?

oblio · on Jan 24, 2022

> (salary + payroll_tax) AS gross_salary,

> (salary + payroll_tax + healthcare_cost) AS gross_cost

> AND (salary + payroll_tax + healthcare_cost) > 0

And his is a simple example.

dagss · on Jan 25, 2022

In Microsoft SQL cross apply can be used for this in even more situations and with less repetition:

    select top(20)
        title,
        country,
        ...
        avg(gross_salary) as average_gross_salary,
        ...
    from employees
    cross apply ( select
        gross_salary = employees.salary + employees.payroll_tax, -- or "as .."
        gross_cost = ...
    ) v -- some name required but don't need to use it if column names are unique
    where ...

tfehring · on Jan 24, 2022

Very cool! A couple questions/suggestions off the top of my head:

1. Did you consider using a keyword like `let` for column declarations, e.g. `let gross_salary = salary + payroll_tax` instead of just `gross_salary = salary + payroll_tax`? It's nice to be able to scan for keywords along the left side of the window, even if it's a bit more verbose.

2. How does it handle the pattern where you create two moderately complex CTEs or subqueries (maybe aggregated to different levels of granularity) and then join them to each other? I always found that pattern awkward to deal with in dplyr - you have to either assign one of the "subquery" results to a separate dataframe or parenthesize that logic in the middle of a bigger pipeline. Maybe table-returning functions would be a clean way to handle this?

maximilianroos · on Jan 24, 2022

Thanks!

> Did you consider using a keyword like `let` for column declarations

Yeah, the current design for that is not nice. Good point re the keyword scanning. I actually listed `let` as an option in the notes section. Kusto uses `extend`; dplyr uses `mutate`; pandas uses `assign`.

I opened an issue here: https://github.com/max-sixty/prql/issues/2

maximilianroos · on Jan 24, 2022

I've added the `let` keyword given a few people commented on this.

justinpombrio · on Jan 24, 2022

Awesome that you're responding to feedback like this!

Another suggestion around `let`: consider splitting it into two operations, for creating a new column and for modifying an existing one. E.g. called `let` and `set`. Those are in effect pretty different operations: you need to know which one is happening to know how many columns the table will have, and renaming a table column can with your current system change which operation is happening.

Splitting them into separate operations would make things easier on the reader: they can tell what's happening without having to know all the column names of the table. And it shouldn't really be harder for the writer, who ought to already know which they're doing.

I encountered something like this at my previous job. We had a DSL with an operation that could either create or modify a value. This made the code harder to read, because you had to have extra state in your head to know what the code was doing. When I rewrote the DSL (the rewrite was sorely needed for other reasons), I split the operation in two. I was worried people would have been too used to the old language, but in practice everyone was happy with it.

maximilianroos · on Jan 24, 2022

Yes this is a good idea. dplyr has something similar with `mutate` & `transmute`.

This could _mostly_ be enforced by PRQL. There's a case where we transpile to:

  select *, x+1 as x_plus_one

...where we don't know whether or not we're overwriting an existing column. But it's a minority of cases, and the contract could stand within PRQL.

I opened an issue here: https://github.com/max-sixty/prql/issues/6

runeks · on Jan 25, 2022

> Another suggestion around `let`: consider splitting it into two operations, for creating a new column and for modifying an existing one. E.g. called `let` and `set`.

Couldn’t we just not allow modifying an existing column? Ie. we would not allow

  count = count + 1

But force the use of a new variable name:

  new_count = count + 1

I think this makes for much more readable code, since the value of a variable does not depend on line number.

hadley · on Jan 25, 2022

acquero uses derive (https://uwdata.github.io/arquero/api/verbs#derive) which I rather like (it's better than mutate, IMO)

maximilianroos · on Jan 24, 2022

> 2. How does it handle the pattern where you create two moderately complex CTEs or subqueries (maybe aggregated to different levels of granularity) and then join them to each other? I always found that pattern awkward to deal with in dplyr - you have to either assign one of the "subquery" results to a separate dataframe or parenthesize that logic in the middle of a bigger pipeline. Maybe table-returning functions would be a clean way to handle this?

I don't have an example on the Readme, but I was thinking of something like (toy example):

  table newest_employees = (
    from employees
    sort tenure
    take 50
  )
  
  from newest_employees
  join salary [id]
  select [name, salary]

Or were you thinking something more sophisticated? I'm keen to get difficult examples!

Edit: formatting

mcdonje · on Jan 24, 2022

When you add in the ability to reference different tables like that to the piping syntax, it starts to remind me of the M query language: https://docs.microsoft.com/en-us/powerquery-m/quick-tour-of-...

There, each variable can be referenced by downstream steps. Generally, the prior step is referenced. Without table variables, your language implicitly pipes the most recent one. With table references, you can explicitly pipe any prior one. That way, you can reference multiple prior steps for a join step.

I haven't thought through that fully, so there may be gotchas in compiling such an approach down to SQL, but you can already do something similar in SQL CTEs anyway, so it should probably work.

twic · on Jan 24, 2022

My gut reaction is that if we have "from first" then maybe we should have to "to last":

  from employees
  sort tenure
  take 50
  as newest_employees
  
  from newest_employees
  join salary [id]
  select [name, salary]

maximilianroos · on Jan 24, 2022

I wrote this over the holidays, because I find SQL wonderfully elegant in its function, but really frustrating in its form.

Let me know any feedback — as you can see it's still at the proposal stage. If it gains some traction I'll write an implementation.

xi · on Jan 24, 2022

Maybe you'd like to check FunSQL.jl, my library for compositional construction of SQL queries. It also follows algebraic approach and covers many analytical features of SQL including aggregates/window functions, recursive queries and correlated subqueries/lateral joins. One thing where it differs from dlpyr and similar packages is how it separates aggregation from grouping (by modeling GROUP BY with a universal aggregate function).

maximilianroos · on Jan 24, 2022

This is awesome! I'll add a link to it on PRQL.

I guess the biggest difference between FunSQL (and similarly dbplyr) and PRQL is that the former needs a Julia (or R) runtime to run.

I really respect the library and keen to see how it develops.

andreypopp · on Jan 25, 2022

FunSQL.jl requires Julia to run (obviously as it is a Julia library) but it produces standard SQL so Julia in this case is just an implementation language.

I have re-implemented parts of FunSQL in Python and OCaml (the one I have ended up using) and have added a concrete syntax similar to what you have in PRQL.

    from employees
    define
      salary + payroll_tax as gross_salary,
      gross_salary + benefits_cost as gross_cost
    where gross_cost > 0 and country = 'usa'
    group by title, country
    select
      title,
      country,
      avg(salary) as average_salary,
      sum(salary) as sum_salary,
      avg(gross_salary) as average_gross_salary,
      sum(gross_salary) as sum_gross_salary,
      avg(gross_cost) as average_gross_cost,
      sum(gross_cost) as sum_gross_cost,
      count() as count
    order by sum_gross_cost
    where count > 200
    limit 20

But, in my mind, the biggest difference between PRQL and FunSQL is the way FunSQL treats relations with `GROUP BY` - as just another kind of namespaces, allowing to defer specifying aggregates. A basic example:

    from users as u
    join (from comments group by user_id) as c on c.user_id = u.id
    select
      u.username,
      c.count() as comment_count,
      c.max(created_date) as comment_last_created_date

The `c` subrelation is grouped by `user_id` but it doesn't specify any aggregates - they are specified in the `select` below so you have all selection logic co-located in a single place.

I think this approach is very powerful as it allows you to build reusable query fragments in isolation but then combine them into a single query which fully specifies what's being selected.

clarkevans · on Jan 24, 2022

Writing an alternative syntax is straight forward. Perhaps prototype PRQL using xi's excellent FunSQL backend? This way it's working out of the gate. Once syntax+semantics are pinned, writing another backend in the language of your choice would then be easier. Getting the backend correct is non-trivial work, and xi has done this already. Besides, we need a sandbox syntax anyway, so it might be fun to collaborate.

olau · on Jan 25, 2022

I like the explicit pipelining idea, seems much easier to reason about. Some comments:

I found the "# `|` can be used rather than newlines." a bit odd. So when using let, you're only transforming one column? I think the example would look weird with returns instead of |.

Depending on your intended target, it might help adoption if you stay closer to the naming conventions of that target. If you're targeting mainstream Java/Python/C#/Javascript etc. then functions need parentheses, "take 20" may be worse than slice, etc.

I think annotating microversions would get tiresome fast. I think the right way to think of this is that you put in a single version number like 1, and then only ever change that if you need to do backwards-compatible changes that cannot be handled by clever hacks in the runtime.

Also I think you should try writing one or more native wrappers in your intended target languages to make sure it's easy to interface between the two, even if it means you'd have to use dots in that language.

I could imagine an end game where the ergonomics were so good that a database like Postgres ends up with a native PRQL frontend. Not sure you're there yet, though. IMHO SQL as a query language suffers from a) sometimes really bad ergonomics, b) it's hard to wrap in another programming language (also ergonomics), c) it has far too many concepts - it's not orthogonal.

maximilianroos · on Jan 26, 2022

> I think annotating microversions would get tiresome fast. I think the right way to think of this is that you put in a single version number like 1, and then only ever change that if you need to do backwards-compatible changes that cannot be handled by clever hacks in the runtime.

Thanks good idea, I just changed this to remove the microversions. If we use SemVer, then before `1`, we'd hold versions compatible to the 0.X, and then to the X.

silvestrov · on Jan 25, 2022

"LIMIT 10" from PostgreSQL sounds nicer than "take 10" because "take" for me sounds like a random selection is made (take 10 from the bag).

Also want "OFFSET 20" from PG.

MarkLowenstein · on Jan 25, 2022

IMO you are at the forefront of where query languages need to and will go.

Some programmers like you see that SQL ordering is backwards to human thinking, except in the simplest cases. But many people with practice and sunk costs in their SQL expertise will be resistant. The resistance usually wins the day.

But sometimes, a useful tool gets created by one person, and a rift is created in that resistance. Think John Resig creating jQuery, leading to LINQ and many other similar patterns. You could be that person for database query languages, but how do you ensure that?

Maybe imagine what made jQuery easy to adopt and indispensable for programmers: easy availability as a simple .js download; solved the problem of DOM differences between browsers. Good luck to you, and thanks for sharing.

MarkLowenstein · on Jan 25, 2022

Also PRQL/Prequel is a great name. Just that can take you far.

_uvvk · on Jan 24, 2022

This looks great! Clear docs and rationale, and the syntax is well thought. I'm definitely following this.

hackeredje · on Jan 24, 2022

Is this not Linq ? https://stackoverflow.com/questions/tagged/linq ?

And then dump the queries via https://stackoverflow.com/questions/1412863/how-do-i-view-th... or https://www.linqpad.net/ ?

aloisdg · on Jan 24, 2022

Indeed. It looks a lot like dotnet's Linq.

cerved · on Jan 24, 2022

wr to linq to sql: the difference is linq works by making objects to tables and dotnet primitives to sql types, often producing really poor queries as a result

rswail · on Jan 25, 2022

That's the underlying implementation decision, not necessarily caused by the language syntax/semantics.

blintz · on Jan 24, 2022

This is a nice idea, especially given all the work people have done recently to make in-language querying nicer (Spark comes to mind).

My only gripe is the 'auto-generated' column names for aggregates. This seems like a recipe for disaster - what if there is already (as there almost certainly will be) named "sum_gross_cost"? The behavior also just seems rather unexpected and implicit. My suggestion would be simple syntax that lets you optionally give a name to a particular aggregate column:

    ...
    filter gross_cost > 0
    aggregate by:[title, country] [
        average salary,
        sum gross_salary,
        average gross_cost,
        let sum_gc = sum gross_cost,
        count,
    ]
    sort sum_gc

While it might seem a little uglier, it seems much more sustainable in the long run. If this is really too gross, I'd advocate some token other than underscore that is reserved for aggregation variables; perhaps `sum@gross_cost` or `sum#gross_cost`.

yen223 · on Jan 25, 2022

> My only gripe is the 'auto-generated' column names for aggregates

For what it's worth, a similar problem already exists with SQL. Something simple like

  select count(*) from my_table;

automatically aliases the column to `count`, even if `my_table` has a column called `count`.

In practice, I don't think this is a major problem.

ziml77 · on Jan 25, 2022

Which database does that? MSSQL doesn't assign any name to the column in this case.

yen223 · on Jan 25, 2022

Postgres does this

maximilianroos · on Jan 24, 2022

Definitely — giving the option of naming them is great.

I'm not sure whether we should force naming? When I'm writing a query often I'm fine with something auto-generated when starting out.

galkk · on Jan 24, 2022

I'm quite opposed to the idea "from should be first".

I want to understand what exactly the query returns, not the implementation detail of the source of this data (that can later be changed).

Literally first example from page - I have no idea what is being returned:

    from employees
     filter country = "USA"                           # Each line transforms the previous result.
     let gross_salary = salary + payroll_tax          # This _adds_ a column / variable.
     let gross_cost   = gross_salary + benefits_cost  # Variables can use other variables.
     filter gross_cost > 0
     aggregate by:[title, country] [                  # `by` are the columns to group by.
          average salary,                              # These are the calcs to run on the groups.
          sum     salary,
          average gross_salary,
          sum     gross_salary,
          average gross_cost,
          sum     gross_cost,
          count,
 ]
     sort sum_gross_cost                              # Uses the auto-generated column name.
     filter count > 200
     take 20

of course, similar things are happening to SQL too, with CTEs becoming more widespread and "real" list of the columns hidden somewhere inside, but it's still parseable

phailhaus · on Jan 24, 2022

    SELECT id, name, author

Quick, what is this query about? What's ironic is that I think you have it backwards: the columns are the implementation detail, not the table. The table is the context: you can't change that without having to change everything else. But columns are the last step, the selection after the filters, joins, etc. They can be changed at any time without affecting the logic.

taeric · on Jan 24, 2022

This is... An odd choice. I'd assume I'm not without context looking at a query to know why I would want those columns.

And the auto complete story is backwards. Often I know what columns I want, but I'm not clear what table I need to get them from. Such that, if you make a smarter suggest in the from to only include tables that have the columns, I'd be much happier.

sanderjd · on Jan 24, 2022

Just throwing in another point of anecdata onto this pile: "Often I know what columns I want, but I'm not clear what table I need to get them from" does not make sense to me. I don't relate at all to their being a global namespace of columns, rather than a namespace of tables, each with its own columns specific to its context.

taeric · on Jan 24, 2022

I challenge this. I accept that there are ambiguities, but I assert that you can go really fast by just telling someone to fetch a few columns by name.

I further assert that if your database is filled with "Id" and "name" columns, instead of "department_name" and similar, you are probably as likely to mess up a join as any benefit you get from the name being short. (And really, what advantage is there in short names nowadays?)

That all said. I worded my take too strongly. My point should have been that auto suggest should not be confined in either direction.

sanderjd · on Jan 25, 2022

I think we have just done most of our data work in different environments.

When I'm trying to query stuff, the first question is "which service's database is that in?", so I can guess "user_service" (or whatever I think it is called), but I have no idea what they call anything in their schema, but now that the autocomplete system knows what table I'm interested in, it can help me figure that out.

taeric · on Jan 25, 2022

I'm used to emacs, with a global namespace. Such that I'm used to searching all variables globally. Feels that searching all columns would be just as easy, all told.

That said, I want to be clear that I think both methods are valid and work.

sanderjd · on Jan 25, 2022

Yeah, it's just that I don't know what the columns are called. The namespaces (databases and tables) are what guides me to the columns, not vice versa.

I can think back to projects / companies where this may have been different, basically just with fewer different distinct schemas, but it just isn't my recent experience.

(I appreciate your magnanimity in these later comments by the way.)

BeefWellington · on Jan 25, 2022

It's query asking for the id, name, and author fields. Very straightforward, I have no idea how this is confusing.

> The table is the context: you can't change that without having to change everything else.

Except even in the provided single-table example this isn't true - you're getting subselected/CTEd results. No functional joins are demonstrated unfortunately.

For example:

   from employees
   left_join positions [id=employee_id]

   ...is equivalent to...

   SELECT * FROM employees LEFT JOIN positions ON id = employee_id

No data is selected from positions in either example, and it's unclear on why we're joining that table (other than just for the heck of it). It's not a workable example.

phailhaus · on Jan 25, 2022

You restated the query; I was asking what it's about. Is it a query across publications? Or is it a query over news articles? That context changes everything: how the query is written, what it can be joined with, how it can be filtered, how it is used, etc. Putting the FROM clause first means that you immediately have context to understand the rest of the query.

BeefWellington · on Jan 27, 2022

We can both play this game though:

    FROM articles

What fields am I expecting in the resultset? Sure I have some context, I know it'll be about articles, but I have no idea what actual data I care about.

You're arguing that:

    FROM articles
    SELECT id, name, author

Is substantially superior to:

    SELECT id, name, author
    FROM articles

I don't see them as markedly different with such a small example. HOWEVER, where the difference comes in is the "at a glance what am I getting in my resultset" data that is much easier to see in the latter, second easiest in the former, and not at all present in the linked article.

I think what this boils down to is what you (any reader, not you specifically) individually expect to need knowledge of when you're writing SQL. In most cases when sitting down to write a brand new piece of code to pull some data from the database, putting the list of tables involved in the query first matters most to some, whereas putting the list of fields to expect in the resultset matters most to others (I put myself in this camp).

For what it's worth, as I've stated elsewhere while I don't prefer it and would find it annoying to debug personally, I do recognize that the idea of SQL that allows you to list FROM / JOIN / etc. first is very appealing to some. What I think is completely off is the near-obfuscation of the examples in the linked article.

phailhaus · on Jan 27, 2022

> I don't see them as markedly different with such a small example.

Of course not, because we're talking about a fundamental change in syntax that affects more than just two-line queries. PRQL puts the SELECT clause practically at the end of the query, after every join, filter, and aggregate. If you only care about the output, that's cool too: just look at the end. But if I want to understand a query in SQL, I have to read the entire thing backwards, clause by clause! By contrast, if the SELECT is at the end, that's not much of a problem.

Now you'll come back and say "but it's the same if you care about context: just look at the end!" And that's where we differ. I care more about writing, debugging, and understanding queries, whereas you think it's more important that the column names are up front, even if it makes understanding a nontrivial query much, much harder.

inglor · on Jan 24, 2022

The big advantage of "from first" like we have in Kusto KQL (a database we use at Microsoft) is that it provides much better autocomplete (if I write the `from` it can easily autocomplete the projection).

If you want an interesting example of how a query language built for developer experience and autocompletions looks definitely check it out!.

lemmsjid · on Jan 24, 2022

That's interesting because it also explains why I was going to say I do like having from first. When trying to reason about a query, I mentally go through the following:

1. What tables are being pulled from? This speaks to the potential domain of the query. 2. What data is being selected (I can now know what is or isn't being pulled from the aforementioned tables...) 3. What operations, aggregations, groupings, etc. are being performed to work on the pulle data

Of course from vs select ordering is completely arguable, but my thinking process seems to follow that of the auto complete--in other words that my cognitive load of looking at the select statement is lessened when I know from what the columns are being selected.

It also follows (at least to me) the mental process of writing the query. First look at the tables, then decide what columns, then decide what functions to apply.

taeric · on Jan 24, 2022

I said it in a sibling, but I feel this is somewhat missed. Auto complete that simply lists the tables is easier if from is first. But... Auto complete that helps me know what tables can give me my requested columns works the other direction.

majkinetor · on Jan 24, 2022

Too bad we can't use Kusto with anything except Azure.

_lqaf · on Jan 24, 2022

Designing languages around autocomplete is like designing toilets for better toilet paper dispensers.

The language should be right for human understanding, not automated mad-lib generation.

inglor · on Jan 24, 2022

You would think that but having used both I find writing Kusto/KQL much smoother, neater and faster and if I have to choose between writing a query in either one I'd pick KQL.

I understand this is just an opinion but it's an opinion held by everyone in my org who writes both.

Theoretical correctness loses to pragmatism a lot and I'd read the KQL every day. Look at the examples at https://docs.microsoft.com/en-us/azure/data-explorer/kusto/q... - look at the examples at https://docs.microsoft.com/en-us/azure/data-explorer/write-q... and tell me they're not more readable than comparable SQL?

(I can see the result type both by hovering on the query but also by just looking at the end of it - and in SQL most of the SELECTed items in complex queries are from subqueries anyway - at least in my use case)

keithnz · on Jan 24, 2022

now if MS made a KQL -> TSQL or support it natively in SQL Server, that would be great :)

adamrezich · on Jan 24, 2022

yes please!!

phailhaus · on Jan 24, 2022

Building for autocomplete is building for human understanding. If it is impossible for a computer to determine the context of your query, why would a human do much better?

_lqaf · on Jan 24, 2022

They are not fully-aligned goals, and autocomplete should not be given equal consideration on par with human clarity.

If you want nice autocomplete too, that's fine, but if there is a tradeoff, human understanding is the primary concern.

tester756 · on Jan 24, 2022

I don't understand why do you think about it this way

C#'s LINQ (really powerful tool similar to SQL) works the same way

look:

var list = new List<int>{1,2,3}

var extracted = list

.............................Where(x => x > 1)

.............................Select(x => $"my number: {x})

.............................ToList();

or

var extarcted =

........................from x in list

........................where x > 1

........................select $"my number: {x};

ziml77 · on Jan 25, 2022

Technically to be equivalent you need to wrap the second one in parentheses so you can use ToList() on it. Unfortunately a bit ugly. I'm not sure why they didn't add one more keyword to handle pipelining into other functions. Something like "feed", "into", or "pipe". Or just pluck the |> operator from F#.

jiggawatts · on Jan 24, 2022

I’ve been frustrated by toilets where I have to contort my body to reach the dispenser. Similarly, I’ve had dispensers intrude on the space where my legs would normally be and make it awkward to even just sit on the toilet.

Toilets are absolutely designed to make the dispenser placement convenient. You just don’t think about it because 95% of toilets get it right, so it just doesn’t bother you that much that it can be wrong.

In SQL, some decisions are right about 10% of the time and are annoying and awkward the other 90%.

That’s why the order matters. Because everything else got it right.

oldsecondhand · on Jan 24, 2022

There's a person behind the IDE. If you help the IDE, you help the person using the IDE.

emteycz · on Jan 24, 2022

I'd agree if there was any way whatsoever of fixing this issue, but there simply isn't. The editor can't even begin to guess what you might want until you write your FROM.

taeric · on Jan 24, 2022

Maybe in gigantic systems with more tables than makes sense. Realistically, all of the columns available in a database can be fit in memory with ease.

Then, the ide could basically fill my from out for me, based on what I'm asking for. Can even suggest what join I will need, if I list columns from multiple tables.

tester756 · on Jan 24, 2022

>Maybe in gigantic systems with more tables than makes sense. Realistically, all of the columns available in a database can be fit in memory with ease.

Every table has more than one column

So there's always more columns to remember than tables and generally tables are pretty easy like user invoices blabla

I worked with systems that had like 500 tables and some of them with 20-50 columns

you really want good intellisense in such a environment

taeric · on Jan 24, 2022

500 times 50 is still not a big number. And you could do decent statistical suggestions on the current columns in.

Good intelligent suggestions is, of course, helpful. And I agree that suggesting one of 500 is easier than the other. That said, neither is hard for a computer. And even asking friends what table I want will often be done with starting with the actual columns I want.

tester756 · on Jan 24, 2022

On the other hand I don't see why it couldnt work both ways

if you start query with

SELECT MiddleName then you could receive auto complete thats adds "FROM Users" and moves your cusor after MiddleName.

if you start query with

FROM Users SELECT _ and know intellisense drops list of columns

taeric · on Jan 24, 2022

Agreed. I'm really just arguing that it doesn't have to be from first.

Xelbair · on Jan 24, 2022

And one of use cases is writing queries which it helps immensely. Best of both worlds would allow both orders. Just automatically transform the query to the usual form after it's execution.

kortex · on Jan 25, 2022

I think it's quite a common convention in engineering - not just software - that the input to a process "goes in the top and out the bottom". We humans read top->bottom (regardless of left/right/vertical, I don't know any languages that write bottom up). Conventional voltage in circuit diagrams usually flow top to bottom. Gravity loads in schematics flow top to bottom. Chemical pathways are usually written top to bottom. And of course functions take arguments up top and return at the bottom, maybe with some short circuits. I think the only counter example of note is distillation columns.

Where is the data coming from? Employees table. What's coming out? 20 rows of sum_gross_cost.

What could improve this is function signatures. It's kind of nice to have the whole abstraction up top...like an abstract.

quocanh · on Jan 24, 2022

I agree that the columns of the results should be more obvious. But I am a proponent of "from should be first". I have never written a SQL query without thinking about the contents of a table or its relations. If it was my way, I would describe where the data I'm pulling from, then describe any filters/joins, then describe the columns that I'm interested in (last).

BeefWellington · on Jan 25, 2022

You've never authored a SQL query that does things like check special functions that don't exist in a table?

For example:

    select @@version

fragmede · on Jan 25, 2022

Seems easy enough to work around with a magic table name in this hypothetical future reworked dialect of SQL?

    from @@special
    select version

tapas73 · on Jan 26, 2022

or

    from dual
    select @@version

barrkel · on Jan 24, 2022

It's a fair sentiment, but it can be handled without losing directional flow and composability, some of the bigger advantages of reworking SQL.

One idea would be along the lines of a function prototype: a declaration, up front, about the columns and types that a query is expected to return. It's a good place to put documentation, it's redundant information which should protect against mistakes but not so redundant that it would be too taxing - the author should know what the query returns. The prototype would only be used for validation of column names and types.

Another idea would be requiring the last element in a query to be a projection, a bit like the return statement in a function body: here's what I'm returning out of the grand set of symbols available (e.g. via various joins) in scope from previous operations in the flow.

samatman · on Jan 24, 2022

I'm also completely unfamiliar with the PRQL syntax, outside of right now.

Reading the comment however, it would seem that `let` adds columns which are implicitly returned in the order they are defined.

I do see benefits in this, and can imagine pitfalls. Hard to judge without kicking the tires.

Update: It's quite possible we saw different syntax!

https://news.ycombinator.com/item?id=30063266

Without the `let` I would imagine having trouble reading it as well, I'm not sure if that would go away with familiarity but my instinct is that it's a useful addition.

KerryJones · on Jan 24, 2022

This feels like a English-language thing. In english we tend to put our adjectives first, it feels natural, "Where is my red, round ball?", rather than some other languages (like German) where you put the subject first. Equivalent of "Where is my ball, red & round?"

While it inherently feels unnatural I do agree with the others here that the context is actually easier to understand once over the initial uncomfort.

zigzag312 · on Jan 25, 2022

_from_ is kind of one of the most important context about the data being returned. It provides the type information. Columns you select are just properties of that type.

In SQL, where _from_ is placed at the end, we are essentially writing equivalent of 'property.object'; eg.: name.person, age.person

mcsoft · on Jan 24, 2022

Both CTEs and this idea address the same problem: poor readability of complex SQL queries. Compared to CTEs, the author takes the idea to split the complex query into parts to the next level.

To your point - a solid IDE will show you what's being processed at each line (or returned, if the cursor is on the last line) - in an autocomplete window or a side panel.

pmontra · on Jan 24, 2022

First, kudos because it takes courage to take on SQL in this way.

Second, this kind of reversed SQL (filter-first, select-last) is much easier to reason about than the original and keep in mind that I prefer to code complex queries in SQL than to build or translate them in the ORM of the project I'm working on.

Maybe a transpiler is an inevitable first step but I think that any SQL replacement should be itself the target of ORMs and run directly in the database CLI tools (psql / mysql ...) or IDEs (pgAdmin, MySQLAdmin, ...). What's the long term plan of the project?

dragonwriter · on Jan 24, 2022

> Second, this kind of reversed SQL (filter-first, select-last) is much easier to reason about than the original

Given that SQL clauses tend to be unambiguously terminated by the start of the next clause or the end of the statement, it surprises me that no engine has gone to accepting otherwise standard(-ish, as much as real DB vendor dialects are) SQL but without a mandated order of clauses.

And then combine that with dev tools that allow easy rearrangement of clauses, perhaps based on configured preferences so that you don’t even see the original if its not your preferred order, so that “Bob likes old-school SELECT FROM WHERE GROUP BY and Alice likes FROM WHERE GROUP BY SELECT” isn’t a problem.

maximilianroos · on Jan 24, 2022

Thanks!

I agree that integrating with the DB would allow much more from a lang. But PRQL is a bet that languages which start there (e.g Kusto) get lost because it requires changing DB, which is really hard. I worry EdgeDB may hit this issue too (but I'm really hoping it works, and they have an excellent team).

As I think you're suggesting — you could imagine a language starting out as a transpiler, and then over time DBs working with it directly, cutting out some of the impediment mismatch.

Malloy [1] is another point in space — it targets existing DBs through SQL queries but can also ask for schemas etc while developing.

[1] https://github.com/looker-open-source/malloy

anentropic · on Jan 25, 2022

FYI I think the phrase you're looking for is "impedance mismatch"

(I noticed this on the github readme too)

maximilianroos · on Jan 26, 2022

Ah thank you!

int_19h · on Jan 24, 2022

I'm kinda surprised that the list of influences doesn't mention XQuery. Yes, it's not a relational query language... but it covers much of the same ground in practice, especially the part that they call "FLWOR expressions" (for/let/where/order/return) that operate on "tuple streams":

https://www.w3.org/TR/xquery-31/#id-flwor-expressions

And it has grouping, windowing functions etc. I bet you could define a subset that is specifically tailored to the same use cases as SQL - basically, get rid of everything to do with elements and attributes, and only allow scalars and sequences (and maybe maps?). But otherwise keep XDM data types and their semantics.

beagle3 · on Jan 24, 2022

shakti / K / kdb+ implements "real SQL", which is concise but readable, and could give you a few ideas. Here's a copy-paste from https://shakti.sh/ under document/sql.d (cannot deep link, unfortunately). The most most magical aspects are automatic joins - both left joins and "foreign key chase" joins. The fk-chase joins, in particular, should be part of every query language, and can possibly be added in a backward compatible way to existing SQL implementations.

example: TPC-H National Market Share Query 8 http://www.qdpma.com/tpch/TPCH100_Query_plans.html what market share does supplier.nation BRAZIL have by order.year for order.customer.nation.region AMERICA and part.type STEEL?

real: select revenue avg supplier.nation=`BRAZIL by order.year from t where order.customer.nation.region=`AMERICA, part.type=`STEEL

ansi: select o_year,sum(case when nation = 'BRAZIL' then revenue else 0 end) / sum(revenue) as mkt_share from ( select extract(year from o_orderdate) as o_year, revenue, n2.n_name as nation from t,part,supplier,orders,customer,nation n1,nation n2,region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'AMERICA' and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'STEEL') as all_nations group by o_year order by o_year;

maxwelljoslyn · on Jan 24, 2022

Thanks for the tip. That automatic "foreign key chasing" looks phenomenal. Byebye, much of that big tedious chunk in the middle of your 2nd example... Wish I had that for more of the SQL I write.

lijogdfljk · on Jan 24, 2022

I wonder how this compares in practice to EdgeQL, https://www.edgedb.com/showcase/edgeql

Offhand i thought PRQL seemed easier to reason about, but something about EdgeQL seems better to me.. though i can't describe it.

maximilianroos · on Jan 24, 2022

I see EdgeQL as an excellent replacement for SQL in OLTP settings — it has great language integration and a unified relational & typing approach. (Please correct me if this is mistaken though).

I wrote the PRQL proposal for analytical / OLAP queries, where the pipeline of transformations are more important, and relations and typing are relatively less important.

RedCrowbar · on Jan 24, 2022

EdgeQL is getting support for generic partitioning/aggregating `GROUP` very soon [1], so we are giving some love to the analytical side of things too :-)

We definitely need more collective effort put into "Better SQL", so PRQL is a welcome sight!

[1] https://github.com/edgedb/rfcs/blob/21e581a188715c6ff82944b6...

glogla · on Jan 24, 2022

This is cool! I love it.

Few notes:

1) SQL also allows you to define windowing reusably, like this

    select
        sum(blah) over window_abc,
        avg(blah) over window_abc
    from table_xyz
    window window_abc as (partition by x order by y)

so that second example could be written somewhat less repetitively but it wouldn't change the whole point.

2) Sadly my main pain point with SQL for ETL is not possible to solve with a transpiler - SQL has exactly one target so doing things like "I want these records to go to a table A and those records go to table B" is not possible with one query.

3) It would be cool to see how this does typically annoying and repetitive cases from analytics / data warehousing world. I'm thinking like SCD1/2 implementation. But I don't even know if mutation is there yet.

4) I would recommend investing in one canonical formatter, like Go has. So that there isn't infinite number of ways the same query could be formatted for people to argue over preference.

EDIT:

5) Since this seems to be focused on analytics (by the choice of queries and Snowflake in examples), I want to highlight that someone suggested to use TPC-H (or TPC-DS) queries as a benchmark. It does sound like a good idea.

spullara · on Jan 24, 2022

SQL could get a lot better by just adopting the ordering of operations like they did with LINQ:

https://docs.microsoft.com/en-us/dotnet/csharp/programming-g...

tester756 · on Jan 24, 2022

Funnily enough LINQ Query syntax is really uncommon and everybody uses method syntax

var list = new List<int>{1,2,3}

var extracted = list

....................Where(x => x > 1)

....................Select(x => $"my number: {x})

....................ToList();

aloisdg · on Jan 24, 2022

Because the original felt odd in C#.

louthy · on Jan 24, 2022

> everybody

Not everybody

tester756 · on Jan 25, 2022

99%? 98%?

datan3rd · on Jan 25, 2022

Awkward syntax — when developing the query, commenting out the final line of the SELECT list causes a syntax error because of how commas are handled, and we need to repeat the columns in the GROUP BY clause in the SELECT list.

There are some SQL varieties that actually allow a hanging comma! Also, the provided examples seem comma-dependent, no?

As someone who writes a ton of analytical SQL, i think this would get super messy for long, complex queries with casting, case statements, windows functions, etc.

Most people just need to learn to write better SQL!

wodenokoto · on Jan 25, 2022

There’s a lot to like here. The ordering and the ability to write functions.

I’m not a big fan of the ternary operator. I think the ‘? :’ is hard to read and caters to programmers from system programming languages. A query language should cater to BI and stats people as they typically have a harder time learning syntax than a CS person has learning non-C-like syntax.

I’ve always liked Pythons ‘val if condition else other_val’. It’s easy to read for new-comers, while the ‘? :’ is a devil to google for.

In the second example I don’t understand why there are both let statements and select statements for the same columns. I also don’t understand why select uses square brackets. Maybe I’m too used to sql but why not just not have brackets?

nurettin · on Jan 24, 2022

To be fair, lateral joins (cross/outer apply in mssql) can help with name aliasing and table functions give sql some reusability. I think the main pain points for sql are pivots and window functions.

A lot of the time you just want to transpose your result, but you have to choose an aggregate and handle null cases to force pivot to work the way you want it.

And a lot of the time you want to aggregate a window but keep the ids of the row so you avoid the having keyword altogether and go for row_number and dense_rank to get your aggregate results.

If I were to write a query language, I would discard group by and having and make it easier to apply transpose and window functions.

gerdesj · on Jan 25, 2022

I'm nowhere near an expert but an interested bystander. I find the SQL example easier to understand than the PRQL one.

... then I read the PRQL one again and again. It does read more like English and more like how people think.

PRQL is much more descriptive.

I quite like it.

betimsl · on Jan 24, 2022

We barely learned SQL for all these years and the guy wants to change it now. Thank you but no thank you.

JK. Cool concept and hopefully it catches.

cies · on Jan 24, 2022

I love quality language proposals like this. I'm not so much in data processing/bigdata, but have had to interact with SQL a lot.

This syntax is lovely! It's more intuitively readable (and SQL is not that bad in that regard).

My feedback:

1. Lower case, underscored everything makes the terms a bit hard to differentiate. Maybe set some classes of symbols in CamelCase, or add a !@#$%& prefixes to them to make it more readable.

2. I dont like to use another language (SQL or PRSQL for that matter) to db interaction, I like to write the queries in the language that I'm using to develop in. There are ORMs in this design space, but I'm a little fed up with them. In Java there's jOOQ. Other less-OO-more-functional ORMs exist in Rust and Haskell land. These often have a code generation step, a library is generated that guarantees some type safety for a give schema version. Some are more SQL-like, some provide a different API. PRQL is much more diverted from SQL than these, and for good reasons. Maybe several languages could easily have libs like this building on top of JPQL?

3. You solution is a bit like GraphQL in some regards; where there is a tool needed to convert the query to SQL. Tools like this exist, like Hasura and the likes. Hasura does a lot more. To me GraphQL has the huge advantage of serving a schema so that clients can be generated. I can interact with GraphQL in a type safe fashion from by generating a client in, say, Elm. The generated client lib does not allow my to write syntax errors in my queries and ensures all type conversions are sound. Maybe PRQL can also be a language like GraphQL in that regard, and provide a schema too.

4. JPQL. It's close to SQL. It improves to SQL, but I never found it enough of an improvement to justify the cost. I think your proposal is better. But still I think JPQL deserves a mention as maybe one of the most widespread compile-to-SQL languages.

rchrch · on Jan 24, 2022

Awesome! Would love to see an implementation. I worked on something similar over the Summer. It’s just relational algebra with pipes for composition. If you are interested, we could get an antlr grammar going and plug it into this basic execution engine to get a feel for the language.

- https://github.com/RCHowell/Sift - https://github.com/RCHowell/Sift/blob/main/src/main/kotlin/c...

maximilianroos · on Jan 24, 2022

Yes this looks really cool, and similar! Feel free to hit me up on Twitter https://twitter.com/max_sixty

Nican · on Jan 24, 2022

Also worth looking at KQL: https://docs.microsoft.com/en-us/azure/data-explorer/kusto/q...

leokennis · on Jan 24, 2022

When you said “KQL” I thought you meant “Kibana Query Language” but I guess every letter of the alphabet followed by QL is already taken twice over…

hardwaregeek · on Jan 24, 2022

I've always wondered why there aren't query languages that embrace algebraic data types and pattern matching. Seems like an obvious fit to me. There's many times where you'd want to model a table that has either this scheme or that schema.

jayd16 · on Jan 24, 2022

Check out C# and LINQ. They pull it off with anonymous types.

jlokier · on Jan 24, 2022

They can work well. In the project I'm working on the database uses algebraic datatype keys (i.e. tags and tag-dependent columns) to make the database faster and smaller than an equivalent relational schema, but the database is used via API rather than via a query language.

bachmeier · on Jan 24, 2022

My thought is that joins are the tough part of the SQL learning curve, but I don't see much in here that reduces the complexity of joins.

omarhaneef · on Jan 24, 2022

There was this professor of language who would say "Do you think the question ('are carpets furniture?') tells you something about the ambiguity of the word carpet, or do you think it tells you something about the ambiguity in the world?"

Similarly, I think joins are "tough" not because of the way SQL expresses them but because the logical possibilities of merging data from multiple tables are varied.

bob1029 · on Jan 24, 2022

There is no such thing as a domain-agnostic SQL database that holds up under this kind of semantic scrutiny. I don't think that there ever could be.

If you are rolling a SQL schema for a home improvement contractor, it is extraordinarily unlikely that their specific business would expect any scenarios in which carpets are sometimes known as furniture.

Having a bounded context to operate within is what makes SQL magical for me. When people don't understand the business or simply the game around how you talk about the business, things start getting messy wrt joins.

omarhaneef · on Jan 24, 2022

The carpet discussion was simply to say that you can't take out all the complexity of a language if the domain it is meant to describe is complex. The language has a limit to how simple it can be.

I was not proposing a SQL database of carpets, or furniture, as a thought experiment.

eurasiantiger · on Jan 25, 2022

EdgeDB

bachmeier · on Jan 24, 2022

Then make some cases easier and fall back to the SQL we already have for the rest?

Supermancho · on Jan 24, 2022

SQL has effectively failed, as a standard, despite it's ubiquity. It's literally being aged out, which makes for opportunities for PRQL, etc to fill pragmatic gaps.

eg the lack of default column aliasing from joins

    SELECT 
        A.id AS A__id, 
        A.name AS A__name, 
        B.id AS B__id, 
        B.name AS B__name 
    FROM A 
    LEFT JOIN B 
        ON A.other_id = B.other_id

When you could have:

    SELECT 
        A.*, 
        B.* 
    FORMAT (TABLE__) 
    FROM A 
    LEFT JOIN B 
        ON A.other_id = B.other_id

cogman10 · on Jan 24, 2022

IMO, this appears not to be something that solves the SQL learning curve but rather the usability of a query language with tooling.

I don't think there is much that could be done to address left, right, inner, outer join semantics. It's just something you have to learn if you want to do a lot of SQL (though, you are likely only ever going to use left and inner joins).

akdor1154 · on Jan 24, 2022

Looks really nice, i've been scribbling away in a little notebook all the things i would do in "akdor's dream sql", and what you have here hits pretty much exactly.

Wondering about generic use of `let` - you have let for col defns, but `func` for functions and a TODO for tables/CTEs - could/should `let` do the lot? (Like another commenter posted, this is how MS's M language, used in PowerQuery in PowerBI and Excel works). Could enable an escape from point-free for entire queries if taken to extreme generality, not sure if that's a good thing, maybe it could be?

Bikeshedding: even with some OCaml/F# experience, i find `f x y` harder to read than `f(x, y)`.

maximilianroos · on Jan 24, 2022

Thanks!

At the moment `let` is used to add a column as part of an existing pipeline. [1]

`func` is the start of new expressions / pipelines. And I just added a proposal for `table = `, which would be the same.

Does that make sense? Very open to more feedback...

[1] I just added `let` based on feedback here, it's better than it was, but not perfect, as it can be confused for a new pipeline given its use in other langs.

jcdreads · on Jan 24, 2022

I like that everyone is trying to make something like SQL that reads more naturally to them. More alternatives is good! SQL is a widely accepted standard, and has strictly defined and super broadly accepted semantics.

As someone who has written quite a few half-baked-for-general-use but fit-for-purpose SQL generator utilities over the years, I'll suggest that if you intend for a novel syntax to be a general SQL replacement then being isomorphic to SQL would massively increase usefulness and uptake:

1. novel syntax to SQL; check! Now novel syntax works with all the databases!

2. any valid SQL to novel syntax; a bit harder, but I'd start by using a SQL parser like https://github.com/pganalyze/libpg_query and translating the resulting AST into the novel syntax.

3. novel syntax to SQL back to novel syntax is idempotent; a nice side effect is a validator/formatter for "novel syntax"

4. SQL to novel syntax back to SQL is idempotent; a nice side effect is a validator/formatter for SQL, which would be awesome. (See also https://go.dev/blog/gofmt, which is where I learned this "round trip as formatter" trick.)

I don't mean for this to sound negative, and I know that 2, 3, and 4 are kind of hard. Thank you for building prql!

haolez · on Jan 24, 2022

This actually looks like an improvement (and I like SQL). This feels closer to non-programmers, contrary to some other SQL "competitors" like that query language from InfluxDB.

thanatos519 · on Jan 24, 2022

It definitely scans better than 'Flux' from InfluxDB2.

One thing I like about Flux is the ability to split streams and return multiple distinct aggregations. Very handy in Grafana dashboards!

kthejoker2 · on Jan 27, 2022

I both love this (it's very well done!) and hate this, because I feel it misses the forest for the trees.

It's like that meme "Nobody says 'I want to be an Excel guru when I grow up.'"

SQL is just the means to an end - to alter or retrieve data in a system.

We shouldn't be coming up with nicer ways to handle autocompletion, or recursive semantics, or windowing functions.

SQL was created to be declarative and human-friendly, to be read aloud.

The next evolution of SQL should be natural language.

It should be bounded at the schema and enriched with as much context as it can, both about the domain of the data, the data itself, and its relationships within itself and to other schemas.

I just want to see all the people in my organization who haven't submitted their timesheet this week; all the orders in my ERP that are waiting for parts from a specific vendor; how much lift I'm getting from my targeted marketing campaign. I just want to delete any contact from CRM who hasn't responded to an email in the last 45 days!

I don't want SQL; I want something that translates natural language into a logical plan, and a physical plan; and a separate tool that allows me to express many, many natural language concepts for my bounded schema ... perhaps in a SQL-like way.

Like a metrics / definition store on steroids, or just a sea of rich aliases and computed columns and subqueries that can be composed without stressing over syntax or newlines or ordering or complex join conditions.

SPBS · on Jan 25, 2022

This only targets SELECT queries right? I guess there's not much to improve on for INSERT, UPDATE and DELETE queries.

BTW window definitions are reusable using the WINDOW clause, there's no need to define it over and over in SELECT.

    SELECT
        date,
        CASE WHEN is_valid_price
            THEN price_adjusted / LAG(price_adjusted, 1) OVER w - 1 + dividend_return
            ELSE NULL
        END AS return_total,
        CASE WHEN is_valid_price
            THEN price_adjusted_usd / LAG(price_adjusted_usd, 1) OVER w - 1 + dividend_return
            ELSE NULL
        END AS return_usd,
        CASE WHEN is_valid_price
            THEN price_adjusted / LAG(price_adjusted, 1) OVER w - 1 + dividend_return - interest_rate / 252
            ELSE NULL
        END AS return_excess,
        CASE WHEN is_valid_price
            THEN price_adjusted_usd / LAG(price_adjusted_usd, 1) OVER w - 1 + dividend_return - interest_rate / 252
            ELSE NULL
        END AS return_usd_excess
    FROM
        prices
    WINDOW
        w AS (PARTITION BY sec_id ORDER BY date)
    ;

dmoura · on Jan 25, 2022

I am writing a language and CLI that mixes SQL with Python (spyql: https://github.com/dcmoura/spyql), and it is interesting to see that we are tackling some of the same problems:

- code/formula reutilization where we need to repeat logic over the query

- functionalities like `EXCEPT` and `REPLACE` modifiers for `SELECT *`, like in google bigquery (in most SQL databases its frustrating when you have a large number of columns and you only want to hide or replace a couple of them)

I do think SQL is all over the place, and while not perfect, it’s familiar and we got used to express our queries the SQL way. At the end, since you are generating SQL to interact with databases, you would have to understand SQL in order to optimise your queries (it might be challenging to get the perfect SQL query from PRQL as you do not know statistics about the tables or which indexes are available).

With SPyQL I am taking a different approach that tries to extend simple SQL SELECT statements so that some of these annoying features are tackled. In addition, by using Python to define expressions and conditions you solve another problem typically present on databases: extensibility. By including an IMPORT clause in your query you can import any Python module, so the sky is the limit. You also get a simple and intuitive way to work with objects and hierarchical data (like JSON).

I do find the language you are proposing very readable and flexible, bringing several advantages to SQL. If you build a parser I would love to bring it to spyql :-) The issues I have brought earlier would not be a problem to spyql since it is a tool to query files and data-streams in the command-line.

dragonwriter · on Jan 24, 2022

> It's not declarative. It's functional.

Functional, logical, and relational paradigms are the most commonly cited examples given of declarative programming.

999900000999 · on Jan 24, 2022

>Compatible — PRQL transpiles to SQL, so it can be used with any database that uses SQL. Where possible PRQL can unify syntax across databases. PRQL should allow for a gradual onramp — it should be practical to mix SQL into a PRQL query where PRQL doesn't yet have an implementation.

Awesome.

I hate SQL so much, I know for personal projects this is gold. I imagine actually using it at work might draw some questions though

gibsonf1 · on Jan 24, 2022

SPARQL. Representing human information in relational tables goes against how people actually think and use information. We humans think in tremendous numbers of nested hierarchies, and recursive hierarchy traversal is a nightmare in relational databases. A graph is the structure for data that works best, is most efficient, and actually reflects how things are connected in our brains.

mindcrime · on Jan 24, 2022

I'm a big fan of SPARQL, but the one thing that would concern me about trying to use it outside of the SemWeb context is simply that it assumes data is stored in <S,P,O> triples. Legacy databases by and large are not, so you need an adapter to bridge the representations. And while I know some exist, I haven't really used them and am not sure about the performance impact.

mst · on Jan 24, 2022

You can get quite far mapping the triple concept to (PK, column, value) or (PK, FK, related-row) and transpiling from there.

(I played around with this some years back, not to the point where anything came out of it worthy of publishing, but enough to be pleasantly surprised how far 'quite far' turned out to be in practice)

eatonphil · on Jan 24, 2022

I'm really excited about languages that build on or are compiled to SQL, in the long-term (because I think it will take a very long time to build adoption).

The ones that particularly excite me are shorthands for SQL, even though their heavy use of symbols may be a detriment. One particular use case is in easily defining static authorization policy-queries that are backed by database data plus and have request variables injected during evaluation.

I am not very excited by datalog/prolog-based languages because I think logic languages are too unnatural to ever go mainstream. But I'd be excited to be wrong or for logic languages to become more friendly.

Here are some others I'm watching.

  * https://github.com/mrumkovskis/tresql
  * https://www.htsql.org/doc/overview.html
  * https://github.com/cytosm/cytosm

larodi · on Jan 24, 2022

in a way SQL is Prolog and all reasoning for improvement of SQL should start from Prolog, because is where SQL started from. the expressive power of both languages is theoretically the same, even though SQL is much more comprehensible. but then again - certain complex task turn SQL in difficult-to-comprehend series of nested declarative operations on algebraic sets.

iblaine · on Jan 24, 2022

The syntax seems similar to Apache Pig. Both are declarative and primarily built to be a procedural form of SQL.

api · on Jan 25, 2022

We don't need better SQL. We need a programming language with relational concepts built into its collections library and a syntax and type system that makes using it easy.

The fact that you have to go through all these intermediate layers to access your data is just stupid. Data should just be "there."

erezsh · on Jan 24, 2022

It seems people here are really interested in alternatives to SQL. So perhaps you'd also like to have a look at https://github.com/erezsh/Preql

(Same name, same goal, different approach, and already working)

deepstack · on Jan 24, 2022

Wouldn't it be better to just use something like prolog, there is already and extension for postgres?

peoplefromibiza · on Jan 24, 2022

This is the kind of things that a well crafted DSL can solve, if the language you use supports macros.

PRQL looks very similar to Ecto, the Elixir Query DSL

https://hexdocs.pm/ecto/Ecto.Query.html

nextaccountic · on Jan 24, 2022

I also saw a proposal for a better syntax for SQL, called BQL, that was a strict superset but allowed for better modularity

http://intelligiblebabble.com/a-better-query-language-bql-la...

It had this github repo https://github.com/lelandrichardson/BQL but never went anywhere

I hope PRQL has a better fate! unfortunately, by deviating from SQL lexical conventions (using :, using [], etc) we lose the ability to copy-paste from sql code elswhere.

I want a better SQL, but I also want some compatibility. Like typescript is for javascript.

ggrothendieck · on Jan 24, 2022

The link refers to dplyr not being able to use databases but actually there is a database backend for it in package dbplyr. See https://dbplyr.tidyverse.org/

rixed · on Jan 25, 2022

> Unnecessary repetition — the calculations for each measure are repeated, despite deriving from a previous measure. The repetition in the WHERE clause obfuscates the meaning of the expression

In my own in-house SQL-like language I solve this simple issue by allowing previously defined columns to be reused:

  select
    salary + payroll_tax as _gross_salary,
    _gross_salary + benefits_cost as gross_cost,
    ...etc...

the prefixing underscore meaning those columns are just temps excluded from the output.

I do not believe other issues are actually issues, and am quite surprised by the volume of interest for this SQL alternative. Can't be because rust is mentioned, can it?