Hacker News new | past | comments | ask | show | jobs | submit login
Pql, a pipelined query language that compiles to SQL (pql.dev)
262 points by ejcx 11 months ago | hide | past | favorite | 131 comments



This is really great! Maybe I'll incorporate this into my own software (scratchdata/scratchdb)

Question: it looks like you wrote the parser by hand. How did you decide that that was the right approach? I myself am new to parsers and am working on implementing the PostgREST syntax in go using PEG to translate to Clickhouse, which is to say, a similar mission as this project. Would love to learn how you approached this problem!


I also wrote a parser (in typescript) for postgres (https://github.com/ivank/potygen), and it turned out quite the educational experience - Learned _a lot_ about the intricacies of SQL, and how to build parsers in general.

Turned out in webdev there are a lot of instances where you actually want a parser - legacy places where they used to save things in plain text for example, and I started seeing the pattern everywhere.

Where I would have reached for some monstrosity of a regex to solve this, now I just whip out a recursive decent parser and call it a day, takes surprisingly small amount of code! (https://github.com/dmaevsky/rd-parse)


It looks a lot like Kusto query language. Here is a kusto query:

    StormEvents
    | where StartTime between (datetime(2007-01-01) .. datetime(2007-12-31)) 
    and DamageCrops > 0
    | summarize EventCount = count() by bin(StartTime, 7d)
edit... yes it indeed was inspired by Kusto as they mention on the github Readme https://github.com/runreveal/pql


Is that what AWS CloudWatch Insights uses?


We use KQL to for axiom too (well, a version of it) - it's a great query language and very flexible for unstructured + structured data.


Azure appinsights logs


This is actually pretty awesome! I use KQL every few days for reading some logs from Azure App Insight. The syntax is pretty nice and you can make pretty complex stuff out of it. But that's it, I can't use KQL anywhere else outside Azure. With this, I can show off my KQL-fu to my teammates and surprise them with how fast you can write KQL-like syntax compared to SQL.


What's the reason to go for this over PRQL?


The simple answer is that it's too distinct from where we're trying to meet our users.

We're not anti-PRQL, but our users (folks in security) coming from Kusto, Splunk, SumoLogic, LogScale, and others have expressed that they have a preference for this syntax over PRQL syntax.

I wouldn't be surprised if we end up supporting both and letting folks choose the one they're most happy with using.


The big advantage of this project in my opinion is that it will pass functions it doesn’t recognize to the underlying DB.

With prql, if they don’t support your favorite operator then you’re out of luck.



Did not know about that!


Looks like PRQL doesn't have a Go library so I guess they just really wanted something in Go?

I would guess they didn't wrap the main PRQL library (which is written in Rust) because Go code is a lot easier to deal with when it's pure Go. And they probably didn't just write a Go version of PRQL because that would be a mountain of work.

Still I think that's a mistake. PRQL is a far more mature project and has things like IDE support and an online playground which they are never going to do...

Better just to bite the bullet and wrap the Rust library.


It’s not the problem with binding to Rust. Python can do it, Zig can do it, C can do it, even JS can do. It is Go which doesn’t integrate well with anything that isn’t Go.


Where did you dream up that idea? PRQL integrates with Go just fine. The project even provides an example in Go.


He said it doesn't integrate well, and he's right. Go works best when you only have Go. (Ok that's true of most languages but it's especially true of Go.)

This article has some details: https://dave.cheney.net/2016/01/18/cgo-is-not-go

Note that this isn't true of languages like Python (CPython) or Rust which are much more closely tied to C than Go is. Go is a "from scratch as if C never existed" language which is great because it doesn't have any C baggage but it does make integrating with C more awkward.


> This article has some details

Couldn't find your own words? The details don't say much. Sure, you have to bend to C to some degree, but that's true of every language that wants to integrate with C. Go integrates no less well than any other language on that front.

And it's not even really all that accurate. Consider "Performance will always be an issue" – gccgo and tinygo have shown that you don't have to have any call overhead. They can call C functions as fast as C can. That is in no way a limitation of Go.

That is only a limitation of gc, and even then the overhead is only a few nanoseconds these days. You're never going to notice. These may have been concerns in 2016 – indeed, overhead was a lot higher back then – but time marches forward. Things change. The link is fast approaching being a decade old by this point. At least give us something from 2024, about the current release that is 17 versions newer than the one referred to in the link, if you really don't know how to formulate your own thoughts.

But you should be formulating your own thoughts if you want to participate in a discussion. Outsourcing thoughts to other people is nonsensical. If those other people want to participate in the discussion, they can write their own comments, but that is for them to do.

> or Rust

PRQL is written in Rust, so it would be quite strange if it wasn't true of Rust. Interestingly, the Javascript bindings use a WASM target. Go could also use the WASM target if there was some reason to avoid more traditional linking. It is curious that you didn't mention that. Again, reason to use your own brain. If you have to outsource discussion, why bother participating at all?


> Okay, sure, you have to bend to C to some degree, but that's true of every language that wants to integrate with C.

Most of the languages I'm aware of integrate with the C runtime much more naturally than go is capable of. Go has its own type of stack, which means it's a pain in the ass to embed into the C runtime and it's a pain in the ass to embed the C runtime into it.

> gccgo and tinygo

Does anyone use these implementations? I honestly had no idea either project existed.


> Most of the languages I'm aware of integrate with the C runtime much more naturally than go is capable of.

In what way? Is it because you call `C.function_name` instead of `function_name`, the latter of which some other languages will allow? I don't see how that is a meaningful difference. Especially when Go developers are already accustomed to referencing pure Go functions in the same way.

> Go has its own type of stack

gc brings its own type of stack, but that's not a feature of Go. Let's not confuse an implementation with a language. Go says nothing about stack layout.

> which means it's a pain in the ass to embed into the C runtime and it's a pain in the ass to embed the C runtime into it.

There might be a pain in the ass for the gc maintainers, but that's not you. You will never notice or care. It is fully abstracted away.

There is some overhead cost to that, but it has shrunk so dramatically over the years, you're not going to notice it anymore either. Protip: Don't use version 1.5 like the previous link is talking about. That was 17 major versions ago.

> I honestly had no idea either project existed.

How'd you miss gccgo, at least? It's maintained by the official Go team. It is the second compiler they always talk about – the one they use to ensure that the standard isn't defined by an implementation.


> Is it because you call `C.function_name` instead of `function_name`, the latter of which some other languages will allow?

No, it's because they use completely different stacks. It has nothing to do with the aesthetics of the language.

> You will never notice or care. It is fully abstracted away.

I mean... aesthetically, sure. How the runtime works still matters a lot.


> it's because they use completely different stacks.

1. Again, that's implementation dependent, not a constraint of Go. The language says absolutely nothing about stacks. tinygo, for instance, uses a C stack.

2. Even in the case of gc, where the stack is unusual, it doesn't really matter. Once upon a time there was some meaningful latency introduced because of it, but that's not true anymore.

What is still something to think about, albeit unrelated to the stack, is if you want to statically link a cross-compiled C library. Cross-compiling C programs a really hard problem. Of course, that is a hard problem no matter what. It is no less easy to cross-compile a C library for a Python program, but the static linking adds an additional complication.

But, at the same time, you don't have to statically link libraries. You can opt to dynamically load libraries, thereby having no cross-compilation issues (assuming the shared library is already compiled for your target platform). This is how most of those other languages we've talked about are doing it to get around the linking challenges where cross-platform execution is pertinent. You can do the same in Go. Go binaries normally being statically linked end-to-end is generally considered a nice feature, but if you're willing to accept dynamic linking in another language...

Naturally, if you never build for systems outside of the system you are using, that's moot. Compiling and linking a C target that matches the platform it is being performed on is easy.

> How the runtime works still matters a lot.

Not really. Anything that might matter is abstracted away – at least to the same extent as other languages. Not your problem.

Interfacing with C is a first-class feature of Go. I don't know where you got the idea that it isn't. There are a lot of good reasons to keep all of your code in the same language (true of Go and every other language in existence), but it is hardly the end of the world if you have a solid reason to turn elsewhere.


> Looks like PRQL doesn't have a Go library so I guess they just really wanted something in Go?

There's some C bindings and the example in the README shows integration with Go:

https://github.com/PRQL/prql/tree/main/prqlc/bindings/prqlc-...


Ah yeah I wonder why they don't mention that in their docs.


I had the exact same question when I saw the post.


Looks similar to PRQL[0].

Neither PRQL nor Pql seem to be able to do anything outside of SELECT like Preql[1] can.

I propose we call all attempts at transpiling to SQL "quels".

[0] https://prql-lang.org/ [1] https://github.com/erezsh/Preql


We're developing TQL (Tenzir Query Language, "tea-quel") that is very similar to PQL: https://docs.tenzir.com/pipelines

Also a pipeline language, PRQL-inspired, but differing in that (i) TQL supports multiple data types between operators, both unstructured blocks of bytes and structured data frames as Arrow record batches, (ii) TQL is multi-schema, i.e., a single pipeline can have different "tables", as if you're processing semi-structured JSON, and (iii) TQL has support for batch and stream processing, with a light-weight indexed storage layer on top of Parquet/Feather files for historical workloads and a streaming executor. We're in the middle of getting TQL v2 [@] out of the door with support for expressions and more advanced control flow, e.g., match-case statements. There's a blog post [#] about the core design of the engine as well.

While it's a general-purpose ETL tool, we're targeting primary operational security use case where people today use Splunk, Sentinel/ADX, Elastic, etc. So some operators are very security'ish, like Sigma, YARA, or Velociraptor.

Comparison:

    users
    | where eventTime > minus(now(), toIntervalDay(1))
    | project user_id, user_email
vs TQL:

    export
    where eventTime > now() - 1d
    select user_id, user_email
[@] https://github.com/tenzir/tenzir/blob/64ef997d736e9416e859bf...

[#] https://docs.tenzir.com/blog/five-design-principles-for-buil...


InfluxDB tried to do this with InfluxQL but abandoned it, and are now back to SQL. The biggest problem I had with it when I tried it, was that is was simply too slow, queries were on average 6x slower than their SQL equivalents. I think a language like this is just too hard to optimize well.


This is incorrect. It was their query engine that was hard to optimize, not the language. InfluxDB has been working on a new query engine based off Apache DataFusion to fix this.

If you squint, this query language is very similar to Polars, which is state-of-the-art for performance. I expect Pql could be as performant with sufficient investment.

The real problem is that creating a new query language is a ton of work. You need to create tooling, language servers, integrate with notebooks, etc… If you use SQL you get all of this for free.


I dont think they have abandoned InfluxQL. They are still supporting it in the InfluxDB 3 as far I know. But they are abandoning flux, which was a mess and pain to use.


Wow, so many query languages, right? Do we really need another one? What's the story behind that decision? Cheers.


This is answered on their blog:

https://blog.runreveal.com/introducing-pql/


Cheers, mate! The blog cleared up a chunk of my question and the chat here gave me a better grasp of why it's over PRQL.


Reminds me of this classic: https://xkcd.com/927/


This is cool. Splunk Search Processing Language (SPL) is a real vendor lock-in feature. Once the team has invested time to get ramped up on SPL, and it gets integrated in your workflows, ripping out Splunk has an even higher switching cost.


Similarities to PRQL: https://prql-lang.org/


PRQL is actually a first class functional programming language, with syntax for supporting readable query processing pipelines. The documentation for PQL is a bit light, so I don't know if it's as powerful as PRQL.


Their first example doesn't look idiomatic at all:

SELECT * FROM "users" WHERE like ("email", 'gmail')

Should “like” here be a user-defined function? Because that’s not the syntax for SQL-like. To which SQL version will Pql translate its queries?


Further down is an example using "minus (now (), toIntervalDay (1))" which is also non-standard SQL. I have no idea which DBMS they are targeting.


> The where operator will validate that the syntax is valid, but it will pass unknown function calls through to the underlying database. In RunReveal's case, we use Clickhouse under the hood, so if we wanted to do a case-insensitive match we could still use Clickhouse's lower function.

From the release blog [1] they mention that unknown functions are passed through to the underlying SQL engine -- this let's them target anything from mysql, Postgres, ClickHouse or proprietary engines like Snowflake.

1. https://blog.runreveal.com/introducing-pql/


for anyone using anything more than basic SQL functionality so far this looks very limiting. No window functions, no agregate filtering, no data type specific functions (ranges).


How do I get parametered queries into this? can I? should I?

edit: guess I can't

https://pkg.go.dev/github.com/runreveal/pql


Is the "piping" associative? As in, does it allow me to put `where eventTime > minus(now(), toIntervalMinute(15)) | count` into a variable so I can use it later on multiple different tables/queries? I remember failing to do the same thing with ggplot2 when I wanted to share styling between components. If the operator is not associative, then the reading order will have to be mixed since composing will require functions (and Go doesn't have pipes/UFCS)


One potential advantage of these compile-to-SQL languages seems to be - they might be easier to tune a codegen LLM on. SQL is very verbose and IME, english -> SQL doesn't really work too well, even in high end products (i.e. pricy SaaS offerings using presumably GPT-4)

My hunch is tuning english prompts on less verbose, more "left to right" flowing languages might yield better results.


At this stage I feel that the natural evolution for SQL is instead to use english to describe what you want and have an LLM generate SQL. Often with comments.

For some reason, a lot of these SQL alternatives seem to be syntactic preference and not much simpler or clearer than the original.


Does this really appeal to developers? SQL is an extremely expressive language and while there are things in it I would change they are mostly minor points. Writing a query in english where the actual SQL it translates to is dependent on the specific version of the LLM seems dangerous - and forcing the query expression to be in a language that not everyone speaks either as a first language or at all seems to make code less accessible.

When properly styled with good indentation and syntax habits SQL is extremely readable.


It's good for the case where you want a somewhat complex analytical query generated on the spot.

But this is only for initial generation. After that, you should be using pure SQL.


I’ve never understood why people want a more verbose version of sql.

I think what people really want is business rules and data cleaning and schema discovery.

If you had to use English against multiple source systems to and tons of joins, the sentence would be paragraphs.

Where I think there’s value is in using something like a data catalog to label business rules against a data warehouse, tied to dashboard queries and other common ones.

But that’s a hard problem and a unique model to every customer. And always changing.


Combining schema discovery and data catalog seems like it might be a hard problem requiring a lot of LLM prompt engineering gymnastics but maybe I underestimate the state of the art.


We might eventually get a good LLM to SQL tool, but my experience with them is that they make slick sales demos, but are worse than useless in the real world.

You have to know SQL to use them. They produce a lot of code that looks correct and produces correct-looking results with subtle errors. So you can't just hand it to someone who doesn't know SQL and let them query the database, but that's the use case where something like this would be valuable. You have to be experienced with SQL and know all the peccadillos of the DB you're working with to check the query and output for correctness.

For someone like me who is experienced with SQL, I can write simple queries just as fast as I can figure out how to prompt the LLM to get what I want. Where a tool like this would be really helpful is if it could help me write more complex queries more quickly. However, it is non-trivial to get the LLM to generate complex queries that take into account all the idiosyncrasies of your specific data model. So again it ends up being much faster for me to just write the query myself and not involve the LLM.

Where I think LLMs go wrong with SQL is that to write good SQL you have to have a deep knowledge of the underlying data model, and the LLMs aren't good at that yet.


Strongly agree. LLMs need something more than just the DDL of the tables and an instruction to write useful SQL in the real world. However, I've had decent success by (1) integrating heavily with a semantic layer on top of your database, and (2) going the agent approach where the LLM is allowed to run different queries and explore the data before writing the final query.


Asking the LLM if the query can be optimized a few times and then checking it in a planner works surprisingly well for me but YMMV.


We built LLM-to-SQL before this at RunReveal, and while it's useful and gets queries mostly correct 80% of the time, 20% of the time it's way off or requires nontrivial manual intervention.

We're still fairly bullish on the LLM-to-SQL front though, but in the meantime PQL is a good bridge.


As a company that's invested into this. Would you mind talking as to why you don't want to use raw SQL - are there particular deficiencies you've found in it?


See tools like pandas and Polars. These database libraries are abstractions that give you a spray of SQL functionality. I prefer using these libraries because it feels much more intuitive (and works with the Python/arrow ecosystem). (I'm also biased since I make a portion of my living off of pandas training material.)


We do use raw sql, but we're a security business which tends to have heavy reliance on other languages that have a similar syntax to pql


Here is a long blogpost "against SQL" which lists many deficiencies of it: https://www.scattered-thoughts.net/writing/against-sql/

In short, it has a longer spec than famously-complex C++ while making a much less expressive language out of it.


A thought experiment:

Rather than an LLM, you can send your request for an SQL query directly to Donald D. Chamberlin, one of the original designers of SQL. Furthermore, he gets an ERD for your database.

What odds you get back a query that gives you correct answers?


we (https://www.definite.app/) do this.

The SQL generation works well out of the box and works better as you update the semantic layer. The semantic layer includes things like joins and measures (e.g. aggregate functions) that you'd want standard definitions for. For example, you don't want an LLM creating a definition for MRR on the fly. All the semantic definitions are plain SQL.

quick demo: https://www.loom.com/share/a0d3c0e273004d7982b2aed24628ef40?...


SQL is part of a generation of languages which attempted to have (within strict syntactical constraints), a natural language-sounding approach to begin with (another one that comes to mind is AppleScript).


Their very first example has issues:

> users > | where like(email, 'gmail') > | count

becomes

> WITH > "__subquery0" AS ( > SELECT > * > FROM > "users" > WHERE > like ("email", 'gmail') > ) > SELECT > COUNT(*) AS "count()" > FROM > "__subquery0";

Fetching everything from the users table can be a ton slower than just running a count on that table, if the table is indexed on email. I had to deal with that very problem this week.


That is the same query plan for any contemporary query planner.

(Just like any C compiler will produce the same output for `x += 2` and `x += 1 + 1`.)

---

A notable exception was PostgreSQL prior to version 12, which treated CTEs as an optimization fences.


I'd be hesitant to assume the generated CTEs are always going to be amenable to optimization. The examples on the linked page are pretty trivial queries - I wonder what happens when that ceases to be the case, as seems very likely with a tool that apparently doesn't do a great deal to promote understanding what goes on under the hood.


It's possible for sure.

It's just worth recognizing that SQL itself is a tool that doesn't do a great deal promote understanding what goes on under the hood. (I've witnessed that firsthand many times.)


Granted, but I've not once yet seen it help to add a second hood for things to be going on under.

If nothing else, having to write SQL tends to lead engineers to the engine manual, where they have at least a chance to become aware of the existence of query planners.


Should not be a problem on modern Postgres.


Sorry I might be dumb but why do we need this?


Because SQL is a nightmare. (Standard is in thousands of pages; nobody fully implements it; not actually a single language as usually each db has deviations and extensions; nonmodular, composability is hard - an afterthought)

And the worst part: nothing better exist; single’ish bad language is better than dozens of new shortlived ones that have quirks in various other places.

But somebody needs to be idealist and keep trying.


That is quite a hyperbolic statement and I have to disagree with you. I've used SQL on a variety of databases over 15 years, and yes while some like MySQL have poorly implemented anything other than basic SQL/RDBMS features, most are very similar in feature sets. There are vendors-specific additional features like how Oracle supports hierarchical queries with CONNECT, but you don't have to use them.

- CTEs are very close if not the same across Oracle, PostgreSQL, DB2, Hive, Snowflake, and MS SQL Server - I believe even Sybase too but it's been a while. - Joins work all largely the same even though a couple of those support additional join types, especially when you want to join on functions that return data sets. - Window functions are supported by every major DB with similar or the same syntax too. Any differences take 5sec to lookup in documentation.

My only complaint is loading data is highly vendor specific.


I used SQL in various implementations for about 15 years. I didn't find much fault in it until I started using KQL (the language which seems to have inspired Pql).

The difference in enjoyability is stark: I truly hate SQL now.

More robust criticism is provided here (https://carlineng.com/?postid=sql-critique#blog). The quote I usually drag out is from Chris Date, who helped pioneer relational DBs:

"At the same time, I have to say too that we didn’t realize how truly awful SQL was or would turn out to be (note that it’s much worse now than it was then, though it was pretty bad right from the outset)."

https://www.red-gate.com/simple-talk/opinion/opinion-pieces/...


    For example, why not allow the following expression as a legal statement:

    tablename;
FYI, in Postgres you can write

    TABLE tablename;


MySQL has supported CTEs since v8.0. Sadly, still no support for RETURNING though, which limits their usefulness, but it does in fact support CTEs now.


SQL is indeed pretty terrible IMO. Here's my favorite source to trot out when people inevitably disagree: https://www.scattered-thoughts.net/writing/against-sql/


Most of these problems are pretty trivial to solve by simply reading the manual for the database engine you happen to be using, which typically will document deviations from the standard in serviceable detail.

I'd be more sympathetic to concerns about switching engines if I had, at any point in a career now heading for its 25th year, ever seen that occur.

Given the frequency with which ORMs are used in greenfield to defend against this notional problem, and the many problems that using an ORM always inflicts, this may well be the costliest form of premature optimization I've ever seen. It certainly can't be outside the top three.


many other DSLs transformed to SQL. And now we have LLMs that can do it much better way.


The first example is worse. Why learning a piping syntax instead learning SQL?


The R {dbplyr} package is also a very good way in practice to pipe SQL.


Also AFAICT the only one that is fully integrated with a general purpose programming language with an excellent IDE.


What target database is that in the examples?

     like ("email", 'gmail')
     minus (now (), toIntervalDay (1))
are non-standard functions/conditions


Maybe I’m totally missing it but why would I use it over sql? All those companies have their own flavor DSL so are you saying this is to standardize using it? Thanks


My primary reason for caring ATM is because a better query language for SQL, one that specifies what tables to operate on before what to do, could simplify tooling quite a bit.


Pipelining is cool, though this could've easily just been a library with nice chaining and combinators in your language of choice (seems to be Go here).


Yeah, but isn't the nice thing here that you can run it directly in your database? Which has all the data and probably a fair bit more compute power than your laptop/PC.

Edit: my comment acted like ORM type libraries than execute within databases like Ibis don't exist. My bad!


So basically the same as C# LINQ feature that allows you to automatically generate SQL and perform database operations without using SQL language.


I never liked the original SQL syntax, it's weird. It's one of the few places where I think using s-exprs would've made more sense.


I’m glad this exists but would caution extensibility as the most important thing for devs to consider when picking there “ORM” stack especially in terse Golang.

For that I use squirrel which uses the builder pattern to compose sql strings. Keeping it as strings and interfaces allow it to be very easily extended, for example I was able to customize it to speak SOQL (salesforce). plenty of downide though.


Serious question - in this day and age why not just as GPT4 in English to write the SQL you need?


Do you want your SQL to be correct?


Reminds me of how ActiveRecord works in Rails


I don't get it. These examples are kinda uninspiring, generated SQL output being unnecessarily complicated doesn't help. I haven't used PRQL, but at least it's pretty obvious from examples, how it's nicer to use than SQL. But this one — yeah, examples on the left are "nicer" than convoluted output on the right, but if you write SQL normally, it's basically just a lot of "|" and table name in the beginning, instead of in the middle. So what's the point?


Another day, another framework that tries to reinvent a limited subset of C#'s LINQ query syntax :)


SQL in the examples is deliberately convoluted, to make pql look more elegant.


Not to be confused with Prql.


The only thing I like in SQL is that is almost the same language in decades. Learn it once and you're done. If you really need, you could write macros yourself. I don't see the value of learning a new language to do the same thing


Given how many new programming languages have become widely used in the time since SQL's creation, it seems many others do se some value. For a mostly direct parallel to PQL, see CoffeeScript and TypeScript.


I love LINQ just because it made HOF look like SQL


SQL's nonsensical handling of null is reason enough to learn other query languages.


Do you mean that NULL <> NULL and NULL infects boolean logic?

NULL is always an awkward thing to deal with - how you want to handle it depends on the specific thing you're trying to accomplish. I'd probably prefer it if NULL equaled NULL when dealing with where conditions but it actually makes join evaluations a lot cleaner - if NULL equaled NULL then joining on columns with nulls would get really weird.

At the end of the day IS NULL and IS DISTINCT FROM/IS NOT DISTINCT FROM exist so you can handle cases where it'd be weird.


the best way to handle nulls is with Option / Maybe types. that is, without null at all

unfortunately they were not invented at the time sql was created


I think that's just a question on syntactic sugaring here - so, concretely, what would that mean for comparison operators? If I wanted to `id = id` and both were nullable would I need to express that as two layers of statements where I tried to unwrap both sides first or would we have a maybe vs maybe comparison operator - if we had such an operator what would it do in this case?


You’d have to unwrap, usually with convenient mechanism for it —pattern matching

The problem is that id = id is fundamentally incorrect for a nullable column. You should have done id is not null and id = id. And you shouldn’t have been allowed to do the first anyways, because nothing good can come of it (there is no sane semantics to stuffing a trinary logic into a boolean algebra, and SQL chooses one of the many insane options, leading to both false positive and false negative matches depending.) the only correct answer is not to do that.


I can understand how that would make things more explicit - but given how common nullability is in the realm of databases I think it'd be a bad idea to force that much syntax in a situation where id=id is essentially what we want to discuss - especially since columns becoming potentially null is so trivial to achieve (even when impossible due to model constraints) as soon as you start throwing left joins in the mix.


Sometimes you really want explicit, rather than dealing with the errors caused by implicitness. And with a language that has proper support for things like Maybe, there isn't that much syntax.

It's true that you have to adopt a completely different language, but when that language saves you from potentially expensive bugs, it becomes appealing.


You might be interested in my attempt: https://docs.racket-lang.org/plisqin/Nullability.html

I think it handles all the 3VL problems I've encountered in SQL, but that doesn't mean it handles all possible 3VL problems. It also might not make any sense to anyone except me.


It's been a while since relational algebra so I don't feel confident saying "Yes" or "No" (I guess you could say my reply is Null) - but that seems like a really good base approach - especially having a state for being uncertain whether something is nullable or not... having an operator embeded fallback approach also seems to be a lot more usable for the developer.

Also, this is an aside, but is your thing named after Snake Plissken?


Yeah, kind of. I have seen Escape From New York but I don't really remember it. I was more aware of the character via the reference/homage in Metal Gear Solid 2. In any case, I don't love the name but the obvious choice (Racquel) was already taken and "Plisqin" had zero Google results so I said "meh, good enough."


Using this language on top won’t solve that though, it still compiles into sql, warts and all.


Pql uses the same NUll handling, since it just passes it through to the generated SQL.


> The only thing I like in SQL is that is almost the same language in decades

So you do agree that the rest of SQL is broken. That’s why there is a value in creating (and learning) such new languages.


I strongly disagree that SQL is broken. Try using JPA or your favorite ORM on a large project with a complex entity graph. It is a nightmare having to ctrl-click deep into the entity graph to see what annotations are on some great-grandchildren to get an idea as to what query is being executed. When working with relational data SQL has always been the ideal tool for the job on the projects I've worked on.


ORMs are fundamentally broken. Hiding the relational structure of a relational database is the act of madmen.

Using the psychosis of ORMs to defend the psychosis of SQL is itself a form of psychosis


I think a lot of times people say “ORM” when they mean “Hibernate” or some other ORM implementation.

In your codebase, do you stick raw SQL all over the place and iterate over rows exclusively? Or instead, as a convenience, do you write helpers that map objects into SQL statements and map result rows into objects? If so, congratulations, you’re using an ORM. The concept of ORMs is not bad. It’s a logical thing to do. Some ORM _implementations_ have some very serious issues, but that does not make ORMs as a whole bad.


I'm not sure I'd consider query-builders to be ORMs. You do inevitably have a mapping from query resultset to list<obj>, so it does meet the name itself, but there's clearly a very large difference between "Hibernate" and query-builder, and if you don't want to use ORM to be the differentiating term calling out Hibernate and friends specifically, then let me know what you do wish to use. But as far as I know, colloquially and commonly, ORM refers to the former -- not the general act of having any mapping at all.


The "Why?" questions are getting downvoted, but to dissect the why section from the page a little...

"designed to be small and efficient" – adding a layer on top of SQL is necessarily less efficient, adding a layer of indirection on underlying optimisations means it is likely (but not guaranteed) to also generate less efficient queries.

"make developing queries simple" – this seems to be just syntactic preference. The examples are certainly shorter, but in part that's the SQL style used.

I think it either needs more evidence that the syntax is actually better or cases it simplifies, the ways in which it can optimise that are hard in SQL, or it perhaps needs to be more honest about the intent of the project being to just be a different interface for those who prefer it that way.

It's an interesting exercise, and I'm glad it exists in that respect, and hope the author enjoyed making it and learnt something from it. That can be enough of a why!


The main goal was to help security engineers / analysts, who _loathe_ sql (for better or worse).

I tend to think this is a little more user friendly, personally, and it's nice to give some open-source competition to the major languages that are used in security (SPL, Sumologic, KQL, and ES|QL).

We were surprised that there weren't syntactic competitiors (i.e. -- while prql has some similar goals, the syntax and audience in mind were very different)


How's perf for the compiled queries? The first thing I see in the examples is what appears to be a CTE-by-default approach that, in most (all?) engines, means the generated query ultimately runs over an unindexed (and maybe materialized!) intermediary resultset.


The join syntax in particular seems really clunky compared to what it could have been - it's unclear if they support joining on different column names (i.e. fooid = foo.id or even fooid = parentid) which would be really restricting if unsupported. It'd also be nice if they used the USING/ON distinction that SQL has instead of just supporting USING but calling it ON since that's a weird thing to mentally translate.

Pipe-driven SQLes have been interesting in the past but I much prefer when you start with a big-blob of joins to define the data source before getting into the applied operations - the SQL PQL produces looks like it would have really large issues with performance due to forcing certain join orders and limiting how much the query planner can rearrange operations.


> I much prefer when you start with a big-blob of joins to define the data source before getting into the applied operations

That's essentially the model we've chosen for XTQL, with the addition of a logic var unification scope for even more concise joins: https://docs.xtdb.com/intro/what-is-xtql#unify

Also, anyone interested in this post-SQL space would probably enjoy this recent paper: https://www.cidrdb.org/cidr2024/papers/p48-neumann.pdf


Higher level languages often have opportunities for additional optimisation over the straightforward implementation in the target language. This is because the semantics of the high-level language offer guarantees that the target doesn't, or because the intent is more clearly preserved in the source language.

Whether or not this is true for PQL/SQL, I don't know enough to say. But I do know that I don't write SQL at a high-enough level to be sure that a wrapper couldn't compile to something more efficient than what I produce, especially for complicated queries.


When dealing with SQL, these higher level languages (which are converted to sql, such as FetchXML and Pql) are traditionally quite limited by what use cases they are suitable for. They are not suitable for complex joins and grouping particularly across large tables and resultsets for example. Not particularly suitable for joining remote data sources particularly of large result sets.

They are suitable for Reads on tables that are tuned specifically for the desired use cases. The amount of optimization required in the conversion should be limited intrinsicly by this assumption and the higher level language should be strict enough that an optimization related to deciding A/B SQL Approach in the conversion to SQL is not required (because results will be unpredictable due to table sizes etc)


SQL also has a lot of edge cases where, because half the syntax got bolted on after the language was initially finalized, a conceptually simple request can turn into a multi-level mess of a query.


> adding a layer on top of SQL is necessarily less efficient

It's not if it's compile time


Adding a compile step to SQL is more efficient than not adding a compile step to SQL. Whether it's workflow efficiency, simpler tooling, or actual runtime cost, a process that goes straight to SQL is necessarily more efficient than one that goes via something else and ends up with SQL.

That cost may be offset by other benefits, but that isn't obviously true in this case.


This seems really cool. This is not meant to be a negative comment - why does it matter that it's written in Go? It's stated multiple times but could this be written in multiple other languages and still be functionally the same?


[flagged]


[flagged]


mmmm, muffins


[flagged]


Agreed! Azure’s best product isn’t CosmosDB, AKS, or their AI cognitive services - it’s Kusto! Sadly no one knows about it.

If you’re interested, you can try it here: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/...


There is also the Kusto Detective Agency site where you learn by pretending to be a detective investigating leads using the the data. You also get a little certificate at the end. https://detective.kusto.io/


Yes, me too!! I think the most natural way to think about more complex data manipulations is as a series of functions.

I think there's a lot love for SQL, and people can tend to get a little defensive around new query languages. But I think having a better suited query languages to is often a much nicer solution than ORMs.


[flagged]


Tcl wants its namespace back - it predates all these. Additionally there is PECL (again, ancient) but the popularity of that has been waning for quite some time with composer being the main package management system for PHP now.


Tcl and PECL are ancient and don't have the community described above. Neither have serious name collisions that I know of. Syntax highlighting is going to be hard when you name things this way. Documentation is going to be hard to find.

I'm just saying: why do this to your users? It's a name. It is one of those things you have complete and total creative control over.


Maybe it's meant to be pronounced 'pea-qul'


Piping the results of each query into text is definitely NOT efficient




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: