Timeouts on calls are, as the OP mentions, a thing in Erlang. Inter-process and inter-computer calls in QNX can optionally time out, and this includes all system calls that can block. Real-time programs use such features. Probably don't want it on more than that. It's like having exceptions raised in things you thought worked.
- Capabilities
They've been tried at the hardware level, and IBM used them in the System/38, but they never caught on. They're not really compatible with C's flat memory model, which is partly they fell out of fashion.
Capabilities mean having multiple types of memory. Might come back if partially-shared multiprocessors make a comeback.
- Production-Level Releases
That's kind of vague. Semantic versioning is a related concept. It's more of a tooling thing than a language thing.
- Semi-Dynamic Language
I once proposed this for Python. The idea was that, at some point, the program made a call that told the system "Done initializing". After that point, you couldn't load more code, and some other things that inhibit optimization would be prohibited. At that point, the JIT compiler runs, once. No need for the horrors inside PyPy which deal with cleanup when someone patches one module from another.
Guido didn't like it.
- Value Database
The OP has a good criticism of why this is a bad idea. It's an old idea, mostly from LISP land, where early systems saved the whole LISP environment state.
Source control? What's that?
- A Truly Relational Language
Well, in Python, almost everything is a key/value store. The NoSQL people were going in that direction. Then people remembered that you want atomic transactions to keep the database from turning to junk, and mostly backed off from NoSQL where the data matters long-term.
- A Language To Encourage Modular Monoliths
Hm. Needs further development. Yes, we still have trouble putting parts together.
There's been real progress. Nobody has to keep rewriting Vol. I of Knuth algorithms in each new project any more. But what's being proposed here?
- Modular Linting
That's mostly a hack for when the original language design was botched.
View this from the point of the maintenance programmer - what guarantees apply to this code? What's been prevented from happening? Rust has one linter, and you can add directives in the code which allow exceptions. This allows future maintenance programmers to see what is being allowed.
I read it, and to me it seems like they're worried about the wrong things. As I understand it, they're worried about the difficulty and hassle of calling unrelated code in the monolith, and proposing things that would make it easier. But that's wrongheaded. Monoliths don't suffer because it's too hard to reuse functionality. They suffer because it's too easy. Programmers create connections and dependencies that shouldn't exist, and the monolith starts to strangle itself (if you have good tests) or shake itself to pieces (if you don't) because of unnecessary coupling. You need mechanisms that enforce modularity, that force programmers to reuse code at designated, designed module interfaces, not mechanisms that make it easier to call arbitrary code elsewhere in the monolith.
In my opinion, a great deal of the success of microservices is due to the physical impossibility of bypassing a network API and calling the code of another service directly. Because of this, programmers respect the importance of designing and evolving APIs in microservices. Essentially, microservices enforce modularity, forcing programmers to carefully design and evolve the API to their code, and this is such a powerful force for good design that it makes microservices appealing even when their architectural aspects aren't helpful.
A language that made it possible to enforce modularity in a monolith as effectively as it is enforced in microservices would make monoliths a no-brainer when you don't need the architectural aspects of microservices.
Not really. Look at any open source Java library, and you'll see that unless it's small enough to implement in a single package, it probably contains public classes that aren't meant to be part of the public API but are declared public so they can be used in other packages in the library codebase.
That's why they introduced a module system for Java in Java 9. It sounded pretty cool to me when they announced it, but it was a bit too late to make much difference in the Java ecosystem (Java 9 came out 21 years after Java 1) and I haven't heard much about it since then.
The module system is used extensively by the core java ecosystem (e.g. the jlink tool will create lean "JREs" based on which modules are actually used), and is getting more and more common in projects that are using a relatively modern Java version.
Besides, it makes sense to encapsulate at different scopes - a class, a package, a module, a library/application. These are "abstractions" on different levels, semantics, constraints.
> Capabilities ... They're not really compatible with C's flat memory model ... Capabilities mean having multiple types of memory
C is not really dependent on a flat memory model - instead, it models memory allocations as separate "objects" (quite reniniscent of "object orientation in hardware" which is yet another name for capabilities), and a pointer to "object" A cannot be offset to point into some distinct "object" B.
> A Truly Relational Language
This is broadly speaking how PROLOG and other logic-programming languages work. The foundational operation in such languages is a knowledge-base query, and "relations" are the unifying concept as opposed to functions with predefined inputs and outputs.
(This is one of those times where the C memory model as described in the spec is very different from the mental PDP-11 that C programmers actually use to reason about)
> In general, while I can’t control how people react to this list, should this end up on, say, Hacker News, I’m looking more for replies of the form “that’s interesting and it makes me think of this other interesting idea” and less “that’s stupid and could never work because X, Y, and Z so everyone stop talking about new ideas” or “why hasn’t jerf heard of this other obscure language that tried that 30 years ago”. (Because, again, of course I don’t know everything that has been tried.)
- Everything except C now has standard strings, not just arrays of characters. Almost all languages now have some standard way to do key/value sets. What else ought to be standard?
-- Arrays of more than one dimension would be helpful for numerical work. Most languages descended from C lack this. They only have arrays of arrays. Even Rust lacks it. Proposals run into bikeshedding - some people want rectangular slices out of arrays, which means carrying stride info around.
-- Standard types for 2, 3 and 4-element vectors would help in graphics work. There are too many different implementations of those in most language and too much conversion.
Things to think about:
- Rust's ownership restrictions are harsh. Can we keep the safety and do more?
-- The back-reference problem needs to be solved somehow. Back references can be done with Rc and Weak, but it's clunky.
-- Can what Rust does with Rc, RefCell, and .borrow() be checked at compile time? That allows eliminating the run-time check, and provides assurance that the run-time check won't fail. Something has to look at the entire call tree at compile time, and sometimes it won't be possible to verify this at compile time. But most of the time, it should be.
-- There's a scheme for ownership where there's one owning reference and N using references. The idea is to verify at compile time that the using references cannot outlive the owning one. Then there's no need for reference counds.
-- Can this be extended to the multi-thread case? There have been academic demos of static deadlock detection, but that doesn't seem to have made it into production languages.
-- A common idiom involves things being owned by handles, but also indexed for lookup by various keys. Dropping the handle drops the object and removes it from the indices. Is that a useful general purpose operation? It's one that gets botched rather often.
-- Should compilers have SAT-solver level proof systems built in?
-- Do programs really have to be in monospaced fonts? (Mesa on the Alto used the Bravo word processor as its text editor. Nobody does that any more.)
-- There's async, there are threads, and there are "green threads", such as Go's "goroutines". Where's that going?
-- Can we have programs which run partly in a CPU and partly in a GPU, compiled together with the appropriate consistency checks, so the data structures and calls must match to compile?
-- How about "big objects?" These are separately built program components which have internal state and some protection from their callers. Microsoft OLE did that, some .dll files do that, and Intel used to have rings of protection and call gates to help with that, hardware features nobody used. But languages never directly supported such objects.
> Everything except C now has standard strings, not just arrays of characters. Almost all languages now have some standard way to do key/value sets. What else ought to be standard?
I think that character strings should not be restricted to (or generally expected to be) Unicode (although using Unicode and other character sets will still be possible).
I also think that key/value lists should allow any or most types as keys, including references to objects. (PostScript allows any type to be used as keys except strings (they are converted to names if you use them as keys) and nulls.)
I think that big integers (which can have a program-specified limited length in programming languages with typed variables) and arrays of fixed-length records (which C already has; JavaScript has typed arrays which is a more limited implementation of this) are also things that would be helpful to include as standard.
> Arrays of more than one dimension would be helpful for numerical work.
I agree with this too; it is a good idea.
> Standard types for 2, 3 and 4-element vectors would help in graphics work.
This is probably helpful, too, although they can be used for stuff other than graphics work as well.
> Do programs really have to be in monospaced fonts?
No, but that is due to how it is displayed and is not normally a feature of the program itself. Many people including myself do use monospace fonts, but this should not usually be required.
> There's async, there are threads, and there are "green threads", such as Go's "goroutines". Where's that going?
I had read about "green threads" and I think that it is a good idea.
> How about "big objects?" These are separately built program components which have internal state and some protection from their callers. Microsoft OLE did that, some .dll files do that, and Intel used to have rings of protection and call gates to help with that, hardware features nobody used. But languages never directly supported such objects.
I also think it is sensible to have components that can be separated from the callers, and that operating system support (and perhaps hardware support) for such thing might be helpful. I would design a programming language for such an operating system that would directly support such objects.
Are C multidimensional arrays guaranteed to be contiguous in memory? In practice they are, but can one iterate through them just by incrementing a pointer which points to the first element without UB?
Yes, but is one allowed to move a pointer inside it as they see fit? On a one-dimensional array, one can iterate through it starting with a pointer pointing to the first element and ending with a pointer pointing one position past the last element (which the user is not allowed to dereference). For multidimensional arrays, the element type is an array too (with a smaller rank than the original one), so one could perform that type of iteration with a pointer to an array. My question is whether a pointer to the underlying scalar type can freely move inside the multidimensional array without UB, since it may have to actually leave the array it was originally part of. If that's not allowed, how could one build slices and other view types?
Yes, you can do that, it's fine as long as you stay within the bounds of the indexes. Under the hood, it's a single contiguous block of memory.
Although at least with 2d arrays I prefer to just use a 1d array and index it with [x * width + y], because one problem with multidimensional arrays in C is they need multiple allocations/frees.
Double indirection arrays with multiple allocations are 25 years obsolete (ok, there are some use cases) but since C99 we prefer to do it like the parent.
In your code link you over allocate memory, sizeof *arr is enough and you need to dereference like with (*arr)[i][j].
You need to dereference it because it is a pointer to an array, if you dereference you get an array. You can also let the first dimensions decay then it looks like:
> Do programs really have to be in monospaced fonts?
Of course not. I've been using a proportional font for at least 10 years and I'm still in business working on code bases shared with developers using monospaced fonts. Both work, none disturb the other, proportional is easier to read as any book can demonstrate. Alignment doesn't matter much.
How do you deal with writing code with multicursors when you have to type the same thing multiple times? With monospace I just ctrl+alt+down a couple times on aligned text and then type. With proportional fonts I don't suppose it's easy to align text exactly, so do you just not use multicursors or is there a solution you came up with that works?
I don't use multicursors much, not every year. I'm using emacs. I register a sequence of keys and apply it multiple times with control e, control e, control e etc.
I’ve been a professional programmer for 17 years and have never used multicursors. I don’t even fathom under what conditions you’d want to. I use Find and Replace.
Everyone programs a little differently. I often use it when e.g. using intrinsics and I want to change types. Find and replace isn't especially helpful when they're different names with substructure you need to modify locally.
Other than comptime, most of Zig ideas were already present in Modula-2, @ all over the place gives a "I miss Objective-C" flavour to it, and only source code ecosystem on a language supposed to be a systems programming language?
> Well, in Python, almost everything is a key/value store.
Why would that be anywhere near an adequate substitute? KV stores are not relational, they don't support relational algebra. KV stores in PLs are common as dirt, so if they were relevant to the question of ending relations in a language I think the author would have noticed.
Many of these things (not only what you describe here but also the linked article) are stuff that I had intended to be available in the built-in command shell (called "Command, Automation, and Query Language", which is meant to describe some of the intentions) of an operating system design, so that they would have support from the operating system.
About capabilities, I think that capabilities should be a feature of the operating system, although hardware support would be helpful. However, I think that it could be done with tagged memory, without necessarily needing multiple types of memory, and programming languages such as C could still be capable of using them (although some things might not work as it would be expected on other computers, e.g. if you try to copy a reference to a capability into a memory area that is expected to be a number and then try to perform arithmetic on that number, the program is likely to crash even if the result is never dereferenced).
However, my idea also involves "proxy capabilities" too, so that you can effectively make up your own capabilities and other programs receive them without necessarily knowing where they came from (this allows supporting many things, including (but not limited to) many of the idea of "divergent desktop" of Arcan).
He proposes that there is a need for a way to connect modules, i.e. dependency injection, without the modules having explicit knowledge of each other, with compile-time verification that the modules being connected are compatible, without the interface song and dance.
> The OP has a good criticism of why this is a bad idea. It's an old idea, mostly from LISP land, where early systems saved the whole LISP environment state. Source control? What's that?
Symbolics Genera can save (incremental and complete) images (-> "Worlds"). The image tracks all the sources loaded into it. The sources/files/docs/... of the software is stored on a central (or local) file server.
I can for example start an initial world and load it with all the wanted software in the various versions I want. Maybe I save a new world from that.
I can also start an pre-loaded world and incrementally update the software: write patches, create new minor/major versions, load patches and updates from the central server, install updates from distributions, ... Maybe save new worlds.
The "System Construction Tool" tracks what code is loaded in what version from where.
> The OP has a good criticism of why this is a bad idea.
They simply assert "twiddling a run-time variable for debugging in your staging environment can propagate straight into a bug on production".
As-if straight into production without re-testing.
> Source control? What's that?
"ENVY/Manager augments this model by providing configuration management and version control facilities. All code is stored in a central database rather than in files associated with a particular image. Developers are continuously connected to this database; therefore changes are immediately visible to all developers."
> Capabilities
>
> Capabilities mean having multiple types of memory. Might come back if partially-shared multiprocessors make a comeback.
I found this description amusing because all modern memory safe languages have capabilities, and they all have multiple types of memory: that's what an object is! Memory safety partitions memory into different types, and object references are capabilities!
What languages do next is where they break the security properties of capabilities: they add "ambient authority" and "rights amplification". Quick primer:
Ambient authority is basically the same problem as globally mutable state. Globally mutable state impedes modular reasoning about code, but if that state also carries authority to do something dangerous in the real world, like launch missiles, then it also impedes modular reasoning about security for the exact same reasons.
Rights amplification is the ability to turn a reference to an object with little to no authority, into a reference to an object with more authority. File.Open is the quintessential example, where you can turn an immutable string that conveys no authority, into a file handle to your root file system!
File.Open is also typically accessible to all code, meaning it's also ambient authority. This classical file API is completely bonkers from a security perspective.
So we already have capabilities, what we really need to do is to stop adding all of this insecurity! The developers of the E language actually showed that this could be done by making a capability secure subset of Java called Joe-E, which removed the parts of Java and the standard library that exposed ambient authority or rights amplification patterns. Most Java code could run unmodified.
And as for whether capability security will ever be adopted elsewhere, it already has been! WASM's WASI has capability security in its core design, because capability security is exactly what you need for good isolation and virtualization, which are the core properties WASM needs.
I think squeak had Monticello for source control with their image based approach almost
20+ years ago and there was something else for smalltalk in the '80s too.
But yeah people like text and hate images, and I believe Pharo switched back to some git integration.
"ENVY/Manager augments this model by providing configuration management and version control facilities. All code is stored in a central database rather than in files associated with a particular image. Developers are continuously connected to this database; therefore changes are immediately visible to all developers."
Smalltalk implementations have had text export/import for ages, and image based source control as you point out, is also quite old, Monticello wasn't the first.
I agree about relational languages. It's absurd when I think that SQL and Datalog came from the same foundations of relational calculus. It's just so much lost expressive power.
I really like what PRQL [1] did, at least it makes table operations easily chainable. Another one that comes to mind is Datomic [2].
I was struggling with doing interesting things with the semantic web circa 2007 and was thinking "OWL sucks" and looking at Datalog as an alternative. At that time Datalog was an obscure topic and hard to find information about it. 10 years later it was big.
(Funny after years of searching I found somebody who taught me how to do really complex modelling in OWL DL but from reading the literature I'm pretty sure the average PhD or prof in the field has no idea.)
I wrote up what I learned an a technical report that got sent to the editors at ISO a month or so ago and ought to appear pretty soon. Look up my profile and send me a note.
I have spent a lot of time trying to understand how we ended up with SQL. Best I can determine, we got SQL because it isn't relational, it is tablational. Tables are a lot easier than relations to understand for the layman, and they successfully pushed for what they were comfortable with, even if to the chagrin of technical people.
"RM:
What was key to SQL becoming the standard language for relational databases in the mid- 1980s? Was all down to good marketing?
CJD:
In other words, why did SQL became so popular? Especially given all its faults? Well, I think this is rather a sorry story. I said earlier that there has never been a mainstream DBMS product that’s truly relational. So the obvious question is: Why not? And I think a good way for me to answer your questions here is to have a go at answering this latter question in their place, which I’ll do by means of a kind of Q&A dialog. Like this:
Q:
Why has no truly relational DBMS has ever been widely available in the marketplace?
A:
Because SQL gained a stranglehold very early on, and SQL isn’t relational.
Q:
Why does SQL have such a stranglehold?
A:
Because SQL is “the standard language for RDBMSs.”
Q:
Why did the standard endorse SQL as such and not something else-something better?
A:
Because IBM endorsed SQL originally, when it decided to build what became DB2. IBM used to be more of a force in the marketplace than it is today. One effect of that state of affairs was that-in what might be seen as a self-fulfilling prophecy-competitors (most especially Relational Software Inc., which later became Oracle Corp.) simply assumed that SQL was going to become a big deal in the marketplace, and so they jumped on the SQL bandwagon very early on, with the consequence that SQL became a kind of de facto standard anyway.
Q:
Why did DB2 support SQL?
A:
Because (a) IBM Research had running code for an SQL prototype called System R and (b) the people in IBM management who made the decision to use System R as a basis on which to build DB2 didn’t understand that there’s all the difference in the world between a running prototype and an industrial strength product. They also, in my opinion, didn’t understand software (they certainly didn’t understand programming languages). They thought they had a bird in the hand.
Q:
Why did the System R prototype support SQL?
A:
My memory might be deficient here, but it’s my recollection that the System R implementers were interested primarily in showing that a relational-or “relational”-DBMS could achieve reasonable performance (recall that “relational will never perform” was a widely held mantra at the time). They weren’t so interested in the form or quality of the user interface. In fact, some of them, at least, freely admitted that they weren’t language designers as such. I’m pretty sure they weren’t all totally committed to SQL specifically. (On the other hand, it’s true that at least one of the original SQL language designers was a key player in the System R team.)
Q:
Why didn’t “the true relational fan club” in IBM-Ted and yourself in particular-make more fuss about SQL’s deficiencies at the time, when the DB2 decision was made?
A:
We did make some fuss but not enough. The fact is, we were so relieved that IBM had finally agreed to build a relational-or would-be relational-product that we didn’t want to rock the boat too much. At the same time, I have to say too that we didn’t realize how truly awful SQL was or would turn out to be (note that it’s much worse now than it was then, though it was pretty bad right from the outset). But I’m afraid I have to agree, somewhat, with the criticism that’s implicit in the question; that is, I think I have to admit that the present mess is partly my fault."
> Why has no truly relational DBMS has ever been widely available in the marketplace?
Postgres was "truly relational" for a significant portion of its life before finally losing the battle with the SQL virus. There is probably no DMBS more widely available. Granted, it wasn't widely used until the SQL transition.
> SQL isn’t relational.
This is key. Relations are too complicated for the layman, the one who is paying for it, to understand. Tables are more in tune to what is familiar to them. The hardcore math/software nerds might prefer relationality, but they aren't the ones negotiating multi-million dollar contracts with Oracle/IBM.
I remember when Postgres moved to SQL. People started billing it as being Oracle, but free. That got non-technical manager attention. Without that marking success appealing to the layman, I expect nobody would be using it today.
I am also agreeing that relational approach to in-memory data is a good, efffective thought.
I recently compiled some of my C code with the sqlite database and I'm preparing to think how the SQL model of my standard code could be used as the actual implementation language of in memory operations.
Instead of writing the hundredth loop through objects I just write a SQL query instead with joining with seeing the internal data representation of the software as an information system instead of bespoke code.
I was hoping to make it possible to handle batches of data and add parallelism because arrays are useful when you want to parallelise.
I was thinking, wouldn't it be good if you could write your SQL queries in advance of the software and then parse them and then compile them to C code (using an unrolled loop of the SQLite VM) so they're performant. (For example, instead of a btree for a regular system operation, you can just use a materialised array a bit like a filesystem so you're not rejoining the same data all the time)
I was thinking of ways of representing actors somehow communicating by tables but I do not have anything concrete for that.
DataDraw is an ultra-fast persistent database for high performance programs written in C. It's so fast that many programs keep all their data in a DataDraw database, even while being manipulated in inner loops of compute intensive applications. Unlike slow SQL databases, DataDraw databases are compiled, and directly link into your C programs. DataDraw databases are resident in memory, making data manipulation even faster than if they were stored in native C data structures (really). Further, they can automatically support infinite undo/redo, greatly simplifying many applications.
For anyone happy enough to consider dealing with the JVM instead of C, and Clojure instead of SQL, I think this CINQ project can deliver on much of what you're looking for here: https://github.com/wotbrew/cinq
> I just write a SQL query instead with joining with seeing the internal data representation of the software as an information system instead of bespoke code
This sounds very similar to how CINQ's macro-based implementation performs relational optimizations on top of regular looking Clojure code (whilst sticking to using a single language for everything).
For semi-dynamic language, Julia definitely took the approach of being a dynamic language that can be (and is) JITed to excellent machine code. I personally have some larger projects that do a lot of staged programming and even runtime compilation of user-provided logic using Julia. Obviously the JIT is slower to complete than running a bit of Lua or whatever, but the speed after that is phenomenal and there’s no overhead when you run the same code a second time. It’s pretty great and I’d love to see more of that ability in other languages!
Some of the other points resonate with me. I think sensible dynamic scoping would be an easy way to do dependency injection. Together with something like linear types you could do capabilities pretty smoothly, I think. No real reason why you couldn’t experiment with some persistent storage as one of these dependencies, either. Together with a good JIT story would make for a good, modular environment.
Oh and Zig is another option for allowing injections that are checked when used at a call site rather than predefined through interfaces.
AFAIK it doesn’t have closures (it’s too C-like) so you need to use methods for all your (implicit) interfaces, but that’s okay…
I think the “exemplars” could be automatically yoinked from documentation and tests and existing usage of the function in the code base. Work needs to be done on the IDE front to make this accessible to the user.
Julia is kind of Dylan's revenge, even if it doesn't take over the whole world, it is already great if it gets its own corner, and from the looks of it that is going alright.
You might be interesting in looking at the Lima programming language: http://btetrud.com/Lima/Lima-Documentation.html . It has ideas that cover some of these things. For example, it's intended to operate with fully automatic optimization. This assumption allows shedding lots of complexity that arises from needing to do the same logical thing in multiple ways that differ in their physical efficiency characteristics. Like instead of having 1000 different tree classes, you have 1 and optimisers can then look at your code and decide what available tree structures make most sense in each place. Related to your async functions idea, it does provide some convenient ways of handling these things. While functions are just normal functions, it has a very easy way to make a block of async (using "thread") and provides means of capturing async errors that result from that.
I'm surprised these are called "programming language ideas". They seem to be solvable, at least many of them, with libraries. For example, my Haskell effect system Bluefin can be seen as a capability system for Haskell. My database library Opaleye is basically a relational query language for Haskell. Maybe I'm short-sighted but I haven't seen the need for a whole new language to support any of that functionality. In fact one gets huge benefits from implementing such things in an existing language.
One advantage (which is touched on in the logging section) is that having it provided by the language makes it clear what the default is, and sets expectations. Essentially, lifting it into the language is a way of coordinating the community.
> Smalltalk and another esoteric programming environment I used for a while called Frontier had an idea of a persistent data store environment. Basically, you could set global.x = 1, shut your program down, and start it up again, and it would still be there.
Frontier! I played with that way back when on the Mac. Fun times.
But as for programming language with integrated database... MUMPS! Basically a whole language and environment (and, in the beginning, operating system) built around a built-in global database. Any variable name prefixed with ^ is global and persistent, with a sparse multi-dimensional array structure to be able to organize and access the variables (e.g. ^PEOPLE(45,"firstname") could be "Matthew" for the first name of person ID 45). Lives on today in a commercial implementation from Intersystems, and a couple Free Software implementations (Reference Standard M, GT.M, and the GT.M fork YottaDB). The seamless global storage is really nice, but the language itself is truly awful.
TADS, an OOP language + VM for interactive fiction, has this "value database" model. Once loaded into memory, the compiled image can be updated with values stored in a separate save file. The compiled image itself could store updated values as well.
In fact, it does this during a "preinit" stage that runs immediately after compilation. Once all preinit code finishes executing, the compiled image is overwritten with the updated state. The language includes a "transient" keyword to permit creating objects that should not be stored.
This same mechanism permits in-memory snapshots, which are used for the game's UNDO feature. No need to rewind or memento-ize operations, just return to a previous state.
It's not a general-purpose mechanism. After all, the language is for building games with multiple player-chosen save files, and to permit restarting the game from a known Turn 0 state.
The MUMPS database is wild. When I was working in MUMPS, it was so easy and fun to whip up an internal tool to share with my coworkers. You don't have to give any special thought at all to persistence, so you're able to stay in the flow of thinking about your business logic.
But as you said, the language itself is almost unbearable to use.
Image persistence was one of the cool ideas of Smalltalk. And in practice, one of the biggest drawbacks. Cruft and old values accumulated steadily, with very little way to find and eliminate them. Transient execution has some cons. But on the pro side, every run starts from a "clean slate."
Save image... for short-term convenience; build clean every week from archived text files.
----
1984 "Smalltalk-80 The Interactive Programming Environment" page 500
"At the outset of a project involving two or more programmers: Do assign a member of the team to be the version manager. … The responsibilities of the version manager consist of collecting and cataloging code files submitted by all members of the team, periodically building a new system image incorporating all submitted code files, and releasing the image for use by the team. The version manager stores the current release and all code files for that release in a central place, allowing team members read access, and disallowing write access for anyone except the version manager."
I believe it's just a git repo behind the scenes. Not sure if the UI exposes those things as I never used that in multi-developer scenarios!
Give it a go and see.
This may fall in the "you think you do, but you don't category", but I've always wanted a Smalltalk (or similar, not that picky) with a persistent virtual memory.
That is, the VM is mapped to a backing file, changes persisted automatically, no "saving", limited by drive space (which, nowadays, is a lot). But nowadays we also have vast memory space to act as a page cache and working memory.
My contrived fantasy use case was having a simple array name "mail", which an array containing all of my email messages (in email object, of course). Naturally as you get more mail, the array gets longer. Also, as you delete mail, then the array shifts. It's no different, roughly, than the classic mbox format, save it's not just text, its objects.
You can see if you delete a email, from a large (several GBs), there would be a lot of churn. That implies maybe it's not a great idea to use that data structure, but that's not the point. You CAN use that data structure if you like (just like you can use mbox if you like).
Were it to be indexed, that would be done with parallel data structures (trees or hashes or whatever).
But this is all done automagically. Just tweaks to pages in working memory backed by the disk using the virtual memory manager. Lots and lot of potential swapping. C'est la vie, no different from anything else. This what happens when you map 4TB into a 16GB work space.
The problem with such a system, is how fragile is potentially is. Corrupt something and it happily persists that corruption, wrecking the system. You can't reboot to fix it.
Smalltalk suffers from that today. Corrupt the image (oops, did I delete the Object become: method again?), and its gone for good. This is mitigated by having backup images, and the changelist to try to bring you back to the brink but no further.
I'm guessing a way to do that in this system is to use a copy on write facility. Essentially, snapshot the persistent store on each boot (or whatever), and present a list of previous snapshot at start up.
Given the structure of a ST VM you'd like to think this is not that dreadful to work up. I'd like to think a paper napkin implementation PoC would be possible, just to see what it's like. One of those things were the performance isn't really that great, but the modern systems are so fast, we don't really notice it in human terms.
> oops, did I delete the Object become: method again?), and its gone for good.
And then you admit actually it's not gone for good because if you created the method that will be recorded in the changes.log file and if it was a provided method that will still be in the provided sources file.
Have you looked at Pharo? Their git integration makes it relatively easy to export and backup parts of your main image, and to pull the things back into a fresher one once you mess up.
Interesting that E is cited under “capabilities”, but not under “loosen up the functions”. E’s eventual-send RPC model is interesting in a number of ways. If the receiver is local then it works a bit like a JavaScript callback in that there’s an event loop driving execution; if it’s remote then E has a clever “promise pipelining” mechanism that can hide latency. However E didn’t do anything memorable (to me at least!) about handling failure, which was the main point of that heading.
For “capabilities” and “A Language To Encourage Modular Monoliths”, I like the idea of a capability-secure module system. Something like ML’s signatures and functors, but modules can’t import, they only get access to the arguments passed into a functor. Everything is dependency injection. The build system determines which modules are compiled with which dependencies (which functors are passed which arguments).
An existing “semi-dynamic language” is CLOS, the Common Lisp object system. Its metaobject protocol is designed so that there are clear points when defining or altering parts of the object system (classes, methods, etc.) at which the result is compiled, so you know when you pay for being dynamic. It’s an interesting pre-Self design that doesn’t rely on JITs.
WRT “value database”, a friend of mine used to work for a company that had a Lisp-ish image-based geospatial language. They were trying to modernise its foundations by porting to the JVM. He had horror stories about their language’s golden image having primitives whose implementation didn’t correspond to the source, because of decades of mutate-in-place development.
The most common example of the “value database” or image-based style of development is in fact your bog standard SQL database: DDL and stored procedures are very much mutate-in-place development. We avoid the downsides by carefully managing migrations, and most people prefer not to put lots of cleverness into the database. The impedance mismatch between database development by mutate-in-place and non-database development by rebuild and restart is a horribly longstanding problem.
As for “a truly relational language”, at least part of what they want is R style data frames.
- I like the idea of a multiparadigm programming language (many exists) but where you can write part of the code in a different language, not trying to embed everything in the same syntax. I think in this way you can write code and express your ideas differently.
- A [social] programming language where some variables and workflows are shared between users [1][2].
- A superreflective programming language inspired by Python, Ruby, and others where you can override practically everything to behave different. For example, in Python you can override a function call for an object but not for the base system, globals() dict cannot be overriden. See [3]. In this way you save a lot of time writing a parser and the language basic logic.
- A declarative language to stop reinventing the wheel: "I need a website with a secure login/logout/forgot_your_password_etc, choose a random() template". It doesn't need to be in natural language though.
Egont sounds a bit like SQL, no? A social way to share data and work with it ... a shared RDBMS where everyone has a user account and can create tables/share them with other users, built in security, etc. Splat a GUI on top and you have something similar.
Modern web frameworks are getting pretty declarative. If you want a basic web app with a log in/out page that's not hard to do. I'm more familiar with Micronaut than Spring but you'd just add:
micronaut.security.authentication=cookie
and the relevant dependencies. Now you write a class that checks the username/password, or use LDAP, or configure OAuth and the /login URL takes a POST of username/password. Write a bit of HTML that looks good for your website and you're done.
> Egont sounds a bit like SQL, no? A social way to share data and work with it ... a shared RDBMS where everyone has a user account and can create tables/share them with other users, built in security, etc. Splat a GUI on top and you have something similar.
Yes, SQL or a global spreadsheet. I would say that it is like SQL plus a DAG or, we can imagine an aggregation of SQLs. The interesting thing is that parts of the global system are only recalculated if there is a change, like in a spreadsheet.
> a shared RDBMS where everyone has a user account and can create tables/share them with other users, built in security, etc. Splat a GUI on top and you have something similar.
We need a little bit more but not much more: security by namespaces and/or rows so the same database is shared but you can restrict who change what: your "rows" are yours. I think something like OrbitDB but with namespaces will be cool.
> Modern web frameworks are getting pretty declarative.
Yes but my proposal was at a higher level. I don't want to know what a cookie is when I just want to create a website. I am not saying that you can create complex components with this idea but you can create common use cases.
I love these ideas! I've been thinking about the "fully relational" language ever since I worked with some product folks and marketers at my start up 15 years ago who "couldn't code" but were wizards at cooking up SQL queries to answer questions about what was going on with our users and product. There was a language written in rust, Tablam[0] that I followed for a while, which seemed to espouse those ideas, but it seems like it's not being owrked on anymore. And Jamie from Scattered Thoughts[1] has posted some interesting articles in that direction as well. He used to work on the old YC-company/product LightTable or Eve or something, which was in the same space.
I've also always thought Joe Armstrong's (RIP) thought of "why do we need modules" is really interesting, too. There's a language I've seen posted on HN here a couple times that seems to go in that approach, with functions named by their normalized hash contents, and referred to anywhere by that, but I can't seem to remember what it's called right now. Something like "Universe" I think?
> with functions named by their normalized hash contents, and referred to anywhere by that, but I can't seem to remember what it's called right now. Something like "Universe" I think?
The issue with the while is that more often than not you need to do some preparations before the condition. So you need to move that to a function, or duplicate it before and inside the loop. Do-while doesn't help, since with that you can't do anything after the condition.
The alternative is a while(true) with a condition in the middle.
while(true){
prepare;
if(!check) break;
process
}
But what if there was a language construct for this? Something like
do{prepare}while(condition){process}
Is there a language that implements this somehow? (I'm sure there is, but I know no one)
The best thing is that this construct can be optimized in assembly perfectly:
...
jump-always > start
after:
process
start:
prepare
condition
branch-if-true > after
...
Evaluates foo, then bar(s), and returns the result of evaluating foo and discards the results of bar(s).
Useful if `foo` is the condition and you need to perform some change to it immediately after, eg:
(while (prog1 (< next prev) (setq prev next)) ...)
---
(prog2 foo bar baz*)
Evaluates foo, then bar, then baz(s) (if present), returns the result of evaluating bar and discards the results of evaluating foo and baz(s).
Might be what GP wants. `foo` is the preparation, `bar` is the condition`, and `baz` can be some post-condition mutation on the compared value. Not too dissimilar to
for (pre, cond, post) {}
With `prog2` you could achieve similar behavior with no built in `for`:
(while (prog2 pre cond post) ...)
---
(progn foo*)
Evaluate each foo in order, return the result of evaluating the last element of foo and discard all the others.
`progn` is similar to repeated uses of the comma operator in C, which GP has possibly overlooked as one solution.
Learning to embrace the LOOP construct has been an experience, for me. Same with the FORMAT abilities. It is amazing how much hate they both get, for how capable they both are.
C-style for-loop is kinda sorta this. Although the "prepare" part has to be an expression rather than a statement, given that you have the comma operator and ?: you can do a lot there even in C. In C++, you can always stick a statement in expression context by using a lambda. So:
for ([]{
/*prepare*/
}(); /*condition*/;) {
/*body*/
}
However, the most interesting take on loops that I've seen is in Sather, where they are implemented on top of what are, essentially, coroutines, with some special facilities that make it possible to exactly replicate the semantics of the usual `while`, `break` etc in this way: https://www.gnu.org/software/sather/docs-1.2/tutorial/iterat...
In some way it's the dual of break, in that you want to jump into the middle of the loop, while break is to jump out of it.
Let's rewrite the loop this way, with 'break' expanded to 'goto':
while (true) {
prepare...
if (!cond) goto exitpoint;
process...
}
exitpoint:
The dual would be:
goto entrypoint;
do {
process...
entrypoint:
prepare...
} while(cond);
Both constructs need two points: where the jump begins and where it lands. The 'break' is syntactic sugar that removes the need to specify the label 'exitpoint'. In fact with 'break' the starting point is explicit, it's where the 'break' is, and the landing point is implicit, after the closing '}'.
If we want to add the same kind of syntactic sugar for the jump-in case, the landing point must be explicit (no way for the compiler to guess it), so the only one we can make implicit is the starting point, that is where the 'do' is.
So we need: a new statement, let's call it 'entry', that is the dual of 'break' and a new semantic of 'do' to not start the loop at the opening '{' but at 'entry'.
do {
process...
entry;
prepare...
} while (cond);
Is it more readable than today's syntax? I don't know...
do {
Get(Current_Character);
if (Current_Character == '*') break;
print(Current_Character);
} while (true);
I don't see why this needs a new construct in languages that don't already have it. It's just syntactic sugar that doesn't actually save any work. The one with the specialized construct isn't really any shorter and looks pretty much the same. Both have exactly one line in the middle denoting the split. And both lines look really similar anyway.
Well, I am in a process of making a language where general loops will look like
loop
prepare;
while check;
process;
end;
I also think you'd enjoy Knuth's article "Structured Programming with go to Statements" [0]. It's the article that gave us the "premature optimization is the root of all evil" quote but it's probably the least interesting part of it. Go read it, it has a several sections that discuss looping constructs and possible ways to express it.
do {
let value = prepare();
} while (value.is_valid) {
process(value);
}
Can the second block of the do-while see `value` in its lexical scope? If yes, you have this weird double brace scope thing. And if no, most non-trivial uses will be forced to fall back to `if (...) break;` anyway, and that's already clear enough imo.
The scope should be unique, yes. In your example value should be visible.
Your are right about the word double braces, but I can't think of an alternate syntax other than just removing the braces around the while. But in that case it may seem odd to have a keyword that can only be used inside a specific block...wich is basically a macro for a if(.)break;
Maybe I'm too used to the c/java syntax, maybe with a different way of defining blocks?
That seems more like a programmer expectations issue than something fundamental. Essentially, you have "do (call some function that returns a chunk of state) while (predicate that evaluates the state) ..."
Hard to express without primitives to indicate that, maybe.
Yeah, `while...else` in Python does the wrong thing. Executes `else` block when the loop finished normally (not through `break`).
Scala for example has a `breakable {}` block that lets you indicate where you should land after a `break`
breakable {
while condition {
// loop body
if otherCondition then break;
// rest of the body
}
// body of pythonic else block
} // you land here if you break
However I have no idea how to implement the kind of `else` I described in any language without checking the condition twice.
You mean like a shell's while-do-done? It's just about allowing statements as the conditions, rather than just a single expression. Here's an example from a repl I wrote:
repl_prompt="${repl_prompt-repl$ }"
while
printf "%s" "$repl_prompt"
read -r line
do
eval "$line"
done
echo
The `printf` is your `prepare`.
This should also be doable in languages where statements are expressions, like Ruby, Lisp, etc.
Here's a similar Ruby repl:
while (
print "repl> "
line = gets
) do
result = eval line
puts result.inspect
end
puts
Exactly, here you are basically keeping it as a while with a condition but allowing it to be any code that at the end returns a boolean, although you need to make sure that variables defined in that block can be used in the do part.
Sidenote: I wasn't aware that shell allows for multiple lines, good to know!
PowerShell can process 0..n input objects from the pipeline using BEGIN {...} PROCESS {...} END {...} blocks.
I find this so incredibly useful, that I miss it from other languages.
Something related that I've noticed with OO languages such as Java is that it tends to result in "ceremony" getting repeated n-times for processing n objects. a well-designed begin-process-end syntax for function calls over iterables would be amazing. This could apply to DB connection creation, security access checks, logging, etc...
This runs all 3 in order every iteration but quits if condition evaluates to false. It just uses the fact that value of a block is the value of the last expression in the block.
Scala has a lot of syntax goodies although some stuff is exotic. For example to have a 'break' you need to import it and indicate where from exactly you want to break out of.
I don't write a lot of while loops so this is just a bit unfamiliar to me, but I'm not really understanding how this isn't the same as `do{block}while(condition);`? Could you give a simple example of what kind of work `prepare` is doing?
Think of a producer (a method that returns data each time you request one, like reading a file line by line or extracting the top of a queue for example) that you need to parse and process until you find a special element that means "stop".
I'm aware this example can be trivially replaced with a while(data=parse(producer.get())){process(data)} but you are forced to have a method, and if you need both the raw and parsed data at the same time, either you mix them into a wrapper or you need to somehow transfer two variables at the same time from the parse>condition>process
A do-while here also has the same issue, but in this case after you check the condition you can't do any processing afterwards (unless you move the check and process into a single check_and_process method...which you can totally do but again the idea is to not require it)
I am interested in the "No Hidden I/O" section. Can you store, say, the system value (?) to a list? Because I'm afraid this is not enough to properly track side-effects.
Unfortunately I think capabilities/an effect system is required for that (so that any point that actually uses a side effect, taints its whole call tree - the same way async/checked exceptions do, both of which are specific Effects.
Yes, you can store the system value (or indeed any other object capability) in a list.
If a function takes in a List[NodeSystem] parameter, then the function can perform side effects through the methods on NodeSystem.
However, if a function takes in a List[T] where T is a type parameter, then that is not enough to perform side effects, even if that function is later called with a List[NodeSystem] argument. There is no way for the function to access the methods of the concrete type of T.
Hence, you can tell whether or not a function may have side effects from the function signature alone.
I think the coloured function problem boils down to the fact that async functions are not naturally a specific kind of sync function, but the other way around.
Functions are so ubiquitous we forget what they really are: a type of guarantee about the conditions under which the code within will run. Those guarantees include the availability of arguments and a place to put the return value (on the stack).
One of the key guarantees about sync functions is the call structure: one thread of execution will be in one function and one function only at any point during the program; the function will only be exited on return (or exception, or panic) or call of another function; and all the local data will be available only for the duration of that function call.
From that perspective, async functions are a _weakening_ of the procedural paradigm where it is possible to "leave behind" an instruction pointer and stack frame to be picked up again later. The ability to suspend execution isn't an additional feature, it's a missing guarantee: a generalisation.
There is always an interplay between expressiveness and guarantees in programming languages. Sometimes, it is worth removing a guarantee to create greater expressiveness. This is just an example of that.
I mentioned exceptions earlier — it's no wonder that exceptions and async both get naturally modelled in the same way (be it with monads or algebraic effects or whatever). They are both examples of weakening of procedural guarantees. Exceptions weaken the guarantee that control flow won't exit a function until it returns.
I think the practical ramifications of this are that languages that want async should be thinking about synchronous functions as a special case of suspendable functions — specifically the ones that don't suspend.
As a counterpoint, I can imagine a lot of implementation complexities. Hardware is geared towards the classical procedural paradigm, which provides an implementation foundation for synchronous procedures. The lack of that for async can partially explain why language authors often don't provide a single async runtime, but have this filled in by libraries (I'm thinking of Rust and Kotlin here).
The meaningful distinction seems to me seems to be the temporal guarantees of execution, not the call structure. Imagine a sequence of async functions that are sleep sorted to execute 1 day apart. A sufficiently smart compiler could compile those to the sync equivalent because it can see the ordering. Similarly, imagine an async runtime that just calls everything synchronously. I've BS'd an interview with that one before.
The "sync guarantees" don't really exist either. If you have a(); b(); the compiler may very well reorder them to b(); a(); and give you similar issues as async. It may elide a() entirely (and reclaim the call structures), or the effects of a() might not be visible to b() yet. Synchronous functions also can and do suspend with all the associated issues of async. That comes up frequently in cryptography, kernel, and real time code.
My comment was really about language semantics. Compilers should respect the semantics and, for instance, only re-order a(); and b(); if there is no data dependency between them and therefore no consequence to exchanging them. But that's in theory: all abstractions leak.
You can I believe emulate async with call-cc (call with current continuation), but I'm not aware of work in that area and not a lot of non-LISP languages support this kind of continuation.
I really wish more languages would "steal" grammars from raku (formerly Perl 6).
A grammar is basically a class (or role/trait), and they can contain regexes (and regular methods). Those regexes have backtracking control (for simple tokens, you don't want to to try parse a string any other way than the first, obvious match).
This makes it much easier to write composable and understandable parsers.
I know that, technically, you could do that in a library, but somehow that's never the same; if it's not baked into the language, the hurdle to introduce another dependency is always looming, and then if there's more than one such library, the parsers aren't composable across libraries and so on.
We have built something that hits on points 1, 3, 5, and 7 at https://reboot.dev/ ... but in a multi-language framework (supporting Python and TypeScript to start).
The end result is something that looks a lot like distributed, persistent, transactional memory. Rather than explicit interactions with a database, local variable writes to your state are transactionally persisted if a method call succeeds, even across process/machine boundaries. And that benefits point 7, because transactional method calls compose across team/application boundaries.
[1] Loosen Up The Functions
[3] Production-Level Releases
[5] Value Database
[7] A Language To Encourage Modular Monoliths
They are related, for sure. But one of the biggest differences is that operations affecting multiple Reboot states are transactional, unlike Azure's "entity functions".
Because multiple Azure entity functions are not updated transactionally, you are essentially always implementing the saga pattern: you have to worry about cleaning up after yourself in case of failure.
In Reboot, transactional function calls automatically roll back all state changes if they fail, without any extra boilerplate code. Our hypothesis is that that enables a large portion of an application to skip worrying about failure entirely.
Code that has side-effects impacting the outside world can be isolated using our workflow mechanism (effectively durable execution), which can themselves be encapsulated inside of libraries and composed. But we don't think that that is the default mode that developers should be operating in.
> Code that has side-effects impacting the outside world can be isolated using our workflow mechanism (effectively durable execution
Sounds very interesting!
I have been thinking about something like this for a new PL, and many kinds of side effect can actually be reversed, as if it never happened.
I have also read that exceptions can complicate control flow, disallowing some optimizations - but if they are transactional, then we can just add their reverse to the supposedly already slow error path, and enjoy our performance boost!
Every method in Reboot has a type: reader, writer, transaction, or workflow. Our retry semantics are such that any method can always be retried from the top, but for different reasons:
In readers, no state changes are possible. And in writers and transactions, retry/abort is always safe because no state changes occur until the method completes successfully.
In workflows, retry is always safe, and is in fact required due to the primitives we use to implement durable execution (we will publish more docs on this soon!). The workflow retries durably until it eventually completes, one way or another.
That means that a workflow is always the right spot to execute an external side effect: if a reader/writer/transaction want to execute a side effect, they do so by spawning a task, which is only actually spawned if the method completes successfully. And we do "effect validation" (effectively: running your method twice!) to make it very hard to write a side effect in the wrong place.
> I have also read that exceptions can complicate control flow, disallowing some optimizations - but if they are transactional, then we can just add their reverse to the supposedly already slow error path, and enjoy our performance boost!
Somewhat...! When you write a transaction method in Reboot, code that fails with an exception cannot have had a side effect on the outside world, and all state changes will vanish if the transaction aborts. So there is never any need to clean something up, unless you are using exceptions to implement control flow.
For relational, look into term-rewriting systems which just keep transforming specified relationships into other things. Maude’s rewriting logic and engine could probably be used for relational programming. It’s fast, too.
My wild idea is that I'd like to see a modern "high-level assembler" language that doesn't have a callstack. Just like in the olden days, all functions statically allocate enough space for their locals. Then, combine this with some semi-convenient facility for making sure that local variables for a given function always fit into registers; yes, I admit that I'm strange when I say that I dream of a language that forces me to do manual register allocation. :P But mostly what I want to explore is if it's possible to create a ""modern"" structured programming language that maps cleanly to assembly, and that provides no optimization backend at all, but has enough mechanical sympathy that it still winds up fast enough to be usable.
A useful purpose for such a thing is in certain embedded, hard-real-time, or mission-critical scenarios.
Many such programming environments need strict control over stack sizes to avoid any possibility of stack overflow.
I had a similar notion a few years back, thinking about a somewhat wider range of "scoped guarantees". The compiler would compute things such as the maximum stack usage of a function, and this would "roll up" to call sites automatically. This could also be used to enforce non-usage of certain dangerous features such as locks, global flags, or whatever.
> all functions statically allocate enough space for their locals.
Would you still have distinct activation records per call or forfeit the ability to have reentrant functions and recursion?
That's one of the main reasons to move to dynamic (as in a call stack) allocation of your activation records versus a single static allocation per function.
In this hypothetical language I'm assuming that recursion is unsupported and that if threading is supported at all, then each thread has its own copy of every function's locals (or at least every function that can be called concurrently; structured concurrency might be leveraged to prove that some functions don't need to be reentrant, or maybe you just chuck a mutex in each function prologue and YOLO). However, while enforcing that basic recursion is forbidden isn't too difficult (you make the language statically-typed, all names lexically-scoped, and don't support forward declarations), it does probably(?) mean that you also lose first-class functions and function pointers, although I haven't thought deeply about that.
I think lambdas or function pointers can be possible if they assume all of the scope of where they are called rather than where they are declared, that would prevent them from allowing recursion through forward declarations.
It would be awkward to work with since you'd have to be aware of all of the eventual caller scopes rather than your current local scope when defining it.
I suppose it would be like macros instead of true functions.
Have you thought about what happens if you want to read and parse a file? Do you declare the maximum filesize you want to support and statically allocate that much memory?
I'm not intending to imply that the language I'm describing can't support heap-allocated memory; Rust shows us that it's even possible to do so without having to manually deallocate, if you're okay with a single-ownership discipline (which is a rather simple analysis to implement, as long as you don't also want a borrow checker along for the ride). Instead, this is about trying to make a language that makes it easy to keep locals in registers/cache, rather than relying on the compiler backed to do register allocation and hoping that your CPU can handle all that cache you're thrashing.
No, you have a scoped pointer to dynamically allocated memory; when the scoped pointer is destroyed/cleaned up/released at the end of the function, it releases the allocated memory.
> modern "high-level assembler" language that doesn't have a callstack
PIC16 has a hardware stack of only a few return addresses, and therefore imposes the "all register allocation is static" + "no recursion" that you're asking for.
Why would you like to have this language? Is it about control over the execution? About better ways to personally optimize? Or just intellectual pleasure? Or is it about reliving the olden days of assembly language programming but with a modern conveniences?
I would simply find pleasure in being able to understand basically every level of the stack. For a RISC architecture, it's not too hard to get a grasp on how it works. Likewise for a simple-enough programming language. The problem(?) is that in between these two is an opaque black box--the optimization backend, which I feel I have no hope of understanding. So instead I wonder if it's possible to have a "safe" (safer than C) and "high-level" (more abstractive than C) language that is still useful and semi-performant, and I'm wondering how much ergonomics would need to be sacrificed to get there. It's a thought experiment.
Starlark, a variant of Python, can be thought of as semi dynamic: all mutation in each file happens once, single threaded, and then that file and all its data structures are frozen so downstream files can use it in parallel
A lot of "staged" programs can be thought of as semi dynamic as well, even things like C++ template expansion or Zig comptime: run some logic up front, freeze it, then run the rest of the application later
Google’s build system uses Starlark definition files for this reason. Very easy to write flexible configurations for each project and module but building is of course very parallel.
I've added this feature to QuickJS [1] and it works quite well in Sciter as persistent Storage [2] mechanism, used in Sciter.Notes [3] for example.
let storage = Storage.open(filename);
let persistentData = storage.root;
if( !persistentData ) storage.root = persistentData = {... initial storage structure ...};
Everything written to persistentData will be persistent between runs.
= Semi-Dynamic Language
De-facto we already use similar approach quite a while. In form of GPU shaders or WebAssembly. The solution is not in making script JIT friendly (that is against its nature) but with an option to use native/compiled/loadable modules written in languages that were designed to be compileable from the beginning.
My Sciter, as an embeddable engine, is an example of such environment. Native host application exposes native functions/classes to HTML/CSS/JS engine that implements UI layer of the application. UI is dynamic (fluid,styleable,etc.) by nature while application backend (a.k.a. business logic layer) is more static and linear by nature.
I wanted to comment on that persistent data point that on Apple platforms, you’ll have NSUserDefaults (objc) and UserDefaults.myVar (Swift). It works just like the author wishes. Other than single thread access, there’s no major problems with it. If you design your program in a way that it falls apart if the use of it is not regulated, that’s on you.
> on Apple platforms, you’ll have NSUserDefaults (objc) and UserDefaults.myVar (Swift). It works just like the author wishes.
You could write something similar to NSUserDefaults in any language for any computing environment, but that's not what the author is talking about. He is talking about the same basic concept, yes, but where it exists within in the language itself, not something implemented on top.
NSUserDefaults is a key-value storage. Same as localStorage in browsers/js.
On other side persistence in QuickJS is more than that. Essentially it is a NoSQL DB integrated into the language and its runtime. For example you can write
let uname = root.users[2].firstName;
to access the data using pure language constructs. While with NSUserDefaults you will need to call DB's facade methods like objectForKey("some") and so on.
And also, in QuickJS, Storage is not reading whole DB in memory but fetches/unloads data on demand transparently for the user. You can think about content of DB as about genuine language data structure with root at storage.root
Well OP, are you me? everything you listed is also in my
short wishlist for a programming language (well except for the value database, once to you have first class relational tables in your language, persistence can be tied to the table identity, doesn't need to be implicit).
Capabilities and dynamic scoping for "modularisation" nicely lead to implicit variables instead of truly global dynamically scoped variables. Implicit variables also probably work well to implement effect systems which means well behaved asyncs.
Edit: other features I want:
- easy embedding in other low level languages (c++ specifically)
- conversely, easy to embed functions written in another language (again c++).
- powerful, shell-like, process control system (including process trees and pipelines), including across machines.
- built-in cooperative shared memory concurrency, and preemptive shared nothing.
"Semi-dynamic" is one of the most common architectures there is for large & complex systems. AAA games are usually written in a combination of C++ and a scripting language. GNU Emacs is a Lisp application with a custom interpreter that is optimized for writing a text editor. Python + C is a popular choice as well as Java + Groovy or Clojure, I've even worked with a Lua + FORTRAN system.
I also think "parsers suck". It should be a few hundred lines at most, including the POM file, to add an "unless" statement to the Java compiler. You need to (1) generate a grammar which references the base grammar and adds a single production, (2) create a class in the AST that represents the "unless" statement and (3) add an transformation that rewrites
unless(X) {...} -> if(!X) {...}
You should be able to mash up a SQL grammar and the Java grammar so you can write
var statement = <<<SELECT * FROM page where id=:pageId>>>;
this system should be able to export a grammar to your IDE. Most parser generators are terribly unergonomic (cue the event-driven interface of yacc) and not accessible to people who don't have a CS education (if you need a bunch of classes to represent your AST shouldn't these get generated from your grammar?) When you generate a parser you should get an unparser. Concrete syntax trees are an obscure data structure but were used in obscure RAD tools in the 1990s that would let you modify code visually and make the kind of patch that a professional programmer would write.
The counter to this you hear is that compile time is paramount and there's a great case for that in large code bases. (I had a system with a 40 minute build) Yet there's also a case that people do a lot of scripty programming and trading compile time for a ergonomics can be a win (see Perl and REBOL)
I think one goal in programming languages is to bury Lisp the way Marc Anthony buried Caesar. Metaprogramming would be a lot more mainstream if it was combined with Chomksy-based grammars, supported static typing, worked with your IDE and all that. Graham's On Lisp is a brilliant book (read it!) that left me disappointed in the end because he avoids anything involving deep tree transformations or compiler theory: people do much more advanced transformations to Java bytecodes. It might be easier to write those kind of transformations if you had an AST comprised of Java objects instead of the anarchy of nameless tuples.+
An interesting problem I've played around with fair bit is the idea of a maximally expressable non-Turing complete language, trying to make a language that is at least somewhat comfortable to use for many tasks, while still being able to make static assertions about runtime behavior.
The best I've managed is a functional language that allows for map, filter, and reduce, but forbids recursion or any other looping or infinite expansion in usercode.
The pitch is that this kind of language could be useful in contexts where you're executing arbitrary code provided by a potentially malicious third party.
I think you're asking for Starlark (https://starlark-lang.org), a language that strongly resembles Python but isn't Turing-complete, originally designed at Google for use in their build system. There's also Dhall (https://dhall-lang.org), which targets configuration use cases; I'm less familiar with it.
One problem is that, while non-Turing-completeness can be helpful for maintainability, it's not really sufficient for security. Starlark programs can still consume exponential amounts of time and memory, so if you run an adversary's Starlark program without sandboxing it, you're just as vulnerable to denial-of-service attacks as you'd be with a Turing-complete language. The most common solution is sandboxing, wherein you terminate the program if it exceeds time or memory limits; however, once you have that, it's no longer necessary for the language to not be Turing-complete, so you might as well use a popular mainstream language that's easy to sandbox, like JavaScript.
One other intriguing option in the space is CEL (https://cel.dev), also designed at Google. This targets use cases like policy engines where programs are typically small, but need to be evaluated frequently in contexts where performance matters. CEL goes beyond non-Turing-completeness, and makes it possible to statically verify that a program's time and space complexity are within certain bounds. This, combined with the lack of I/O facilities, makes it safe to run an adversary's CEL program outside a sandbox.
Non-Turing-completeness doesn’t buy you that much, because you can still easily multiply runtime such that it wouldn’t terminate within your lifetime. With just map you can effectively build the cross product of a list with itself. Do that in an n-times nested expression (or nested, non-recursive function calls), and for a list of length k the result is a list of length kⁿ. And with reduce you could then concatenate a string with itself those kⁿ times, resulting in a string (and likely runtime and memory usage) of length 2^kⁿ.
If you want to limit the runtime, you need to apply a timeout.
If you're interested in prior art, Ian Currie's NewSpeak was an attempt at a non-Turing complete language for safety critical systems. Most of the search results are for a different language with the same name, but "RSRE currie newspeak" should find relevant links.
Idris is Turing complete [1] and has a very advanced and expressive type system (dependent types) where type checking is still guaranteed to halt.
That's because the language has a notion of `total` functions [2], and only `total` functions can be used for computing type signatures. These `total` functions must terminate in finite time and not crash. AFAIK, they aren't Turing complete, but they're still pretty expressive. `partial`, Turing complete functions are still allowed at runtime (outside of type signatures) [1].
If I understand correctly, running Idris in the type-checking mode (`--check`) should give you what you want.
To me , this idea seems so so insane (especially for things like extraction , like you start extracting a zip on one device and it can be partially extracted and then you can partially extract it on the other) (yes sure , you could loop over each file and have a list of files currently unzipped and rather unzip the file which hasn't been unziped yet)
But Imagine if the file to be extracted is a singular file in zip (like 100 gig file)
I don't know , I have played this with criu and it had worked. Qemu can also work. But this idea is cool
Instead of using a default storage where entropy can hit , I would personally like it if the values were actually stored in sqlite and combined with Truly Relational Language maybe as well (but it doesn't truly require you to learn sqlite)
I had posted this on one of hackernews this as well and theoretically its possible with the brainfu* in sqlite intepreter that I had found. But I don't know.... If anybody knows of a new language / a method for integrating this in new languages , it would be pretty nice.
Oh my god , Another banger is the modular monolith part which I personally believe that it can be considered that java / kotlin ecosystem , golang with nats , elixir / erlang can be.
Another cool way is using encore in golang or typescript and then hosting the aws stack yourself or running encore locally I am not sure)
> What about a language where for any given bit of code, the dynamicness is only a phase of compilation?
This is (essentially) Crystal lang's type system. You end up with semantic analysis/compilation taking a significant amount of time, longer than other comparable languages, and using a lot of resources to do so.
It's not very convincing to me when the article talks about truly relational language but fails to mention Prolog and anything that we learned from it.
Logic languages are definitely not what I'd expect a relational-first language to look like.
What we learned from Prolog is mostly that starting from an exponentially-complex primitive and then trying to beat it into submission doesn't work at scale. Relational DBs don't have that problem. They do go n-squared and n-cubed and so forth easily, but there are lots of solutions to that as well.
I'm not sure what you mean with "an exponentially-complex primitive". In my opinion, Prolog lets you start with simple relations (n-squared, using your terms) and then enables you to build more complex relations using them.
The section about language support for modular monoliths reminds me of John Lakos's "Large-Scale C++ Software Design", which focuses on the physical design/layout of large C++ projects to enforce interfaces and reduce coupling and compilation time. Example recommendations include defining architecture layers using subdirectories and the PImpl idiom. It's pretty dated (1996, so pre-C++98), but still a unique perspective on an overlooked topic.
> Some Lisps may be able to do all this, although I don’t know if they quite do what I’m talking about here; I’m talking about there being a very distinct point where the programmer says “OK, I’m done being dynamic” for any given piece of code.
In Common Lisp there are tricks you can pull like declaring functions in a lexical scope (using labels or flet) to remove their lookup overhead. But CL is generally fast enough that it doesn't really matter much.
You can declaim inline a toplevel function. That doesn't necessarily mean that it will be integrated into callers. Among the possible effects is that the dynamism of reference can be culled away. If a function A calls B where B is declaimed inline then A can be compiled to assume that B definition. (Such that if B is redefined at run-time, A can keep calling the old B, not going through the #'B function binding lookup.).
I seem to remember that Common Lisp compilers are allowed to do this for functions that are in the same file even if they are not declaimed inline. If A and B are in the same file, and B is not declaimed notinline (the opposite of inline), then A can be translated to assume the B definition.
So all your helper functions in a Lisp module are allowed to be called more efficiently, not having to go through the function binding of the symbol.
for "Semi-Dynamic Language" it might be worth looking into rpython: interpreters written in rpython have two phases, in the first phase one has full python semantics, but in the second phase everything is assumed to be less dynamic, more restricted (the r of rpython?) so the residual interpreter is then transpiled to C sources, which, although compiled, can also make use of the built-in GC and JIT.
I immediately thought of Julia as a semi-dynamic language. Julia is a dynamic language, but (as I understand it) the first time a function is called with a specific type signature, that specific method is JIT compiled as static LLVM.
Which is then used for future dispatches on that same signature and gives it very good performance. Julia is dynamic, and definitely beats the 10x slower than C barrier jerf mentioned.
For what I was using it for at the time (~3 years ago when I used it seriously) it offered performance close to the compiled orbital analysis code we had (in more conventional languages, Fortran and C) but with the flexibility for developing models of Python and other dynamic/interactive languages. An excellent tradeoff: very small performance cost for better interactivity and flexibility.
Well, I'm thrilled by something in the neighbourhood, but your widespread adoption there is more a "well known in small circles" kind of thing: instead of selling shovels to the gold miners, the target of demosceners+analysts sounds like selling paint brushes to the garret-dwelling artists?
For me, I guess the killer checklist item is "[X] Programming in this language is an adequate punishment for inventing it".
For this putative APL-kith, I'd guess the combination of "[X] You require the compiler to be present at runtime [X] You require the language runtime to be present at compile-time" would be killer; after all wirth-style compilers, these days, can run out of L1$. However, does sufficient staging run into the "fewer than 100 programmers un-algoled* enough" problem?
(could relative popularity of the square bracket language be because phykers display formulae in order to convey insights, but for imkers the calculations are their own point?)
* on the "[X] Rejection of orthodox systems programming without justification" front, what might be to algol-inspired programming as Thistlethwaite's algorithm is to plain old cube solving? https://news.ycombinator.com/item?id=42426716 instead of reducing the group operators as one proceeds, one would reduce the dynamism as the loops nested... (one of the things I find impressive about the rpython JIT is that it roughly manages to implicitly do that reduction!)
Some phykers will admit that making useful (as opposed to publication-quality) graphics in bracket-language remains a PitA, these days of overpowered GPUs..
Do you have a better combination of target audience?
the uiua drawing their logo in uiua made me hopeful that something like that could go viral on yt/tt (& get undergrads running the gauntlet en masse hunting navierstokes singularities on their (still wirthy?) HBMs)
I helped on a language called Eve about 10 years ago. A truly relational language was exactly what that language was supposed to be, or at least that's what we were aiming at as a solution for a user-centric programming language.
The language we came up with was sort of like Smalltalk + Prolog + SQL. Your program was a series of horn clauses that was backed by a Entity-Attribute-Value relational database. So you could write queries like "Search for all the clicks and get those whose target is a specific id, then as a result create a new fact that indicates a a button was pressed. Upon the creation of that fact, change the screen to a new page". We even played around with writing programs like this in natural language before LLMs were a thing (you can see some of that here https://incidentalcomplexity.com/2016/06/10/jan-feb/)
It's very declarative, and you have to wrap you brain around the reactivity and working with collections of entities rather than individual objects, so programming this way can be very disorienting for people used to imperative OOP langauges.
But the results are that programs are much shorter, and you get the opportunity for really neat tooling like time travel debugging, where you roll the database back to a previous point; "what-if" scenarios, where you ask the system "what would happen if x were y" and you can potentially do that for many values of y; "why not" scenarios, where you ask the system why a value was not generated; value providence, where you trace back how a value was generated... this kind tooling that just doesn't exist with most languages due to how they languages are built to throw away as much information away as possible on each stage of compilation. The kind of tooling I'm describing requires keeping and logging information about your program, and then leveraging it at runtime.
Most compilers and runtimes throw away that information as the program goes through the compilation process and as its running. There is a cost to pay in terms of memory and speed, but I think Python shows that interpretation speed is not that much of a barrier to language adoptions.
But like I said, that was many years ago and that team has disbanded. I think a lot of what we had in Eve still hasn't reached mainstream programming, although some of what we were researching found its way into Excel eventually.
> Loosen Up The Functions... Capabilities... Production-Level Releases... Semi-Dynamic Language... Modular Monoliths
I really like where the author's head at, I think we have similar ideas about programming because I've been developing a language called Mech that fits these descriptors to some degree since Eve development was shut down.
So this language is not supposed to be relational like Eve, but it's more like Matlab + Python + ROS (or Erlang if you want to keep it in the languages domain).
I have a short 10 min video about it here: https://www.hytradboi.com/2022/i-tried-rubbing-a-database-on... (brief plug for HYTRADBOI 2025, Jamie also worked on Eve, and if you're interested in the kinds of thing the author is, I'm sure you'll find interesting videos at HYTRADBOI '22 archives and similarly interested people at HYTRADBOI '25), but this video is out of date because the language has changed a lot since then.
Mech is really more of a hobby than anything since I'm the only one working on it aside from my students, who I conscript, but if anyone wants to tackle some of these issues with me I'm always looking for collaborators. If you're generally interested in this kind of stuff drop by HYTRADBOI, and there's also the Future Of Coding slack, where likeminded individuals dwell: https://futureofcoding.org. You can also find this community at the LIVE programming workshop which often coincides with SPLASH: https://liveprog.org
I remember both LightTable and Eve. At the time I thought they were both really interesting ideas but wasn't sure where they were going.
Re-reading the eve website now, with 10+ years more experience and understanding of languages I'm really astounded at how brilliant Eve was, and how far ahead of it's time it was (and still is). Also at how rare it is to have any revolutionary ideas in modern programming language design make it out of theory in contemporary times. There were many radical ideas in the 60 and 70s, but so much now is incremental.
It's a shame Eve couldn't continue, just to see what it would've become and the influence it would have had on language expectations. Really cool stuff in there. While not likely, I hope someone picks up those ideas and continues them.
Did the effort just run out of funding? Or did it hit a stumbling block?
Thanks for the kind words, and I agree it is a shame, but we a ran out of money! Chris raised $2m from investors, and we spent that over 3 years with a pretty minimal burn rate as far as SF startups go. We couldn't really show a path to making money at that time (although I still think we were on to some things), so we couldn't raise anymore. We tried for an acquihire but nah.
As for continuing the ideas, I'm putting a lot of what we learned into Mech. I know that Chris and Josh went to work at Relational AI, but I'm not sure exactly what work they got up to there. But Chris recently posted about generative AI and how it goes back to things we were thinking about in 2015 with respect to Eve: https://x.com/ibdknox/status/1630548754238435330
So in that sense generative AI can be a vector for some of the ideas in Eve coming to the mainstream.
The only thing I want added to every programming language I use is the ability to call functions and handle data structures provided by libraries and services written in other languages without me having to write arcane wrappers.
Thanks - this was one of the more interesting things I've read here in a while.
I wonder if "Programming languages seem to have somewhat stagnated to me.", a sentiment I share, is just me paying less attention to them or a real thing.
I think there is innovation, but there's more than innovation required to be a good language. If a innovative feature is the cornerstone of a language, it frequently means that the language neglects pragmatic coding features that while not particularly special contribute to the language being nice to use.
I feel like in the next few years in languages will be things like Rust descendants where people with experience in using Rust want to keep what works for them but scales back some of the rigidity in favour of pragmatism.
It's also with noting that there are existing languages that are also changing over time. Freepascal has developed a lot of features over the years that make it fairly distant from original Pascal. More recent languages like Haxe are still developing into their final form. TypeScript has gone from a language that provided a tangible solution to an existing problem to a quagmire of features that I'd rather not have.
One thing I would like in PyTorch is for a tensor’s shape to be a fundamental part of its type. That is, disable implicit broadcasting and if an operation would require adding dimensions to inputs, require those inputs to be type cast to the correct shape first.
I can’t tell you how much time I have wasted on broadcasting bugs, where operations “work” but they aren’t doing what I want them to.
Jax can do this but no one uses Jax because of other reasons.
As far as "semi-dynamic" goes, C# has an interesting take coming from the other direction - i.e. a fully statically typed language originally bolting dynamic duck typing later on.
It's done in a way that allows for a lot of subtlety, too. Basically you can use "dynamic" in lieu of most type annotations, and what this does is make any dispatch (in a broad sense - this includes stuff like e.g. overload resolution, not just member dispatch) on that particular value dynamic, but without affecting other values involved in the expression.
Folks, this is not a process that converges. We've now had 60 years of language design, use and experience. We're not going to get to an ideal language because there are (often initially hidden) tradeoffs to be made. Everyone has a different idea of which side of each tradeoff should be taken. Perhaps in the future we can get AI to generate and subsequently re-generate code, thereby avoiding the need to worry too much about language design (AI doesn't care that it constantly copies/pastes or has to refactor all the time).
It's bimodal. Most of the world uses one paradigm of programming (declarative programming via Excel and SQL), while developers use another paradigm (imperative programming via Python/C/C++/Javascript et al.).
I think the problem with "big" language ideas is, that as long as they match exactly your needs, they're great, but if they're slightly off, they can be a pain in the ass.
I'm wondering if languages could provide some kind of meta information, hooks or extension points, which could be used to implement big ideas on top. These big ideas could then be reused and modified depending on the needs of the project.
Totally agree that programming languages are a bit stagnant, with most new features being either trying to squeeze a bit more correctness out via type systems (we're well into diminishing returns here at the moment), or minor QoL improvements. Both are useful and welcome but they aren't revolutionary.
That said, here's some of the feedback of the type you said you didn't want >8)
(1) Function timeouts. I don't quite understand how what you want isn't just exceptions. Use a Java framework like Micronaut or Spring that can synthesize RPC proxies and you have things that look and work just like function calls, but which will throw exceptions if they time out. You can easily run them async by using something like "CompletableFuture.supplyAsync(() -> proxy.myCall(myArgs))" or in Kotlin/Groovy syntax with a static import "supplyAsync { proxy.myCall(myArgs) }". You can then easily wait for it by calling get() or skip past it. With virtual threads this approach scales very well.
The hard/awkward part of this is that APIs are usually defined these days in a way that doesn't actually map well to standard function calling conventions because they think in terms of POSTing JSON objects rather than being a function with arguments. But there are tools that will convert OpenAPI specs to these proxies for you as best they can. Stricter profiles that result in more idiomatic and machine-generatable proxies aren't that hard to do, it's just nobody pushed on it.
(2) Capabilities. A language like Java has everything needed to do capabilities (strong encapsulation, can restrict reflection). A java.io.File is a capability, for instance. It didn't work out because ambient authority is needed for good usability. For instance, it's not obvious how you write config files that contain file paths in systems without ambient authority. I've seen attempts to solve this and they were very ugly. You end up needing to pass a lot of capabilities down the stack, ideally in arguments but that breaks every API ever designed so in reality in thread locals or globals, and then it's not really much different to ambient authority in a system like the SecurityManager. At least, this isn't really a programming language problem but more like a standard library and runtime problem.
(3) Production readiness. The support provided by app frameworks like Micronaut or Spring for things like logging is pretty good. I've often thought that a new language should really start by taking a production server app written in one of these frameworks and then examining all the rough edges where the language is mismatched with need. Dependency injection is an obvious one - modern web apps (in Java at least) don't really use the 'new' keyword much which is a pretty phenomenal change to the language. Needing to declare a logger is pure boilerplate. They also rely heavily on code generators in ways that would ideally be done by the language compiler itself. Arguably the core of Micronaut is a compiler and it is a different language, one that just happens to hijack Java infrastructure along the way!
What's interesting about this is that you could start by forking javac and go from there, because all the features already exist and the work needed is cleaning up the resulting syntax and semantics.
(4) Semi-dynamic. This sounds almost exactly like Java and its JIT. Java is a pretty dynamic language in a lot of ways. There's even "invokedynamic" and "constant dynamic" features in the bytecode that let function calls and constants be resolved in arbitrarily dynamic ways at first use, at which point they're JITd like regular calls. It sounds very similar to what you're after and performance is good despite the dynamism of features like lazy loading, bytecode generated on the fly, every method being virtual by default etc.
(5) There's a library called Permazen that I think gets really close to this (again for Java). It tries to match the feature set of an RDBMS but in a way that's far more language integrated, so no SQL, all the data types are native etc. But it's actually used in a mission critical production application and the feature set is really extensive, especially around smooth but rigorous schema evolution. I'd check it out, it certainly made me want to have that feature set built into the language.
(6) Sounds a bit like PL/SQL? I know you say you don't want SQL but PL/SQL and derivatives are basically regular programming languages that embed SQL as native parts of their syntax. So you can do things like define local variables where the type is "whatever the type of this table column is" and things like that. For your example of easily loading and debug dumping a join, it'd look like this:
DECLARE
-- Define a custom record type for the selected columns
TYPE EmpDept IS RECORD (
name employees.first_name%TYPE,
salary employees.salary%TYPE,
dept departments.department_name%TYPE
);
empDept EmpDept;
BEGIN
-- Select columns from the joined tables into the record
SELECT e.first_name, e.salary, d.department_name INTO empDept
FROM employees e JOIN departments d ON e.department_id = d.department_id
WHERE e.employee_id = 100;
-- Output the data
DBMS_OUTPUT.PUT_LINE('Name: ' || empDept.name);
DBMS_OUTPUT.PUT_LINE('Salary: ' || empDebt.salary);
DBMS_OUTPUT.PUT_LINE('Department: ' || emptDebt.name);
END;
It's not a beautiful language by any means, but if you want a natively relational language I'm not sure how to make it moreso.
(7) I think basically all server apps are written this way in Java, and a lot of client (mobile) too. It's why I think a language with integrated DI would be interesting. These frameworks provide all the features you're asking for already (overriding file systems, transactions, etc), but you don't need to declare interfaces to use them. Modern injectors like Avaje Inject, Micronaut etc let you directly inject classes. Then you can override that injection for your tests with a different class, like a subclass. If you don't want a subtyping relationship then yes you need an interface, but that seems OK if you have two implementations that are really so different they can't share any code at all. Otherwise you'd just override the methods you care about.
Automatically working out the types of parameters sounds a bit like Hindley-Milner type inference, as seen in Haskell.
(8) The common way to do this in the Java world is have an annotation processor (compiler plugin) that does the lints when triggered by an annotation, or to create an IntelliJ plugin or pre-canned structural inspection that does the needed AST matching on the fly. IntelliJ's structural searches can be saved into XML files in project repositories and there's a pretty good matching DSL that lets you say things like "any call to this method with arguments like that and which is inside a loop should be flagged as a warning", so often you don't need to write a proper plugin to find bad code patterns.
I realize you didn't want feedback of the form "but X can do this already", still, a lot of these concepts have been explored elsewhere and could be merged or refined into one super-language that includes many of them together.
The issue with stuff like SQL/JRT is that often stored procs are just running a few SQL statements in a row, the stored procs are being used for security or latency optimization but not because there's some truly complex logic. And if you want a language that has the basics like variables, loops, conditionals, data structures etc but which also makes issuing SQL statements natural, then it's hard to beat PL/SQL.
I really mean that, all those approaches using Perl, Java, .NET as stored procedures never gained much love among DBAs or the market in general, and I always favour approaches where I am not the only person on the building that can change something.
Also those approaches lack the end to end tooling, so you're basically back at printf debugging.
This is actually quite relevant to language design, too many people still get lost on discussing about grammar and language semantics, forgeting about everything else that is quite relevant why chose language A over language B.
As for "capabilities", I'm not sure I fully understand how that is advantageous to the convention of passing the helper function ("capability") as an argument to the "capable" function.
For instance, in Zig, you can see that a function allocates memory (capability) because it requires you to pass an allocator that it can call!
I'd like to see if others are more creative than me!
In Zig it's conventional to pass an allocator, but any code can end run around the convention by reaching for page_allocator or c_allocactor behind your back. Capabilities upgrade that convention into a guarantee.
That's pretty much how it plays out, as I understand it.
The trick is making sure that that object is the Only possible way to do the thing. And making more features like that, for example Networking, or File I/O, etc
In which Jerf longs for PHP.
Every single point has been in, and actively used, for a long while.
The __call() & friends is particularly nifty - simple mental model, broad applicability, in practice used sparingly to great effect.
For "value database", it seems to me that the trick is, you can't just ship the executable. You have to ship the executable plus the stored values, together, as your installation image. Then what you run in production is what you tested in staging, which is what you debugged on your development system.
I mean, there still may be other things that make this a Bad Idea(TM). But I think this could get around the specific issue mentioned in the article.
If it's about well-contained applications in a well designed (and user-centric) OS with a proper concept of "application" and "installation", with a usable enough mechanism, I don't see anything that would make it bad.
On Windows it's a disaster. To the point that dumping random text files around in Linux works better.
It is not. I didn't want to give a half explanation, but it is another case of the increasing difficulty in coming up with good Google searches anymore.
But you use capabilities all the time... operating system users work that way. As a user, you can't "just" execute some binary somewhere and thereby get access to parts of the system your user doesn't have rights to. (Forget setuid for a second, which is intended precisely to get around this, and let's just look at the underlying primitive.)
Capabilities in programming languages take the granularity further down. You might call some image manipulation code in a way that it doesn't have the capability to manipulate the file system in general, for example, or call a function to change a user's login name with capabilities that only allow changing that user, even if another user ID somehow gets in there.
It would be a fairly comprehensive answer to the software dependency issues that continue to bubble up; it would matter less if a bad actor took over "leftpad" if leftpad was actively constrained by the language to only be able to manipulate strings, so the worst an actor could do is make it manipulate strings the wrong way, rather than start running arbitrary code. Or put another way, if the result of the bad actor taking the package wasn't that people got hacked but users started getting
compile error in file.X:28: library "leftpad" tried to open a file without file system capabilities
compile error in file.X:30: library "leftpad" tried to open a socket without network capabilities
which would immediately raise eyebrows.
It's not a new idea, in that E already tried it, and bits and pieces of it are everywhere ("microkernels" is another place where you'll see this idea, but at the OS level and implemented in languages that have no native concept of the capabilities), but for the most part our programming languages do not reflect this.
> But you use capabilities all the time... operating system users work that way.
Most operating systems don't have proper capabilities - they use things like ACLS, RBAC, MAC, etc for permissions.
The golden rule of capabilities is that you should not separate designation from authority. The capability itself represents the authority to access something, and designates what is being accessed.
For the equivalent in operating systems land, look at the respective manual pages for Linux capabilities[1] or OpenBSD pledge[2] and unveil[3]. The general idea is that there are some operations that might be dangerous, and maybe we don't want our program to have unrestricted access to them. Instead, we opt-in to the subset that we know we need, and don't have access to the rest.
There's some interest in the same thing, but at the programming language level. I'm only aware of it being implemented academically.
I don't think that Linux capabilities have much to do with the capabilities that the OP intends.
In a capabilities system, a program has permission to act on any object if it has a reference (aka a capability) to the object, there is no other access control. A program acquires a capability either by receiving it from is parent (or caller in the case of a function) or some other way like message passing. There is no other source of capabilities and they are unforgeable.
Unix file descriptors act in many ways as capabilities: they are inherited by processes from their parents and can be passed around via Unix sockets, and grant to the FD holder the same permissions to the referenced object as the creator of the file descriptor.
Of course as Unix has other ways from creating file descriptors other than inheritance and message passing is not truly a capabilities system.
It's implemented in Java! .NET tried it too, UNIX file descriptors are capabilities, Mach ports are capabilities. Capabilities are widely used far outside of academia and have been for a long time.
What people often mean when they say this is a so-called pure capability system, where there are no ambient permissions at all. Such systems have terrible usability and indeed have never been made to work anywhere, not even in academia as far as I know.
> This is not a new idea, so I won’t go deeply into what it is
So, no, the author claims it too.
Capabilities are a way to do access control where the client holds the key to access something, instead of the server holds a list of what is allowed based on the clients identities.
But when people use that word, they are usually talking about fine-grained access control. On a language level, that would mean not granting access for example for a library to do network connections, even though your program as a whole has that kind of access.
For example, consider a simple function to copy files. We could implement it like this:
def copy(fs: Filesystem, in: Path, out: Path) {
inH: HandleRead = fs.openRead(in);
outH: HandleWrite = fs.openWrite("/tmp/TEST_OUTPUT");
finished: Boolean = false;
while (!finished) {
match (inH.read()) {
case None: finished = true;
case Some(data) = outH.write(data);
}
}
inH.close();
outH.close();
}
However, there are many ways that things could go awry when writing code like this; e.g. it will write to the wrong file, since I forgot to put the real `out` value back after testing (oops!). Such problems are only possible because we've given this function the capability to call `fs.open` (in many languages the situation's even worse, since that capability is "ambient": available everywhere, without having to be passed in like `fs` above). There are also other capabilities/permissions/authorities implicit in this code, since any call to `fs.open` has to have the right permissions to read/write those files.
In contrast, consider this alternative implementation:
def copy(inH: HandleRead, outH: HandleWrite) {
finished: Boolean = false;
while (!finished) {
match (inH.read()) {
case None: finished = true;
case Some(data) = outH.write(data);
}
}
inH.close();
outH.close();
}
This version can't use the wrong files, since it doesn't have any access to the filesystem: there's literally nothing we could write here that would mean "open a file"; it's unrepresentable. This code also can't mix up the input/output, since only `inH` has a `.read()` method and only `outH` has a `.write()` method. The `fs.open` calls will still need to be made somewhere, but there's no reason to give our `copy` function that capability.
In fact, we can see the same thing on the CLI:
- The first version is like `cp oldPath newPath`. Here, the `cp` command needs access to the filesystem, it needs permission to open files, and we have to trust that it won't open the wrong files.
- The second version is like `cat < oldPath > newPath`. The `cat` command doesn't need any filesystem access or permissions, it just dumps data from stdin to stdout; and there's no way it can get them mixed up.
The fundamental idea is that trying to choose whether an action should be allowed or not (e.g. based on permissions) is too late. It's better if those who shouldn't be allowed to do an action, aren't even able to express it at all.
You're right that this can often involve "keys", but that's quite artificial: it's like adding extra arguments to each function, and limiting which code is scoped to see the values that need to be passed as those arguments (e.g. `fs.openRead(inPath, keyThatAllowsAccess)`), when we could have instead scoped our code to limit access to the functions themselves (though for HTTP APIs, everything is a URL; so "unguessable function endpoint URL" is essentially the same as "URL with secret key in it")
The key property being that everything can only be accessed via handles, including, recursively, other handles (i.e. to get an handle to an object you need first to already have an handle to the handle-giver for that object).
A capability is basically a reference which both designates some resource to be accessed and provides the authority to access it. The authority is not held somewhere else like an Access Control List - the reference is the authority. Capabilities must be unforgeable - they're obtained by delegation.
---
To give an example of where this has been used in a programming language, Kernel[1] uses a capability model for mutating environments. Every function (or operative) has an environment which holds all of its local variables, the environment is encapsulated and internally holds a reference to the parent environment (the surrounding scope). The rule is that we can only mutate the local variables of an environment to which we have a direct reference, but we cannot mutate variables in the parents. In order to mutate the variables in the parent, we must have a direct reference to the parent, but there is no mechanism in the language to extract the parent reference from the environment it is encapsulated in.
For example, consider the following trivial bit of code: We define some variable `x` with initial value "foo", we then mutate it to have the value "bar", then look up `x`.
($define! x "foo")
($set! (get-current-environment) x "bar")
x
As expected, this returns "bar". We have a direct reference to the local environment via `(get-current-environment)`.
Technically we could've just written `($define! x "bar")`, where the current environment is assumed, but I used `$set!` because we need it for the next example.
When we introduce a new scope, the story is different.
($define! x "foo")
($define! foo
($lambda ()
($set! (get-current-environment) x "bar")))
(foo)
x
Here we create a function foo, which has its own local environment, with the top-level environment as its parent. We can read `x` from inside this environment, but we can't mutate it. In fact, this code inserts a new variable `x` into the child environment which shadows the existing one within the scope of the function, but after `foo` has returned, this environment is lost, so the result of the computation is "foo". There is no way for the body of this lambda to mutate the top-level environment here because it doesn't have a direct reference to it.
So far basically the same static scoping rules you are used to, but environments in Kernel are first-class, so we can get a reference to the top-level environment which grants the child environment the authority to mutate the top level environment.
($define! x "foo")
($define! env (get-current-environment))
($define! foo
($lambda ()
($set! env x "bar")))
(foo)
x
And the result of this computation is "bar".
However, by binding `env` in the top-level environment, all child scopes can now have the ability to mutate the top-level.
To avoid polluting the environment in such way, the better way to write this is with an operative (as opposed to $lambda), which implicitly receives the caller's environment as an argument, which it binds to a variable in its local environment.
($define! x "foo")
($define! foo
(wrap ($vau () caller-env
($set! caller-env x "bar"))))
(foo)
x
Now `foo` specifically can mutate it's caller's local environment, but it can't mutate the variables of the caller of the caller, and we have not exposed this authority to all children of the top-level.
---
This is only a trivial example, but we can do much more clever things with environments in Kernel. We can construct new environments at runtime, and they can have multiple parents, ultimately forming a DAG, where environment lookup is a Depth-First-Search, but the references to the parent environments are encapsulated and cannot be accessed, so we cannot mutate parent scopes without a direct reference - we can only mutate the root node of the DAG for an environment to which we have a direct reference. The direct reference is a capability - it's both the means and the authority to mutate.
---
We can use these first-class environments in conjunction with things like `$remote-eval`, which evaluates some piece of code in an environment provided by the user, which may contain only the bindings they specify, and does not capture anything from the surrounding scope.
We get an error, `write` is unbound - even though `write` is available in the scope in which we performed this evaluation. We could catch this error with a guarded continuation so the program does not crash.
This combination of features basically let you create "mini sandboxes", or custom DSLs, with more limited capabilities than the context in which they're evaluated. Most languages only let you add new capabilities to the static environment, by defining new functions and types - but how many languages let you subtract capabilities, so that fewer features are available in a given context? Most languages do this purely at compile time via a module/import system, or with static access modifiers like `public` and `private`. Kernel lets you do this at runtime.
---
One thing missing from this example, which is required for true capabilities, is the ability to revoke the authority. The only way we could revoke the capability of a function to mutate an environment is to suspend the program.
Proper capabilities allow revocation at any time. If the creator of a capability revokes the authority, this should propagate to all duplicated, delegated, or derived capabilities with immediate effect. The capabilities that were held become "zombies", which no longer provide the means nor the authority - and this is why it is essential that we don't separate designation from authority, and why these should both be encapsulated in the capability.
This clearly makes it difficult to provide proper capabilities in programming languages, because we have to handle every possible error where we attempt to access a zombie capability. The use of such capabilities should be limited to where they really matter such as access to operating system resources, cryptographic keys, etc. where it's reasonable to implement robust error handling code. We don't want capabilities for every programming language feature because we would need to insert error checks on every expression to handle the potential zombie. Attempting to check if a capability is live before using it is no solution anyway, because you would have race conditions, so the correct approach to using them is to just try and catch the error if it occurs.
Another take-away from this is that if capabilities are provided in a language via a type system, it must be a dynamic type system. You cannot grant authority in a static type system at compile time if the capability may have already been revoked by the time the program is run. Capabilities are inherently dynamic by nature because they can be revoked at any time. This doesn't mean you can't use capabilities in conjunction with a static type system - only that the static type system can't really represent capabilities.
You can find out a lot more about them on the erights page that others have linked, and I would recommend looking into seL4 if you're interested in how they're applied to operating systems.
Perhaps programming language is not the right abstraction to implement capability and need strong support from hardware and OS. OS with CPU based memory segmentation is an old idea probably worth re-exploring.
Implementing capability in the programming languages constructs will only increase cognitive overload and it will not be helpful for the programmer productivity [1].
b) Semi-Dynamic Language:
Dynamic language is what we want but static language is what we need [TM]. Instead of making dynamic language more static why not make static language more dynamic?
I think D language is moving the right direction with the default GC, RDMD based scriting, CTFE, and Variant based standard library features [2].
c) A Truly Relational Language:
Relational is only but one of the techniques of data processing, and other popular ones are spreadsheet, graph and matrices.
Rather than constraint programming languages with relational native constructs, better to provide generic mechanisms with fundamental constructs for data with associative array algebra [2].
d) Value Database:
This very much relates with (c) and can be excellent by-products and side-effects solution of it.
e) A Language To Encourage Modular Monoliths:
I think this is best point and idea from the entire article but as it rightly pointed out this is mainly architecture problem and programming language plays as a supporting role. It's the same as the Internet is based on packet switching rather than circuit switching, regardless of the RFCs and standards are being implemented in any languages.
However, for OS architecture with regard to Linus vs Tanembaum debate, modular monolithic is what we have now and the most popular in the form Linux, Windows and some say MacOS, and together they cover more than 99% of our OSes.
Regarding the 'Modular Monoliths' bit, I wholeheartedly agree. I always found it kind of disappointing that while we're told in our OOP classes that using interfaces increases modularity and cohesion and decreases coupling, in reality in most programming languages you're relying on the nominal type of said interface regardless. All libraries have to use a common interface at the source code level, which is obscenely rare. For interfaces to truly live up to what they're describing, they merely ought to be structural (or whatever the equivalent to functions is that structural typing is to data).
Edit, since I remembered Go has this behaviour: I think Go's auto-interfaces I think are easily one of its biggest selling points.
I feel the ideas listed here range from nice-to-haves to fundamental weaknesses of all languages that derived from the C/Python/Java family, which almost all mainstream languages seem to be.
I'm working on a platform that tries to remedy some of those, so I'll enum the overlap I have with the author about what's fundamental:
1. The distinction between sync and async should be a runtime/scheduler decision, the way it is say in your CPU scheduler, we shouldn't have to separate functions into sync and async (red and blue, etc.). This is a legacy quirk of trying to bolt asynchrony on runtimes which are intimately dependent on a call stack, while asynchrony can't use the call stack (it uses so called "spaghetti stack" or inverse tree where children link to a parent, or the way say LISP environment frames link to one another).
2. Persistence ("relational programming / value database"). I'd go farther than the author and suggest both his concerns with relational programming and value database have a shared solution: the language SHOULD be the database. It shouldn't be something you code for, then the code becomes SQL, and then the SQL is executed elsewhere. This is also a historical quirk. The relational storage should come with every programming language. This was also the original way in which SQL was intended to be used. Unfortunately SQL databases failed to evolve in terms of general purpose programming and failed to provide also clear ways to separate "clean code" from a deployment with state. You need persistence in programs, but you also need a clear line between initial state (construction from source code only) and deployment with state. I can discuss further how this is done if someone is interested.
Two more things I didn't see mentioned, but are related:
3. Pass by value. If you want to bridge the gap between remote and local calls, timeouts and asynchrony is one thing, but another is that you have no direct zero-latency access to someone else's memory. So you can't pass things by handle or reference like we do in common OOP languages. Runtimes need to lean more heavily into pass-by-value, and optimize large structure copying with concepts like copy-on-write, and other algorithms from "persistent data structures" we're well familiar with. Mutable structures being a configuration of smaller immutable structures also permits seamless, effective caching between remote clients, so they can reference immutable facts that don't "go out of date" or change.
4. Return to "code is data" and "data is code" i.e. so called homoiconicity. So much mindless boilerplate in your programs is the endless reimplementation of proto-interpreters for custom input data, unrecognized Domain Specific Languages. If this DSL was written in a compact, expressive manner, and if you could transform that DSL to code and run it instead, and know it'll be treated just as well as any other code in the system, you wouldn't have to write, and just as importantly look at that boilerplate ever again. This of course requires a runtime with strong, clear safety boundaries on IO and side effects, so that if you fat-finger the conversion and permit injections, you can still feel safe the code can't take over your system, the worst it can do is error out or provide a bad answer which you'll catch on return.
* In general, whenever I hear "compiler will optimize this", I die a little on the inside. Not even because it's delegating solution of the newly created problem to someone else, but because it creates a disconnect between what the language tells you is possible and what actually is possible. It encourages this kind of multi-layer lie that, in anger, you will have to untangle later, and will be cursing a lot, and will not like the language one bit.
* Capabilities. Back in the days when ActionScript 3 was relevant, there was a big problem of dynamic code sharing. Many services tried to implement module systems in AS3, but the security was not done well. To give you some examples: a gaming portal written in AS3 wants to load games written by programmers who aren't the portal programmers (and could be malicious, i.e. trying to steal data from other programs, or cause them to malfunction etc.) ActionScript (and by extension JavaScript 4) had a concept of namespaces borrowed from XML (so not like in C++), where availability of particular function was, beside other things, governed by whether the caller is allowed to access the namespace. There were some built-in namespaces, like "public", "private", "protected" and "internal" that functioned similar to Java's namesakes. But users were allowed to add any number of custom namespaces. These namespaces could be then shared through a function call in a public namespace. I.e. the caller would have to call the function and supply some kind of a password, and if password matched, the function would return the namespace object, and then the caller could use that namespace object to call the functions in that namespace. I tried to promote this concept in Flex Framework for dealing with module loading, but that never was seriously considered... Also, people universally hated XML namespaces (similar to how people seem to universally hate regular expressions). But, I still think that it could've worked...
* All this talk about "dynamic languages"... I really don't like it when someone creates a bogus category and then says something very general about it. That whole section has no real value.
* A Truly Relation Language -- You mean, like Prolog? I wish more relational databases exposed their content via Prolog(like) language in addition to SQL. I believe it's doable, but very few people seem to want it, and so it's not done.
“It feels like programming languages are stagnating.”
As they should be. Not every language needs to turn into C++, Rust, Java, C#, or Kotlin.
The only group I see lamenting about features these days are PL theorists, which is fine for research languages that HN loves but very few use outside the bubble.
Timeouts on calls are, as the OP mentions, a thing in Erlang. Inter-process and inter-computer calls in QNX can optionally time out, and this includes all system calls that can block. Real-time programs use such features. Probably don't want it on more than that. It's like having exceptions raised in things you thought worked.
- Capabilities
They've been tried at the hardware level, and IBM used them in the System/38, but they never caught on. They're not really compatible with C's flat memory model, which is partly they fell out of fashion. Capabilities mean having multiple types of memory. Might come back if partially-shared multiprocessors make a comeback.
- Production-Level Releases
That's kind of vague. Semantic versioning is a related concept. It's more of a tooling thing than a language thing.
- Semi-Dynamic Language
I once proposed this for Python. The idea was that, at some point, the program made a call that told the system "Done initializing". After that point, you couldn't load more code, and some other things that inhibit optimization would be prohibited. At that point, the JIT compiler runs, once. No need for the horrors inside PyPy which deal with cleanup when someone patches one module from another.
Guido didn't like it.
- Value Database
The OP has a good criticism of why this is a bad idea. It's an old idea, mostly from LISP land, where early systems saved the whole LISP environment state. Source control? What's that?
- A Truly Relational Language
Well, in Python, almost everything is a key/value store. The NoSQL people were going in that direction. Then people remembered that you want atomic transactions to keep the database from turning to junk, and mostly backed off from NoSQL where the data matters long-term.
- A Language To Encourage Modular Monoliths
Hm. Needs further development. Yes, we still have trouble putting parts together. There's been real progress. Nobody has to keep rewriting Vol. I of Knuth algorithms in each new project any more. But what's being proposed here?
- Modular Linting
That's mostly a hack for when the original language design was botched. View this from the point of the maintenance programmer - what guarantees apply to this code? What's been prevented from happening? Rust has one linter, and you can add directives in the code which allow exceptions. This allows future maintenance programmers to see what is being allowed.
reply