Arc is underrated as an information management tool. There's something to be said for having a web framework that works out of the box. Rails is probably the only other framework that makes it as easy to "just make some forms that pass data around and run some code on that data." But not quite -- I haven't seen arc's closure-storing technique used in any other web framework.
The main issue that arc solves is that it gives you a full pipeline for managing "objects with properties" via the web. It's so flexible. I wrote a thread on how we're using it in production to manage our TPUs: https://twitter.com/theshawwn/status/1247570883306119170
You haven't been around long enough, at least under the same name, to remember when there was an implicit time limit on the comment reply page, inflicted by those stored closures silently timing out. Having long comments so often eaten that way was actually the specific thing that annoyed me into first installing It's All Text.
It's an interesting approach, as attempts to force statefulness on a stateless-by-design protocol go, but I don't know that I like how it scales.
My account was reset. I've been around since day two. :)
You're right, it has some downfalls. But a lot of the time it simply doesn't matter. All the links at https://www.tensorfork.com/tpus are dynamic, and the speed you gain by being able to whip up a feature in 10 minutes is worth the pain of an occasional dead link.
On the other hand, I did some work for "deduplicating fnids": http://arclanguage.com/item?id=20996 which the site is using, so the links possibly last much longer than the early-HN links.
(Basically, we calculate the fnid key based on the lexical environment, rather than using a random ID each time the page is loaded. So each link gets a unique ID based on the code structure rather than a random ID. Meaning, instead of millions of random links to store, you end up with a few tens of thousands.)
It's short for function ID. If you want to route requests to specific closures, you have to have some sort of ID that you can send down to the user. Arc stores closures in a hash table keyed by random ID, but we use lexical structure plus lexical values (like username) to make the key deterministic. It greatly cut down on the number of closures that needed to be stored.
Basically, if you have a closure that does a certain action – e.g. editing a comment – Arc generates new a closure every page refresh. That was the root reason for the "dead link problem" during the early days of HN. I reworked it to make the closure IDs deterministic.
If you walk the source code from the point of the closure up to the root, and then you stick all the local variables in a list and use that as a key, then hash that, along with the actual source code, you end up with something that (a) has a very low probability of collision, and (b) is deterministic each time the page refreshes, unless the variables' values change. (E.g. if you store the current time in a local variable, the value changes, so the fnid will change as well, since that value is captured by the closure and therefore the closure has to do something different by definition.)
That sounds like a complicated fix! How many tens of minutes did it take? And how many tens of minutes did the bug cost, in effort spent on comments that got eaten instead of posted?
(I know there's no way to answer that last question, but that doesn't mean its answer is equal to zero.)
It was complicated, but at one point I was so in love with Arc that I wanted to give it a real shot at taking over the world. It seemed like a necessary change to make, since the moment someone brought up dead links as a slight against arc, I could point to the change and say, "Already fixed!"
the speed you gain by being able to whip up a feature in 10 minutes is worth the pain of an occasional dead link
This is everything that is wrong with the software industry, summarized in one sentence. Speed gain enjoyed by developers is paid for by the users in pain.
It used to be that developers would go through tremendous amounts of pain just to squeeze out a few instructions from a UI drawing routine in order to make it just a little bit smoother for the users. Now those developers are derided as "greybeards."
I'm sorry, I probably could have found a less harsh and cynical way of writing all that, but I feel like the Internet is getting worse everyday and there's not enough urgency among tech people.
Closures on a server are a powerful way of representing data flows. But they come at a cost: the links expire after some time. How do you strike a balance?
The simplest way is to put in the extra time to make everything into a persistent link. But, that's equivalent to removing all the benefits of lexical scoping. If you've ever created an inner function before, you know how powerful that technique can be. You can encode the entire state of the program into closures -- no need to store things in databases. Want a password reset link? Spin up a closure, email the link, done. Literally identical to storing a reset token in a database, except there's no database.
Another solution is to fix the root problem. Does the closure really need a random ID every time the page refreshes? The closure links die because they have to be GC'd periodically, to keep memory usage down. Even if you cache the page for 90 seconds, that's still 960 refreshes per day for logged-out users. Then if you have a few hundred regular users, that's at least another factor of two. And certain pages might create hundreds of closure links each refresh, so it quickly gets out of hand.
Ironically, the solution was emacs -- in emacs, they store closures in a printable way. A closure is literally a list of variable values plus the function's source code. That got me thinking -- why not use that as the key, instead of making a random key each time the page refreshes? After all, if the function's lexical variables are identical, then it should produce identical results each time it's run. No need to create another one.
That's what I did. It took a week or so, which is a week I'll never get back for building new features. But at least users won't have to deal with dead links anymore.
Clever readers will note a theoretical security flaw: an attacker might be able to guess your function IDs if they knew the entire state of the closure + the closure's source code (which is the default case for an open source project). That might give them access to e.g. your admin links. But that's not an indictment of the technique; it's easily solved by concatenating the key with a random ID generated at startup, and hashing that. I'm just making a note of it here in case some reader wants to try implementing this idea in their own framework.
The closure technique has nontrivial productivity speedups (that I think someone will rediscover some years from now). I hope the idea becomes more popular over time.
How about a keepalive from the client side: little bit of JavaScript that somehow tells the server that the session is still alive, so don't blow away the closure or continuation.
Since this is getting a surprising amount of interest, let me sum up the technique here. It's really not hard to implement it in Javascript using Express.
1. inside of your express endpoint, create a closure that captures some state. For example, the user's IP address.
EDIT: I updated this to capture the date + time the original page was loaded, which is slightly more compelling than a mere IP address.
app.get('/', function (req, res) {
let ip = req.headers['x-forwarded-for'] || req.connection.remoteAddress;
let now = new Date();
let date = now.getFullYear()+'-'+(now.getMonth()+1)+'-'+now.getDate();
let time = now.getHours() + ":" + now.getMinutes() + ":" + now.getSeconds();
let fn = (req, res) => {
res.send(`hello ${req.query.name}. On ${date} at {time}, your IP address was ${ip}`)
}
... see below ...
})
2. insert that closure into a global hash table, keyed by a random ID.
g_fnids = {};
app.get('/', function (req, res) {
let fn = ...
let id = <generate random ID>
g_fnids[id] = fn;
res.send(`<a href="/x?fnid=${id}&name=bob">Say hello</a>`);
})
3. create an endpoint called /x which works like `/x?fnid=<function id>&foo=1&bar=2`. Use <function id> to look up the closure. Call the closure, passing the request to it:
app.get('/x', function (req, res) {
let id = req.query.fnid;
let fn = g_fnids[id];
fn(req, res)
}
Done.
Congratulations, your closure is now an express endpoint. Except you didn't have to name it. You can link users to it like `<a href="/x?fnid=<function id>&name=bob">Say hello</a>`.
The reason this is a powerful technique is that you can use it with forms. The form target can be /x, and the query params are whatever the user types into the form fields.
I bet you already see a few interesting use cases. And you might notice that this makes scaling the server a little more difficult, since incoming requests have to be routed to the server containing the actual closure. But in the meantime, you now have "inner functions" in your web framework. It makes implementing password reset functionality completely trivial, and no database required.
If it seems slightly annoying to use – "I thought you said this was a productivity boost. But it's annoying to type all of that!" – lisp macros hide all of this boilerplate code, so there's zero extra typing. You can get lisp macros for Javascript using Lumen lisp: https://github.com/sctb/lumen
Even without macros, though, I bet this technique is shorter. Suppose you had to store the date + time + IP address somewhere. Where would you put it? I assume some sort of nosql database like firebase. But wouldn’t that code be much longer and more annoying to write? So this technique has tremendous value, and I’m amazed no one is using it circa 2020.
This is really funny; Back in the warcraft 3 days, we used to do the same thing inside its scripting language --- to attach some data to a timer, we would exploit the fact that a timer is in fact just a 'void *' underneath: so the pointer address gave us the unique ID. We would stash data associated with the timer in a global hash table. Then, in the callback of the timer, we would read the data back from the global hash table!
Your exposition took me a trip down memory lane to middle/high school. Thank you for this :)
Well this is certainly the coolest thing I've read today. I've been trying to grok closures and this helps a bit. Is there a reason to not use the global hash table itself to store the state instead of a closure? This seems to be trading a database with memory. This also seems to be harder to interrogate, what if I want to go in and see what's currently outstanding, instead of going to Firebase, I'll need to go through the hash table and check the content of the function.
I think I may be missing something someone who's actually worked with lisp can see, to me closure, recursion and functional programming is cool but I can do everything its showing off using the standard fare of loops and databases.
The biggest difference is the “...” assignment to fn. The idea is similar to AWS Lambda — write functions, store those functions, and then call them later when you need them. I’ve minimal Lisp experience, but from my perspective, a closure is a function you can store in a variable that’s defined with a scope, or a set of arguments/variables used in your function, that often (but not always) includes variables from the parent scope it was defined in, (often only) if referenced by the function. Because closures need to have independent scopes, by default values (or variables) that can mutate need to be copied — alternatively, you can use immutable data structures which copy more cheaply. The big differences then between closures and other types of code often comes down to how frequently immutability is used, and whether you call functions that assume state or share state (more OO, or non-FP), or transfer functions with state to other functions (FP, though composability and other properties matter too when defining FP, this is a simplification). This is a bit of a vague answer, perhaps others can chime in with a better one. And if you’re not careful, FP can introduce problems too, though that happens more often with distributed, multi-threaded or recursive programs which can themselves be hard to write using non-FP also.
It's not all that similar to AWS Lambdas in concept or in execution. Those are stateless; to a very good first approximation, they're just a single-route web server with all the boilerplate abstracted away, and that starts up a fresh instance to handle each request and is shut down again immediately after.
What 'sillysaurusx describes is much more similar to what, in Scheme and elsewhere but these days mainly there, is called "continuation-passing style". It's a way of pausing a partially completed computation indefinitely by wrapping it up in a function that closes over the state of the computation when you create it, and calling that function later to pick up from where you left off when you're ready to proceed again.
I suppose you could maybe do that with an AWS Lambda, but because the technique relies strongly on the runtime instance staying around until the computation finishes, it would probably get expensive. Lambdas aren't priced to stay running, after all.
As a side note, it's worth mentioning that the "AWS Lambda" product, which whatever its virtues isn't actually a lambda, derives its name from the lambda calculus, where I believe the concept of anonymous first-class functions originates. I don't recommend reading about the lambda calculus itself unless you're up for a lot of very heavy theory, but it's worth knowing that, especially in the Lisp world and realms adjacent, you'll often see the term 'lambda' used in a sense which has nothing to do with the AWS product, but rather refers to a form of abstraction that relies on defining functions which retain access to the variable ("lexical") scopes in which they were created, even when called from outside those scopes. Javascript functions have this property, which is why they're capable of expressing the technique 'sillysaurusx describes, and it gives them a lot of other useful capabilities as well.
True. Good distinctions. To re-iterate the above, the approximation to AWS Lambda would require dynamic AWS Lambda functions -- as in code that creates a Lambda with specific state embedded in it -- then tracks each of those by their unique Lambda identifier and ... yeah, that's where this breaks down because it's not all that similar to Lambda if the best use for a Lambda is repeated invocations of the same code. And Lambda IDs presumably aren't based on a hash of their contents and variables the way this is. But dynamic AWS Lambda functions are possible, so there's that. You could write this in Lambda, it just might be expensive if API calls to create and destroy one-time Lambdas are expensive enough. It's a lot cheaper and faster to build functions and store references to them in a hash table in memory.
Another similarity to this use of hashing the scope of a function would be in memoization of a function, to cache the output based on the input, such that you hash a function's inputs and assign to that hash a copy of the output of the function when run with those inputs. Then you can hash the inputs and skip re-running the function. You have to be sure the function has no side-effects nor any changes in behaviour or inputs not specified in the memoization hash, though. "Pure" functions are best for this use case.
Memoization is usually preferable if you can do it, sure. But you can't memoize a continuation, because what it expresses is a computation that has yet to complete and produce the result you'd need in order to memoize. And the use of the g_fnid hash table doesn't qualify as memoization, either, because the keys aren't arguments to the function that produced the values; what it actually is is a jump table, cf. https://en.m.wikipedia.org/wiki/Branch_table#Jump_table_exam...
Thanks for your reply. I ended up looking for a bit more on continuations from the perspective of JS Promises and found https://dev.to/homam/composability-from-callbacks-to-categor... which was a pretty easy to follow read on this if you take the time to understand the JS, though there might be better references to continuations elsewhere, this was just one of the first I found.
It works a lot better in a proper Lisp, where the REPL and debugger are first-class citizens. In Javascript, you can do it, but it's a dancing bear at best; as you note, the observability is poor to nil without heroic effort, and scalability's a problem too.
I mean, I can tell you right now why I'm not using it circa 2020, nor do I expect I shall in future. For sure, it's clever and it's elegant, a brilliant hack - but it's not durable, and in my line of work that counts for more.
On the one hand, as you note, this can't scale horizontally without the load balancer knowing where to route a request based on the fnid, which means my load balancer now has to know things it shouldn't - and that knowledge has to be persisted somewhere, or every session dies with the load balancer.
On the other hand, even if I teach nginx to do that and hang a database or something off it so that it can, with all the headaches that entails - this still can't scale horizontally, because when one of my containers dies for any reason - evicted, reaped, crashed, oomkilled because somebody who doesn't like me figured out how to construct a request that allocates pathologically before I figured out how to prevent it, any number of other causes - every session it had dies with it, because all that state is internal to the runtime instance and can't be offloaded anywhere else.
So now my cattle are pets again, which I don't want, because from a reliability standpoint shooting a sick cow and replacing it with a fresh one turns out to be very much preferable to having to do surgery on a sick or dying pet. Which I will have to do, because, again, all the persisted state is wrapped up tight inside a given pod's JS runtime, so I can't find out anything I didn't know ahead of time to log without figuring out how to attach a debugger and inspect the guts of state. Which, yes, is doable - but it's far from trivial, the way Lisps make it, and if the pod dies before I can find out what's wrong or before I'm done in the debugger, I've got a lot less to autopsy than a conventional approach would give me. And that's no less a problem than the rest of it.
Yes, granted, the sort of software you describe is incredibly elegant, a beautifully faceted gem. It's the sort of thing to which as a child I aspired. But as it turns out, here thirty years on, I'm not a jeweler, and the sort of machine my team and I build has precious little need for that sort of beauty - and less still for the brittleness that comes with it. Durability counts for much more, because if our machines break and stay broken long enough, the cost is measured in thousands or millions of dollars.
That's not hyperbole, either! Early one morning last November, I ran two SQL queries, off the top of my head, in the space of two thirds of a minute. When all was eventually said and done, the real value of each of those forty seconds, in terms of revenue saved, worked out to about $35,000 - about $1.4 million, all told, or seven hundred thousand dollars per line of SQL. And not one of the people who gave us all that money ever even knew anything had been wrong.
Granted that a couple of unprecedented SQL queries like the ones I describe, written on nothing but raw reflex and years of being elbow deep in the grease and guts of that machine and others like it, constitute a large and blunt hammer indeed. But - because we built that machine, as well as we knew how, to be durable and maintainable above all else - in a moment which demanded a hammer and where to swing it, both were instantly to hand. In a system built as you describe, all gleaming impenetrable surfaces between me and the problem that needed solving right then, how could I have hoped to do so well?
Only through genius, I think. And don't get me wrong! Genius is a wonderful thing. I wish I had any of it, but I don't. All I know how to be is an engineer. It's taken me a long time to see the beauty in that, but I think I'm finally getting a handle on it, these days. It's a rougher sort of beauty than that to which I once aspired, that I freely concede, and the art that's in it is very much akin to something my grandfathers, both machinists and one a damned fine engineer in his own right, would have recognized and I hope might have respected, had they lived to see it.
Do you know, one of those grandfathers developed a part that went on to be used in every Space Shuttle orbiter that ever flew? It wasn't a large part or a terribly critical one. You wouldn't think much of it, to look at it. But he was the man who designed it, drew it out, and drew it forth from a sheet metal brake and a Bridgeport mill. He was the man who taught other men how to make more of them. And he was a man who knew how to pick up a hammer and swing it, when the moment called for one. He was possessed of no more genius than am I, and his work had no more place in it for the beauty of perfectly cut gemstones than does mine. But he was a smart man, and a knowledgeable man, and not least he was a dogged man. And because he was all those things, my legacy includes a very small, but very real, part in one of the most tangible expressions of aspiration to greater, grander things that our species has ever yet produced. Sure, the Space Shuttle was in every sense a dog, a hangar queen's hangar queen. But, by God, it flew anyway. It 'slipped the surly bonds of Earth, and touched the face of God' - and next time, we'll do better, however long it takes us. And, thanks to my grandfather's skill and effort, that's part of who and what I am - and there's a part of me in that, as well.
No gemstone that, for sure! It has its own kind of beauty, nonetheless - the kind that leaves me feeling no lack in my paucity of genius, so long as I have an engineer's skill to know when and how to swing a hammer, and an engineer's good sense to leave myself a place to land it. If that was ever in doubt, I think it can only have been so until that morning last November, when I saved ten years' worth of my own pay in the space of forty seconds and two perfect swings of exactly the right hammer.
There's a place for the beauty of gemstones, no doubt - for one thing, in seeing to it this very long comment of mine isn't lost to the vagaries of a closure cache. And I appreciate that, for sure! It'd be a shame to have wasted the effort, to say nothing of any small value that may cling to these words.
But there's a place for the beauty of hammers, too.
The vast majority of websites don't need to scale beyond what a single computer can do, especially with an efficient runtime. You're right that if you're building Wikipedia or Amazon you need to scale horizontally. But most sites aren't Wikipedia or Amazon.
It's true that JS systems like Node aren't really designed for this kind of thing, although they could have been. Arc is.
Yup. I somehow became a graybeard. I really didn't fit in at my last three gigs doing "backend" work.
I always play to win, so try to understand why & how I failed.
My current theory:
I had good successes doing product development. Shipping software that had to be pretty close to correct.
Today's "product development" is really IT, data processing. Way more forgiving of stuff that's not quite right. Often not even close to right. (Guessing that about 1/3rd of the stuff I supported didn't actually do what the original author thought it did, and no one was the wiser, until something didn't seem quite right.)
One insightful coworker said it best: "I learned to do everything to 80% completion."
My observation is that most team mates created more bugs than they closed. Maybe incentivized by the "agile" methods notions of "velocity". And they were praised for their poor results.
Whereas my tortoise strategies nominally took longer. So I had fewer, larger commits. Way fewer "points" on the kanban board. Created much fewer lines of code.
(When fixing [rewriting] other people's code, mine was often 50% to 80% smaller. Mostly by removing dead code and deduplication.)
I was able to bang out new stuff and beat deadlines when I was working solo.
I think the difference between solo and team play is mostly due to style mismatches. It's very hard for me to collaborate with teammates who are committing smaller, more frequent, often broken, code changes.
Any way. That's my current best guess at what's happening to this graybeard.
More optimistically...
I'm very interested in the "Test Into Prod" strategies advocated by the CTO from Gilt (?). It's the first QA/Test strategy (for an "agile" world") that makes any kind of sense to me. So I think I could adapt to that work style.
(I served as SQA Manager for a while. It's hard to let go of those expectations. It's been maybe 20 years since I've seen anyone doing actual QA/Test. I feel bad for today's business analysts (BAs) who get stuck doing requirements and monkey style button pushing. Like how most orgs functioned in the 80s.)
I see more gray in my beard every morning. And the thing about "...to 80% completion" is that the first 80% of value is captured in the first 80% of effort, and the last 20% of value in the other 80% of effort. It's important to know when to follow that ROI graph past the knee, for sure. But it's just as important to know when not to.
(I mind me of a time a few years back when I was surprised to learn that the right method of exception handling, for a specific case in some work I was doing on a distributed system, was none - just letting it crash and try again when the orchestrator stood up a fresh container. It felt wrong at first; ever before I'd have instead gone to a lot of painstaking effort to reset the relevant state by hand, and my first instinct was to do the same here. But crashing turned out to be the right thing to do, because it worked just as well and took no time at all to implement.)
HN still uses it extensively for its more obscure operations—the ones that don't need to scale. We switched all the most common handlers to regular links years ago. It's hard to remember now that the most common complaint here (by far) used to be about those "Unknown or expired link" messages. In fact, you can tell from the histogram of https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... when it was that we fixed this—six years ago—because that's when the complaints slow down to a trickle.
As long as you don't use it for things that you need a lot of, it's a great approach that holds up well. The primary downside is that they all get discarded when we restart the server process. Another downside is that they don't work well with the back button.
Edit: I just remembered another issue with them. Sometimes browsers pre-visit these links, which 'uses' them so that by the time the user goes to click on it, it has already expired. (Yes, this is an issue with using GET for these.)
It's way easier in elisp than in racket, but, the idea is to write out the function + the closure variables to disk.
The hard part is that you'd have to fix up object references. (If two closures both capture x, and x is a hash table, then the values of x for both closures should be the same hash table after a reboot.) But it's doable.
And of course, if it's possible to write the closures to disk, that means you can write them out to a database shared by multiple Arc webservers. As long as the state is also shared (perhaps the values of the variables can also be stored in the database?) then this means the technique can horizontally scale, just like any other.
I spent like a year trying to brainstorm ways of showing that Arc can go toe-to-toe with any of the popular frameworks, with no downsides. "Reboots wipe the closures" implies "Arc can't scale horizontally," which would be a serious limitation in a corporate setting. But in principle I think you could write closures to disk.
That would also result in a funny situation: if closures persist forever, it means that a closure could potentially be activated years after it was first stored. So it'll run with years-old code, rather than the latest version. :) But if people are using global names for functions, then it'll just call the latest versions of those functions, which will probably work fine in most cases.
I know I've seen other work on serializing closures, but it was years ago and it just left me with the impression "hard". Maybe one could get a subset of it working nicely for the cases that an application like HN needs.
Some arc code I wrote back in the day, for generating textual descriptions from structured data using a web frontend, is still in production at a previous company of mine (as far as I know). It was indeed a useful tool.
It's a shame arc didn't have persistent data structures (besides alists :-) and a native hashmap type though.
I'm sure somebody has written an ADVENTURE front-end for a LISP debugger.
You are in a CONS cell. The there is a CAR to the left, leading off to another CONS, and a CDR to the right, containing NIL. The garbage collector briefly enters the CONS, marks it, quickly glances at the NIL in the CDR, smiles quietly to itself, and looking relieved, hurriedly sweeps away through the CAR.
>GIVE CONS TO CDR
You give the CONS cell to its CDR, creating a circular list.
I am ready to be corrected but I'm a fairly sure there have been a few other continuation-based web frameworks, Seaside in Smalltalk was the one that made the idea popular if I recall correctly. "Href considered harmful" comes from that.
problem with those frameworks that "just works" is that they get old pretty darn fast.
it handles all your sql and auth cookies? too bad now there's an easy way every script kiddie can now login as admin or guess cookies without the security header du jour.
It's all nice and all when it's being actively updated. But arc, rails, phoenix-ecto, node/react, drupal, spring, etc it all get old pretty soon when the core maintainers loose focus, and then instead of just keeping an eye on the latest best practices and implementing it yourself, you have to dive deep down into years of feature creep and bad coding practices to do the little thing you need to keep things going.
i think seaside (a smalltalk web framework) pioneered the continuation-based technique, though it could have been around earlier. http://www.seaside.st/
The main issue that arc solves is that it gives you a full pipeline for managing "objects with properties" via the web. It's so flexible. I wrote a thread on how we're using it in production to manage our TPUs: https://twitter.com/theshawwn/status/1247570883306119170