"The Step Functions free tier includes 4,000 free state transitions per month." .. "$0.025 per 1,000 state transitions thereafter"
So Amazon has created a new calling convention, where conditional logic now also requires a context switch and JSON serialization. Then they charge you for the call .. $tdcall?
This will certainly have some useful applications, maybe someone will build an inexpensive data processing pipeline on top of it. Having seen many visual workflow tools through the years, most are simplifying complex underlying process, but these step functions and state transitions are modeling basic internal control flow with complex abstractions and little benefit other than retries with backoff.
The pure functional immutable nature of the data flow is ideally nice but tainted by the JSON. The parallelism is interesting, but it seems bolted on instead of a more powerful central part of the design.
Why do something like this in JSON? I don't work in javascript, but surely it would have been far simpler to use. Restrict it to a subset if you have to, but this is absurd:
It looks like they want to make a User interface on top of it. It is very difficult to make a user interface that reads and writes from a regular programming language as there are multiple ways to write the same expression (with space, without space, with new line etc).
its one thing to serialise your states to json. quite another to try and recreate javascript or any other low permission language in it.
I know _why_ they did it, and i bet a fair few pure managerial types fall for the bait and commission a project or thousand on it, locking themselves in to amazons absurd pricing model for production systems. But everyones got to make a buck eh.
Do you remember when XML was all the rage, and we had silly implementations of pretty much everything using XML? For no good reason other than the fact that it was XML. Then everybody moved to JSON because it was simpler.
Well, it seems history is being repeated. Just because something exists that a lot of people use doesn't mean you need to build everything on top of it.
Yeah, we still have a few critical components that serialise to xml rather than json, although they have mostly been replaced.
What's new this time is the pricing model. It used to be we could run what we wanted on our own hardware on our own time. SaaS is new, mostly impossible to pirate (hacking AWS keys aside), and priced to make your eyes water.
I have been working on a moon-shot project for more than a decade that seeks to model distributed systems as if they were digital logic circuits. One of the natural areas of interest is of course FSM's - in particular declarative specifications of FSM's. I continue to believe (although it's a wildly unpopular notion among some of my machine learning friends) that this obsession with the mechanics of declaring design intent with models and then synthesizing runtime code will fundamentally transform software engineering over time. But, declarative FSM's aren't enough to make it practical IMHO.
Several things I believe are actually essential to make use of any of these ideas at scale:
- There needs to be an ad-hoc extensible standard for notating serialized data with markers, tags, semantics, metadata (whatever you care to call it). It is not practical to do unsupervised feature extraction on internal message streams. And, it's _insanity_ to write/test/maintain custom validation/normalization logic.
- Given the above, FSM declarations must be encoded with labels (as above) so that generic code can easily affect interop.
- Small FSM's are reasonably easy to comprehend. But, very few systems can be modeled with simple FSM's. Rather, real systems can be modeled as complex directed graphs where edges represent the flow of observable state from one FSM to another (vertices represent individual FSM).
- Given that real systems can be modeled using non-trivial graph models of FSM (as above), building reusable components by splicing and dicing the graph up is logically possible. But, this is not something that mortals can do by hand. Considerable tooling is required to make it practical to design systems like this.
If you're interested in these topics, and want to help, look me up @Encapsule.
UML Statecharts are definitely an interesting model , being HFSMs, and thus modular/extensible. But I think there is a lot to be said about context and application - this kind of thing needs to have a revolutionary 'right place, right time' opportunity.
Kind of like Hypertext and Hypermedia, which languished as an academic pipe dream for decades with occasional commercial moments of brilliance (Hypercard), until Tim Berners Lee figured out the right mix.
i have a project dating back around maybe ten or 15 years that could read the text format (xml iirc) of dia drawings and execute them.
the main problems included being actually very hard to understand visually, very ugly, and subject to all sorts of edge case errors when running. basically easier to write code then compile uml from that for anyone crazy enough to want it.
the crux of the problems is verbosity. Once systems start to get to a reasonable level of complexity the uml diagrams can cover the walls of a large room.
I tend to agree with you on this, my focus has been on the application of declarative Hierarchical FSMs, Behaviour Trees, and/or Hierarchical Task Networks (using automated planning) as a way of describing FSMs declaratively and yet handling the state explosion problem. The intent is to enabling better integration and interoperability on the Web - basically, getting rid of Twitter, Facebook, etc. centralized monopolies of "write" functionality on the Web. ("Read" functionality, i.e. web crawling / HTTP GET / Google is also monopolized but that's less due to architectural problems and more economic)
I have given a couple of keynotes on this topic at the W3C and RESTfest over the years, but just haven't done a lot of the grunt work since I have a day job.
I have felt this train of thought could be useful for a general purpose approach to software engineering beyond distributed systems interop. Unfortunately this has been a hobby horse of mine for about 10 years that I don't have a lot of time to dedicate to....
> I have felt this train of thought could be useful for a general purpose approach to software engineering beyond distributed systems interop
One of the best articles I've read in recent years on the topic is 'On the Industrial Adoption of Model
Driven Engineering. Is your company
ready for MDE?': http://www.uajournals.com/ijisebc/journal/1/4.pdf
> Unfortunately this has been a hobby horse of mine for about 10 years that I don't have a lot of time to dedicate to....
It's a fun horse to ride if not a bit of a wild and tiring.
By chance you want to buy some bubble sort for $5/month? that's less then .005 per sort. I also have some 24 bit IEEE floating points as a service (IEEEFPaSS), on sale as well.
Hashtables are by far not the only way to associate a number of keys with respective values. They have some nice properties and some rather ugly properties (memory usage, growing/shrinking, iteration, seeding / table poisoning). Most RDBMS (for example) tend to use trees instead for indices. And then there's the whole category of tries.
Interesting that they're using JSONPath, which isn't even specified formally anywhere. The only other major implementor that I know about is Kubernetes, which has some odd extensions for templating. (JSONPath itself, of course, isn't very well designed in the first place.)
We're working on a new variant of JSONPath that we're hoping to publish as a formal, comprehensive specification. It's essentially a superset of JSONPath with some syntax warts fixed (like the need to start with $). I wrote a little about it on HN a week ago [1].
Perhaps the most formally specified JSON-addressing dense declarative syntax is JSON Pointer (RFC 6901), but it's very limited: it only has exact index selectors, an end-of-array selector, and an exact object name selector. Still, given how JSON-Patch (RFC 6902) depends on it, it may be worthwhile to pursue a notation that extends it formally.
Remember when we used XML, and then people started making DSLs in XML and XML was the worst and too 'heavy' and having to write schemas was enterprise and awful and we are totes using lightweight schema free sexy JSON now?
I wonder what the new thing to replace enterprise JSON will be.
It's still going to be XML. JSON is fairly limited, it's literally the serialization format for JavaScript variables, and that one size does not fit all.
Yes, Ansible supports conditionals and loops in its playbooks. Ansible 2.0 also introduced blocks, which are effectively a try-catch-finally in disguise.
This is interesting, has a lot of similarities (not surprisingly) with how the state machine we build workflows on our bot platform, although of course ours is specifically fixed around chat/messages as a key interaction point.
Major differences that I can see are we enable multiple functions to be sent per state, and that the output data from any state is referenceable by any other state, not just passing it down in turn through the states.
We support fallback states but in a different way, and don't support the retry concept directly within the state language itself, has to be built as a set of states to perform a loop to attempt a retry.
We don't support parallel stages, but do support branches, and remerging of those branches.
Probably the final difference I can see, is one of our options when running a function allows you to actually append additional states to the machine during the runtime process.
People needed a state engine for sewing Lambda functions together. For example, try doing retry with exponential back-off in Lambda. You quickly run into a number of problems that are difficult without an execution context outside the Lambda itself.
Step Functions give you this external context for doing retry, conditional trigger of downstream functions, parallel trigger of additional functions and more. Execution time of a state machine can last for up to a year, so this also gives you a way to do more than 5 minutes of work at a time.
They needed a syntax that was easy to transform into usage of other Amazon resources. I'm guessing JSON was by far the most straightforward for them, not to mention that they've been using the system themselves for quite a while. But I'm just guessing.
You wait, Google is going to create one in XML, then Apple is going to invent a cool minimalistic pseudo-Lisp that doesn't require any brackets or colons. Then Windows is going to try to create one that only runs on .NET. Finally, someone will create a format that to be read by all of them, and someone is going to implement another LISP in that format.
The Apple one attempts to use natural language, the Google one gets discontinued after it becomes somewhat well-known with a cult following, and Microsoft tries to push a competitor until after the Google one fails, at which point they adopt the standard even though their platform store is filled with junkware and no one cares anymore.
Meanwhile the aws service is still there plugging away, like simple db, because core infra in aws was built on it and they don't want to anger customers.
"This document describes a JSON-based language used to describe state machines declaratively. The state machines thus defined may be executed by software. In this document, the software is referred to as “the interpreter”."
Tooling. IDEs and special languages to generate the files for the state transitions, and the body of the lambdas themselves. (reading this kind of made me want to go write a little haskell dsl)
If they're a little undisciplined, they'll probably add stuff to implement counting and comparison directly, to put a hard limit on loops.
I'd also guess an addition of a couple special tasks, perhaps append to log in s3 bucket and continue, that perhaps come with a discount.
There are about 3 million old flowchart tools out there, any feature you see tacked on is a candidate.
Tangentially, other organizations will be inspired by this, and implement their own language in json, but this time they'll do it "the right way" then you'll get a working group to try to reconcile all the competing standards.
Or maybe not. kind of what happened with XML though.
In explaining the choice of JSON as the description language for state machines Tim Bray briefly said, "I couldn't find a good reason not to express it as JSON and in this day and age, you need a good reason for it to not be JSON."
He also mentioned that because it was a formally specified syntax, you could, should you choose to, build other more convenient syntaxes that reduce to it. It won't surprise me to see that happen fairly quickly.
As verbose as cloud formation is, I really like it. You can represent the entire state of your architecture in a single JSON file: VPCs, EC2 instances, Elasicache clusters, security groups, IAM roles, everything. Updates to the cloud formation template are atomic and can be rolled back, saves you the trouble of writing a long runbook to install your system, and your template can be kept in version control.
You want to talk weirdly over-engineered, check out SWF
We break our cloud formation stacks up and reference the outputs of those stacks in consuming stacks. You have a VPC, IAM, IAM policy, security group, network acl, and then your application stacks. So, if you have an application that references a security group from it's respective stack and you want to change those rules, you just update the resource in the security group stack and the application stack never needs to get updated. (edit: by separating stacks I mean that they live in different templates, not just different stacks in one file)
I don't quite see the advantage. If you update a stack it will only do the minimal required changes. So if you have an EC2 instance and a security group defined in one stack and you update the security group the EC2 instance won't be affected, it's not like it would be shut down and an identical one would be spun up.
Late reply, but mainly shared resources like security groups (like a specific security group for NTP or HTTP/S), IAM roles, DB's, Kinesis streams, etc. The other reason being the limit on template length. You can't spec out a 3-tier VPC with network ACLs and security groups in a single template.
SWF is awesome, but it can feel very abstract for most use-cases. In light of this, one of my good colleagues wrote a library called Super Simple Worfklow. It provides the level of simplicity that one would have expected from SWF to begin with. We opened sourced it at my company, check it out[1]. We use this library for production services.
(Full disclosure: I work at Bazaarvoice, and my colleague does, too!)
The user interface of SWF looks like it is the most neglected thing in AWS. Faar too much whitespace and crappy layout makes navigating it a real PITA.
The Step Functions UI on the other hand is really rather intuitive.
Nice! SWF is sadly an under-utilized service outside of Amazon - great to see more accessible abstractions being built on top of it. We wrote a simple abstraction around SWF to make is easier to work with:
This could be useful. Currently there is no easy way to add delay when invoking lambdas from an event. Perhaps this could be used to create a delayed execution triggers (e.g. Invoking a lambda 10 minutes after a dynamodb entries updated).
It seems some people would rather die than use parenthesis and acknowledge that those damn lispers had a good idea, after all.
When curly braces fall out of fashion, what will come next?
I vote for "263D first quarter moon" and "263E last quarter moon", which cannot be displayed here.
For me it's not curly braces vs parenthesis, but about unnecessary commas and quotes in JSON. I would rather use EDN than JSON (https://github.com/edn-format/edn) - no unnecessary noise. I regard EDN as a kind of S-expressions.
Looks like any other brittle FSM. It's missing an important layer of logic. And yeah, I'm bias because I'm finishing up an FSM that crushes that brittleness.
So Amazon has created a new calling convention, where conditional logic now also requires a context switch and JSON serialization. Then they charge you for the call .. $tdcall?
This will certainly have some useful applications, maybe someone will build an inexpensive data processing pipeline on top of it. Having seen many visual workflow tools through the years, most are simplifying complex underlying process, but these step functions and state transitions are modeling basic internal control flow with complex abstractions and little benefit other than retries with backoff.
The pure functional immutable nature of the data flow is ideally nice but tainted by the JSON. The parallelism is interesting, but it seems bolted on instead of a more powerful central part of the design.