Amazon States Language – A JSON-based language to describe state machines

termie · on Dec 3, 2016

"The Step Functions free tier includes 4,000 free state transitions per month." .. "$0.025 per 1,000 state transitions thereafter"

So Amazon has created a new calling convention, where conditional logic now also requires a context switch and JSON serialization. Then they charge you for the call .. $tdcall?

This will certainly have some useful applications, maybe someone will build an inexpensive data processing pipeline on top of it. Having seen many visual workflow tools through the years, most are simplifying complex underlying process, but these step functions and state transitions are modeling basic internal control flow with complex abstractions and little benefit other than retries with backoff.

The pure functional immutable nature of the data flow is ideally nice but tainted by the JSON. The parallelism is interesting, but it seems bolted on instead of a more powerful central part of the design.

mSparks · on Dec 3, 2016

wait, what..!!

is that entire system state to state or per state?

i.e. if i have 100 states each transitioning 365 times bootstrapped a thousand times

they'll charge nearly a million dollars or nearly a thousand?

noting that takes like a minute on my desktop.

DigitalJack · on Dec 3, 2016

Why do something like this in JSON? I don't work in javascript, but surely it would have been far simpler to use. Restrict it to a subset if you have to, but this is absurd:

  "ChoiceStateX": {
  "Type" : "Choice",
  "Choices": [
    {
        "Not": {
          "Variable": "$.type",
          "StringEquals": "Private"
        },
        "Next": "Public"
    },
    {
      "And": [
        {
          "Variable": "$.value",
          "NumericGreaterThanEquals": 20
        },
        {
          "Variable": "$.value",
          "NumericLessThan": 30
        }
      ],
      "Next": "ValueInTwenties"
    }
  ],
  "Default": "DefaultState"
  }

murukesh_s · on Dec 3, 2016

It looks like they want to make a User interface on top of it. It is very difficult to make a user interface that reads and writes from a regular programming language as there are multiple ways to write the same expression (with space, without space, with new line etc).

murukesh_s · on Dec 3, 2016

Looks like it's already there - https://aws.amazon.com/step-functions/

This is the underlying spec of Amazon step functions.

mSparks · on Dec 3, 2016

18 years late.

https://en.m.wikipedia.org/wiki/Dia_(software)

murukesh_s · on Dec 3, 2016

Not at all related. Amazon step functions are executable state machines, while Dia is just a general-purpose diagramming tool.

mSparks · on Dec 3, 2016

dia creates xml almost identical in nature to this json spec.

and if you need json, plenty of xml to json convertors knocking around.

mSparks · on Dec 3, 2016

competely agree.

its one thing to serialise your states to json. quite another to try and recreate javascript or any other low permission language in it.

I know _why_ they did it, and i bet a fair few pure managerial types fall for the bait and commission a project or thousand on it, locking themselves in to amazons absurd pricing model for production systems. But everyones got to make a buck eh.

lokedhs · on Dec 4, 2016

Do you remember when XML was all the rage, and we had silly implementations of pretty much everything using XML? For no good reason other than the fact that it was XML. Then everybody moved to JSON because it was simpler.

Well, it seems history is being repeated. Just because something exists that a lot of people use doesn't mean you need to build everything on top of it.

mSparks · on Dec 4, 2016

Yeah, we still have a few critical components that serialise to xml rather than json, although they have mostly been replaced.

What's new this time is the pricing model. It used to be we could run what we wanted on our own hardware on our own time. SaaS is new, mostly impossible to pirate (hacking AWS keys aside), and priced to make your eyes water.

junke · on Dec 3, 2016

Because Javascript doesn't have macros.

junke · on Dec 3, 2016

If you squint your eyes, it really looks like a cond:

    (cond 
      ((not (eq (type $) :private)) :public)
      ((<= 20 (value $) 29) :value-in-twenties)
      (t :default-state))

ChrisRus · on Dec 3, 2016

I have been working on a moon-shot project for more than a decade that seeks to model distributed systems as if they were digital logic circuits. One of the natural areas of interest is of course FSM's - in particular declarative specifications of FSM's. I continue to believe (although it's a wildly unpopular notion among some of my machine learning friends) that this obsession with the mechanics of declaring design intent with models and then synthesizing runtime code will fundamentally transform software engineering over time. But, declarative FSM's aren't enough to make it practical IMHO.

I wrote a short essay about this work in which I argue that the software engineering community needs to embrace the design methodologies and rigour of hardware designers: https://medium.com/@alpinelakes/on-monday-i-learned-i-got-ac...

See also: http://blog.encapsule.org/early-encapsule-project-history/20... (old code but same ideas as what I'm building now in Node.js/HTML5 @Encapsule).

Several things I believe are actually essential to make use of any of these ideas at scale:

- There needs to be an ad-hoc extensible standard for notating serialized data with markers, tags, semantics, metadata (whatever you care to call it). It is not practical to do unsupervised feature extraction on internal message streams. And, it's _insanity_ to write/test/maintain custom validation/normalization logic.

- Given the above, FSM declarations must be encoded with labels (as above) so that generic code can easily affect interop.

- Small FSM's are reasonably easy to comprehend. But, very few systems can be modeled with simple FSM's. Rather, real systems can be modeled as complex directed graphs where edges represent the flow of observable state from one FSM to another (vertices represent individual FSM).

- Given that real systems can be modeled using non-trivial graph models of FSM (as above), building reusable components by splicing and dicing the graph up is logically possible. But, this is not something that mortals can do by hand. Considerable tooling is required to make it practical to design systems like this.

If you're interested in these topics, and want to help, look me up @Encapsule.

jnwatson · on Dec 3, 2016

You're about 30 years late. The programming with visual state machines started with Harel and Booch and led to UML (see https://en.wikipedia.org/wiki/Shlaer%E2%80%93Mellor_method#/...). Executable UML (xUML) is a thing. It just never caught on.

parasubvert · on Dec 3, 2016

UML Statecharts are definitely an interesting model , being HFSMs, and thus modular/extensible. But I think there is a lot to be said about context and application - this kind of thing needs to have a revolutionary 'right place, right time' opportunity.

Kind of like Hypertext and Hypermedia, which languished as an academic pipe dream for decades with occasional commercial moments of brilliance (Hypercard), until Tim Berners Lee figured out the right mix.

mSparks · on Dec 3, 2016

i have a project dating back around maybe ten or 15 years that could read the text format (xml iirc) of dia drawings and execute them.

the main problems included being actually very hard to understand visually, very ugly, and subject to all sorts of edge case errors when running. basically easier to write code then compile uml from that for anyone crazy enough to want it.

the crux of the problems is verbosity. Once systems start to get to a reasonable level of complexity the uml diagrams can cover the walls of a large room.

vs 1 page of a4 for pseudocode.

parasubvert · on Dec 3, 2016

I tend to agree with you on this, my focus has been on the application of declarative Hierarchical FSMs, Behaviour Trees, and/or Hierarchical Task Networks (using automated planning) as a way of describing FSMs declaratively and yet handling the state explosion problem. The intent is to enabling better integration and interoperability on the Web - basically, getting rid of Twitter, Facebook, etc. centralized monopolies of "write" functionality on the Web. ("Read" functionality, i.e. web crawling / HTTP GET / Google is also monopolized but that's less due to architectural problems and more economic)

I have given a couple of keynotes on this topic at the W3C and RESTfest over the years, but just haven't done a lot of the grunt work since I have a day job.

See - http://www.slideshare.net/StuC/ill-see-you-on-the-write-side...

Also - http://www.slideshare.net/StuC/linking-data-and-actions-on-t...

And per your point about how you need serialized data notation + FSMs, see http://web.archive.org/web/20160410102032/http://www.stuchar...

I have felt this train of thought could be useful for a general purpose approach to software engineering beyond distributed systems interop. Unfortunately this has been a hobby horse of mine for about 10 years that I don't have a lot of time to dedicate to....

ChrisRus · on Dec 3, 2016

Cool! Thanks for the links and the reply.

> I have felt this train of thought could be useful for a general purpose approach to software engineering beyond distributed systems interop

One of the best articles I've read in recent years on the topic is 'On the Industrial Adoption of Model Driven Engineering. Is your company ready for MDE?': http://www.uajournals.com/ijisebc/journal/1/4.pdf

> Unfortunately this has been a hobby horse of mine for about 10 years that I don't have a lot of time to dedicate to....

It's a fun horse to ride if not a bit of a wild and tiring.

lacampbell · on Dec 3, 2016

A new market to disrupt - undergraduate computer science projects as a service.

Wait until you guys check out my hashtable implementation in the cloud.

delbel · on Dec 3, 2016

By chance you want to buy some bubble sort for $5/month? that's less then .005 per sort. I also have some 24 bit IEEE floating points as a service (IEEEFPaSS), on sale as well.

automatwon · on Dec 3, 2016

Is a KeyValue database not a hashtable in the cloud?

dom0 · on Dec 3, 2016

Hashtables are by far not the only way to associate a number of keys with respective values. They have some nice properties and some rather ugly properties (memory usage, growing/shrinking, iteration, seeding / table poisoning). Most RDBMS (for example) tend to use trees instead for indices. And then there's the whole category of tries.

novembermike · on Dec 3, 2016

Yeah, but most of the large key-value stores are distributed hash tables.

paulddraper · on Dec 4, 2016

So... DynamoDB?

michaelsbradley · on Dec 3, 2016

See also SCXML and SCION:

https://www.w3.org/TR/scxml/

https://github.com/jbeard4/SCION

And for background, see the pioneering work of Dr. David Harel:

A list of all of his papers

http://www.wisdom.weizmann.ac.il/~harel/papers.html

A few on Statecharts

http://www.wisdom.weizmann.ac.il/~harel/SCANNED.PAPERS/Seman...

http://www.wisdom.weizmann.ac.il/~harel/reactive_systems.htm...

http://www.wisdom.weizmann.ac.il/~harel/papers/RhapsodySeman...

http://www.wisdom.weizmann.ac.il/~harel/papers/Statecharts.H...

Prof. Harel the dreamer...

http://www.wisdom.weizmann.ac.il/~harel/papers/LiberatingPro...

atombender · on Dec 3, 2016

Interesting that they're using JSONPath, which isn't even specified formally anywhere. The only other major implementor that I know about is Kubernetes, which has some odd extensions for templating. (JSONPath itself, of course, isn't very well designed in the first place.)

We're working on a new variant of JSONPath that we're hoping to publish as a formal, comprehensive specification. It's essentially a superset of JSONPath with some syntax warts fixed (like the need to start with $). I wrote a little about it on HN a week ago [1].

[1] https://news.ycombinator.com/item?id=13032391

niftich · on Dec 3, 2016

Perhaps the most formally specified JSON-addressing dense declarative syntax is JSON Pointer (RFC 6901), but it's very limited: it only has exact index selectors, an end-of-array selector, and an exact object name selector. Still, given how JSON-Patch (RFC 6902) depends on it, it may be worthwhile to pursue a notation that extends it formally.

devj · on Dec 3, 2016

IBM JSONata is another open source alternative. Check it out: https://developer.ibm.com/open/jsonata/

murukesh_s · on Dec 3, 2016

JSONata - weird name but seems like a good alternative to XPath.

JonnieCache · on Dec 3, 2016

Very nice. Xpath is the saving grace of dealing with XML, so I'll definitely keep an eye on this.

Marazan · on Dec 3, 2016

Remember when we used XML, and then people started making DSLs in XML and XML was the worst and too 'heavy' and having to write schemas was enterprise and awful and we are totes using lightweight schema free sexy JSON now?

I wonder what the new thing to replace enterprise JSON will be.

aindhaden · on Dec 3, 2016

It's still going to be XML. JSON is fairly limited, it's literally the serialization format for JavaScript variables, and that one size does not fit all.

Marazan · on Dec 3, 2016

I'm totally fine with XML. I'm just amazed at the cognitive dissonance anti-Xml pro-Json peeps are starting to display.

supergreg · on Dec 3, 2016

Yaml looks like the perfect candidate

allengeorge · on Dec 3, 2016

Isn't it already? After all - YAML in Ansible now includes some control structures no? (Or am I misremembering that?)

DCoder · on Dec 4, 2016

Yes, Ansible supports conditionals and loops in its playbooks. Ansible 2.0 also introduced blocks, which are effectively a try-catch-finally in disguise.

tonylucas · on Dec 3, 2016

This is interesting, has a lot of similarities (not surprisingly) with how the state machine we build workflows on our bot platform, although of course ours is specifically fixed around chat/messages as a key interaction point.

Major differences that I can see are we enable multiple functions to be sent per state, and that the output data from any state is referenceable by any other state, not just passing it down in turn through the states.

We support fallback states but in a different way, and don't support the retry concept directly within the state language itself, has to be built as a set of states to perform a loop to attempt a retry.

We don't support parallel stages, but do support branches, and remerging of those branches.

Probably the final difference I can see, is one of our options when running a function allows you to actually append additional states to the machine during the runtime process.

Swizec · on Dec 3, 2016

I did it better 5 years ago. https://swizec.com/blog/a-turing-machine-in-133-bytes-of-jav...

133 byte interpreter in JavaScript. Input is JSON specifying state name, write, move direction, and next state. Turing machines, basically.

Mine was for fun, but why is Amazon doing this?

slowmovintarget · on Dec 3, 2016

People needed a state engine for sewing Lambda functions together. For example, try doing retry with exponential back-off in Lambda. You quickly run into a number of problems that are difficult without an execution context outside the Lambda itself.

Step Functions give you this external context for doing retry, conditional trigger of downstream functions, parallel trigger of additional functions and more. Execution time of a state machine can last for up to a year, so this also gives you a way to do more than 5 minutes of work at a time.

Swizec · on Dec 3, 2016

That's cool.

But is there no existing language that can be used to describe state machines?

slowmovintarget · on Dec 3, 2016

:)

They needed a syntax that was easy to transform into usage of other Amazon resources. I'm guessing JSON was by far the most straightforward for them, not to mention that they've been using the system themselves for quite a while. But I'm just guessing.

automatwon · on Dec 3, 2016

⊆ {"Money", "Power", "Respect"}

nighthawk454 · on Dec 2, 2016

Some more info on Tim Bray's site: https://www.tbray.org/ongoing/When/201x/2016/12/01/J2119-Val...

timbray · on Dec 3, 2016

Glad to hear that so many people already did this (only better), years ago. Nice to have company.

The only thing new or interesting about States is it has a product behind it that implements it at scale, available now; give it a try.

I think it's quite likely that this syntax is state-machine assembler, and smart people will find nicer expressions of this and compile them down.

In particular, some people prefer dependency graphs to explicit state machines for this sort of thing.

trickyager · on Dec 3, 2016

Amazon has also created a ruby gem to lint JSON state machines. https://github.com/awslabs/statelint

TeMPOraL · on Dec 3, 2016

Wait, did they just made a Lisp in JSON?

c3534l · on Dec 3, 2016

You wait, Google is going to create one in XML, then Apple is going to invent a cool minimalistic pseudo-Lisp that doesn't require any brackets or colons. Then Windows is going to try to create one that only runs on .NET. Finally, someone will create a format that to be read by all of them, and someone is going to implement another LISP in that format.

seanp2k2 · on Dec 3, 2016

The Apple one attempts to use natural language, the Google one gets discontinued after it becomes somewhat well-known with a cult following, and Microsoft tries to push a competitor until after the Google one fails, at which point they adopt the standard even though their platform store is filled with junkware and no one cares anymore.

grogenaut · on Dec 3, 2016

Meanwhile the aws service is still there plugging away, like simple db, because core infra in aws was built on it and they don't want to anger customers.

absrnd · on Dec 3, 2016

Greenspun's tenth rule?

tree_of_item · on Dec 3, 2016

No, what makes you say that?

hacker_9 · on Dec 3, 2016

"This document describes a JSON-based language used to describe state machines declaratively. The state machines thus defined may be executed by software. In this document, the software is referred to as “the interpreter”."

Oh no.

snoman · on Dec 3, 2016

Everything old is new again.

So, now that it is happening to me, am I allowed to apologize to the numerous old-coders that tried to tell me this when I was coming up?

jessep · on Dec 3, 2016

If you've seen this before, I'd love to know what came next last time. How did this evolve and what were the issues with it?

jfoutz · on Dec 3, 2016

Tooling. IDEs and special languages to generate the files for the state transitions, and the body of the lambdas themselves. (reading this kind of made me want to go write a little haskell dsl)

If they're a little undisciplined, they'll probably add stuff to implement counting and comparison directly, to put a hard limit on loops.

I'd also guess an addition of a couple special tasks, perhaps append to log in s3 bucket and continue, that perhaps come with a discount.

There are about 3 million old flowchart tools out there, any feature you see tacked on is a candidate.

Tangentially, other organizations will be inspired by this, and implement their own language in json, but this time they'll do it "the right way" then you'll get a working group to try to reconcile all the competing standards.

Or maybe not. kind of what happened with XML though.

edit

ah, here you go. https://www.w3.org/TR/scxml/ that stuff.

ChuckMcM · on Dec 3, 2016

Yes you must :-)

It would be fun to implement Zork with this system.

coredog64 · on Dec 3, 2016

You are in a twisty little maze of [standards], all alike.

slowmovintarget · on Dec 3, 2016

In explaining the choice of JSON as the description language for state machines Tim Bray briefly said, "I couldn't find a good reason not to express it as JSON and in this day and age, you need a good reason for it to not be JSON."

He also mentioned that because it was a formally specified syntax, you could, should you choose to, build other more convenient syntaxes that reduce to it. It won't surprise me to see that happen fairly quickly.

jonstewart · on Dec 3, 2016

Last one to implement x86 on this is a rotten egg.

Rapzid · on Dec 3, 2016

He should talk with the Cloud Formation team about their reasons for supporting(it's a full shift in reality) YAML.

JonnieCache · on Dec 3, 2016

Fun Fact: json is a subset of yaml.

haimez · on Dec 3, 2016

Everything at Amazon is like this. You should try to use cloud formation.

openasocket · on Dec 3, 2016

As verbose as cloud formation is, I really like it. You can represent the entire state of your architecture in a single JSON file: VPCs, EC2 instances, Elasicache clusters, security groups, IAM roles, everything. Updates to the cloud formation template are atomic and can be rolled back, saves you the trouble of writing a long runbook to install your system, and your template can be kept in version control.

You want to talk weirdly over-engineered, check out SWF

jeletonskelly · on Dec 3, 2016

We break our cloud formation stacks up and reference the outputs of those stacks in consuming stacks. You have a VPC, IAM, IAM policy, security group, network acl, and then your application stacks. So, if you have an application that references a security group from it's respective stack and you want to change those rules, you just update the resource in the security group stack and the application stack never needs to get updated. (edit: by separating stacks I mean that they live in different templates, not just different stacks in one file)

openasocket · on Dec 3, 2016

I don't quite see the advantage. If you update a stack it will only do the minimal required changes. So if you have an EC2 instance and a security group defined in one stack and you update the security group the EC2 instance won't be affected, it's not like it would be shut down and an identical one would be spun up.

What does breaking it up like that give you?

jeletonskelly · on Dec 6, 2016

Late reply, but mainly shared resources like security groups (like a specific security group for NTP or HTTP/S), IAM roles, DB's, Kinesis streams, etc. The other reason being the limit on template length. You can't spec out a 3-tier VPC with network ACLs and security groups in a single template.

jonaf · on Dec 3, 2016

SWF is awesome, but it can feel very abstract for most use-cases. In light of this, one of my good colleagues wrote a library called Super Simple Worfklow. It provides the level of simplicity that one would have expected from SWF to begin with. We opened sourced it at my company, check it out[1]. We use this library for production services.

(Full disclosure: I work at Bazaarvoice, and my colleague does, too!)

[1]: https://github.com/bazaarvoice/super-simple-workflow

rurounijones · on Dec 3, 2016

The user interface of SWF looks like it is the most neglected thing in AWS. Faar too much whitespace and crappy layout makes navigating it a real PITA.

The Step Functions UI on the other hand is really rather intuitive.

They really need to improve the SWF UI

timbray · on Dec 3, 2016

BTW, the Step Functions product that interprets this uses the SWF bsck-end.

muyfine · on Dec 4, 2016

Nice! SWF is sadly an under-utilized service outside of Amazon - great to see more accessible abstractions being built on top of it. We wrote a simple abstraction around SWF to make is easier to work with:

https://github.com/swift-nav/wolf

Excited about the potential of using lambda functions with SWF - we have workflows with predominantly idle workers that would greatly benefit from it!

ape4 · on Dec 3, 2016

So JSON has officially Jumped The Shark (like XML)

plandis · on Dec 3, 2016

Seems like a decent way to glue AWS Lambda functions together?

Why: "oh no"?

ajkjk · on Dec 3, 2016

devj · on Dec 3, 2016

Wondering why they aren't using their own data format(Amazon Ion - https://amznlabs.github.io/ion-docs/spec.html) instead of JSON.

rurounijones · on Dec 3, 2016

Nothing here require ION as far as I can see

flaviuspopan · on Dec 2, 2016

This strikes me as the underlying spec for the new Batch service mentioned at re:invent.

slowmovintarget · on Dec 3, 2016

It's the underlying spec for the JSON description of state machines in AWS Step Functions.

https://aws.amazon.com/step-functions/

flaviuspopan · on Dec 3, 2016

.....and just when I thought I've seen all the new services, haha. Thank you.

neurostimulant · on Dec 3, 2016

This could be useful. Currently there is no easy way to add delay when invoking lambdas from an event. Perhaps this could be used to create a delayed execution triggers (e.g. Invoking a lambda 10 minutes after a dynamodb entries updated).

avodonosov · on Dec 3, 2016

JSON? Somebody forbade s-expressions?

TeMPOraL · on Dec 3, 2016

For some weird reason people these days believe curly brackets are the the best brackets.

junke · on Dec 3, 2016

It seems some people would rather die than use parenthesis and acknowledge that those damn lispers had a good idea, after all. When curly braces fall out of fashion, what will come next?

I vote for "263D first quarter moon" and "263E last quarter moon", which cannot be displayed here.

https://en.wikibooks.org/wiki/Unicode/List_of_useful_symbols...

avodonosov · on Dec 3, 2016

For me it's not curly braces vs parenthesis, but about unnecessary commas and quotes in JSON. I would rather use EDN than JSON (https://github.com/edn-format/edn) - no unnecessary noise. I regard EDN as a kind of S-expressions.

rhizome · on Dec 3, 2016

JATEOAS

bunderbunder · on Dec 3, 2016

Context: https://spring.io/understanding/HATEOAS

swehner · on Dec 3, 2016

Doesn't look too pretty, ugly in parts. Something strange about it. The ARN business looks very suspicious.

Maybe this is supposed to be pre-alpha?

Consistent with the Tim Bray blog article saying the validator's "implementation is kind of gross."

But then again: stay out of trouble, avoid Amazon.

shoefly · on Dec 3, 2016

Looks like any other brittle FSM. It's missing an important layer of logic. And yeah, I'm bias because I'm finishing up an FSM that crushes that brittleness.

rspeer · on Dec 3, 2016

That doesn't tell us very much, and the word you mean is "biased".

elcct · on Dec 3, 2016

Language? That's bold

bbcbasic · on Dec 3, 2016

Bah! Why JSON not XML?

RandomOpinion · on Dec 3, 2016

Already exists, it's called SCXML and is a W3C standard.

https://www.w3.org/TR/scxml/

danielpatrick · on Dec 3, 2016

... what is a single reason to use XML over JSON beyond legacy purposes?

bbcbasic · on Dec 3, 2016

Schema. Validation.

jack9 · on Dec 3, 2016

JSON Schema validation exists (e.g. the IAB standards).

joe-user · on Dec 3, 2016

Extensibility.

chajath · on Dec 3, 2016

Job security