Workflow languages are programming languages. I'd be really interested to know if 1) there's a textual representation of the visual workflow design that you can easily export/import, and 2) the syntax and semantics of the language. I've seen a _lot_ of visual workflow editors which have a tendency towards limited expressiveness; there seems to be a major disconnect between the worlds of workflow authoring and programming.
I noticed that they provide the ability to specify a Dockerfile which contains the necessary facilities to run arbitrary code. But I can't help but think there has to be a middle ground between the two. I've written about this in the past, arguing that applying concepts from traditional programming language theory (in particular functional programming) to the design of workflow languages can be fruitful.
It's common among workflow languages to represent everything as a graph. What I think is more appropriate is an expression-like approach, where the relationships between the different actions are implicit in the syntax.
For example, if you have a graph like the following:
node A
node B
node C
node D
edge B -> A
edge C -> A
edge D -> B
edge D -> C
you could write that as an expression like this:
A (B D) (C D)
In the action example, the "needs" field is a dependency on the task, or in other words a subexpression. If there are multiple independent expressions that all depend on that, you could use something like let/letrec to assign the result of "Provision Database" to a variable, and then reference that from other expressions.
Basically, the action syntax they have is like writing an abstract syntax tree where every node explicitly lists its dependencies. An expression is a much more compact form and the subexpressions are implicit in the AST produced by the parser; this is the approach used by most programming languages. But for reasons I still don't fully understand, workflow languages (which I consider to be a specific class of programming language) don't seem to adopt this more compact representation.
We've recently been working on a workflow tool for people significantly less technical than Github's users and we started out with an approach somewhat similar to the one you outline. We developed a compact DSL that maps almost 1-1 with the UI we wanted to build and inferred the dependency graph from the preconditions set and data used in each action. The thinking was that there was no sense in explicit ordering since it could only slow down execution and introduce errors and that allowing preconditions would create a declarative way to ensure "after" behavior when necessary.
But what we found in user testing is that our users kept wanting to manually re-order actions and were really uncomfortable with the system just "figuring out" execution order on its own. We had to introduce the concept of UI-only ordering (the backend still tries as hard as possible to execute in parallel without violating the dependency DAG) to give them the illusion of control.
But our users aren't programmers, so Github might have bit more leeway to push complex topics like these onto users.
For non-programmers, the visual approach is very attractive, and provides an intuitive way to display the workflow in a manner that is easy to make sense of.
For programmers (i.e. pretty much everyone that uses GitHub), I think a more DSL-like approach would be appropriate, though this doesn't preclude a visual editor as an alternative. As gbaygon mentioned, they do have a DSL, but I think the approach is suboptimal.
In a project I'm working on at the moment (which extends the work from my thesis), we're targeting two audiences - programmers write the workflows, but "business" people can see a visualisation. Essentially we take the Scheme code that comprises the workflow and render it as a graph, similar to BPMN. I think that's a nice approach when you have people on staff who have the necessary programming skills and are working alongside non-technical people. But that doesn't apply in every situation, so visual design can be useful when you don't have experienced programmers creating the workflows.
I'm an experienced programmer and I want effective visualizations. I strongly believe that domain-specific visualizations are the way forward to real progress, even if we haven't created a practical one yet.
Totally agree. I have come to integrate a live module dependency graph visualization[1] in my JS workflow, it really helps me when apprehending a new project or prototyping ideas.
More flexible representations (at the function scale for instance) would probably help even more.
Visualizing and manipulating should probably be seen as seperate domains though, or at least working on seperate levels of abstraction: effective visualization implies hiding operations; you don’t want to view, or visually edit, your stribg manipulations. But re-ordering your functions called from main() might be more malleable
But even then, how often does such a reordering not constitute detail changes as well?
I think visualizations are much more useful for “reading” code, than for writing it. Which is why visual editors are so appealing: they’re showing off the reading aspect.
Im pretty sure something like the grandparent, or that visual-haskell project whose name I cant remember (where code and visualization are directly equivalent) is the way to go: there’s no paradigm shift to be had here.
Sure, what kinds of questions do you have? We're still pretty early in the process and I think we made a few mistakes that we'll need to correct. But we made a number of choices that have worked out really well despite the fears of some of the team.
A few notables:
- Written in Rust. This caused some early struggles, but has been paying dividends ever since we got it working. The thing is rock solid and easily handles tens of thousands automation runs per second. Also, Pest is awesome and writing complex lexer/parser implementation is really easy these days.
- We're currently triggering automations with HTTP calls, but we want to move to primarily triggering with some sort of work queue.
- We didn't consider aggregates in the first iteration of the product and now we're feeling that pain and looking at solutions.
- Tracking the types of all data through every step of the automations was a lot of work to setup, but is hugely valuable. Being able to suggest what values/operations are available to users in the UI as well as doing AOT type checks when saving automations means a lot fewer errors at runtime.
- How have users taken to the tool? Have they needed ongoing support, or once trained they understand what to do?
- Were alternatives considered? Or was the complexity such that a workflow was the only way users could control this?
- Do people ever manage to design impossible flows?
- Anything about the use-case you're able to say, for where the tool is needed by users (and not just as an easier way for developers to adjust the system)
Sorry I missed your reply...I stopped following the thread. But in case you're still reading, here's some responses:
> How have users taken to the tool? Have they needed ongoing support, or once trained they understand what to do?
It's been a bit of a struggle. Once people understand how to use the UI, they go to town and get a lot of value out of it. But we've found it's not approachable and basically requires us to teach them how to use it. We're continuing to experiment with it. The good part is that everything we're trying is supported by the underlying DSL and workflow engine and we really haven't had to make more than a couple of tweaks to that.
> Were alternatives considered? Or was the complexity such that a workflow was the only way users could control this?
We looked into off-the-shelf options, but we didn't think they'd give us the level of control we wanted to build a product around it. As mentioned above, the hardest part is the UI, and if we're building this as a product, we need to build that anyways.
> Do people ever manage to design impossible flows?
No, that's impossible through the UI. Since we're tracking the types of all data throughout the execution of the flow, we're able to analyze the flow statically before it's saved to the database and give users an error. But they basically can't even get that because our UI prevents them from choosing illegal values or setting up infinite dependency chains.
> Anything about the use-case you're able to say, for where the tool is needed by users
It's designed to be kinda like Zapier, but for a much more specific audience who are generally less technically adventurous. In talking with these users, many of whom use Zapier, we've identified that they find it difficult to use and not really suited to their use case, so we're hoping that something that's purpose built for that use case will make their lives easier and convince them to switch.
That's exactly what I've explored in a Haskell project: DepTrack.
https://github.com/lucasdicioccio/deptrack-project basically it decorates expressions like in your example to collect actions. It's strongly-typed. As a result, with a bit of Haskell type-system knowledge you can easily enforce invariants like "I won't run a command if you don't install it before" and "if you don't tunnel/proxy a given service then the config will not compile".
There are other niceties that the Haskell type system gives you by playing with the underlying effects (e.g., forbidding IOs enforces that a same config always gives the same result, using a List allows to cleanly handle heterogeneous-platforms concerns) but these are advanced topics.
Most workflows are a DAG[1], not a graph, this makes them representable as Tuple[List[Step], List[Tuple[Step, List[Step]]]. In other words, (List[Step], Map[Step, Dependencies]), so your example could be
which is clearer than the graph representation. Notably, your syntax also assumes a DAG, it can't represent a full graph, so the graph syntax is more "powerful". Though unnecessarily so.
The expression-y representation doesn't scale well. If you consider that workflows are mostly linear, but have branches, the syntax you provided gets ugly fast.
This is one of those weird things that is very much not obvious without hindsight, but try describing a workflow with a critical path of length 10 or 15, and some subchains that are mutual but not exactly the same. Formatting the expression based form you suggest quickly becomes a bit of a nightmare. In the extreme, consider representing a git commit graph, which is also a dag, in the various syntaxes proposed. Then consider trying to modify that structure. It's not very ergonomic.
[1]: Anything loopy or graph-requiring should be factored out into its own sub-flow implemented in a turing complete construct. A workflow should be a composition of such turing complete sub pieces.
Oh god. hcl is one of the worst parts of terraform. I really wonder what other declarative languages were considered for this because it feels like a very weird base to start with :/
Is it possible some of your perception is shaped by Terraform’s implementation? I’m not a huge fan of json/yaml/etc as a DSL but Terraform makes/made some choices that leave me empty inside. But only after I read the original issue with hundreds of comments that was then migrated by a bot to a new repo issue which is then closed after hundreds more comments because a more recent issue that’s more specific is now around and then that one is closed after dozens of comments because there’s a roll up issue.
Maybe. On the other hand a lot of terraform's implementation has been shaped by how they themselves designed hcl, right? Things like multi-nesting function calls inside quotes and what not. All the dollar signs. I can't even tell if that's terraform or HCL, to be honest.
At first I thought it was just another CI tool exclusive to github.com, but the fact that you can hook up more than just the commit push events makes it interesting for orchestrating a bunch of workflows around issues and pull requests.
Also great to see that it supports both UI and code definition.
The only big missing feature in my opinion is a shared library support, because it will soon be tedious to copy/paste the same generic docker build commands across repositories.
I think it's referring to copying just part of the flow, where the terminal ends might not be connected, so per project/target you just wire up a repository to side A, and a deploy target to side B. Flows being a shareable description of part of the processing pipeline, where the testing/slack notification/approval workflow/building is standardized perhaps.. . (maybe I've been reading too much Akka streams)
Wow, this could be really useful for my team. We've built a ton of customized workflows on top of github. We use a modified git-flow process, and have bespoke solutions for automatically tagging branches when branches w/ name "hotfix" get into master, when long-living releases get merged, tons of logic for getting commits on a "release-*" branch into dev and other branches.
We ended up building a custom github worker that listens to all of this, but it's opaque and our Bus factor is 1 for that tool. Putting it on Github where anyone can change the rules and see them cleanly is fantastic!
Yeah but why shy away from the fact that any of us could randomly die at any moment? It may not be pleasant to think about, but it's true. And important to be plan for in critical systems.
That's a worse analogy. The entire point of "bus factor" is that it assumes the person is dead afterwards, so there's no way to recover the knowledge and this eventuality needs to be planned for beforehand.
Whereas even someone who won the lottery will take a $100k/hr consulting gig, or might take pity on his ex-coworkers and explain his choices in a 1 hour phonecall while sipping margaritas on the beach.
Hi Dan,
I am currently working on something that seems to be very fitting for your use case, and looking for early adopters. Can you please email me at meow@softkitteh.com if you find this interesting?
Thanks!
Word of advice: It's very unlikely someone is going to mail with such a hand-wavy pitch. If you have something that addresses a problem, describe it on a landing page and post the link.
This UI is beautiful. I hope GitLab looks into implementing something similar. Their CI is already so powerful, it would be great to be able to have a UI to build pipelines.
I'd rather describe workflows in files (s.a .circleci/config.yml) which are diff-able, copy-able, shareable and live happily in repositories. UI could be built from there to show me the flow (again, like in the appealing circlci dashboard).
Thanks for you kind words about GitLab CI. What are the things that a visual editor is really great at? Getting a good overview of the relations between tasks? Seeing what secrets are available?
Just another kind comment for GitLab CI: we deployed it on prem and built a whole data analysis and quality tracking system for a clinical trial on top. I know it’s not the most powerful thing out there but for our small deadline-ridden team the comprehensive and all in one nature + on-prem is just awesome.
It’s a statistic model for epilepsy. We have GitLab running alongside dedicated NFS server for the data, build Docker images with the code, run them with access to the data, and put quality control images and reports on GitLab CI as artifacts. Each patient gets a branch, triggers CI runner generating patient reports. It took a while to figure it out but it’s really good for a dev team lacking in discipline ;)
Briefly but the ETL stuff a little enterprisey for me. The ability to incorporate notebooks as part of the workflow definitely looks interesting, but I don’t think the heavy data sets we have would warehouse well in PostGres, we would need blobs or an NFS filer anyhow.
Sorry for being a little off-topic, but speaking of GitLab CI's, I really wish Auto DevOps wouldn't be enabled by default/mysteriously when there's no .gitlab-ci.yml and it hasn't been manually enabled. You end up with the red 'build: failed' icon which appears to be impossible to delete/hide even if you disable Pipelines under project permissions and disable everything under CI/CD.
I believe one thing that cause it is changing a repo from private to public without unchecking Pipelines permission (enabled by default). The new GitLab profile pages looks absolutely gorgeous, but I suspect many will be turned off by the highly visible 'build: failed' icons next to the repos, even if those repos contain nothing but a JSON file.. I personally had to resort to deleting the repo and creating a new one to get rid of it. And on that note, clicking on the '/users/username/projects?limit=10&page=2' next button results in the Overview tab turning blank.
FWIW, if you disable Pipelines and push another commit to the repo, you'll clear the failed icon from the project page. The project page only shows the last commit and the failed status is associated with the specific commit, not the project as a whole.
I am a very visual person. The first step before I start writing a workflow in gitlab is, that I draw the whole pipeline on a whiteboard as a tree/graph. We discuss it with the team and only then I start writing the .gitlab-ci.yml file.
Having something digital to substitute the whiteboard would be fantastic. Not just for designing the pipeline, but also for seeing the results of an actual run of the pipeline.
I've been just draw a .gitlab-ci.yml file with random sleeps. Hell, I might expand upon that and make the sleeps sorta simulate the real thing but on a faster timescale for demo purposes.
I think it would be amazing if gitlab had a visual editor for one time runs. Like, "rerun this one step in the pipeline as if this was this new yaml". Would prevent the annoying aspect of building a pipeline, which is when your 3rd, 4th step, etc. Isn't correct, and you have to edit the yaml and wait for steps 1 and 2 every single change.
Thanks for the suggestions. It would be great to have more details about it. Could you please open a feature proposal about that over at https://gitlab.com/gitlab-org/gitlab-ce/issues? We'd love to follow up on it.
Workflows are typically I/O bound (where I/O == tasks that are executed). In most cases they tend to include little or no actual computation. So the raw computation speed is of little consequence, and an interpreted approach is fine.
Implementations of workflow languages optimised for compute performance (e.g. using JIT compilation) do exist, but are not widely used outside of situations where the workflow combines both compute-intensive work and coordination of external tasks.
Microsoft is gigantic, they loose absolutely nothing by competing against themselves.
People get caught up in the on-size-fits-all mentality. If you are a 20 man start-up it’s stupid to make 3 separate solutions that all compete. When you employ north of a hundred thousand people, it’s less of an issue.
From a certain size onwards not competing against yourself seems to be almost irresponsible. Just imagine the fate Intel without any internal competition to the Itanium, or without the Haifa team that insisted on pitting a refined P6 against netburst for mobile.
Totally agree with this. I've heard Amazon does similar things, like multiple teams competing with each other to accomplish similar goals. Can be very healthy for the company if managed correctly.
That particular example, I'm not even sure if it's an issue of teams competing for the king of the messaging hill. And it's of particular interest to me as a developer who works in the messaging space.
It's not hard to get right, it's not difficult to get wrong either-everyone has their spin on messaging and that kind of choice is perfectly fine. We don't need one to rule them all, at least IMO.
But goodness gracious Google just seems to have NO clue what they're doing with messaging. Which is frustrating because in the sliver of time they got it right, they got it right (that time when Hangouts was actually kind of great, it was well integrated, and looked like Google was actually trying to make it better? Member those days?) and then-as expected-they stripped the car for parts and we ended up with two communications (Duo and Allo) platforms that really should have been one feature-rich solution.
Microsoft also has Flow, which can kind of do a lot of the same things as well, although more on the if-this-then-that type of model. It's actually pretty handy, since that seems to be the best way to do workflows that create issues into VSTS/AzureDevOps, or generate notifications out of it.
Logic Apps and Flow have the same tech and platform (shared code base). One requires an Azure subscription, the other doesn't so is aimed at Office/IT rather than devs and platform people
Presumably, like every other GitHub feature, the people paying for GitHub pay for it, free users get it as part of the ecosystem building promotion that the free tier provides to bring in paying (not purely public/OSS, team features) customers, with the assumption that the surplus on the paid accounts will, over the log term if not immediately, cover the cost of the free use.
Or they’ll make it a separately charged feature once they evaluate demand and usage in the beta, but AFAIK that would be new for GitHub.
> Over time — and Lambert seemed to be in favor of this — GitHub could also allow developers to sell their workflows and Actions through the GitHub marketplace.
Is this made to replace things like Travis and Jenkins? (I have very little experience with them.) The GUI looks very sleek, I'll definitely be checking this out to see how to integrate it into my SE course.
More like replace Atlassian Bamboo. I've had traumatic experiences with sleek CIs. To me it's like the Scratch "programming language". Good for beginners, and then you find yourself wanting to copy/paste your workflow or make any sort of involved change and having to rewrite everything in a Jenkinsfile.
Now, I'd like to be wrong. But I doubt a UI can get close to plain text for this kind of thing, it's just very difficult for software to translate boxes into code.
Until then, there's a reason our tooling is text-based to this day.
The new UI that Github implemented for it's actions is really slick!
As far as implementation I'm starting to wonder if anyone actually uses BPMN[0]...It might be nice if we had a standardizable way to do these orchestrations, and I thought BPMN was it.
Are you sure they don't have Dockerfiles? It looks like the Dockerfiles don't have to be in the root directory. You can have a repository with multiple Dockerfiles stored in subdirectories, which results in the repository hosting multiple actions.
This looks very promising, and I'm eager to try it out for some hobby projects.
One thing I haven't found skimming the docs is a manual approval gate, which would be very useful for projects that don't have full automated test coverage (so, nearly all of them) before a production deployment.
I'm very hopeful this can replace at least some of our hooks. We've for a fairly a automated process using GitHub labels and it would be great to be able to move some of those off our hook server.
Great news. Also the right way to do CI/CD with each step being a separately configured container with a persistent workspace throughout. Makes it very fast and easy to chain together steps while using small, focused, and updated images.
There's only a few providers that seem to get this right so it's nice to see included in github. I was just talking to the Azure DevOps people about this kind of functionality so it seems like GH is and will continue to be run independently of MS/Azure.
This is very cool. I maintain a GitHub App (pullreminders.com) and I can see Actions helping with setting up additional custom alerts that users sometimes ask for.
I wonder if actions on public / private repos will run in public / private containers (and be stored in public / private registries) by default?
The closest that I found is [1]:
> Actions are defined in a Docker container. Actions run in an environment where they have access to the code in your repository, variables you define, and secrets you make available to the action.
I get the impression from that and [2] that it's only private containers on GitHub servers for now.
Another interesting observation from the docs [3] currently:
> You can only create workflows in private repositories.
I imagine that's a temporary constraint for now?
Edit: Found more info on the runtime environment [4].
If you're as eager as I am to check this out, try this:
1. Go here as a logged in user: https://github.com/actions/docker/blob/master/.github/main.workflow
2. Click Edit on the file (top right corner, pencil icon)
3. Edit existing workflow, or click "Create a new workflow".
“Imagine an infinitely more flexible version of shortcut, hosted on GitHub and designed to allow anyone to create an action inside a container to augment and connect their workflow"
Err... so like code? If he phrases it like that, does that mean that the target audience isnt developers? I mean why else go with such an analogy?
We should be worried about locking our workflow to a specific provider. The fact a provider makes it easier or provides tools in itself isn't lock-in.
Even then we shouldn't be particularly worried. Vendor lock-in is an unusual problem; deciding to change vendor is rare and if there's a good reason to do it then it's worth spending resources. The only time it's a real problem is when you absolutely have to change and you don't have the resources to do the necessary work.
Interesting that GitHub is continuing to still actively develop independent features after the acquisition. I almost started to say that this seemed like a pushback against Microsoft’s newly-rebranded Azure DevOps before I even remembered...
Wouldn't it make more sense for Microsoft to merge GitHub with their TFS workflow that already has build / test etc. integration to some extent (along with hosting code)? I don't understand this development.
I noticed that they provide the ability to specify a Dockerfile which contains the necessary facilities to run arbitrary code. But I can't help but think there has to be a middle ground between the two. I've written about this in the past, arguing that applying concepts from traditional programming language theory (in particular functional programming) to the design of workflow languages can be fruitful.
https://www.pmkelly.net/publications/wage2008.pdf
https://www.pmkelly.net/publications/thesis.pdf