Workflow Definition Format

proposal

#1

Within this topic, I aim to find a native/standard definition format for creating workflows.

The Problem

We want to provide the best user experience for creating MESG Applications (can be mentioned as 'app/apps/workflows’).

The current way of creating apps is actually coding them in any programming language. A MESG Application is just a piece of software that interacts with Core’s ‘core’ gRPC APIs. This APIs are ListenEvent(), ListenResult() & ExecuteTask().

Using full featured programming languages for creating workflows:

  • is boring.

  • is time consuming, might require boilerplate code to be written that is not relevant to workflow itself.

  • is equally complex with the complexity of used programming language.

  • requires legitimate programming skills which is hard for non-programmers.

  • is not natural because in workflows we should only describe the data flow between services and should not do any advanced programming.

  • is too flexible with lots of advanced(non-needed) features and this leads programmers to do the mistake of putting some service logic right inside of the applications.

  • has, no standard for creating workflows.

  • causes to poor optimization because everything is up to developer’s code style and the used programming language.

These are the reasons that why we shouldn’t use a pre-exists, full featured scripting language like Lua, JS or Lisp to define/create workflows.

The Solution

Instead of using a full featured scripting language, we need a standardized, small and targeted configuration language that is specially designed for creating workflows.

Workflows are some sort of configurations that defines the data flow between the services. They don’t require whole features of an actual programming language but still needs some programming primitives like arithmetics, conditional operators, basic loops to manipulate data and create dynamic executions.

A configuration language with these basic features and a VM at the backend to parse, understand this syntax and execute tasks depending on the state of conditions is actually what we should are looking for. Where this VM should also manage(deploy, start etc.) the dependency services and do smart optimizations on listening events, results and executing tasks.

What is a MEG App(Workflow)?

Creating a workflow is about:

  • listening some events from some services,

  • but filtering these events by their keys or payload data.

  • executing tasks from various services depending on events or results that produced by other executions.

  • executing fixed or dynamic amount of tasks depending the conditions from root event or previously done executions(task results).

  • executing tasks synchronously by depending on each other or asynchronously or by mixing both ways which all decided by the conditions applied to the information got from root event or previously done executions(task results).

  • continuously manipulating the data got from events and results and dynamically creating new data as inputs to task executions.

Workflow’s life cycle:

  • usually starts by executing some task at the beginning to do application specific things like data migrations or triggering the root(first) events to start sub life cycles.

  • starts for listening some events where each event can be seen as a start point and root of its own sub life cycle (can be called as ‘sub workflow’).

  • each sub workflow usually consist of a mix of task executions where some of them depends on each other and some running in parallel. a task execution running in parallel might trigger a series of other depended or parallel task executions as well.

  • executions might be statically defined and in a fixed amount or can be created dynamically depending on the conditions got from root event or results of other executions.

  • data of root event or results from different executions can be combined in different ways depending on some logic to use them as inputs of other task executions otherwise life cycle will end for that session.

Expectations from a Good Workflow Definition Format

  • it should fulfill the needs talked in the What is a MEG App? and the The Solution sections.

  • it should be easier to create by non-programmers. so it shouldn’t feel like a complicated programming language, it needs to be more like a configuration language with a simple syntax & basic programming features. needs to be in the sweet spot.

  • it should be possible to convert definition to JSON and convert back from JSON. this way it’ll be easier to create it programmatically and JSON is available in browsers and most of the programming languages. (creating it programmatically is specially needed while building workflows via user interfaces)

  • JSON representation shouldn’t feel like an abstract syntax tree of a complicated programming language. it should be very close to the actual definition format. this is possible because we don’t want to have a full featured programming language. this way, it’ll be easily understandable by humans and easier to create programatically.

Finding a Workflow Definition Format

Applications are a bunch of configurations that defines the data relationship between the services. It’s about describing the data flow so this why they’re called as workflows. But to describe how data is should actually flow, we need to be able to do some sort of programming while defining workflows.

As a result, we need to introduce some programming primitives inside to the workflow definition format like arithmetics, conditional operators, basic loops to manipulate data and create dynamic executions.

This means, a workflow definition format needs to be able to accept some basic programming keywords alongside workflow specific definitions like workflow id, name, tasks, events, executions and so on. It is also needed to have an interpreter in the backend to parse & understand this whole definition format and run a special virtual machine for this definition format/language with the purpose of executing tasks in the described ways and pre created conditions.

Since we also want to make this definition format to be easily generated programmatically we definitely need to have it on top of JSON.

While considering all of these with the expectations from a good workflow definition format and after the experimentations I did with the HCL2, it seems that HCL2 is just created to meet with this kind of needs in mind. HCL2 is very close to JSON under the hood and it also has the flexibility we expect from a simple programming language to create workflows.

I don’t like to depend on custom languages like HCL2 and would like to stick with JSON as a data interchange format. But it seems that when HCL2 is converted to JSON, it’s actually still not complicated and actually looking like something that what we could create for defining workflows. HCL2 is also very flexible, extendible and has a nice interpreter that could remove so much work from our VM. So when thinking all about this, I’m really positive about HCL2 and would like to give a good try.

Why to Choose HCL2

  • it has built in support for variables where we can inject values from the interpreter or they can be created inside the workflows.

  • it has conditional loops for generating custom data over other data.

  • it’s extendible with 3rd party extensions so it’s actually possible to extend HCL2 language to meet with our special needs.

  • it has a data types extension.

  • it has an extension to dynamically define configuration sections that we can use to define executions dynamically depending on the state of conditions.

  • it has good packages, a parser, printer and others to deal with the language which we can use them to convert or convert back from JSON.

  • it has a good interpreter, a small runtime to understand and execute HCL2 programs that could remove a lot of work from our VM.

  • and most importantly, the JSON generated from a HCL2 configuration is very nice and legit which means, if we come up with our perfect implementation on top of JSON instead of using HCL2, it could have be very similar. so, this is a very comforting reason to give HCL2 a try. if we think that HCL2 is not enough at some point, we can always create our definition format and interpreter but still keep this underlying JSON format.

  • using HCL will reduce the time we invest into creating a workflow definition format and interpreter from scratch. HCL2 already has a good amount of packages and tooling around it that we can get benefit from.

Details About HCL2

Resources

Please Read All of These

Random

HCL2 Extensions

Cons and Tricks

HCL2 <> JSON

Even converting to or from JSON for HCL2 is possible, it’s not 100% convenient.

For example, function calls or built-in for expression are represented as plain strings when converted to JSON where I expect them to be converted as a structured JSON objects. That way creating them programmatically would be easier.

But this can be accomplished later by improving the existent HCL2 <> JSON tools. For now, we can work around this by using code templates while building JSON representations.

Some Hints

  • @Anthony we talked about directly passing structured data to workflow runner before here. I think, HCL2’s JSON representation can be very close to that structure. What do you think?

Introducing Workflow Files
#2

~reserved for a new post~


#3

Thanks for this post, really nice feedback.

I think HCL is a good candidate for the definition of the workflow but I don’t think we should focus on the representation of this workflow for now. HCL or JSON is for me exactly the same and we should know what data we actually need instead of how to represent them.

I disagree with that, arithmetics can be done with services, as well as conditions and loops can be done with recursion so this is not necessary and I would strongly suggest to not go this path yet as it will add a lot complexity. We have different way to reproduce that without having the workflow system that actually manages that so I don’t see the need for this extra complexity. Thus few advantages that you point out with HCL are irrelevant.

JSON representation will be an AST and there is nothing wrong here. This is all the instructions that are needed to execute something (basically the compiled version of it) that’s what actually matters, the list of all the necessary instructions for the workflow eg: definition, tasks to execute, events to watch, dependencies between etc…
If we don’t have operations loops and all the stuff like that then this AST will be quite simple and I really think this is the stuff we should think about, what kind of data/instructions are needed in here for the rest we can always compile yaml/text/hcl whatever down to that maybe even js haha.

We need an interpreter to read variables from other tasks/events that happened previously in the workflow but that’s it. We don’t need to recreate a programming language with arithmetics loops etc…


In general I agree that HCL is a good format but we shouldn’t focus really on that but more on what will be compiled for the core to run this workflow. Also I think a lot can be simplified by having a more functional programming approach. If you think about lisp for example there is no operations and no loops only functions (that can have an alias with an operator but still just function) eg: (+ 1 2), (add 1 2)

For me arithmetics, conditions and loops should not be part of the reflection and also HCL is good but this is something that has to be compiled into an AST and I think now it’s more important to extract what the data of this AST will be. We will always be able to compile HCL/JSON/xxx into it.

EDIT:

Just to show why I’m thinking that expressions are not necessary and should not be implemented. It can be done with different services (that could be core services)

example: iteration and filter over some data to get the list of data.

27%20pm

similar to

list = []
for i in 0..x
  d = data.get(i)
  if expression(d)
    append(list, d)
return list