Introducing Workflow Files

proposal

#2

Thanks for this post it’s really nice :slight_smile:

Syntax

I like your syntax, it’s really close to the api that we have right now and this is something nice. I have one concern though. We need in the future to be able to chain the executions. In my first proposition I was thinking to flat all the executions and resolve the dependencies. I’m afraid with your syntax it might be hard to chain executions.
It’s something we should think about. Maybe something like that:

when:
  serviceX:
    event:
      eventX:
        execute:
          nameofexecution:
            serviceY: taskY
            result:
              resultY:
                map:
                  foo: $nameofexecution.resultY.outputY
                  bar: $event.dataX
                execute:
                  nameofexecution2:
                    serviceZ: taskZ
                    ...

With something like that we could even flatten all the executions and resolve them based on the data they need. This might be too much to implement for now but I just want to have a syntax that we will easily be able to migrate to adopt that.

New APIs

All good for that, I would just remove the update part, let’s keep it simple for now, we delete and create a new one like the services. We can have an id system on top of that later on to mimic an update.
I would be careful to have a consistant naming between the service and workflow actions like remove vs delete

Multiple task executions

Totally necessary, it’s kind of related to my first point but I think here you are more talking about executing them all in parallel and not chained which is something that we should cover too but we will have the same problems. The execution can be extended but the mapping might be totally different and this is why I think we should group the mapping inside the execution part.

Filters

This one is tricky, especially if we want something simple. We definitely need a filter system, filters that for me should be done based on all the data from the execution (data, tag, outputKey) but also the one from the parent execution (in case of nested executions). For the kind of filters at least the equal is necessary and all the other primitives should be perfect but for now, like you propose we can use services for that. Let’s make sure that we have something where we will be able to add the filters but we can now have special services for that.

Data composition

I think this one is too much, I would recommend to go with a service for that, we will never be able to cover the different needs for that so let’s not try I think

Architecture

I think we should have something always reacting from event’s services. For now we can have something simple and listen for the api that we already have based on the workflow informations, basically what you’ve already did. But we should have all these workflow informations in a database and for every events request this database to see if we need to execute a task. This way we remove all “listening” part that is not really scalable and hard to manage.


In conclusion, it’s really nice and for now we can use the system of listeners but we should keep in mind that this will evolve with a database (even distributed database) and also the syntax needs to be “future friendly”. I really think we should name the executions and do the processing inside these executions, that way we will be really flexible but I might be biased by my previous researches. Definitely open to rethink that.


#3

I would split inspect into two

  • first for getting workflow definition workflow get-def id-or-name
  • second for inspecting workflow inspect id-or-name

Because definition of workflow is static resouces and everthing else is more dynamic one.

Except that everything is ok.

We should provide an option to execute multiple tasks

See new proposition (triggers.when.outputs). Also, we should avoid manipulate outputs because we will need to create kind of DML for json (I haven’t seen a successful project for it).

First let’s set up some proposition on workflow file, then we could talk about the arch of it.

So I have such proposition:

# name of the workflow
name: email-notification

# description of the workflow
description: Workflow for notify when email is sent

# services aliasas that cloud be accessible in the action
services:
  - email: 6b0884a06e169c095ed8c412c3afc398
  - slack: 474cb31a6264142684d314d6f2ec650a
  - forum: 174cd4d4ba541fda5cf46d0d74e1102a

# triggers is a list of all services and its events mesg will listeing for.
triggers:
  # the name of trigger used in eventflows
  public-email:
    description: "email with topic and message"
    # id or name of service
    service: email # 6b0884a06e169c095ed8c412c3afc398
    # when filters the service events.
    when:
      # events name from services mesg.yml
      event: EmailSent
      # events tags
      tags:
        - t1 # simple tag name
        - /^\w+$/ # maybe regexp?

      # events outputs (from services mesg.yml)
      # NOTE: this is the core of workflow file
      # required outputs will be passed to execution 
      # of next service
      outputs:
        - topic
        - message

  # another service (here is the same but with diffrent outputs)
  private-email:
    description: "email with topic only"
    service: email
    when:
      event: EmailSent
      outputs:
        - topic

# list of services to execute
actions:
  # name of the action
  send-to-slack:
    description: "send to slack slack"
    # id or name os service
    service: slack 
    # name of the task from service mesg.yml
    task: send-to-channel
    # provide inputs (they will be combine with triggers outputs)
    inputs:
      apikey: "80676bd37b0636dc11828d7f23cdafbb3889aba8"
      channel: "notification"

  post-on-forum:
    description: "post on forum"
    service: forum 
    task: post
    inputs:
      user: "root"
      pasowrd: "pass"

# eventflows combines triggers with actions
eventflows:
  # trigger name
  public:
    # on has one trigger for actions
    on: public-email
    # execute contains action/list of actions to execute on given trigger
    execute: send-to-slack
  private:
    on: private-email
    execute:
      - send-to-slack
      - post-to-forum

The key features:

  • the syntax is not so nested
  • the triggers.outputs and actions.inputs correspond 1:1 with mesg.yml definitions
  • it has 3 main part : triggers, actions and bindings between them
  • you can chain triggers and actions (althouth you can’t create multiple chain - on a do b then c because such chain requires keeping the state).

#4

It’s good that if we can reduce nested executions for readability. In the functionality side, they all seem the same. We need be sure to have a nice syntax for serial(dependent) & parallel task executions.

I’m throwing an idea by extending the original syntax that I provided to cover both parallel and serial task executions without a nested syntax. I’m introducing the new dependsOn field and named executions pattern inspired from @Anthony’s.

@Anthony I think you mentioned about having map inside execution, this makes sense and it’s needed to make it possible doing multiple task executions with different input data. The below example also adopts that part.

And there is an example in the bottom about wildcard use for listening all events or results that I forgot about mentioning in the first post.

when:
  serviceA:
    event:
      eventX:
        execute:
          # this execution runs in parallel because it doesn't depend
          # on any other executions.
          execution1:
            map:
              field1: $event.data.fieldX
            serviceX: taskX
          # this execution runs in parallel because it doesn't depend
          # on any other executions.
          execution2:
            map:
              field1: $event.data.fieldY
            serviceY: taskY
          # this execution waits execution1 to complete with resultX &
          # execution2 to complete with resultY.
          execution3:
            dependsOn:
              execution1: resultX
              execution2: resultY
            map:
              foo: $execution1.result.data.fieldX
              bar: $execution2.result.data.fieldY
              baz: $event.data.fieldY
            serviceZ: taskZ
          # this execution waits execution3 to complete with resultX.
          execution4:
            dependsOn:
              execution3: resultX
              ...
  serviceB:
    result:
      # listens all results from serviceB.    
      '*':
        execute:
          logExecution:
            map:
              message: $result.data
              serviceName: serviceB
            logger: log

And there can be multiple executions that waits for the same execution to complete with the same or different output keys. There is also a big range of possibilities for doing dependent executions with this kind of syntax.

WSS will analyse all the dependent executions and run them in serial or parallel depending on how they’re defined and what executions that each execution depends on.


#5

@krhubert Yes, I agree that we may need to provide a command to user for showing the underlying workflow.yml. It’d be nice :slight_smile:.

I think we can use one of the command names below for this:

  • $ mesg-core workflow dump ID
  • $ mesg-core workflow definition ID

#6

I’m thinking about adding mongodb as dependency to WSS so we can query workflows depending on incoming event and results to execute tasks. Note that, we may not need this at this time and only query saved workflows on startup and keep their definitions in memory. We’ll see this by time while experimenting.

I’d like to see how a distributed database will work together with WSS. @Anthony already pointed that we’ll need a distributed database in future so mongodb can be good start and we can always change it in future if needed. I don’t want to use a simple key-value database like LevelDB for workflows because we’ll need some querying.

@core team please give feedbacks :slight_smile:.

This is my current TODO list:

create a base for workflow feature and add dummy create & delete features. #541
create the most simple VM implementation in WSS to run workflows. It should be able to run the sample workflow service in the first post. And use mongodb for saving and querying workflows. #559
implement workflow logs command so we can easily debug running workflows. #559

Later on, do improvements on syntax, VM that runs workflows and implement remaining cli commands / gRPC apis.


#7

Workflow Running Policy

This is an another thing that we need to discuss. We need to decide how to deal with disconnected services in order to run workflows stably.

Scenario #1

What to do if a service cannot be started that a workflow depends on, in the first creation time of the workflow?

Should workflow create feature return with an error or create the workflow and try starting and listening on the services within intervals until all of them are responsive? We can still log this process to workflows own log stream for devs to be aware of the life cycle of workflow.

I prefer the second way by leaving this management to WSS so it can deal with services under the hood instead of failing on workflow creation.

Scenario #2

What to do if a service is got disconnected that a workflow depends on?

This service could be the service that tasks are executed on or can be a service that listening results and events on it. There can be several services that a workflow depends on and some or all of them can be disconnected/unresponsive.

In this case, workflow cycle will not proceed properly because of the disconnected services. For example a task may be able to get executed after an event received from a service but some other tasks may not be executed because their services are down or some results or events may not be listened for the same reason.

Should we completely pause the execution of a workflow when at least one of the services it depends on is not responsive (I think yes)? In this case workflow can continue after its services are fully responsive again. To make this possible, we need to make sure that we’re keeping workflow’s state (unhandled events, results, inputs/outputs, executed/non-executed tasks etc.) correctly otherwise we can miss some task executions on the way and this can introduce weird behaviours to application.

I think WSS should manage all the services like this and have restarting and relistening policies on services. And log any info to workflow’s log stream about the status of services and listening/execution state of workflow.


#8

For now we don’t have some kind of registry to map service ids with their Git URLs. Because of that, in workflows, we’re not able to automatically deploy depended services. So, we’re thinking about supporting repo urls and local paths next to service ids in the definition.

e.g.

name: ...
description: ...

services:
  # with service id.
  serviceA: 5baa5a2f1ecdda9a25a15e350f0a94730ca2ad3b
  # with git host.
  serviceB: https://github.com/mesg-foundation/service-influxdb#also-supports-branches
  # with absolute path.
  serviceC: /Users/ilgooz/Programs/go/src/github.com/ilgooz/vuejsapp
  # with relative path.
  serviceD: ./another/service
...

#9

I really love the idea to have this deployment part directly in the workflow, like that we can just provide the workflow and this install and start everything.
Let’s definitely keep this in mind, maybe not a priority for now but really good, we could have a kind of workflow service resolver that for now is database resolver but later can be git, path, tar, ipfs…


#10

We may support configuring services inside workflows as mentioned in another proposal.


#11

Let’s keep the deployment idea but implement it later :wink:


#12

we can implement a lock file for locking service’s version. see: Improve relation between SID and Hashes


#13

Here is a proposal for the workflow UI for applications/workflows that will not surcharge the cli / core for now and let us experiment before adding this feature in the core.


#14

I remember that we had some ideas about executing tasks on pre-created applications like we do with MESG services. For example, an application can have some tasks/funcs that actually makes various task executions on services under the hood and produces a result.

This way it’s possible to create reusable & configurable applications as well.

@Anthony can you share your vision for this about how we should implement this feature in workflows?


#15

I’m not sure this is a really good thing to do, I think the best is to really keep it simple and if there is tasks for preprocessing or post processing we just put them before and after the task we want.

If it’s tasks really independent of the application that needs that for the deploy, they can just create a app that execute a task and then use the deploy api with the result but in that case we still don’t need any pre/post processing.


#16

@Anthony I created an another proposal for this and explained the idea a bit more. Reusable Workflows & Workflow Marketplace


#17

I’ve been looking at Github workflows, the config behind Github Actions.
You can actually create your own tasks based on docker with the connections you want, that might be a good source of inspiration.


#18

To let you know it looks like hcl syntax.

For syntax we need to first decide if we want to use yaml or hcl (json dosen’t have comment and it’s unreadable, toml has it’s own quirk). Other languages are not so popular.


#19

HCL seems to be a much more human readable format comparing to YML. It also supports arithmetics and registering custom funcs to work with data.

Creating custom funcs is very useful for us. Specially, creating a func to handle data we get from events, task results and to create input data for task executions is very useful. We can create some useful funcs to easily work with service data inside workflows.

And, of course, we can always use MESG services to work with data by using services like objects, logic but experimenting with HCL’s funcs and solving this problem without going to network might be nicer.

Having arithmetics and dealing with data by using custom funcs directly inside HCL, might decrease the complexity of workflow which can make it much more readable. Because, we’ll not be making lots of additional task executions to just handle the data.

Working with data in workflows is very essential and is a big part of workflow logic but if it becomes too hard to manage, it’ll distract developer’s focus while creating workflows. So, we should be providing the simplest solutions.

See index.js of application-marketplace to get an idea about how objects & logic services are used in the marketplace application.

HCL also supports JSON. So converting it to JSON and YML is very easy but supporting these two next to HCL will require a bit more work because we need to create a small runtime to interpret arithmetics and custom funcs. But supporting JSON and YML can be discussed later. I think that they’re not necessary for now.


#20

I’m excited by what you can came up with HCL :wink:


#21

I created a sub topic to discuss about workflow definition format, please check from here.