Add more detailed data in the mesg.yml

v0-7
proposal

#1

Problem

Right now the mesg.yml can only handle few types of data:

  • Number
  • String
  • Boolean
  • Object

The Object type is not really clear, it can be an object, an array, a string, whatever…

It can be confusing and we are loosing a lot of informations. We should be able to add more details

Proposition

We need to handle 2 other kind of data, Array and NestedType and also make sure to keep the possibility to handle unstructured data.

Array

For the array part we could have a repeated attribute in the data

data:
  foo:
    type: String
    repeated: true

Nested type

For the nested type we need to be able to define the nested attributes

data:
  foo:
    type: Object
    object:
       attrX:
         type: String
       attrY:
         type: Number

Any

We still need to make sure that we can send any kind of data, in this case the core will not do any validation and the responsibility will be on the user to know what kind of data will be processed in the application

data:
  foo:
    type: Any

Milestone v0.6
Milestone v0.7
#2

Just an idea come to my mind. Why we have data type at first place?

Can’t we just pass the data and the service will be one who validate if the data are corrent for it?
Why core needs to do this, where is the benefit?

For example service wants binary data - how do we pass it with the core?
Also we need to figure out what is the data representation. What if we just define the tasks and that’s all?

This is something like static vs dynamic typing


#4

I see many reasons why the core needs to have these informations:

  • developer experience (validation of data, documentation and things like that), and I agree with you, ultimately this one could be remove
  • workflow connections, having the workflow being able to take the outputs of one task and convert them to inputs of the next task at least making sure that this connection is possible like not putting a string into an int.

Of course you’re right about the static vs dynamic comparison. It’s exactly the same debate, it’s either all static or all dynamic. I personally feels that static is way easier as a bottom layer and we can always build some stuff dynamic on top of that later


#5

From an user point of view, static typing will improve the experience a lot. With it, doc can be automatically generated, full data validation and some GUI could even been created with a simple connection from event’s data to task’s input.


@Anthony I totally agree with everything you write. I really like the possible to mix array and nested type :slight_smile:

I only suggest to deprecated or directly remove the Object type.


#6

I only suggest to deprecated or directly remove the Object type.

+1

And I prefere syntax with []String or Strigs[] as it’s more compact and you see right away what type of data it is and you don’t need to look for additional field.


#7

Having []string or string[] is easy to read but limiting the stuff befause what happen if we have a nested data (the actual Object type) ? Do we need to define a typeX with the definition and do a []typeX ?

With the repeated we can use it for any types and just transform any types (even the nested ones) in an array.


#8

So something like:

data:
  foo:
    repeted: true
    type:
       attrX:
         type: String
       attrY:
         type: Number

lgtm


assigned ilgooz #11

#12

implemented by https://github.com/mesg-foundation/core/pull/646


#14

Because of the implementation complexity that using type parameter for both Number/String/Boolean/Any and Nested definition brings, @Anthony and I suggest to add a new parameter object to describe the nested definition.
When the property type contains Object, then, the Object definition should be done in a property object.

Before:

data:
  foo:
    type:
       attrX:
         type: String
       attrY:
         type: Number

After

data:
  foo:
    type: Object
    object:
       attrX:
         type: String
       attrY:
         type: Number

In go the implementation is super simple. Just need to add to Parameter struct:

type Parameter struct {
	.......

	Object map[string]*Parameter `yaml:"object"`
}

The JSON Schema should be updated and return an error if type=Object but object=nil. If it too complex to do it in JSON Schema, it should be done manually in Go in service/importer/service_file.go.


@ilgooz @krhubert what do you think? Do you like the name object in the mesg.yml file to describe the object properties?


#15

I agree that there is a little bit complexity in the code but it doesn’t require us to change yml format. I feel it’s easier for devs to use current syntax instead of adding object field for nested parameter definitions. I’d like to KISS msg yml format for the devs.


#16

It’s ok to change the yml, it’s a new feature that if they want to use it they can add it.
Also this is quite normal as a developer to have something like that

variable Type
struct Type {
  ...
}

This is something common in many languages so I really don’t see any problems with that


#17

For nested type,

  • we can respect json style
type: object
properties:
  attrX:
    type: String
  attrY:
    type: Number

Use data because we use data at most of root level stuff

type: data
data:
  attrX:
    type: String
  attrY:
    type: Number

#18

I’m ok with type: Object because I didn’t find any other name.

Some notes from what I have tried:

  • our yaml starts looking like json and it’s wired
  • if someone need more nested struct then readability hurts
  • maybe we should provide funciton in go package like:
func CheckTypes(i interface{}) error {
// pass service interface 
// read mesg.yaml file
// check with reflect package if interface == mesg.yaml definitions
}
  • why there are multiple outputs for task but only one input (consistency)?
  • I’ve also come up with draft of new mesg.yaml which represent objects in seperate section (it also has drawbacks) but just to show it.
objects:
  - name: t1
    type:
      var: Number

  - name: t2
    type:
      var2: repeted String

  - name: t3
    type:
      var3: t1
      var4: repeted t2

  - name: n1
    type:
      url: String
      body: String

--- with objects
tasks:
  execute:
    inputs: n1
    outputs:
      success: t1
      error: t3

--- without objects
tasks:
  execute:
    inputs:
      url:
        type: String
      body:
        type: String
    outputs:
      success:
        data:
          var:
            type: Number
      error:
        data: 
          var3: 
            type: Object
            object:
              var: Number
          var4:
            type: Ojbect
            repeted: true
            object:
              var2: String

To summarize: i’m ok with object :smiley: it would be good to have a time in next few months and rethink how we want to represent those values, so it would be easily readable in code, in docs, testable, etc.


#19

What about the name of the property to define the object properties and types?


#20

Here is a summary so we can reach consensus about the nested declaration:

#1

type: Object
object
  attrX:
    type: String
  attrY:
    type: Number

#2

type: Object
properties:
  attrX:
    type: String
  attrY:
    type: Number

#3

type: Data
data:
  attrX:
    type: String
  attrY:
    type: Number
  • #1
  • #2
  • #3

0 voters


unassigned ilgooz #21

assigned krhubert #22

#23

Implemented here:




unassigned krhubert #24