Filter events and results by data

From https://github.com/mesg-foundation/core/issues/195

Events/results filtering should be possible on values.

Let’s make a list of operator to implement:

  • Equal
  • Contain
  • lower, lower or equal, greater, greater or equal

We should make more research on this issue. We need a really efficient filtering system.

The core, once the network is ready, may have to filter thousands and thousands of events per sec


Reply from @Anthony

Also the goal with the network is to reverse this and instead of having many applications that listen and filter some events, have an event than can find all associated application that need this event with the filter applied.

We can easily implement this if we are based only on the event key with a query like service.all.where(listeners.contains(eventKey))

But it gets trickier when we want to search for the applications that matches a specific filter.

Example:
an application that needs to listen an event eventX that contains {foo: String, bar: String} with the filter foo=hello.

When an event {foo: "hello", bar: "world"} is coming we should be able to find the application without having to replay filters for all matching applications.

I’m pretty sure we can find a way to hash these filters if we only use the equal operator and get the matching applications for a specific event but if we add more complex operations gt, lt… i don’t see how so maybe we should implement only the equal operator and do the other filter on the application side only

We need to re-open the conversation of filters as we implement workflow.
A few start of discussion:

With following event definition

// Event stores all informations about Events.
type Event struct {
  Hash         hash.Hash              `hash:"-"`
  InstanceHash hash.Hash              `hash:"name:instanceHash"`
  Key          string                 `hash:"name:key"`
  Data         map[string]interface{} `hash:"name:data"`
}

The only things requires filtering are equality of key and instance hash.

There is proposal to use protobuf Any for Data instead of json, which I hope we will have (in general mixing two format is strange). If we do so, how do we plan to support contain syntax?

But even without speaking of filter sytanx, I think if someone needs only specific subset of events, it could be created in mesg.yaml, eg:

events:
   transferMESGTokens:
     .....
   transferMESGTokensFromCreatorWallets:

And in workflow someone just listen [transferMESGTokensFromCreatorWallets, …]`

So for me events filtering is not needed now (and maybe it won’t be needed ever) and we shoudn’t focus on it nor complicate it, as more important things are todo with workflows.

Only equality on key and instance hash should be possible.

I agree that we can use different events to do this filtering but we don’t necessarily have access to the service and/or want to modify it for a specific reason. Example: I don’t want to have to update the ethereum service just to send me an event when there is exactly my address…

There is still a way for the workflow to exists without filters is using other custom services that either return a result or trigger an error if the match doesn’t work (and break the chain of execution). But this is kind of a hack that has its limitations too.

Anyway, this is about filters in general not only in the context of workflow but also the stream of execution/event.

If we want a generic filter system we need to know what we exactly want from that and where we want to apply these filters.

I don’t think we need a generic filter for everything

We certainly need one for the “unstructured” data (map[string]interface{} that we have for event data or execution outputs) but that’s it. The rest we have everything typed and known so it’s better to have well-defined filters.

What a filter could be

For me, this is all the parts that could be needed.

  • Comparison of attribute’s value (of course this should be based on the type of data we check)
    • equals
    • greater than
    • less than
    • different
    • contains
  • Support of AND conditions eg: gt(xxx, 0) AND different(xxx, 10)
  • Support of OR conditions eg: eq(xxx, 'hello') OR eq(xxx, 'world')
  • Nested filtering. Beeing able to filter a data inside an object or array

Here is a way we could create this kind of filter. I’ve used a similar structure in a previous project and that was covering all the needs.

[
  { attribute: 'xxx', value: 'xxx', predicate: 'eq' },
  { attribute: 'yyy', value: ['aa', 'bb'], predicate: 'eq' }, // This is a OR `yyy = 'aa' || yyy = 'bb'`
  { attribute: 'zzz', value: 10, predicate: 'gt' },
  { attribute: 'zzz', value: 100, predicate: 'lt' }, // This is an AND `zzz > 10 && zzz < 100`
  { attribute: 'data.object', value: 'xxx', predicate: 'neq' }, // This is for nested filtering
  { attribute: 'data.array[0].xx', value: 'xxx', predicated: 'eq' } // Filter based on the array item
]

Not sure all that is needed but I’m sure there are libraries that can do that for us:

To keep in mind

Also, something to keep in mind:

  • For the workflow, we might need to find the data based on a query and its filter or index based on it for optimization purpose.