The OSyRIS Workflow Platform

ABOUT

The OSyRIS (Orchestration System using a Rule based Inference Solution) Workflow Platform is partially supported by the GiSHEO (http://gisheo.info.uvt.ro) ESA PECS project. It is developed by the project team located at the West University of Timisoara, Faculty of Mathematics and Computer Science. The Workflow Engine is based on the DROOLS rule engine and has no relation with the DROOLS FLOW API developed by the DROOLS/JBOSS team. Currently it is at the beta version and aims at offering a simplified/minimal yet universal language for expressing general scientific/data workflows.

A rule based approach was chosen as it offers runtime adaptiveness and dynamism by introducing/removing new rules or facts at runtime. Due to this behaviour it is possible to change at runtime the structure of the workflow or even go one step further and given a problem determine its workflow solution based on one or several rule sets. It does not rely on any transmission protocols and views tasks as simple Web Services.

Self adaptation to resource failure is embedded inside the engine. Each time a resource will fail to solve a specific task the engine will try to find a new one until all options are consumed.

The OSyRIS Engine also provides an API for creating your own rule based workflow application. A SOA approach has been incorporated inside the engine as resources are viewed as services. Furthermore the engine itself can be exposed as a service.

Current capabilities include :

  • simplified language (called SILK or SImplified Language for worKflows) with support for sequences, parallel threads, synchronised splits and joins, and loops. As the language is rule based there exists only one implicit construct which is sequence. All the rest follow naturally by introducing conditions to rules and allowing them to execute in parallel.
  • support for multiple rule domains. This allows users to divide their rules into groups. Only one group can be active at any given moment and thus only rules inside it can fire. This behaviour allows for simulating chemical reactions (using multiple solutions and sub-solutions) and membrane calculus (together with the fact that tasks can have multiple instances).
  • support for handling multiple task instances. This feature is directly integrated inside the SILK language.
  • support for both parallel (in case several rules can be fired simultaneously) and sequential rule firing.
  • support for distributing the workflow across several resources (including engine self-healing in case of resource failures) : D-OSyRIS
  • information stored inside a PostgreSQL database.
  • support for automatically generating a rule set given a goal and a rule base.
  • support for visual workflow design.
  • administrator interface for viewing workflow and task statuses.

Future enhancements:

  • extend language to support map-reduce operations as a language component, accumulator variables
  • extend the Visual Designer so that it can offer advices from existing rule base rules and in the same time increase the knowledge inside it
  • model checking of the created workflow in order to ensure correctness.
  • create own version of RETE for matching the rules without needing to rely on DROOLS
  • create visual interface for deploying and managing (stop/resume/cancel actions) workflows

The second draft of the OSyRIS v0.95 technical document is online. You can access it at http://gisheo.info.uvt.ro/trac/attachment/wiki/Workflow/osyris.pdf

License

The OSyRIS engine and associate tools are available under the terms of the GNU GENERAL PUBLIC LICENSE version 3 or newer. You can find a copy of the license here.

Download

The OSyRIS platform can be downloaded from the Mercurial at the following link: http://code.ieat.ro/hg/hgwebdir.cgi/People/marc/OSyRIS.

Example:

mkdir OSyRIS
hg clone http://code.ieat.ro/hg/hgwebdir.cgi/People/marc/OSyRIS/ OSyRIS

The folder contains the engine as well as the additional tools. Beware that the tools are beta versions and might not work as planned. The engine is the only component released.

Features

Workflow Engine

Its initial aim was to provide a simple workflow platform for complex image processing requests. During the initial development phase it was noted that the engine could be designed to cope with a wider range of workflows, especially service oriented scientific workflows. In this direction it was intended for the platform to offer a simple interface and a simplified yet general enough language which could allow for users to submit general workflows in a natural fashion.

The engine is built on top of the DROOLS (http://drools.org/) inference language which uses a modified object oriented version of the RETE algorithm for rule matching. The engine ca be either embedded inside a Java application or exposed as a WS which can be later invoked by any SOAP enabled client. It also allows users to submit workflows in the form of SILK rules which are then translated into DROOLS rules and any syntactic errors are handled at this point. Before tasks can be executed by the engine a special abstract class needs to be implemented as it deals with specific application details. It is responsible for discovering proper resources for the execution of tasks and communication between them and the engine. Once the interface is implemented the DROOLS inference engine handles the actual rule execution.

The engine can be either called from inside a custom Java application or as an external WS. Embedding the engine inside an application requires extending the OSyRISwf base class. This class offers basic functionality such as executing the workflow synchronously, querying its result (the result of the last executed task in the workflow), finding the output of a particular workflow task, retrieving the workflow (or task) status. Each workflow is uniquely identified by an ID which can be used when an asynchronous workflow execution is considered. Asynchronous execution can be obtained by starting the workflow in a different thread or by exposing it as a WS which is then periodically queried for information.

Every custom application using the OSyRIS workflow needs to implement its own workflow class derived from the OSyRISwf base class. Inside this extension users can add their own custom methods and functionality.

The following fragment of code shows how we can create a workflow class instance with parallel rule firing enabled, start it and query the result after its completion:

MyOSyRISwf wf = new MyOSyRISwf(rulefile, settingsFile, parentWorkflowId, fromDB);
String wfID = wf.getWorkflowID();
wf.execute();
System.out.println("The result of workflow " + wfID + " is: " + MyOSyRISwf.getResult(wfID, settingsFile));

For more details see the Executor Example class inside the OSyRIS archive.

Resource selection is important in any workflow system as it can lead to unbalances across the Grid. This issues is usually accomplished by using scheduling policies for optimizing the workload. Resource selection can be handled by the OSyRIS engine in different ways:

  • By using an external Resource Selector engine - This approach relies on the fact that each OSyRIS task is solved by an external WS. An intermediate Broker Service can be added between the actual service and the engine and will be responsible for dispatching tasks coming from the engine to appropriate resources. Additionally the Broker can be designed with scheduling decision capabilities.
  • By implementing additional functionality inside the extended Executor class - In this case the scheduling decision is translated from the Broker Service to an additional class called by the application specific Executor class. After taking a decision the Executor will send the task directly to the WS without any intermediate Broker Services.
  • By adding resource selection decisions inside the rules - When using this approach the scheduling decision will be inserted directly inside the rules. For example we can decide to execute a task only if its predecessors have completed their execution and appropriate services have been found. More insight on how we can achieve this will be given in Section \ref{silk-examples}. Two different approaches can be considered in this case. The first one requires modifying the SILK2Rules class and using the WFResource collection which contains a list of available services. In this case a distinct rule condition must be added inside each of the OSyRIS generated DROOLS rules. This condition will contain all the required scheduling decisions and will return one or more potential services where the task can be safely executed. The second approach implies using a separate service similar to the Broker Service. This service will be attached to a task which has the sole purpose of dealing with scheduling other tasks. This task will then be attached to each rule and its output consisting of a set of available services will be transmitted as input to the tasks needed to be solved.

Distributed OSyRIS

Allows several OSyRIS engines to cooperate into solving a particular workflow. The workflow is split into domains (solutions) each being executed independently. Communication and coordination is achieved by message passing. In case of engine failures, an intelligent feedback control loop re-deploys the failed component. Engine deployment can be either manually or by using the provided deployment facilities.

SImple Language for worKflows

SILK offers a lot of elementary constructs such as sequence, parallel, split, join, decision or loop. Besides these constructs it also offers the possibility to define tasks which represent the atoms of the workflow. Tasks are viewed as black boxes which have certain mandatory attributes and an arbitrary number of meta-attributes attached to them. In order to maintain the generality of the language the only mandatory attributes related which can be assigned to tasks are represented by input/output ports and two special meta-attribute which will be addressed in greater detail in what follows. Each task belonging to a workflow must have at least one input and output port defined by using the keywords with the same names (eg. i1:input). Besides these users can also define their own workflow specific meta-attributes. Meta-attributes are defined as a "name"="value" pair. The following code fragment shows how we can define a task with one input and one output port and one meta-attribute:

A := [i1:input, o1:output, "processing"="grayscale-extraction"];

Ports can also be initialised with certain values by using the assignment operator (=):

A := [i1:input="image.jpg", o1:output, "processing"="grayscale-extraction"];

Any value is admitted for storage inside the meta-attributes or as initial value for a certain port. This value can contain either a plain text to be send directly to the service, XML information or a pointer (ID) to a database (table, element, etc.) or file (line etc.) which is interpreted by the custom Executor class and whose content is then forwarded to the WS for handling.

Another use for meta-attributes is to define and use ontologies for defining tasks. Ontologies require users to define a set of concepts or tasks in our case, attributes related with them and relationships between them. SILK allows to easy define all the previous by letting users to define tasks in a simple manner as previously shown, attach attributes to them by using meta-attributes, and attach relationships between them through the use of inference rules as it will be detailed in what follows. Given a SILK file we could easily assume that it represents besides a workflow formalisation an actual task ontology. Besides defining relationships in this manner users can also embed them inside task definitions as meta-attributes.

The SILK language also permits the use of line comments. The commented lines must start with the # character and all subsequent text will be ignored. They can be used for documenting language files as it is shown in the next example:

# This line is commented out and everything is ignored including A := [i1:input, o1:output];

Inside a SILK file the task definitions must precede the rules as the last are checked by the OSyRIS engine for validity against already known tasks.

Rules are simply defined by using a LHS → RHS | cond , salience syntax where the condition and the salience are optional. The RHS (Right Hand Side and LHS (Left Hand Side) are made up of tasks separated by commas and linked with each other through the use of variables as the following example will show:

A[a=o1] -> B[i1=a];

In the previous example a represents a variable which binds the output port o1 of task A to the input port i1 of task B. More than one port can be bound to a variable inside a rule. For example we could have a RHS rule which is bound to two variables belonging to two different LHS tasks. In this case the bindings are separated by the # character:

A[a=o1], B[b=o1] -> C[i1=a#i2=b];

Inside a LHS rule there can exist tasks which are not bound to any variable. In this case it they are called synchronisation tasks. Synchronisation tasks are used when it is desired to delay the execution of RHS tasks the synchronisation tasks get executed. The result of the latter would however not be used when executing the RHS tasks.

A[a=o1], B -> C[i1=a];

If more than one task exists in the RHS then each of them will be executed in parallel by the OSyRIS engine inside a separate thread.

As previously mentioned a rule can have an optional condition and salience. The salience is represented by an integer number (negative or positive) and represents the importance of the rule with large values making the rules more important and increasing their chance of execution in case multiple rules are ready for firing in the same time. The condition can be any logical statement which uses integer, floating point or string operands. The following examples show how a condition having integer or floating point operands:

A[a=o1] -> C[i1=a] | a > 0 , 4;
A[a=o1] -> B[i1=a] | a <= 0 , 4;

or how string operands can be used:

A[a=o1], B[b=o1] -> C[i1=a#i2=b] | a != "yes" and b > 0;

Inside rules users can also manipulate task behaviours. More precisely the behaviour of a LHS task refers to whether its instance gets consumed or not after triggering a rule or to how many instances get created when a RHS task completes execution. The following example shows how we can tell the engine not to consume the LHS task and to create 5 instances of the resulting RHS task:

A[a=o1#consume=false] -> C[i1=a#instances=5];

It should be mentioned that the default behaviour is consume=true and instances=1. Due to the fact that when using rules we allow for implicit parallelism by introducing the concept of task instances we can also create explicit rule sequencing. As an example we can consider the following two rules where task A has only one instance:

A[a=o1#consume=true] -> B[i1=a],A[i1=a#instances=1];
A[a=o1#consume=true] -> C[i1=a],A[i1=a#instances=1];

In the previous example the rules cannot fire simultaneously and one of them needs to wait for the other to produce another instance of task A.

When creating new task instances they are added to the already existing ones. For example if we consider the following fragment of code:

A[a=o1] -> C[i1=a#instances=3];
B[b=o1] -> C[i1=b#instances=2];

and both task A and task B have one instance than they will trigger the execution of the two rules and the resulting number of instances for task C will be of 5.

In the case of meta-attributes there exists a special one called instances which is implicitly supported by the engine and which can determine the number of initial instances a task has:

A := [i1:input, o1:output, "instances"="5"];

The default behaviour in this case is of zero initial instances for any task.

By using the simple rule previously described it is possible to simulate basic workflow constructs such as sequence, parallel, split, join, decision, synchronisation, loop or timers:

  • sequence:
{A[a=o1] -> B[i1=a];

where task B cannot get executed until task A completes.

  • parallel: allows for RHS tasks to get executed in parallel after the LHS tasks have completed their execution:
A[a=o1],B[b=o1] -> B[i1=a], C[i1=b];

Moreover parallelism is implicit when using rule based approaches as in the case where more than one rule can be executed at a given moment. In this instance they are all triggered simultaneously.

  • split: allows for two or more tasks to be executed based on the result of one (or more) LHS task:
A[a=o1] -> B[i1=a], C[i1=a];
  • join: represents the opposite action of the previous construct by allowing several LHS tasks to merge their outputs into executing one (or more) RHS task:
A[a=o1],B[b=o1] -> C[i1=a#i2=b];

It should be noted that in the case of a join construct each variable linked to the join node should point to a different input node. This is also the case of the previous example and a construct as the following is not valid:

# Invalid join:
A[a=o1],B[b=o1] -> C[i1=a#i1=b];

This restriction is also applicable to any RHS task linked to multiple variables as each variable should link a different input port. Generally it can be said that it is prohibited to have multiple incoming edges attached to a single input port. This restriction is not valid in case of LHS tasks where we can have the same output port linked to more than one RHS input ports as in the case of the split construct.

  • decision: allows for different paths of execution to be followed based on the outcome of certain LHS tasks. This behaviour paves the way for self-adapting workflows:
A[a=o1] -> B[i1=a] | a > 0;
A[a=o1] -> C[i1=a] | a <= 0;
  • synchronisation: has been previously presented and allows RHS tasks to wait the completion of LHS tasks without actually being linked to them.
  • loop: allows for repeatable actions to be incorporated inside the workflow:
A[a=o1] -> B[i1=a];
B[b=o1] -> C[i1=b];
C[c=o1] -> A[i1=c] | c < 10;
C[c=o1] -> D[i1=c] | c >= 10;

where the last two rules express the reiteration and the exit condition from the loop with task A being the entry point.

Loop constructs can also be built by using only the number of task instances as iterator. In this case no optional condition is added to any rule and instead the rule will trigger repeatedly as long as LHS task instances exist. The following example shows such a scenario where task C gets executed 11 times: one time as a result of Rule 1 and 10 times as a result of Rule 2.

# Rule 1:
A[a=o1] -> B[i1=a#instances=10], C[i1=a];
# Rule 2:
C[c=o1], B -> C[i1=c];
  • timers: are not explicitly defined in the SILK language but can be simulated by using so called idle service which receive as argument a numerical constant representing time in milliseconds and return void after that particular time period.

SILK also offers the possibility to for users to specify rule domains. Each rule domain is identified by an integer number and when not explicitly attached to a domain a rule is considered to belong by default to the solution identified by the domain with the ID equal to 0. The following example show how this can be easily expressed in SILK:

# Rule in domain 1:
1 : A[a=o1] -> B[i1=a], C[i1=a];
# Rule in domain 2:
2 : C[c=o1], B -> C[i1=c];

At any given time only one domain can be active and only the rules inside it can execute. However there can be defined transition rules where LHS tasks belong to one domain and all the RHS tasks belong to another. In this case after firing the engine will automatically select the new domain (which the RHS tasks belong to) as the active one. The next code fragment shows such an example:

# Rule 1 in domain 1:1 : A[a=o1] -> B[i1=a], 2:C[i1=a];
# Rule 2 in domain 1:
1 : C[c=o1] -> D[i1=c];
# Rule 3 in domain 2:
2 : C[c=o1] -> E[i1=c];

As mentioned in the beginning the OSyRIS engine allows for rules to be fired either in sequence or in parallel (when multiple rules satisfy the triggering criteria in the same time). In order to allow parallel rule firing each exit point of the worklow needs to be explicitly marked by using the special meta-attribute "isLast"="value". The reason for this lies in the engine design and is independent of any SiLK syntax restrictions.

Restriction Due to the internal OSyRIS parser for SiLK files characters such as , or " are not allowed as meta-attribute values or names. The same is true in the case of the explicit values used in output ports.

Automatic workflow extractor

There are cases when the user does not now or simply does not have the time to design the chain of rules leading to a desired output. In this case it is mandatory to have the possibility to automatically generate the workflow given a final goal task. This approach is also called backwards chaining in contrast to the forward chaining used by the RETE algorithm. In this case there must exist an initial rule base containing all the allowed rules between tasks and which will be queried during the automatic workflow generation. The backward chaining phase is generated by using a depth first algorithm. The workflow execution phase begins with the user choosing the desired goal from a list of available tasks extracted from the RHS tasks inside the rule base. Then the final rules are extracted from the rule base using backwards chaining. After several workflow solutions have been generated the user, depending on available input, can chose one of them and then execute it by using the OSyRIS engine.

The following code exemplifies a possible rule base:

A:=[i1:input="initial input A", o1:output, "proc"="operation-A"];
B:=[i1:input, o1:output, "proc"="operation-B"];
C:=[i1:input, o1:output, "proc"="operation-C"];
D:=[i1:input, o1:output, "proc"="operation-D"];
E:=[i1:input, o1:output, "proc"="operation-E"];
M:=[i1:input, o1:output, "proc"="operation-M"];
F:=[i1:input, i2:input, o1:output, "proc"="operation-F"];
P:=[i1:input="initial input P", o1:output, "proc"="operation-P"];

A[a=o1] -> B[i1=a], C[i1=a];
B[b=o1] -> D[i1=b];
D[d=o1] -> E[i1=d];
C[c=o1] -> E[i1=c];
E[e=o1],C[c=o1] -> F[i1=e#i2=c];
M[m=o1] -> F[i1=m];
P[p=o1] -> M[i1=p];

Given the desired output task F the following two possible workflows are automatically created:

M:=[i1:input, o1:output, "proc"="operation-M"];
F:=[i1:input, i2:input, o1:output, "proc"="operation-F"];
P:=[i1:input="initial input P", o1:output, "proc"="operation-P"];

M[m=o1] -> F[i1=m];
P[p=o1] -> M[i1=p];

and respectively:

A:=[i1:input="initial input A", o1:output, "proc"="operation-A"];
B:=[i1:input, o1:output, "proc"="operation-B"];
C:=[i1:input, o1:output, "proc"="operation-C"];
D:=[i1:input, o1:output, "proc"="operation-D"];
E:=[i1:input, o1:output, "proc"="operation-E"];

A[a=o1] -> B[i1=a], C[i1=a];
B[b=o1] -> D[i1=b];
D[d=o1] -> E[i1=d];
C[c=o1] -> E[i1=c];
E[e=o1],C[c=o1] -> F[i1=e#i2=c];

It can be noticed that there were not produced two different workflows one for the rule chain A → C → F and another for A → B → D → E → F as task F is produced by the join of tasks E and C and not separately by any of them.

Example

In what follows we will present an example of how the workflow language can be actually used. In this directions we will present a simple workflow for computing the sum of the first 10. First we start with the initial sum of 0 (output of task B), the first number in the list which is 1 (output of task A) and we make sure that we have enough LHS task instances to trigger the rule chain. Then the sum and the next number in the list are computed in parallel as long as we have numbers to add. The following fragment of code shows details this process by using the SILK language:

A := [i1:input, o1:output="1", "processing"="return-successor", "instances"="1"];
B := [i1:input, i2:input, o1:output="0", "processing"="add", "instances"="1"];

# Compute the sum and the next number in the list as long as it is smaller than 10
A[a=o1], B[b=o1] -> B[i1=a#i2=b], A[i1=a] | a < 10;

For details on this example take a look at the code found on the SVN. It also provides a working example based on it.

In geography a scenario in which the vegetation index for some particular area needs to be detailed can be expressed by the following workflow: extract in parallel red and infrared band, then compute by using the previously obtained images the Normalized Difference Vegetation Index (NDVI).

# Initial activation task
A0:=[o1:output="imageId", "instances"="1"];
# The following tasks belong to the processing workflow
A:=[i2:input, o1:output, "datagroup"="UUID", "processing"="image extract-band(band#image)", "argument-list"="<band=red>"];
B:=[i2:input, o1:output, "datagroup"="UUID", "processing"="image extract-band(band#image)", "argument-list"="<band=infrared>"];
C:=[i1:input, i2:input, o1:output, "dataset"="UUID","processing"="image compute-ndvi(image#image)","isLast"="true"];

# Extract red and infrared bands from the initial image
A0[a=o1] -> A[i2=a], B[i2=a];
# Compute NDVI using red and infrared bands
A[a=o1], B[b=o1] -> C[i1=a#i2=b];

In the previous case the argument-list meta-attribute allows users to predefine the input of certain arguments sent to the method (exposed as a service and identified by the processing meta-attribute). Other arguments not expressed in this way can be set by binding input ports to variables inside rules.The list can contain multiple <name=value> pairs separated by the # character.

A link towards a page detailing the SOAP message exchange format between various GiSHEO components is this one.

Visual Workflow Designer

In order to facilitate the creation of consistent workflows we have developed a visual interface which allows users to seamlessly create them by using a drag and drop approach. By using it a user can create a workflow which ensures that the generated rules will reflect the desired task chaining. The designer allow users to define tasks, set their ports, number of instances and define optional meta-attributes. Rules are defined by linking tasks between them and by adding optional conditions. Restrictions as the ones defined previously are also taken care of at this stage. After visually creating the workflow users have the option of exporting them in the SILK format and other optional plug-in-able language formats which will be added later.

Visual Workflow Designer Features

Visual Workflow Administrator

Each workflow task is executed by services running on distributed resources. This approach may lead to delay problems due to resource unavailability or overuse and has direct consequence on the time needed to execute the workflow. As a result it is desired to have the ability to overview and in some sense to control where the tasks are executed and to create statistics based on this data. These statistics can later be used actions such as task scheduling or other administrative issues. The OSyRIS platform places at the user disposal such an interface where the status of each workflow task, where they are executed, how long it took to execute them, how used is a certain resource etc. can be observed.

Workflow Admin Features

Aknowledgement

The following people have contributed to the development of the OSyRIS platform (alphabetically):

  • Phd students:
    • Ciprian Craciun for help and ideas in designing the SILK language (December 2008 - present)
    • Marc Frincu for designing the SILK engine and the workflow platform (December 2008 - present)
  • Master students:
    • Codruta David for help in implementing the Backwards Chaining Module (May 2009)
    • Vlad Flore for help in implementing the Backwards Chaining Module (May 2009)
    • Claudiu Mardarasevici for help in in designing and implementing the Workflow Administrator (May 2009)
    • Staniloiu Mircea for help in in designing and implementing the Workflow Administrator (May 2009)
    • Andrei Niciu for help in designing and implementing the Visual Workflow Designer and part of the Workflow Administrator (May-June 2009)
    • Mihaela Radescu for help in designing and implementing the Visual Workflow Designer and part of the Workflow Administrator (May-June 2009)

This research is supported by ESA PECS Contract no. 98061 GiSHEO - On Demand Grid Services for High Education and Training in Earth Observation.

Attachments