COMPSs Programming Framework


COMPSs is a framework, composed of a programming model and a runtime system, which aims to ease the development and deployment of distributed applications and web services. The core of the framework is its programming model, which allows the programmer to write applications in a sequential way and execute them on top of heterogeneous infrastructures exploiting the inherent parallelism of the applications. The COMPSs programming model is task-based, allowing the programmer to select the methods of the sequential application to be executed remotely. This selection is done by means of an annotated interface where all the methods that have to be considered as tasks are defined with annotations describing their data accesses and constraints on the execution of resources. At execution time this information is used by the runtime to build a dependency graph and orchestrate the tasks on the available resources.

COMPSs programming model

The programming model syntax enables the easy development of applications as composite services. A composite, called Orchestration Element (OE), is written as a sequential program from which other services and regular methods, namely Core Elements (CE), are called. Therefore, composites can be hybrid codes that reuse functionalities wrapped in services or methods, adding some value to create a new product that can also be published as a service. Besides, all the information needed for data-dependency detection and task- based parallelization is contained in a separate annotated Core Element Interface (CEI).

Any COMPSs application can be composed of two different kinds of CE: Method CE and Service CE. Method CEs are regular methods of the application selected to be run remotely. To pick a method CE, the programmer declares the method in the CEI, adding the @Method annotation indicating the implementing class.

On their turn, Service CEs correspond to SOAP Web Service operations described in WSDL documents. To select a SOAP operation as a CE, the developer declares the service operation together with the @Service annotation describing the service details (namespace, service name and service port). The location of the service is not included in the CEI, but instead in the runtime configuration that actually decides which server will run the task; thus, the programming model syntax remains completely unaware of the underlying infrastructure.

The next code contains an example of COMPSs application in Python. This script performs some computation for a number of steps (line 5) and then merges the partial results, of type dictionary, into a final dictionary (line 6, variable result). Each computation receives a configuration parameter, initialised in line 3. The script can be executed as a sequential Python program, but in order to parallelise it with COMPSs, we define as tasks three functions called by the script: init conf, compute step and merge.

 

  1. result = {}
  2. num steps = 3
  3. conf = init_conf ()
  4. for i in range(num steps):
  5. step res = compute_step(i, conf)
  6. merge(result, step_res)
  7. from pycompss.api.api import compss wait on
  8. result = compss wait on(result)
  9. print ”Result: ”, result

A task definition in PyCOMPSs is done by means of Python decorators, which are part of the standard Python syntax and permit to wrap calls to functions, with some additional behaviour. In particular, the user needs to add, before the definition of the function, a @task decorator that describes the task. Continuing with the example, the code below shows the aforementioned functions together with their @task decorators. Function init_conf returns an object of class Configuration (defined in line 1), as stated by its decorator (line 4). Similarly, compute_step returns a dictionary (as specified in the decorator in line 7) and receives two parameters: an integer and a Configuration object. Finally, merge receives two dictionary parameters and merges them into the first one (line 13); in order to declare that the first dictionary will be modified along the task, the decorator defines it as an input-output parameter (line 11).

 

  1. class Configuration(object):
  2. from pycompss.api.task import task
  3. from pycompss.api.parameter import *

 

  1. @task(returns = Configuration)
  2. def init conf():
  3. return Configuration()

 

  1. @task(returns = dict)
  2. def compute step(step, conf ):
  3. res = do some computation(step, conf)
  4. return res

 

11 @task(dict1 = INOUT)

  1. 12. def merge(dict1, dict2):
  2. dict1.update(dict2)

 

Figure 1 depicts the task dependency graph built on the fly by COMPSs for the Python example. The first asynchronous task that is created corresponds to function init_conf, and thereafter the main program proceeds immediately to execute the computation loop and merge the tasks. Inside the loop, a total of 3 compute_step tasks are generated, depending all on the previous init_conf task because they receive the configuration object conf as input parameter – if no direction is specified for a parameter, it defaults to IN. The loop also generates 3 merge tasks, each depending on their corresponding compute_step for the partial result of the iteration (variable step_res); moreover, each merge task depends on the result produced by the previous iteration (stored in result), and are subsequently ordered  in  a task chain. Once the loop completes, the program reaches lines 7-9, where the final result in variable result is printed. Before printing, though, the script needs to synchronise for the last value of result, produced by the last merge task. In order to do that, PyCOMPSs provides an API function, compss_wait_on, which stalls the main control flow until the last result value is obtained. Hence, the call to compss (line 8) will wait for the last merge task to finish before obtaining and returning the final result, so that it can be printed (line 9).

Task dependency graph corresponding to the example script

Figure 1 – Task dependency graph corresponding to the example script

COMPSs runtime

One important feature of the COMPSs runtime is the ability to exploit the cloud elasticity by adjusting the amount of resources to the current workload. When the number of tasks is higher than the available cores, the runtime turns to the cloud looking for a provider offering the type of resources that better meet the requirements of the application and with the lowest economical cost. Analogously, when the runtime detects an excess of resources for the actual workload, it will power off unused instances in a cost-efficient way. Such decisions are based on the information on the type of resources, that contains the details of the software images and instance templates available for every cloud provider. Since each cloud provider offers its own API, COMPSs defines a generic interface to manage resources and to query about details concerning the execution cost of multiple cloud providers during one and the same execution. These, called connectors, are responsible for translating the generic requests to the actual provider’s API.

COMPSs does not provide only a programming model. The framework is complemented with a set of platform tools which facilitates (i) the development of the COMPSs applications by means of an Integrated Development Environment (IDE); (ii) the deployment of applications in distributed infrastructures by means of the Programming Model Enactment Service (PMES); and (iii) the monitoring of executions by means of the Monitoring and Tracing tools. Figure 2 shows a diagram with all the tools composing the framework, and their corresponding place in  the service lifecycle.

COMPSs Framework architecture

Figure 2 – COMPSs Framework architecture

 

 

The transparent deployment of COMPSs applications on cloud infrastructures is delegated to the PMES PaaS component, whose architecture is depicted in Figure 3. Via a Basic Execution Service (BES) interface, the PMES exposes the needed operations to the COMPSs IDE dealing with the intricacies of the deployment and contextualization operations, and the installation of the application packages, the required libraries, and the monitoring processes. A dashboard (Figure 4) is also available for the configuration of the user cloud environment.

PMES architecture

Figure 3 – PMES architecture

 

 

PMES Dashboard

Figure 4 – PMES Dashboard

 

 

 

 

 

 

 

The runtime of COMPSs provides some information at execution time so that the user can follow the progress of the application through a web interface that shows real-time information on the tasks being executed and on the usage of the resources (Figure 5).

 

COMPSs execution monitor

Figure 5- COMPSs execution monitor

At the end of each execution or file transfer, the COMPSs runtime also creates usage records. The usage records contain information about the resources involved in the task execution, the source and destination resources in data transfers, and the start and end time of each operation. Once the application completes, all these usage records can be processed by the Tracing tool in order to perform a postmortem reconstruction of the application execution across the different cloud resources. This reconstruction can be visualized by tools such as Paraver in order to detect bottlenecks and unbalanced parts of the application which could be fixed to increase the application performance (Figure 6).

 

fig6

Figure 6 – Paraver analysis of COMPSs execution traces