Managing a Pipeline¶

Creating a Pipeline¶

./pl create --data "Example;example@example.com;123456;www.example.com"

This command is what is needed to create a pipeline. The --data argument provides the pipeline information (orgtype, name, email, web, orgid). There is no need to provide a --type argument if the data source is RDF, since DCAT is chosen by default. Since there already exists a recipe for how to create a pipeline for handling RDF, there is no need to provide a the transforms that the pipeline consists of either.

There exists pre-defined pipelines for other use-cases as well, such as when handling CKAN and Inspire data sources. If the use case at hand is not catched by any of these recipes, a custom pipeline has to be maded. This is done by utilizing the --custom flag, while also passing the pipeline transforms as argument to the --pipeline flag:

./pl create -t custom -p data/pipeline.json -d  "Example;example@example.com;123456;www.example.com"

Listing all Pipelines¶

./pl ls

This command is will diplay all the pipelines which exist in the repository.

Executing a Pipeline¶

./pl exec --context 120

Once a pipeline has been created, it can be executed in order to instantiate a data processing job. In the above example, we have specified what exact pipeline we want to execute by providing its context ID. From this specific pipeline, a job will be created, denoted as a Pipeline Result.

The Pipeline Result¶

A Pipeline Result has a status, and initially this is set to Pending. Once the job is run, the job may have any of the following status: InProgress, Succeeded, Failed. Running a job is carried one by the Harvester