Managing a Pipeline¶
Creating a Pipeline¶
./pl create --data "Example;example@example.com;123456;www.example.com"
This command is what is needed to create a pipeline. The --data argument provides the pipeline information (orgtype, name, email, web, orgid). There is no need to provide a --type argument if the data source is RDF, since DCAT is chosen by default. Since there already exists a recipe for how to create a pipeline for handling RDF, there is no need to provide a the transforms that the pipeline consists of either.
There exists pre-defined pipelines for other use-cases as well, such as when handling CKAN and Inspire data sources. If the use case at hand is not catched by any of these recipes, a custom pipeline has to be maded. This is done by utilizing the --custom flag, while also passing the pipeline transforms as argument to the --pipeline flag:
./pl create -t custom -p data/pipeline.json -d "Example;example@example.com;123456;www.example.com"
Listing all Pipelines¶
./pl ls
This command is will diplay all the pipelines which exist in the repository.
Executing a Pipeline¶
./pl exec --context 120
Once a pipeline has been created, it can be executed in order to instantiate a data processing job. In the above example, we have specified what exact pipeline we want to execute by providing its context ID. From this specific pipeline, a job will be created, denoted as a Pipeline Result
.
The Pipeline Result¶
A Pipeline Result
has a status, and initially this is set to Pending
. Once the job is run, the job may have any of the following status: InProgress
, Succeeded
, Failed
. Running a job is carried one by the Harvester