Skip to content

Harvesting sources

To be able to harvest a data catalog, its harvesting source must first be added. An organization may have one or more harvesting sources with separate harvesting web addresses.

Adding a harvesting source

To add a new harvesting source, click on the button "Create".

Create button

You need to enter the name of the organization, a short description, and the web address of the harvesting source.

Create source dialog

When you are finished, the harvesting source is saved in Registry and will be automatically harvested. It may take a little time before the web address is checked and its data has been harvested. Automated harvests are done once a night.

Listed sources

Check harvesting status

If you click on the source, you can see statuses for the latest harvests. You can also click the button "Reharvest" to manually trigger a harvesting attempt, which usually takes only a few minutes.

Harvesting status for a source

If you click on the harvest row, you get more details about that particular harvesting attempt.

Detailed harvesting status

Edit harvesting source

If you want to view or edit the information about a harvesting source, as in name, description or web address, click on "Information" in the left menu. Then you may click the "Edit" button to make changes.

Edit harvesting source

An edit dialog opens where you can change the source's name, description and web address.

Edit dialog

Validation

If you click on "Validation" you can see if there are any issues with the metadata. You will then see a summary of the validation results per entity type/class.

Sources, Validation

If you click on a row for an entity type, errors (red) and warnings (yellow) are listed for the different instances. You can also choose to view all metadata fields by clicking the "Validation report" button.

Sources, Validation, clicked row

The fields are then displayed in the same order as they appear in EntryScape Catalog, with any warning icons and errors described under each field.

Sources, Validation, validation report

Statistics

You can view statistics for an organization's metadata if you click on "Statistics" in the left menu.

Statistics

Under "Link check report" you can see how many links are working and how many are broken in the harvesting source's data catalog.

Link check report

Currently, the link check report is configured for DCAT-AP metadata, and the fields checked for each type are as follows:

  • dcat:Distribution
    • dcterms:conformsTo
    • dcat:accessURL
    • dcat:downloadURL
  • dcat:Dataset
    • dcat:landingPage
    • dcterms:conformsTo
    • foaf:page
    • owl:versionInfo
  • dcat:DataService
    • dcat:endpointDescription
    • dcat:landingPage
    • foaf:page

A first attempt to reach the link is made via the HEAD method. If the call is answered with status 400 or higher, the GET method is used instead. A timeout for the call is used, currently 5000 ms. In order not to overload domains with many links in the data catalog, a pause of at least 1000 ms is made between each call to each link within the same domain. Once the call is made, the status of the link is reported to the report.

Notifications

Read about notifications here.