Harvesting sources¶
To be able to harvest a data catalog, its harvesting source must first be added. An organization may have one or more harvesting sources with separate harvesting web addresses.
Adding a harvesting source¶
To add a new harvesting source, click on the button "Create".
You need to enter the name of the organization, a short description, and the web address of the harvesting source.
When you are finished, the harvesting source is saved in Registry and will be automatically harvested. It may take a little time before the web address is checked and its data has been harvested. Automated harvests are done once a night.
Check harvesting status¶
If you click on the source, you can see statuses for the latest harvests. You can also click the button "Reharvest" to manually trigger a harvesting attempt, which usually takes only a few minutes.
If you click on the harvest row, you get more details about that particular harvesting attempt.
Edit harvesting source¶
If you want to view or edit the information about a harvesting source, as in name, description or web address, click on "Information" in the left menu. Then you may click the "Edit" button to make changes.
An edit dialog opens where you can change the source's name, description and web address.
Validation¶
If you click on "Validation" you can see if there are any issues with the metadata. You will then see a summary of the validation results per entity type/class.
If you click on a row for an entity type, errors (red) and warnings (yellow) are listed for the different instances. You can also choose to view all metadata fields by clicking the "Validation report" button.
The fields are then displayed in the same order as they appear in EntryScape Catalog, with any warning icons and errors described under each field.
Statistics¶
You can view statistics for an organization's metadata if you click on "Statistics" in the left menu.
Link check¶
Under "Link check report" you can see how many links are working and how many are broken in the harvesting source's data catalog.
Currently, the link check report is configured for DCAT-AP metadata, and the fields checked for each type are as follows:
- dcat:Distribution
- dcterms:conformsTo
- dcat:accessURL
- dcat:downloadURL
- dcat:Dataset
- dcat:landingPage
- dcterms:conformsTo
- foaf:page
- owl:versionInfo
- dcat:DataService
- dcat:endpointDescription
- dcat:landingPage
- foaf:page
A first attempt to reach the link is made via the HEAD method. If the call is answered with status 400 or higher, the GET method is used instead. A timeout for the call is used, currently 5000 ms. In order not to overload domains with many links in the data catalog, a pause of at least 1000 ms is made between each call to each link within the same domain. Once the call is made, the status of the link is reported to the report.
Notifications¶
Read about notifications here.











