Harvest report
Overview¶
The harvest report is a visual summary of how many organizations publish open data and provide a webpage about them.
Reports¶
Under Reports you can also view harvest reports for each source.
If you click on a harvesting source you can see the harvest details. There are five different tabs: Status, Information, Validation, Statistics and Link Check Report. Each view also has a direct link so that you can open it in a separate browser tab if you want to share the link to the view with someone else.
Status¶
Under the “Status” tab, you will find the daily harvest status for the last 30 days, as well as a list of the latest harvest attempts.
You can click on the row for a harvest attempt to get more information about it.
In addition, if you are logged in and looking at your own data, you can also see a button “Reharvest”, which you can click for an instant harvest.
You can also see a link to a validation reportfor your catalog.
Information¶
Under the "Information" tab you will find general information about the organization.
If you are logged in and reviewing your own organization, you may edit the information by clicking the "Edit" button.
Validation¶
If you click on the "Validation" tab you can see if there are any issues with the metadata. You will then see a summary of the validation results per entity type/class.
If you click on a row for an entity type, errors (red) and warnings (yellow) are listed for the different instances. You can also choose to view all metadata fields by clicking the "Validation report" button.
The fields are then displayed in the same order as they appear in EntryScape Catalog, with any warning icons and errors described under each field.
Statistics¶
Under the "Statistics" tab you can see how many entities of different kinds have been harvested as well as how many of their mandatory input fields or recommended input fields are filled in.
Link Check Report¶
The last tab, "Link Check Report", shows how many working or broken links are found in the metadata.
Currently, the link check report is configured for DCAT-AP metadata, and the fields checked for each type are as follows:
- dcat:Distribution
- dcterms:conformsTo
- dcat:accessURL
- dcat:downloadURL
- dcat:Dataset
- dcat:landingPage
- dcterms:conformsTo
- foaf:page
- owl:versionInfo
- dcat:DataService
- dcat:endpointDescription
- dcat:landingPage
- foaf:page
A first attempt to reach the link is made via the HEAD method. If the call is answered with status 400 or higher, the GET method is used instead. A timeout for the call is used, currently 5000 ms. In order not to overload domains with many links in the data catalog, a pause of at least 1000 ms is made between each call to each link within the same domain. Once the call is made, the status of the link is reported to the report.














