Managing organizations and harvesting sources¶
In Registry each organization is also a harvesting source. In order to be able to harvest a data catalog, their organization must first be added.
Adding an organization and harvesting source¶
To add a new organization, click on the button "Create".
You need to enter the name of the organization, a short description, and the web address of the harvesting source.
When you are finished, the organization is saved in Registry and will be automatically harvested. It may take a little time before the web address is checked and its data has been harvested. Automated harvests are done once a night.
Check harvesting status¶
If you click on an organization, you can see statuses for the last harvests. You can also click the button "Reharvest" to manually trigger a harvesting attempt, which usually takes only a few minutes.
If you click on the harvest row, you get more details about that particular harvesting attempt.
Edit organization and harvesting source¶
If you want to view or edit the information about an organization, as in name, description or web address for its harvesting source, click on "Information" in the left menu. Then you may click the "Edit" button to make changes.
An edit dialog opens where you can change the organization's name, description and URL to harvesting source.
Statistics¶
You can view statistics for an organization's metadata if you click on “Statistics” in the left menu.
Link check¶
Under "Link check report," you can see how many links that are working and how many that are broken in the catalog for the organization. Currently, the link check report is configured for DCAT-AP metadata, and the fields checked for each type are as follows:
- dcat:Distribution
- dcterms:conformsTo
- dcat:accessURL
- dcat:downloadURL
- dcat:Dataset
- dcat:landingPage
- dcterms:conformsTo
- foaf:page
- owl:versionInfo
- dcat:DataService
- dcat:endpointDescription
- dcat:landingPage
- foaf:page
A first attempt to reach the link is made via the HEAD method. If the call is answered with status 400 or higher, the GET method is used instead. A timeout for the call is used, currently 5000 ms. In order not to overload domains with many links in the data catalog, a pause of at least 1000 ms is made between each call to each link within the same domain. Once the call is made, the status of the link is reported to the report.
Notifications¶
Read about notifications here.