Skip to content

Managing organizations and harvesting sources

In Registry each organization is also a harvesting source. In order to be able to harvest a data catalog, their organization must first be added.

Adding an organization and harvesting source

To add a new organization, click on the button "Create".

Create button

You need to enter the name of the organization, a short description, and the web address of the harvesting source.

Dialog Create organization

When you are finished, the organization is saved in Registry and will be automatically harvested. It may take a little time before the web address is checked and its data has been harvested. Automated harvests are done once a night.

Listed organizations

Check harvesting status

If you click on an organization, you can see statuses for the last harvests. You can also click the button "Reharvest" to manually trigger a harvesting attempt, which usually takes only a few minutes.

Harvesting status for an organization

If you click on the harvest row, you get more details about that particular harvesting attempt.

Detailed harvesting status

Edit organization and harvesting source

If you want to view or edit the information about an organization, as in name, description or web address for its harvesting source, click on "Information" in the left menu. Then you may click the "Edit" button to make changes.

Edit organization and harvesting source

An edit dialog opens where you can change the organization's name, description and URL to harvesting source.

Edit dialog

Statistics

You can view statistics for an organization's metadata if you click on “Statistics” in the left menu.

Statistics

Under "Link check report," you can see how many links that are working and how many that are broken in the catalog for the organization. Currently, the link check report is configured for DCAT-AP metadata, and the fields checked for each type are as follows:

  • dcat:Distribution
    • dcterms:conformsTo
    • dcat:accessURL
    • dcat:downloadURL
  • dcat:Dataset
    • dcat:landingPage
    • dcterms:conformsTo
    • foaf:page
    • owl:versionInfo
  • dcat:DataService
    • dcat:endpointDescription
    • dcat:landingPage
    • foaf:page

A first attempt to reach the link is made via the HEAD method. If the call is answered with status 400 or higher, the GET method is used instead. A timeout for the call is used, currently 5000 ms. In order not to overload domains with many links in the data catalog, a pause of at least 1000 ms is made between each call to each link within the same domain. Once the call is made, the status of the link is reported to the report.

Notifications

Read about notifications here.