Skip to content

Datasets and distributions

Control datasets

It's a good idea to start by trying to avoid redudant effort. That's why we start by checking if anyone has already worked on the data we want to publish or improve.

Search datasets

If you cannot find the dataset you have upgraded from suggestions - use the search function to search the available datasets in the catalog.

Screenshot dataset search

Update an existing dataset

Choose dataset

When you have found the dataset you want to describe - click on the name of the dataset you have upgraded.

Screenshot edit existing dataset

This will take you to an overview view of the dataset you have selected.

Screenshot edit dataset

Then click on Edit dataset.

Application profile (standard)

At Select profile: select the application profile for metadata that fits your dataset. Before you start describing the dataset, always select the application profile in the top right corner.

Screenshot choose metadataprofile

The organization can publish according to different application profiles, e.g. for PSI it can be DCAT-AP and for the INSPIRE Directive it can be a combination of NMDP and DCAT-AP. The most popular profile is usually preset by default. If you are working with geodata, you should use the GEODCAT-AP profile.

Screenshot chosen geodataprofile

Then edit more metadata and press Save changes.

Screenshot edit and save dataset

Describe datasets

In this step you add or edit descriptions (metadata fields).

A dataset can be described with the DCAT-AP metadata standard by clicking on the list line or via the Edit menu. Like the metadata dialog for the catalog itself the dataset properties are organized into mandatory, recommended and optional.

Screenshot of metadatadialogue

The fields you must fill in according to the organization's requirements are marked with asterisks to indicate that they are mandatory and are located under the Mandatory tab. This is always visible by default.

You can activate the fields that are recommended or optional according to the organization's procedure and the selected application profile by clicking on Recommended and Optional in black. Keep in mind that it may be mandatory in your organization to fill in mandatory metadata fields in multiple languages. When editing a dataset, you may want to make sure to enter some fields in both your language and an additional language to make the dataset more visible, understood and used. These fields may include: title, description, and keywords. In this way, colleagues and other users of datasets can more easily find and make more use of them.

Title - Add or edit

This title should be the title you entered when the dataset was a suggestioon. You can update to clarify and improve. Remember to also add the title in English if it is not included since previous steps.

Screenshot of add or edit title

Description - Add or edit

The description can be edited here and improved from the one you put in the suggestions view.

Screenshot of add or edit description

Publisher

Add your organization as publisher. You can either search, add a new publisher or choose an existing publisher.

Screenshot of add or edit publisher

Contact point

A dataset needs a contact, usually the dataowner, to keep the organization in control of which department or person is updating and maintaining the dataset. This can be an individual or organization (department or unit) within the organization with a non-personal mail address. Select a contact from the list or create one if there is none.

Choose contact point

Select a contact from the drop-down list.

Screenshot of add or edit existing contact

Otherwise you need to create a contact, either individual or organization, e.g. generic mail address for the project, department or unit.

Create contact point

Click on the magnifying glass to browse all contacts or create a contact.

Screenshot of browse or create contact

If you cannot select a contact from the list - click on "+ Create".

Screenshot of create contact

Select Individual or organization and fill in the required fields, preferably also recommended fields. Optional fields are not available. Then click on Save.

Screenshot of create individual or organization

Then the contact point is filled in the dataset description.

To make the dataset searchable, and thus discoverable, you need to describe it by defining keywords, categories and subject terms (e.g. via linked or imported terminologies such as GEMET from the EU). Filling in the various fields may require some thought and work but they are very important to make it possible to find the datasets, both in our own catalog but also in other catalogs that harvest our catalog (e.g. dataportal.se, govdata.de or opendata.swiss).

Keywords

Describe the dataset with your own keywords, one per line without commas. Create a new row with the plus sign (+).

Screenshot edit keywords

Category (Data Theme Vocabulary)

Choose one or more Data Theme Vocabulary categories where your dataset fits.

Screenshot metadata category

Category (Data Theme Vocabulary)

Special: Custom fields

In some cases, there may be custom fields that your organization has in its custom application profile. This is only available in your own instance of EntryScape and not in EntryScape Free.

Example: GEMET terminology from Entryscape Terms

For example, your organization may have chosen to import controlled subject terms from the GEMET (GEneral Multilingual Environmental Thesaurus) glossary into EntryScape. GEMET is an established European thesaurus with over 40 themes (top level) and contains a total of about 5000 terms. As there are so many terms to choose from, the easiest way to select appropriate subject terms is to use the search function.

Custom field from Inspire application profile

If your organization has a custom application profile for e.g. INSPIRE.

Example: Categories for Inspire

If you have selected the Inspire profile, the Inspire Theme and Subject Category fields are also visible to describe the data set.

Date – Release date – Date modified

Screenshot Date – Release date – Date modified Release date – Date modified

Specify the release date of the dataset and eventually date modified, in case of maintenance.

Time period

The Time period metadata field under the Optional tab means that you can specify the time period to which the dataset applies, e.g. between a number of months in the same year or a number of years, or days in the same week. You can specify this by entering the date or by clicking on the calendar icon. This applies to both the start and the end of the time period. Screenshot metadata time period start end

Language

Add one or more languages in which the dataset is available.

Screen of language

Landing page

This property refers to a web page that provides access to the dataset or its distributions and/or additionMake publishing availableal information. It should point to a landing page of the original data provider and not to a page on a third-party website, such as an aggregator.

Screenshot landing page

Conforms to

This property allows you to refer to an implementing regulation or other specification. For example, you can refer to something you have imported or uploaded with the Document function. For example, you can select a specification you imported from a dataportal.

Screenshot of conformsTo

Named geographical area

Enter the geographical area here. Example with coordinates for Schaffhausen can be found at: geonames.org.

Screenshot geographical area

You can select from geographical areas at a detailed level in the menu.

Screenshot choose geographical area

Access rights for classified datasets

For datasets that have sensitive personal data or other classified data, you need to set the correct access rights setting. Under the Access Rights field, specify Restricted to make it clear that the dataset with distributions is not available as open data. Note that metadata should be published but distributions should not be created to these datasets.

Screenshot of Access Rights

Table view

You can also edit descriptions and set metadata via table view on multiple datasets simultaneously.

Screenshot of table view creation dialog

Hover over the field you want to edit.

Screenshot of table view creation dialog

Then click on the field you want to edit and describe it with a value from the drop-down list or free text.

Screenshot of table view creation dialog

Screenshot of table view creation dialog

Click anywhere other than the popup box and see that the row in the list is marked as modified in a field. Then click "Save" to save your changes or "Discard" if you don't want to save one or more changes you made.

Screenshot of table view creation dialog

Add dataset

Create dataset from a template

You can also create a dataset from a template.

Copy dataset

You can copy datasets contained in your catalogs. This is practical for reusing general metadata and descriptions you or other users might otherwise need to fill in again and again. You can avoid this if you have chosen to customize your metadata profile with some pre-filled fields, but even then you can use the copy function. The interface allows you to copy the dataset within the catalog itself. You can therefore create one or more datasets without distributions that can serve as templates for other datasets. Among other things, you can have described these datasets that are tailored to specific application profiles. For example, it could be open data according to the Open Data Act and the PSI Directive, or perhaps open geodata according to the INSPIRE Directive.

Start by creating a dataset that you describe with proper metadata. You can create different specific templates, e.g. one for open data, one for shared data, one for open geodata and so on.

Screenshot of the table view creation dialog

The next step is to press the "Copy" button to copy the proposal or dataset.

Screenshot of the table view creation dialog

Then confirm that you want to copy the dataset without distributions.

Screenshot of table view creation dialog

Then change the title of your copy to a unique title and continue describing it with unique metadata for the new dataset based on your copy. Click save when you are done.

Distributions

Adding distributions

Distributions are the manifestation of a dataset. Usually there is one distribution per format. For example, a dataset has two distributions if it can be downloaded as a CSV file and if it is also accessed via an API. There is no limit to the number of distributions per data set. It is recommended to have no more than one distribution per technically available format.

A distribution is created for a dataset by clicking on the create button in the list row of the dataset.

Screenshot of the distribution creation dialog

A single distribution can consist of several files. This is particularly useful if a dataset contains time series that are regularly updated/expanded. It is sufficient to just add new data to an existing distribution, instead of replacing all the data for the distribution.

Create an API from tabular data

A distribution with tabular data (currently CSV files) can be used to automatically create an API.

There are a few conditions that must be met for the API generation to work:

  • The first row of the table should contain short names for each column as they are used as variable identifiers in the API. The column titles will be trimmed and converted to lower case on import.
  • String values such as column titles or cell contents may only contain Unicode characters.
  • Commas (",") must be used as column separators. Detection is available for CSV files that use semicolons (";") as separators, but it is recommended to use commas.
  • Double quotes (""") must be used as quotes.
  • Double backslash ("\") must be used as escape character.
  • Line feed ("\n") or carriage return followed by line feed ("\r\n") must be used to indicate a new line.

An automatically generated API is available via its REST interface and a simple web interface can be accessed with a web browser. The web interface contains links to more detailed Swagger-based API documentation.

Activate API for a distribution

It is beneficial to enable the API for a file distribution if possible. If you have uploaded a file as a distribution, you can enable it for use through the API. To do this, go to the dataset overview and click on the three-point menu on the right, then click Activate API.

Screenshot of activate API

If the dataset is not published, the following warning will appear that the dataset will be published when you start activating API generation. If you are OK with this, click "Yes".

Screenshot of API generation warning

If you have already published the dataset or clicked "Yes" on the warning, API generation will begin.

Screenshot of API generation

Once the API has been generated, click Close.

Screenshot of close after API generation

The URL of the API can be seen in the API information dialog box.

Screenshot after API generation

Describe distribution

License

This field is under distributions and it is mandatory to specify the license for the distribution. Select the license that applies to the distribution even if you also specified it for the dataset. Creative Commons Zero Public Domain 1.0 (CC0 1.0) is often the standard for open data and organizations. The licence that the organization has chosen to work with is often explained in the organization's Information Reuse Policy document.

Format

Describe the format of your distribution. Choose from the list of different technical formats that the data recipient will receive. You can choose from the lists of Common mediatypes, Geographic mediatypes or enter Other mediatypes as text. To specify other mediatypes in MIME format, a complete list is available from the IANA organization.

External API or file as distribution

This way of making data available is done through a URL link you provide to an external API or external file area. That system updates its API when there are updates. This approach is suitable when your organization has chosen not to work with uploaded files in EntryScape.

Start from the overview of the dataset. Once the description is correct, you can add a Distribution.

To do so, click on the (+) Plus icon to the right of Distributions.

Screenshot of Add Distribution

Then select "Web address" and paste a link from an API ending in ".json" for example.

Screenshot of Create distribution metadata

You can specify the update frequency from the API, for example daily, weekly, monthly or other.

Then click on Recommended and also Optional.

Fill in the title, description, the format of the file and the license (probably CC0 1.0) that the organization uses to make it clearer to other publishers and dataconsumers how they can access the distribution and how they can use it - in this case how they want as it is licensed as a public domain (open data).

Remember to click Create at the bottom to save.

Visualizations

From EntryScape 3.4 onwards, you can create visualisations for bar charts, line charts, maps and tables. These can be automatically displayed through EntryScape Blocks and on data portals that support embedded visualizations through EntryScape Blocks, for example.

To add a visualization, go to the edit dataset.

Adding visualizations

Then, when you want to create a visualisation, press the "+" button to the right of Visualisations.

Screenshot of Create visualisation

Next, in the drop-down list, select which distribution from the dataset you want to use.

Screenshot of Choose distribution for visualisation

Select a distribution that fits one of the visualization modes. For CSV files, bar chart, line chart, map and table are suitable depending on the data model and the data itself. For WMS services, map is suitable.

Bar chart

Select bar chart to create a bar chart

Screenshot of Create visualisation

Then set an appropriate title. For example, for the public in dataset, it could be "Bar chart visualizing amount and date of purchase".

Line chart

Depending on the data model, you can create a line chart.

Screenshot of Create visualization

Map

If you want to create a map visualization, you need coordinates in a datafile or a WMS service as a distribution.

Screenshot of Create visualisation

Table

You can create a visualization with a table. It creates a preview.

Screenshot of Create visualisation

Publish

Make dataset publicly available

Do you want to know if the catalog you are working in is published and therefore Public? If so, the Public button should be visible and clickable. If this button is not visible or cannot be clicked, you or your colleague in charge of the group will need to publish the catalog you are using. In that case, see the overview of catalogs and click on the red button for the unpublished catalog and activate it.

Preview before publishing

Before publishing, you can preview the dataset. Exit the overview view and select Preview. By default, this opens a new tab to switch to.

Screenshot of Preview before publishing

This is to see that the dataset and descriptions are correct and to get an idea of what it will look like for data consumers on e.g. dataportals.

Screenshot of the Preview button before publishing

Once you have finished viewing the preview, you can close the tab or go back and either continue describing through "Edit" or go to Status.

Publish with the Status button

When you have finished describing your dataset and distributions, you will be in the overview view of your dataset.

Screenshot of Dataset unpublished

There is a button on the right called Status. It's drawn to the left and it says Unpublished underneath and thus the dataset is unpublished. If you or a colleague has already published the dataset, it has shifted to the right and it says Published, indicating that publishing is active.

Next, you can work on how you and your unit will maintain the dataset.

PSI - Publish as open data

To publish the dataset as open data, you need to make sure that the "PSI" button is enabled.

Screenshot of Dataset as PSI

If the dataset is published and PSI is not active, it means that the dataset is published as shared data for internal use within the organization. For example, the shared dataset can be consumed via a embedded intranet through EntryScape Blocks.

Maintenance

Edit

To edit and update the metadata of a distribution, click on the distribution or the options menu and then on "Edit". Then you can edit as you did when you created the distribution with fields that are required, recommended and optional. You can change descriptions in existing fields and add metadata in fields that you did not previously fill in. For example, you can add in fields that can be described if you press "Recommended" and "Optional" to have the opportunity to enhance the metadata.

Screenshot of Update metadata

Statistics

When you are in the overview view of your published datasets, you can see statistics about its usage. In EntryScape, you can get statistics about distributions that are uploaded files or generated as APIs from uploaded files. To get statistics about a specific distribution, click on the options menu and then on Statistics.

Screenshot of Statistics option menu

If there are no statistics about the selected time period, you can change it to see. More functionality for statistics can be found in the statistics function.

Screenshot of change time for statistics

Versions

To view versions of a file distribution, you can click on the options menu of a file distribution and then "Versions".

Screenshot of Version menu

Then you can choose which version you want to see.

Screenshot of Choose version

Once you've chosen the version you want to see more about, you'll see the metadata for that particular version.

Screenshot of Version info

Click close to return to the dataset overview.

Remove

If you want to delete a distribution, click on the options menu and delete.

Screenshot of Remove Menu

If you generated an API from the file distribution, you will be stopped and you need to do the same but on the distribution "Auto-generated API".

Screenshot of Remove API distribution

Once you have removed the distribution that was the API generated from the file, you can try again. Confirm the removal by clicking Yes.

Screenshot of Remove Distribution

Manage files

Screenshot of Manage files menu

Overview

Here you can see an overview of different files in the file distribution. This feature is useful if you are adding one or more updated files to the distribution per new time period, e.g. data for a new month on a monthly basis. Or if you are replacing the file with additional updated data.

Add file

To add a file, click on "+ Add File".

Screenshot of Manage files add file

Then select the file to upload and describe with metadata.

Screenshot of Add file metadata

Then click save.

Screenshot of Save added file

Replace and update file

When updating your dataset, you may need to replace and update the file you previously uploaded as a distribution. For example, the data in the previous version may not have been correct and the file needs to be updated with corrections. Click on the three-dot menu on the right side of the distribution under the overview view and click Manage Files.

Then click on the More menu for a file and click "Replace file". Follow the same procedure if you want to edit, download or delete the file.

Screenshot of Replace and update file

If you have a linked API or visualization, you will get the warning "There is at least one visualization for this file. If you replace the file, related visualizations will be deleted. Are you sure you want to continue?". You will need to create the visualization again if you replace the file. If you are OK with this click "Yes".

Frequency of update

Based on what you have described in the Frequency of update field, you should update your dataset according to the frequency specified. You may want to set reminders in your calendar or project management tools to remind you to update the dataset according to the specified update frequency. A dataset that is not updated at the specified update frequency may result in data consumers contacting you or your organization to notify you that the dataset is not updated. This can create additional work and it is preferable to update proactively according to the update frequency.

Screenshot of the Update Frequency field