Datasets and distributions¶
Control datasets¶
It's a good idea to start by trying to avoid redudant effort. That's why we start by checking if anyone has already worked on the data we want to publish or improve.
Search datasets¶
If you cannot find the dataset you have upgraded from suggestions - use the search function to search the available datasets in the catalog.
Update an existing dataset¶
Choose dataset¶
When you have found the dataset you want to describe - click on the name of the dataset you have upgraded.
This will take you to an overview view of the dataset you have selected.
Then click on Edit dataset.
Application profile (standard)¶
At Select profile: select the application profile for metadata that fits your dataset. Before you start describing the dataset, always select the application profile in the top right corner.
The organization can publish according to different application profiles, e.g. for PSI it can be DCAT-AP and for the INSPIRE Directive it can be a combination of NMDP and DCAT-AP. The most popular profile is usually preset by default. If you are working with geodata, you should use the GEODCAT-AP profile.
Then edit more metadata and press Save changes.
Describe datasets¶
In this step you add or edit descriptions (metadata fields).
A dataset can be described with the DCAT-AP metadata standard by clicking on the list line or via the Edit menu. Like the metadata dialog for the catalog itself the dataset properties are organized into mandatory, recommended and optional.
The fields you must fill in according to the organization's requirements are marked with asterisks to indicate that they are mandatory and are located under the Mandatory tab. This is always visible by default.
You can activate the fields that are recommended or optional according to the organization's procedure and the selected application profile by clicking on Recommended and Optional in black. Keep in mind that it may be mandatory in your organization to fill in mandatory metadata fields in multiple languages. When editing a dataset, you may want to make sure to enter some fields in both your language and an additional language to make the dataset more visible, understood and used. These fields may include: title, description, and keywords. In this way, colleagues and other users of datasets can more easily find and make more use of them.
Title - Add or edit¶
This title should be the title you entered when the dataset was a suggestioon. You can update to clarify and improve. Remember to also add the title in English if it is not included since previous steps.
Description - Add or edit¶
The description can be edited here and improved from the one you put in the suggestions view.
Publisher¶
Add your organization as publisher. You can either search, add a new publisher or choose an existing publisher.
Contact point¶
A dataset needs a contact, usually the dataowner, to keep the organization in control of which department or person is updating and maintaining the dataset. This can be an individual or organization (department or unit) within the organization with a non-personal mail address. Select a contact from the list or create one if there is none.
Choose contact point¶
Select a contact from the drop-down list.
Otherwise you need to create a contact, either individual or organization, e.g. generic mail address for the project, department or unit.
Create contact point¶
Click on the magnifying glass to browse all contacts or create a contact.
If you cannot select a contact from the list - click on "+ Create".
Select Individual or organization and fill in the required fields, preferably also recommended fields. Optional fields are not available. Then click on Save.
Then the contact point is filled in the dataset description.
Keywords, categories and related terminology¶
To make the dataset searchable, and thus discoverable, you need to describe it by defining keywords, categories and subject terms (e.g. via linked or imported terminologies such as GEMET from the EU). Filling in the various fields may require some thought and work but they are very important to make it possible to find the datasets, both in our own catalog but also in other catalogs that harvest our catalog (e.g. dataportal.se, govdata.de or opendata.swiss).
Keywords¶
Describe the dataset with your own keywords, one per line without commas. Create a new row with the plus sign (+).
Category (Data Theme Vocabulary)¶
Choose one or more Data Theme Vocabulary categories where your dataset fits.
Category (Data Theme Vocabulary)
Special: Custom fields¶
In some cases, there may be custom fields that your organization has in its custom application profile. This is only available in your own instance of EntryScape and not in EntryScape Free.
Example: GEMET terminology from Entryscape Terms¶
For example, your organization may have chosen to import controlled subject terms from the GEMET (GEneral Multilingual Environmental Thesaurus) glossary into EntryScape. GEMET is an established European thesaurus with over 40 themes (top level) and contains a total of about 5000 terms. As there are so many terms to choose from, the easiest way to select appropriate subject terms is to use the search function.
Custom field from Inspire application profile¶
If your organization has a custom application profile for e.g. INSPIRE.
Example: Categories for Inspire¶
If you have selected the Inspire profile, the Inspire Theme and Subject Category fields are also visible to describe the data set.
Date – Release date – Date modified¶
Specify the release date of the dataset and eventually date modified, in case of maintenance.
Time period¶
The Time period metadata field under the Optional tab means that you can specify the time period to which the dataset applies, e.g. between a number of months in the same year or a number of years, or days in the same week. You can specify this by entering the date or by clicking on the calendar icon. This applies to both the start and the end of the time period.
Language¶
Add one or more languages in which the dataset is available.
Landing page¶
This property refers to a web page that provides access to the dataset or its distributions and/or additionMake publishing availableal information. It should point to a landing page of the original data provider and not to a page on a third-party website, such as an aggregator.
Conforms to¶
This property allows you to refer to an implementing regulation or other specification. For example, you can refer to something you have imported or uploaded with the Document function. For example, you can select a specification you imported from a dataportal.
Named geographical area¶
Enter the geographical area here. Example with coordinates for Schaffhausen can be found at: geonames.org.
You can select from geographical areas at a detailed level in the menu.
Access rights for classified datasets¶
For datasets that have sensitive personal data or other classified data, you need to set the correct access rights setting. Under the Access Rights field, specify Restricted to make it clear that the dataset with distributions is not available as open data. Note that metadata should be published but distributions should not be created to these datasets.
Table view¶
You can also edit descriptions and set metadata via table view on multiple datasets simultaneously.
Hover over the field you want to edit.
Then click on the field you want to edit and describe it with a value from the drop-down list or free text.
Click anywhere other than the popup box and see that the row in the list is marked as modified in a field. Then click "Save" to save your changes or "Discard" if you don't want to save one or more changes you made.
Add dataset¶
Create dataset from a template¶
You can also create a dataset from a template.
Copy dataset¶
You can copy datasets contained in your catalogs. This is practical for reusing general metadata and descriptions you or other users might otherwise need to fill in again and again. You can avoid this if you have chosen to customize your metadata profile with some pre-filled fields, but even then you can use the copy function. The interface allows you to copy the dataset within the catalog itself. You can therefore create one or more datasets without distributions that can serve as templates for other datasets. Among other things, you can have described these datasets that are tailored to specific application profiles. For example, it could be open data according to the Open Data Act and the PSI Directive, or perhaps open geodata according to the INSPIRE Directive.
Start by creating a dataset that you describe with proper metadata. You can create different specific templates, e.g. one for open data, one for shared data, one for open geodata and so on.
The next step is to press the "Copy" button to copy the proposal or dataset.
Then confirm that you want to copy the dataset without distributions.
Then change the title of your copy to a unique title and continue describing it with unique metadata for the new dataset based on your copy. Click save when you are done.
Distributions¶
Adding distributions¶
Distributions are the manifestation of a dataset. Usually there is one distribution per format. For example, a dataset has two distributions if it can be downloaded as a CSV file and if it is also accessed via an API. There is no limit to the number of distributions per data set. It is recommended to have no more than one distribution per technically available format.
A distribution is created for a dataset by clicking on the create button in the list row of the dataset.
A single distribution can consist of several files. This is particularly useful if a dataset contains time series that are regularly updated/expanded. It is sufficient to just add new data to an existing distribution, instead of replacing all the data for the distribution.
Create an API from tabular data¶
A distribution with tabular data (currently CSV files) can be used to automatically create an API.
There are a few conditions that must be met for the API generation to work:
- The first row of the table should contain short names for each column as they are used as variable identifiers in the API. The column titles will be trimmed and converted to lower case on import.
- String values such as column titles or cell contents may only contain Unicode characters.
- Commas ("
,
") must be used as column separators. Detection is available for CSV files that use semicolons (";
") as separators, but it is recommended to use commas. - Double quotes ("
"
") must be used as quotes. - Double backslash ("
\
") must be used as escape character. - Line feed ("
\n
") or carriage return followed by line feed ("\r\n
") must be used to indicate a new line.
An automatically generated API is available via its REST interface and a simple web interface can be accessed with a web browser. The web interface contains links to more detailed Swagger-based API documentation.
Activate API for a distribution¶
It is beneficial to enable the API for a file distribution if possible. If you have uploaded a file as a distribution, you can enable it for use through the API. To do this, go to the dataset overview and click on the three-point menu on the right, then click Activate API.
If the dataset is not published, the following warning will appear that the dataset will be published when you start activating API generation. If you are OK with this, click "Yes".
If you have already published the dataset or clicked "Yes" on the warning, API generation will begin.
Once the API has been generated, click Close.
The URL of the API can be seen in the API information dialog box.
Describe distribution¶
License¶
This field is under distributions and it is mandatory to specify the license for the distribution. Select the license that applies to the distribution even if you also specified it for the dataset. Creative Commons Zero Public Domain 1.0 (CC0 1.0) is often the standard for open data and organizations. The licence that the organization has chosen to work with is often explained in the organization's Information Reuse Policy document.
Format¶
Describe the format of your distribution. Choose from the list of different technical formats that the data recipient will receive. You can choose from the lists of Common mediatypes, Geographic mediatypes or enter Other mediatypes as text. To specify other mediatypes in MIME format, a complete list is available from the IANA organization.
External API or file as distribution¶
This way of making data available is done through a URL link you provide to an external API or external file area. That system updates its API when there are updates. This approach is suitable when your organization has chosen not to work with uploaded files in EntryScape.
Start from the overview of the dataset. Once the description is correct, you can add a Distribution.
To do so, click on the (+) Plus icon to the right of Distributions.
Then select "Web address" and paste a link from an API ending in ".json" for example.
You can specify the update frequency from the API, for example daily, weekly, monthly or other.
Then click on Recommended and also Optional.
Fill in the title, description, the format of the file and the license (probably CC0 1.0) that the organization uses to make it clearer to other publishers and dataconsumers how they can access the distribution and how they can use it - in this case how they want as it is licensed as a public domain (open data).
Remember to click Create at the bottom to save.
Visualizations¶
From EntryScape 3.4 onwards, you can create visualisations for bar charts, line charts, maps and tables. These can be automatically displayed through EntryScape Blocks and on data portals that support embedded visualizations through EntryScape Blocks, for example.
To add a visualization, go to the edit dataset.
Adding visualizations¶
Then, when you want to create a visualisation, press the "+" button to the right of Visualisations.
Next, in the drop-down list, select which distribution from the dataset you want to use.
Select a distribution that fits one of the visualization modes. For CSV files, bar chart, line chart, map and table are suitable depending on the data model and the data itself. For WMS services, map is suitable.
Bar chart¶
Select bar chart to create a bar chart
Then set an appropriate title. For example, for the public in dataset, it could be "Bar chart visualizing amount and date of purchase".
Line chart¶
Depending on the data model, you can create a line chart.
Map¶
If you want to create a map visualization, you need coordinates in a datafile or a WMS service as a distribution.
Table¶
You can create a visualization with a table. It creates a preview.
Publish¶
Make dataset publicly available¶
Do you want to know if the catalog you are working in is published and therefore Public? If so, the Public button should be visible and clickable. If this button is not visible or cannot be clicked, you or your colleague in charge of the group will need to publish the catalog you are using. In that case, see the overview of catalogs and click on the red button for the unpublished catalog and activate it.
Preview before publishing¶
Before publishing, you can preview the dataset. Exit the overview view and select Preview. By default, this opens a new tab to switch to.
This is to see that the dataset and descriptions are correct and to get an idea of what it will look like for data consumers on e.g. dataportals.
Once you have finished viewing the preview, you can close the tab or go back and either continue describing through "Edit" or go to Status.
Publish with the Status button¶
When you have finished describing your dataset and distributions, you will be in the overview view of your dataset.
There is a button on the right called Status. It's drawn to the left and it says Unpublished underneath and thus the dataset is unpublished. If you or a colleague has already published the dataset, it has shifted to the right and it says Published, indicating that publishing is active.
Next, you can work on how you and your unit will maintain the dataset.
PSI - Publish as open data¶
To publish the dataset as open data, you need to make sure that the "PSI" button is enabled.
If the dataset is published and PSI is not active, it means that the dataset is published as shared data for internal use within the organization. For example, the shared dataset can be consumed via a embedded intranet through EntryScape Blocks.
Maintenance¶
Edit¶
To edit and update the metadata of a distribution, click on the distribution or the options menu and then on "Edit". Then you can edit as you did when you created the distribution with fields that are required, recommended and optional. You can change descriptions in existing fields and add metadata in fields that you did not previously fill in. For example, you can add in fields that can be described if you press "Recommended" and "Optional" to have the opportunity to enhance the metadata.
Statistics¶
When you are in the overview view of your published datasets, you can see statistics about its usage. In EntryScape, you can get statistics about distributions that are uploaded files or generated as APIs from uploaded files. To get statistics about a specific distribution, click on the options menu and then on Statistics.
If there are no statistics about the selected time period, you can change it to see. More functionality for statistics can be found in the statistics function.
Versions¶
To view versions of a file distribution, you can click on the options menu of a file distribution and then "Versions".
Then you can choose which version you want to see.
Once you've chosen the version you want to see more about, you'll see the metadata for that particular version.
Click close to return to the dataset overview.
Remove¶
If you want to delete a distribution, click on the options menu and delete.
If you generated an API from the file distribution, you will be stopped and you need to do the same but on the distribution "Auto-generated API".
Once you have removed the distribution that was the API generated from the file, you can try again. Confirm the removal by clicking Yes.
Manage files¶
Overview¶
Here you can see an overview of different files in the file distribution. This feature is useful if you are adding one or more updated files to the distribution per new time period, e.g. data for a new month on a monthly basis. Or if you are replacing the file with additional updated data.
Add file¶
To add a file, click on "+ Add File".
Then select the file to upload and describe with metadata.
Then click save.
Replace and update file¶
When updating your dataset, you may need to replace and update the file you previously uploaded as a distribution. For example, the data in the previous version may not have been correct and the file needs to be updated with corrections. Click on the three-dot menu on the right side of the distribution under the overview view and click Manage Files.
Then click on the More menu for a file and click "Replace file". Follow the same procedure if you want to edit, download or delete the file.
If you have a linked API or visualization, you will get the warning "There is at least one visualization for this file. If you replace the file, related visualizations will be deleted. Are you sure you want to continue?". You will need to create the visualization again if you replace the file. If you are OK with this click "Yes".
Frequency of update¶
Based on what you have described in the Frequency of update field, you should update your dataset according to the frequency specified. You may want to set reminders in your calendar or project management tools to remind you to update the dataset according to the specified update frequency. A dataset that is not updated at the specified update frequency may result in data consumers contacting you or your organization to notify you that the dataset is not updated. This can create additional work and it is preferable to update proactively according to the update frequency.