Uploading Data and related Metadata in the Upload interface

Upload Interface – Presentation and Structure

Log-in to the Interface

Once an agreement has been reach on the overview table (at the latest), the Project manager will be provided with a single login-password which allows to reach the Project account on the Database Upload interface. Once logged-in, the manager reaches the Project Dashboard.

Reviewers guideline

To create a new project, go on “Home” page of the ESPON 2020 Data and Metadata Upload System and click on “+ Add” a project.

To create a new login password, open the projects to which you want to add a manager and click on the “+” next to the manager. From there, you may choose an already registered manager or create a new one. When a new manager is created, it receives an e-mail to connect. Please advise the manager that he/she has to “reset” the password (as if it was lost) to create a new password.

Project Dashboard

The Dashboard first displays basic information in relation to the project and its data delivery (project leader, manager, URL on ESPON website, status of the delivery). The Abstract provides an overview of the content of the data delivery (in 5-6 sentences).

The Project Dashboard then displays two tabs: one for each delivery channel and one to keep track of the overall validation process (see screenshot).

The section Main Data allows the manager to add, visualise, update and delete datasets.
The section Other Data allows the manager to add, visualise, update and delete Other Data elements

The manager should complete the project basic information using the ‘Edit’ button next to the name of the Project.

Reviewers guideline

The project Dashboard has one more tab in the reviewer’s views: the Validation process tab that allows to visualise and keep track of the delivery as a whole.

Approvers guideline

The project Dashboard has one more tab in the approver’s views: the Validation process tab that allows to visualise and keep track of the delivery as a whole.

Once the overview table has been approved by all stakeholders, the approver only intervenes at the very end of the process to make sure that the proposed delivery is coherent with the content of the overview table.

Uploading ‘Main Data’

The delivery of ‘Main Data’ is structured by datasets. Each dataset consists of one or several indicators. Each dataset is validated independently.

Dataset workflow

Introduction to the workflow

The following workflow describes the different steps required to validate a dataset. The progress through this workflow is monitored under the “Validation process” tab of each dataset.

Figure 6 Dataset workflow

Add a dataset

The process starts by adding a dataset. This is done using the button ‘Add dataset’ in the Main Data section (see arrow 1).

Dataset characteristics may be added in the Add/Edit Dataset page. A dataset is characterised by several fields: a name, an abstract, a spatial extent and a topic category (see arrow 2).

Once a dataset is opened, it appears in the Main Data section together with a visual which reports the status of the dataset towards validation (see arrow 3).

By clicking on the dataset name, dataset information appears on top of a five-tab structure (see screenshot):

Indicators. This section provides a list of all indicators under the dataset.
Contacts. This section provides information on the team that elaborated the dataset. Contact details are provided (for one or several team members) in case a data user has questions in relation to an indicator that belongs to the dataset.
Sources & Preliminary processing. This section lists all sources and preliminary processing which are flagged in the data file.
Data. This section is dedicated to the upload of the data file.
Validation process: This section allows to monitor the dataset delivery process. It works as a dashboard from where submission / validation and checking reports are provided

Once a dataset is opened, it appears in the Main Data section together with a visual which reports the status of the dataset towards validation.

Add Indicators

By clicking on the name of a dataset, one may visualise underlying indicators and add new ones. To add a new indicator, click on one of the two ‘Add indicator’ button (see screenshot).

General procedure (Standard, Single)

The Add standard indicator button create a new standard indicator, i.e. an indicator that is based on a standard nomenclature.

An indicator is characterised by a number of fields: a Code, a Name, an Abstract, Territorial Information, a Genealogy Status (key / background), a Structural Type (Single, Multi, Dimension and Class), a Data Type (Integer, Float, Boolean, Enumerated), a Numerator / Denominator Name, Numerator / Denominator Scale, a Nature Type, Main Theme(s), Access and Use Constraints. Additional fields are provided upon the selection of some of the options related to aforementioned fields.

Besides, several syntactic constraints are defined between fields which describe indicators, e.g. a non-standard indicator can’t be set as a key indicator. Most fields are compulsory.

Non-standard indicators

The Add non-standard indicator button allows to create a non-standard indicator, i.e. an indicator that does not refer to a standard nomenclature (be it statistical data or a geospatial data).

Non-standard indicators may only be added as ‘background’ indicator in a dataset. The specificity of the non-standard indicator is that the data file has to be uploaded in a specific section under the “Data” tab of the dataset where all non-standard indicators are listed. Most information regarding the content of the data file should be provided in the indicator Abstract.

Multi indicators

‘Dimensions’ and ‘Classes’ of a ‘Multi’ indicators may be created from the indicator page, under the ‘Dimension’ tab. The ‘Dimension’ tab also provides a view of the internal structure of the ‘Multi’ indicator

Declare Genealogy

Once few indicators are inserted, genealogy relations between indicators can be added. Each operation involving one or several parent indicators and resulting in one child indicators has to be declared. To declare Genealogy relations, click on the ‘Genealogy’ tab (see screenshot).

Declaring a genealogy implies:

selecting the indicator whose genealogy needs to be described. A genealogy panel pops-up (see screenshot).
choosing one or several ‘parents’ among (1) already declared indicators, (2) ESPON base indicators, (3) other ESPON Project indicators.
describing the ‘Methodology’ which was used to process ‘parent’ indicators into the ‘child’ indicator

Fill in contact details

For each dataset, contact details are to be provided under the ‘Contact’ tab (see arrow 1 on screenshot). A variety of contacts can be added. The following two contact types are crucial:

A Responsible Party is the person responsible for the Project which should be contacted for further questions on the methodology and on the use of the data content.
A Metadata Contact is the person who brought together metadata and data and delivered data.

Contacts may be added (see the “+ Add” button) and edited (see arrow 2 on screenshot). To be submitted to the next step, at least one contact is required for each dataset.

Add Sources and Preliminary Processing

Sources and preliminary processing are key features that allow for data traceability. For each data set, these are managed under a specific tab (‘Sources and preliminary processing’ – see arrow 1 on screenshot)

Sources

Each data cell (in the data file) needs to refer at least to one source. A source may be added through the button “+ Add” (see arrow 2 on screenshot).

To declare a source, four types of information are needed (see arrow 3 on screenshot):

a Flag number (1,2,3,4… preferably starting from 1);
a Description (name of the institution, the person or the report) which is considered to be the source of the original information;
a Date which corresponds to the date when the information was issued (for a report – the publication date) or the date when the information retrieved from its original location (for a statistical dataset from national institution – the download date);
a URL (if available) to locate the information on the Internet.

Each source declared in metadata should be used in the data file. A source flag used in the data file should be declared under the dedicated tab (see the section ‘Structure data file’).

Preliminary Processing

Preliminary Processing are used to highlight pre-processing applied to a group of entities before they were compiled in an indicator (aggregation, disaggregation, estimation, etc).

To declare a preliminary processing, two types of information are needed:

a flag letter (a,b,c… preferably starting from ‘a’);
a Description of the operation which was performed.

Each preliminary processing declared under the Source and Preliminary Processing tab should be used in the data file. A preliminary processing flag used in the data file should be declared under the Source and Preliminary Processing tab (see the section ‘Structure data file’).

Submit metadata

Once the manager considers that, for a given dataset:

all indicators are created and well-described
all genealogies are declared
all sources and preliminary processing are listed
contact details are complete

The manager may submit the dataset for Semantic and Genealogy check (SG check). This can be done under the tab ‘Validation process’ of the dataset to submit (see screenshot).

Semantic and genealogy check (SG check)

SG check consists in a visual verification of the metadata content by the data reviewer. The following points are checked:

Coherence of the dataset with the expected indicators (as laid out in the overview table);
Language proofreading of all textboxes (dataset abstract, indicator abstracts, genealogy methodology, etc.);
Genealogy: all background indicators contributes directly or indirectly to the genealogy of (at least) a key indicator.

The SG check results either in:

a validation of the metadata (SG checked). Once the metadata of the dataset are validated by the reviewer, those are locked for any further modifications. The data manager may then upload data.
a request for further elaboration in the form of a report highlighting changes to be implemented and other points to be checked. Exchanges between the data manager and the data reviewer

Reviewers guideline

The Semantic and Genealogy check is a key step for ensuring the quality of the delivery. A list of points to be checked is provided in the Reviewer toolbox under \Metadata and data checks\01_Main_Data_SG_check.xlsx. This file can be filled in and shared with the data manager to highlight points to be corrected

Upload a data file (standard indicators)

Once metadata are validated, the manager is requested to upload a data file encompassing all standard data as declared in metadata. This file is to be handed in under the ‘Data’ tab of a dataset (see screenshot).

A template in which data should be uploaded is provided based on submitted metadata (including all unit codes, object types and indicator/year on the first row of each column). The project manager must download it and transfer its data in this file and then upload it back on the interface.

The data manager is asked not to modify the structure of the data upload file (modification such as adding a column, changing indicator codes or years, deleting entities without data). This may jeopardize the upload process, because of the non-conformity between the data file and the metadata. Any error in the file is rooted in an error in the metadata. Hence any change has to be applied to metadata first. If the data manager spots a mistake at this points, he/she shall ask the data reviewer to send the dataset back to the open state for modification.

Once uploaded, the manager submits the file so that an automatic Spatial and Data check can be performed by the system.

Figure 7 Data file formal constraints

Spatial and Data check

The Spatial and Data check (SD check) is performed automatically by the system upon upload of the standard data file.

The Spatial and Data check results in a report which appears in dataset workflow tab. This report includes

‘Errors’ (syntactic inconsistencies between metadata and data). These shall be corrected to go on with the upload procedure;
‘Warnings’ (information which are of interest for the manager and may induce him/her to revise data and/or metadata).
‘Information’ (information which could be of interest for the manager and which provides an overview of the content of the dataset).

‘ERRORS’ cover logical input errors detection resulting from inconsistencies between metadata and data. The procedure checks for:

Consistency of Spatial Objects (or unit codes) used in ‘data’ with the nomenclatures declared in ‘metadata’
Consistency of Spatial Objects (or unit codes) used in ‘data’ with the spatial extent declared in ‘metadata’
Consistency of the data type in ‘data’ with the Data Type property declared in ‘metadata’
Consistency of value labels used as ‘data’ with the value labels declared in ‘metadata’ (for Data Type ‘enumerated’)
Consistency of the ‘sources’ used (flags) in ‘data’ with the sources declared in metadata
Consistency of the ‘preliminary processing’ used in ‘data’ with the ‘preliminary processing’ declared in metadata
Existence of a source flag for all data cell with data.

‘WARNINGS’ include automatic completion test and outlier detection. It provides:

Completion rate for each indicator/year
A list of potential statistical outliers (for stock and ratio indicators). The system highlights entities considerably outlying from the median/quartiles (distance to the quartile Q1 and Q3, respectively lower or greater than 1.5 times the interquartile difference);
A list of potential trend outliers for stocks and ratio indicators. The system spots suspicious movements in time series;

‘INFORMATION’ include:

Basic descriptive statistics on each indicator/year
Typology issues [for typology indicators]. The system provides the proportion of entities under each category at the level of each indicator/year. It may raise attention on over- or under-represented categories of the typology (i.e. categories with a doubtful number of associated spatial entities).

Based on this report the data manager has three options:

If there is no ‘error’, submit data for quality control to the reviewer;
Discard data and upload a revised version of the file. This option implies that metadata are deemed correct but data file needs to be revised;
Discard data and revise metadata. This option implies that metadata will be modified by the manager and checked again for semantic and genealogy consistency.

Reviewers guideline

Reviewers should be ready to answer questions on a variety of issues raised by the automatic checking. For more details information, see the list of automatic checks performed by the system provided in the Reviewer toolbox under \Metadata and data checks\ 02_Raw list of automatic SD_checks.xlsx

Upload data file (non-standard indicators)

In parallel with the upload of the main data file, the project manager shall upload data files associated with non-standard indicators. This has to be done in the “non-standard data” section of the ‘data’ tab (see arrow 1 on screenshot). This section provides a list of all non-standard indicators declared so far. For each indicator of the list, a data file should be added (see arrow 2 on screenshot). This file can be deleted and re-uploaded as long as the dataset is at step 2 or 3 of the validation process. Note that one file only can be uploaded for each non-standard indicator. If an indicator requires multiple files, it should be bundled in a single .zip file.

Non-standard data files are checked on the basis of a multi-criteria evaluation performed by the reviewer. The following points are checked:

Consistency of the file(s) with the metadata
Integrity of the file (can it be opened in an appropriate software?)
Basic assessment of the content (min-max, existence of data for all fields provided)

Quality check (QC check)

Once the data manager submits data for quality control, the database reviewer performs a last quality check on the dataset delivery. It consists of:

For standard indicator: (1) an assessment of the ‘warnings’ left and of their potential consequences on the quality of the delivery. (2) a visual check of the data delivery (through the button ‘Display uploaded data’).
For non-standard indicators: multi-criteria evaluation (see last section).

The DB reviewer has then three options:

Validate the dataset.
Send back data and request that the manager upload a revised version of the file, taking into account one or several warnings or comments. This option implies that metadata are deemed correct, but some data files need to be revised.
Discard data and open metadata for revision. This option implies that metadata need revision based on the information provided through the data file. Any revision of the metadata implies that metadata needs to be checked again for semantic and genealogy consistency.

Exchanges between the reviewer and the manager on further improvement of the delivery shall be done through the ‘Submission information’ dialogue box of the ‘Validation process’ tab.

Validated datasets are not anymore editable by the project manager.

Reviewers guideline

The quality check is essential for ensuring the quality of the delivery.

Comments related to the quality of the data delivery should be sent through the ‘Submission information’ dialogue box of the ‘Validation process’ tab. In case of light modification, the reviewer may use the “comment” box and provide a list of improvements needed. In case of complex modification requests (e.g. modifications requested on several indicators), comments may be sent via a dedicated file attached to decision. A template for providing comments is provided in the Reviewer toolbox under \Metadata and data checks\ 03_Main_Data_QC_check.xlsx. This file can be filled in and shared with the data manager to highlight points to be corrected.

‘Other Data’ upload

‘Other Data’ upload is based on ‘elements’. An element is a thematically coherent set of data files that may include data of various kinds (statistical, geospatial, other).

Each data file contains one or more ‘indicators’. These files may be of different format (xls, tiff, shp, etc).

For statistical data, a template is provided in order to ensure a common visual identity for ESPON Data. This template (‘Other Data’ template) can be downloaded under the ‘Other Data’ tab (see arrow 1 on screenshot).

Fill in the ‘Other Data’ template (compulsory for statistical data)

The Other Data template is composed of three spreadsheets:

‘Metadata’ which contains all metadata indicators. One column corresponds to one indicator.
‘Data’ which contains all value indicators. One column corresponds to one ‘indicator/year’.
Distributor which contains ESPON EGTC contact information. This spreadsheet should not be edited.

The Metadata table provides information for each indicator and includes the following fields: Code, Name, Abstract, Years available, Methodology description, Metadata date, Use constraints, Point of contact, Project, How to source this indicator. The bottom part of the indicator description is dedicated to ‘Source’, and one or several sources may be added to each indicator. The Source box (including Provider Name, Reference, Copyright, Publication title, URL) can be copied and pasted as many times as needed to include all Sources.

Table 4 Example of metadata information for 'Other Data' indicators

Create and submit an element

‘Other Data’ files are uploaded from the ‘Other Data’ tab, using the following procedure:

Create an element (click on ‘Add an element’);
Fill in metadata related to this element. Fill in a name, an abstract (500 characters max.) and add the files related to that element. Save it.
Submit the element for review: click on the element tp get to Other Data submitting page (see screenshot), add a comment to the reviewer (if needed), then click on the submit button.

Quality check of ‘Other Data’

‘Other Data files’ are subject to a quality check which, in the case of statistical data, includes the following verifications:

Integrity of the files (integrity check)
Proofreading of language (semantic check)
Consistency of the metadata and data: indicator declared vs. indicators delivered (syntactic check)
Credibility of min and max value (data check)

A report is issued by the reviewer and made available to the manager with the data file.

The DB Reviewer has then two options:

Validate the file (and its related metadata)
Ask for revision in the file or in the abstract.

Validated files are not editable anymore by the manager.

Reviewers guideline

Comments related to the quality of the data delivery should be sent through the ‘Submission information’ dialogue box of each element. In case of light modification, the reviewer may use the “comment” box and provide a list of improvements needed. In case of complex modification requests (e.g. modifications requested on several indicators), comments may be sent via a dedicated file attached to the decision. A template for providing comments is available in the reviewer package under \03 Metadata and data checks\ 04_Other_Data_checklist.xlsx

Validation of the Projects delivery by ESPON EGTC

Once all ‘Main Data’ datasets and all ‘Other Data’ elements are delivered and approved, the reviewer may submit the whole delivery to the approver. The approver checks the integrity of the delivery (list of datasets, indicators and files), compare it to the overview table and decides whether or not it satisfies their expectations. The approver has two options:

Validate the delivery. This triggers the integration of data and metadata in the ESPON 2020 Database Portal and closes the delivery procedure.
Ask for complementary information as regards the delivery.
Request changes in Main Data or Other Data. This imply to re-open the ‘Main Data’ and/or the ‘Other Data’ procedure

Reviewers guideline

The validation of a project’s delivery is done under the ‘Validation process’ tab (only visible for the data reviewer and the approver at ESPON EGTC). This overall validation process aims at including ESPON EGTC project experts in the final approval of the delivery.

When a reviewer considers that the delivery is complete, he/she:

Checks the coherence of the delivery with the expectations expressed in the overview table.
Validates the delivery under the ‘Validation process’ tab (which brings the process to state 2).

A notification is automatically sent to the approver who is then invited to check and validate the delivery.

If the approver approves the delivery, data are ready to be inserted in the ESPON Database.

If not, the reviewer has to take the comments of the approver into account and re-open defective datasets / elements or requests missing dataset / elements from the data manager, until approval.

Approvers guideline

The validation of a project’s delivery is done under the ‘Validation process’ tab (only visible for the data reviewer and the approver at ESPON EGTC). This overall validation process aims at including ESPON EGTC project experts in the final approval of the delivery.

When a reviewer considers that the delivery is complete, he/she:

Checks the coherence of the delivery with the expectations expressed in the overview table.
Validates the delivery under the ‘Validation process’ tab (which brings the process to state 2).

A notification is automatically sent to the approver who is then invited to check and validate the delivery.

If the approver approves the delivery, data are ready to be inserted in the ESPON Database.

If not, the reviewer has to take the comments of the approver into account and re-open defective datasets / elements or requests missing dataset / elements from the data manager, until approval.

In this Page

In this section