Here are good practices which would help the Project Data Manager to:
(1) Work according to the DB concepts. This means that people involved in data collection and processing in an ESPON Project should get acquainted with major concepts, and keep in mind that the project will have to:
- select and deliver key indicators and their background indicators. This means that one should keep track of the main calculation step in order to reflect it in metadata
- deliver all other indicators for these to be included in the Project Archive. This means that one should store all data adequately to retrieve it in the final stage of the project.
(2) Use metadata templates. ESPON Projects are invited to use the two metadata and data templates provided by the ESPON DB team (on the ESPON Database Portal):
- the Main Data template can help to store metadata for indicators which may be selected as key or background indicators. Most information needed to declare a dataset under Main Data may be stored there. It is recommended to create as many indicators (blue frames) as needed, standard or non-standard, and add any useful information. This template is used as an intermediary step toward providing Main Data indicators in the Upload interface. As a consequence, Projects can add any remark or highlight any fact that is deemed useful at a later stage;
- the Other Data template can be used to store the Other Data indicators during the Project. This template is the one that will actually be used to deliver Other Data. As a consequence, projects are asked not to restructure the template. The metadata spreadsheet allows to store multiple indicators (one by column) and to add as many sources as needed (at the bottom of the column).
Some basic rules can help to decide which are the indicators which your project needs to highlight and select as key indicators.
Key indicators must be innovative. In order to avoid overlaps in the database, a key indicator must be different from what is already part of the database: a base indicator or a key indicator already declared by another project cannot be one of a Projects key indicator (even if this base indicator is central in your analysis). Hence, in most cases, key indicators should be the results of the Projects own data collection and/or calculation.
Key indicators must be well enriched with exhaustive metadata, including the documentation of their calculation process. This is a necessary precondition to declare the indicator and build the dataset around it. Sources, methodology as well as the original data used to calculate the values are part of what will be requested during the delivery process.
Key indicators must be as complete as possible from a territorial point of view. Whatever the spatial nomenclature used, the level of completeness shall be good (more than 90% of the entities is a good target), so that the key indicator can be used for interterritorial comparison. If some data are not provided by main data providers (Eurostat, World Bank, OECD), Projects are invited to look at national data providers (national statistical agencies) and other reliable sources to improve the coverage.
Indicator name is an essential field that is used in the ESPON 2020 Database to identify indicators. Indicators are defined independently from years, spatial extent and nomenclatures. Besides these should be concise and self-explanatory. Therefore, here are some rules of thumb to draft indicator names. Indicator names shall:
- NOT include any reference to years delivered, to spatial extent and to nomenclatures.
- NOT include any abbreviations or acronyms
- Start with the key concept of the indicator (e.g. Employment (total) in construction rather than Total number of jobs in construction)
- Include any specificity that differentiate the indicator from another indicator in the Database (e.g. Population (total) on 1st January which is different from Population (total) legal population)
Concerning the multis, dimensions and classes:
- The name of a dimension is formed using the name of the multi: <name of the multi> by <dimension>, e.g. Population (total) on 1st January by age group
- The name of a class is formed using the name of the dimension: <name of the dimension> - <class>, e.g. Population (total) on 1st January by age group age group 15+
Main Data is primarily meant for pan-European data. To be considered pan-European a dataset should cover most of the countries from the European Union as well as Island, Norway, Lichtenstein, Switzerland. A wider European coverage is also possible which includes all EU candidate and potential candidate countries. The three paneuropean spatial coverages are:
- EU28 (European Union)
- EU 28+4 (European Union + Island, Norway, Lichtenstein, Switzerland)
- EU 28+4+CC (EU28 + Island, Norway, Lichtenstein, Switzerland + Albania, Bosnia and Herzegovina, Kosovo, Montenegro, Northern Macedonia, Serbia, Turkey
Data that do not cover most of the ESPON space may qualify as Main data if it covers one of the European transnational cooperation areas, i.e North Sea, North West Europe, Northern Periphery and Arctic, Baltic Sea, Danube Area, Atlantic Area, Alpine Space, Central Europe, Adriatic Ionian, Balkan-Mediterranean, South-West Europe, Mediterranean Area. .
An indicator can be delivered in one or several of these TCA at the same time.
One shall keep in mind that the purpose of structuring deliveries according to genealogies is:
(1) to put forward some indicators (key) and structure ESPON outputs for the end-user of the ESPON 2020 Database Portal ;
(2) to identify links between ESPON activities ;
(3) to provide some basic conditions for the reproducibility of ESPON Projects results.
Genealogies imply first the identification of the most relevant background data (from external sources) and intermediary calculation steps, and second the declaration of the relations between these background indicators.
Figure 4 provides an example of a key indicators genealogy based on several data sources:
- A is based on two background indicators taken from external sources;
- B is based on an ESPON base indicator (the list of base indicators is provided together with the overview table);
- C is based on two indicators, one being an indicator from a previous ESPON projects.
The Upload interface allows the inclusion of these three types of indicators as part of genealogies.
Background indicators from external sources shall be declared and uploaded by the Project. Base indicators and indicators from previous ESPON Projects are already available in the ESPON Database and should not be declared and uploaded again. The genealogy tool of the Upload interface allows to call these existing indicators as an input of a processing operation.
Figure 5 Types of inputs in genealogy relations
Other Data elements are flexible container for data that does not comply with main data constraints. It may contain one or several files.
Here are general rule of thumb that can help the manager to decide which
- By default each data file (be it statistical data, geographical data or survey data) should be delivered in a separate element.
- Exceptions: several files should be bundled together in an element in the following cases:
- Thematic proximity: several files related to a well-identified sub-theme of the study (e.g. agricultural development, transport, ageing).
- Case studies: several files related to a specific case study (e.g. social innovation in Corsica, transport in Northern Sweden or education in Bulgaria).
Besides these rules, common sense applies. Anything that can make the element more user-friendly is appreciated
A project developed a new calculation of time-distance indicator to specific service of general interest in a 1*1km grid data (spatial data). How to deliver it?
Grid data is non-standard data. The indicator may thus be delivered either
- as background data of a key indicator (any indicator based on this specific time-distance indicator and provided in a standard nomenclature (e.g. median time-distance to school in NUTS3 regions);
- or as an other data element that will be provided in the project archive.
Values may be missing for some spatial entities of the defined nomenclature*spatial extent. In this case, the data cell of the data sheet shall be left blank (i.e. no specific text referring to the absence of data). Besides, the next cell corresponding to sources and preliminary processing shall also be left blank (i.e. no specific source is expected to justify an absence of value).