ESPON Nomenclatures Support

Preparation of a Statistical Units Nomenclature for Integration into ESPON Database

ESPON M4D

Revision History
Revision 0.16 (svn rev 2420)2014-12-19 17:32:37
Please consult the Changes appendix for an overview of main changes regarding this document.

Abstract

This document explains how to prepare a new nomenclature (or a new version of a supported nomenclature) for the integration into ESPON database.


Table of Contents

1. Introduction
1.1. Terminology
1.2. Data Formats
1.3. Common Rules
1.4. Structure of the Package
2. Nomenclature file
2.1. About Sheet
2.2. Version Sheet
2.2.1. version sheet layout example: NUTS 2006
2.2.2. version sheet layout example: UNEP 2006
2.3. Units Sheet
2.4. Names Sheet
2.5. Hierarchy Sheet
2.6. Changes Sheet
2.6.1. Typology of Statistical Units Evolution
2.6.2. Changes sheet table example
2.7. Equivalence Sheet
2.8. Derivations Sheet
2.9. study_area Sheet
3. Geometries File
A. Changes
B. Templates and Examples
C. References
D. About

List of Figures

1.1. Main elements of a nomenclature description

List of Tables

1.1. Package contents
2.1. nomenclature.xls file structure
2.2. about sheet layout
2.3. version sheet layout example
2.4. version sheet introducing NUTS 2006 nomenclature
2.5. version sheet introducing UNEP 2006 nomenclature
2.6. units sheet layout example
2.7. names sheet layout example
2.8. hierarchy sheet layout example
2.9. changes sheet layout example
2.10. Equivalence sheet layout example
2.11. Derivations sheet layout example
2.12. study_area sheet layout example
A.1. Update of Unit Codes for Croatia in NUTS 2010 Nomenclature

Chapter 1. Introduction

During the ESPON 2013 Database Phase 2 project (aka M4D - Multi Dimensional Database Design and Development), we are aiming to extend the support of different geographical objects in the database. To the existing library of NUTS units of all public official versions will be added the libraries covering the nomenclatures like WUTS (World Unified Territorial System), UMZ (Urban Morphological Zones), LUZ (Larger Urban Zones), FUA (Functional Urban Areas), LAU (Local Administrative Units) and, probably, some others.

Several datasets, already created during the first phase of ESPON Database project, use statistical units of these nomenclatures, but do not provide information about these units. Before the integration of such datasets, the database needs to know some details about the statistical units used, their hierarchy and relationship with other nomenclatures. This information, collected into the ESPON database, allows afterwards to integrate any number of new datasets using the dedicated statistical units library.

To build the library of statistical units, the automatic analyzer created for this purpose reads the data on a nomenclature of statistical units, checks for its validity and coherence and then puts the analyzed information into the ESPON database.

This document describes the structure of the package and the layout to use in order to prepare a new nomenclature (a new version of an existing nomenclature) for the integration into the ESPON Database. The schema shown in Figure 1.1 illustrates the main elements that compose the description of a nomenclature.

Figure 1.1. Main elements of a nomenclature description

Main elements of a nomenclature description

This schema proposes an overview of the main characteristics that compose a nomenclature description. It also shows links (derivation, equivalence) that may be specified between spatial units of different nomenclatures versions.


1.1. Terminology

The present document uses the following terms:

  • The Database - the ESPON Database.

  • The Library - the library of statistical units supported by the ESPON Database.

  • The nomenclature - a new nomenclature of statistical units OR a new version of an existing nomenclature, being prepared for integration into the ESPON Database.

1.2. Data Formats

The data (file) formats used to prepare a nomenclature for integration into the Database are the following:

  • Tabular format. This is the main form of data exchange in the ESPON Database project. In this context, it is represented mainly by Microsoft Excel (.xls). This is the default case that does not need any special documentation. If other tabular formats are used, the documentation coming with the package must precise the name and the version of the software that is able to open the files and, eventually, the steps to follow in order to obtain all the information contained in the files without losses or transformations as compared with its state when it was being sent.

    The recommended tabular file format is Microsoft Excel (.xls) of the 97/2003 version. The recommended software to create tabular files are Microsoft Excel (proprietary) or OpenOffice (public and free to use).

  • GIS format. To represent geographic data (statistical units geometries), the .shp (ESRI shape) format must be used. Details on its usage are given below in the present document (see Geometries file).

All of the files describing a nomenclature must be collected in a single .zip file, with the name composed of the prefix "Nomenclature_", followed by the acronym of the nomenclature, an underscore and then a mark indicating its version. For example, a package describing the NUTS nomenclature of the 2010 version would be put in the file called "Nomenclature_NUTS_2010.zip".

Templates and examples for different files cited in the present specification can be found in Appendix B.

1.3. Common Rules

Characters Encoding

Any file or database source composing a package with a nomenclature description must use the UTF-8 characters encoding.

Tabular Spreadsheet Structure

Taking into account the limitation of the .xls file format of 65536 rows per sheet, large nomenclature descriptions may not have enough place in a single sheet in a spreadsheet. That is why they may be distributed among several sheets. Each of the sheets making a part of the list must respect the same layout and naming convention as explained for the respective part of the specification.

Each tabular sheet must respect the naming rule and the header labels for the columns (the first row of each example below), case insensitive.

There must not be any empty rows before the list header labels, or any empty column between the columns with values.

Values Restrictions

Most of the values used to describe statistical units are character strings. No leading or trailing spaces are allowed for any of them. For certain properties (e.g., unit codes, nomenclature acronyms), the values are not case sensitive. During the data processing, they will be converted to upper case. For geographical names, general conventions on proper nouns should be applied, if the official names are not strictly specified (for example, for NUTS0 units the names must be in upper case without accents).

Boolean values are represented by English literals commonly used for them: true or false.

1.4. Structure of the Package

The package describing a nomenclature must contain the files described in the table below. The file name example given for .xls (default) format can be changed for other tabular formats by replacing the file name extension.

Table 1.1. Package contents

File nameObligationDescription
nomenclature.xlsMandatory

Describes the nomenclature. This is the main file of the package.

geometries.zipMandatory

Zip archive containing the shapes/geometries of the statistical units.


[Important]

The zip files must not be recursive, e.g they must not include nor prefixes neither subdirectories: that means that, for example, unzipping the archive myNomenclatureArchive.zip in a parent directory named directory/ must return:

directory/
|- myNomenclatureArchive.zip
|- geometries.zip
|- nomenclature.xls
Then, unzipping the geometries.zip must return the following hierarchy of files:
directory/
|- myNomenclatureArchive.zip
|- myGeometries.zip
|- myGeometry.shp
|- myGeometry.shx
|- myGeometry.dbf
|- nomenclature.xls
See also Geometries file for a description of the expected geometries files.

Chapter 2. Nomenclature file

The nomenclature.xls is the main file composing the description of a nomenclature. It details all the information needed to integrate the nomenclature into the Library.

This file must have a strict structure defined in the present section.

The entire spreadsheet must be divided into several sheets according to the following rule:

  • Each sheet must have the title corresponding to its contents and defined in the Table 2.1. For example, the list of statistical units composing a nomenclature must be put in the sheet called "units".

  • Each sheet can contain no more than 65536 rows (limitation of the Excel format). If their effective number is greater, the entries must be divided between several sheets. Each name of the sheet, chosen according to the previous rule, must be completed with a numeric index of the sheet corresponding to the ordinal of the part of the items it describes. For example, a nomenclature composed of 80000 units will have the following sheets: units1 and units2, where the first one will contain 65536 units and the second the remaining part of 14464 units.

The spreadsheet of nomenclature.xls file can contain the following sheets:

Table 2.1. nomenclature.xls file structure

Sheet nameObligationDescription
aboutMandatoryContains general information about the package and its originator.
versionMandatoryContains general details on the nomenclature described.
unitsMandatoryIdentifies the statistical units composing the nomenclature.
namesMandatoryLists the names of the statistical units.
hierarchyMandatory for hierarchical nomenclatures. Meaningless for non-hierarchical ones.Establishes hierarchical relations between the units of the nomenclature.
changesMandatory for new versions of supported nomenclatures. Meaningless for new (original) nomenclatures. Lists the changes that characterize the described version of the nomenclature as compared with the previous one.
derivationsOptional for derived nomenclatures. Meaningless for original ones. Provides the information about the relationship between the units of a derived nomenclature and the ones of the original one.
equivalenceOptional Establishes relationship between the units of the described nomenclature and other supported nomenclatures.
study_areaOptional Establishes relationship between the units code and the country it belongs to. Aims at associating the units codes with a study area for the "where" filter of the ESPON Database Portal Search Query.

The names of the sheets are not case sensitive.

The following sections give detailed descriptions of the contents of these sheets.

2.1. About Sheet

The about sheet provides additional information about the originator of the nomenclature description and some details on its use. This sheet must have the layout as shown in Table 2.2 (examples of values are given in italics).

Table 2.2. about sheet layout

 AB
1contact_nameRonan Ysebaert
2email[email protected]
3package_versionv0 last updated 2017-05-31
4software_usedOpen Office v 5.3 and Quantum GIS 1.7
5geometries_copyrightProduced by ESPON Database Project. Free public distribution through ESPON DB Web Application.
6geometries_scale1:3000000
7additional_info Please note that all the names of statistical units are now translated in esperanto (EO) and volapük (VO), the new official EU languages.

The expected cells in this sheet are detailed below:

contact_name

MANDATORY. The person who created the files in the package or the person responsible for their creation.

email

MANDATORY. The email address of the person mentioned by the previous property.

package_version

MANDATORY. The version of the package (may be the date of the last update) describing the nomenclature.

software_used

MANDATORY. The names and the versions of the software used to create the nomenclature.xls and the geometries files.

geometries_copyright

OPTIONAL. Information on the copyright of the geometries in the package. It must detail the source of production of the geometries and the restrictions on their distribution, if they exist.

geometries_scale

OPTIONAL. Information on the scale, the level of details, the sizes of the files for the geometries. The recommended scales to use are 1:3000000, 1:10000000, 1:20000000 and 1:60000000 (3M, 10M, 20M and 60M).

additional_info

OPTIONAL. Any details that the originator of the package may consider as useful.

[Important]

The data_format field has been removed since the revision 0.9 of these specifications. Please consult Appendix A for further information.

2.2. Version Sheet

The version sheet gives general information about the nomenclature described by the package. This sheet has the following layout (examples of values are given in italics):

Table 2.3. version sheet layout example

 AB
1nomenclature_nameNomenclature of Territorial Units for Statistics
2nomenclature_acronymNUTS
3is_officialtrue
4URLhttp://www.example.org/nomenclatures/n/2019.pdf
5version_name2019
6version_start2020-01-01
7version_previous2015
8similar_to 
9derived_from 
10dimension 
11level 

The expected cells in this sheet are detailed below:

nomenclature_name

MANDATORY if this nomenclature is not supported yet.

OPTIONAL for new versions of an already supported nomenclature.

The nomenclature_name property specifies the full name of the nomenclature.

nomenclature_acronym

MANDATORY

The nomenclature_acronym property specifies the acronym used for the nomenclature. For new versions of supported nomenclatures, the value of this cell must correspond to a supported nomenclature acronym (see the Appendix B).

is_official

MANDATORY

The is_official property shows if this nomenclature (version) is official. Official nomenclatures are standardized and legally approved by regulations on statistical units. For example, the NUTS nomenclature is official because it is approved by the European Commission and is published in the Official Journal of the European Union. If the nomenclature is not a part of a standard, it cannot be considered as official.

URL

MANDATORY

The URL property specifies the link to the on-line publication of documents describing the nomenclature (version).

version_name

MANDATORY

The version_name property specifies the name/title used to denote the nomenclature version described by the package.

version_start

MANDATORY

The version_start property specifies the date when this nomenclature version comes in force.

version_previous

MANDATORY if the package describes a new version of a nomenclature already supported by the Database.

OPTIONAL if it describes a new original nomenclature of statistical units, then this cell must be left blank.

The version_previous property specifies the name/title of the version that is chronologically previous to the described one.

similar_to

MANDATORY only if this is a new (original) nomenclature AND if its structure is similar to one of other supported nomenclatures.

OPTIONAL in other cases, then this cell should be left blank.

This field aims at referencing an already supported nomenclature which is similar in its structure: for example, EFTA (European Free Trade Association, see http://www.efta.int/ for further information), and CC (Candidate Countries) nomenclatures are similar to NUTS.

The expected pattern for a similar_to item is an already supported nomenclature acronym and its version, separated by a pipe "|" character. For example, if EFTA nomenclature version 2003 was to be introduced for the first time into the database, this cell would have had the value NUTS|2003. Several similar nomenclatures items can be listed, separated by a comma. Example: NUTS|2003,NUTS|2006.

derived_from

The derived_from property is used in case when the nomenclature is derived (its units are built upon the units of other supported nomenclature(s)). The value must refer to the nomenclature and the version from which the described nomenclature is produced. For example, in the NUTS2-3 nomenclature, the units were produced from NUTS nomenclature, so this is specified as NUTS_1989 (for the NUTS of version 1989). If this is not a derived nomenclature, this cell must be left blank.

If there are more than one nomenclature from which the described one derives, each of them must be mentioned on a separate row by repeating the label of the "A" column and giving the reference to the nomenclature from which this one derives in the "B" column.

dimension

MANDATORY only for new (original) nomenclatures OR by a nomenclature version that brings changes in the hierarchy of levels.

OPTIONAL in other cases, then this cell must be left blank.

If set, this property must be repeated as many times, as there are dimensions in the hierarchical structure of the nomenclature. After each dimension row, there must be rows that declare the levels corresponding to this dimension. After them, another dimension row can appear to introduce the next dimension. Please see the example of the file for the UNEP (United Nations Environment Programme) nomenclature below.

The order in which this property appears in the table is relevant: dimensions with higher priority must appear above dimensions with lower priorities. The priority of a dimension is defined by its importance in the list of all dimensions. For example, in UNEP nomenclature, geographical dimension has higher priority than the political one.

level

MANDATORY (at least one level) IF the dimension property is set.

OPTIONAL only after each dimension property OR after another level property. It introduces the identifier of a nomenclature level. All the levels belonging to the same dimension must be listed after the respective dimension property, beginning from the most general level and ending by the most detailed one.

[Important]

Since the revision 0.10 of these specifications, the definition of dimension and level has been reviewed, the levels_count and dimensions_count fields have been removed. Please consult Appendix A for further information.

To give more detailed examples, the following sub sections propose two complete "version" sheet layouts covering different nomenclatures.

2.2.1. version sheet layout example: NUTS 2006

The following layout describes the NUTS nomenclature of 2006 version, as if it were not yet supported by the Database, but the previous NUTS versions were already supported.

Table 2.4. version sheet introducing NUTS 2006 nomenclature

 AB
1nomenclature_name 
2nomenclature_acronymNUTS
3is_officialtrue
4URLhttp://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2007:039:0001:0037:EN:PDF
5version_name2006
6version_start2008-01-01
7version_previous2003

The property nomenclature_name is empty because this nomenclature is a new version of the NUTS nomenclature, already supported by the Database.

The property similar_to is empty because the NUTS nomenclature is a base one (but there are other nomenclatures that are similar to NUTS: in principle, this relationship is reversible).

The property derived_from is empty because the NUTS nomenclature is not derived from any other.

In the NUTS 2006 version, there have not been any changes to the hierarchical structure of the nomenclature, so it is not necessary to specify the properies that may follow the derived_from one.

2.2.2. version sheet layout example: UNEP 2006

The next layout introduces the UNEP nomenclature of 2006 version, as if it was not yet supported by the Database. It would be the first supported version of this nomenclature.

Table 2.5. version sheet introducing UNEP 2006 nomenclature

 AB
1nomenclature_nameUnited Nations Environment Programme
2nomenclature_acronymUNEP
3is_officialfalse
4URLhttp://geodata.grid.unep.ch/extras/geosubregions.php
5version_name2006
6version_start2006-01-01
7version_previous 
8similar_to 
9derived_fromISO-3166
10dimensiongeographical
11levelGlobal
12levelRegion
13levelSubregion
14levelNational
15dimensionpolitical
16levelSovereign
17levelNational

The is_official property is set to false because, although this nomenclature is widely used by the UNEP, no official documentation was found detailing it, apart from the list of units available at the cited URL.

The version_previous property is empty because it is the very first available version of the nomenclature.

The similar_to property is empty because no other known nomenclatures use exactly the same structure.

The derived_from property is set to ISO-3166 because at the National level, the UNEP nomenclature uses exactly the same units that are references by the ISO-3166 standard, including the digital units codes.

Then two dimensions are set:

  1. one dimension named geographical, composed of the following levels:

    1. Global

    2. Region

    3. Subregion

    4. National

  2. one dimension named political, composed of the following levels:

    1. Sovereign

    2. National

[Note]

The geographical dimension is described before the political one because it has higher priority.

The levels of each dimension are described from the most to the least general.

The values used to set the dimension and level properties must be used by the corresponding columns in the hierarchy sheet (see Section 2.5).

2.3. Units Sheet

The units sheet gives the full and exhaustive list of territorial/statistical units composing the nomenclature.

This sheet must respect the layout shown in Table 2.6 (examples of values are given in italics).

Table 2.6. units sheet layout example

 AB
1unit_codeis_territorial
2BEtrue
3BE1
4BE10true
5BE100
6Zfalse

Details on the columns are given below:

unit_code

MANDATORY

Contains the codes of statistical units. The expected data type is a character string.

is_territorial

OPTIONAL

The column is_territorial is a boolean value that shows if the statistical unit of this row has a territory: some statistical units are not associated with particular geographic area, for example the NUTS units having the codes **Z, **ZZ and **ZZZ. For the units that do not have their own geographic area, the value in this column must be false. The expected data type is a boolean.

[Note]

Only false values in the is_territorial column are considered as units that do not have a territory. Consequently, "true" and "empty" cells are equivalent (rows 2, 3, 4, 5 in Table 2.6).

Data constraints. Unit codes are identifiers of statistical units. They cannot be duplicated in this sheet.

2.4. Names Sheet

The names sheet contains the list of the names of all statistical units referenced in the units sheet. This sheet is necessary because each statistical unit can have more that one name (official or inofficial) in more than one language. This multiplicity is very difficult to layout on a single sheet.

Each statistical unit must have at least one name specified. This helps users to identify the unit without looking into the nomenclature documentation, because the code of the unit may provide no extra-nomenclature information about the statistical unit itself, while the name is more commonly used in everyday life.

The names sheet must respect the layout shown in Table 2.7 (examples of values are given in italics).

Table 2.7. names sheet layout example

 ABCD
1unit_codeunit_nameis_officiallanguage_code
2BEBELGIQUEtrueFR
3BEBELGIEtrueNL
4BEBELGIUMfalseEN
5BE1RÉGION DE BRUXELLES-CAPITALEtrueFR
6BE1BRUSSELS HOOFDSTEDELIJK GEWESTtrueNL

The aims of columns are detailed below:

unit_code

MANDATORY

References the code of the statistical unit present into the units sheet (see Section 2.3). All statistical units must have at least one name specified, that is to say, all codes from the units sheet be associated with at least one name. The expected value in this column is a character string that is also present in the units sheet.

unit_name

MANDATORY

Contains statistical units names.

is_official

OPTIONAL

Shows if the name in the previous column is an official name of the statistical unit. Empty values will be considered as true values. There may be more than one official name specified by unit, for example, when it is officially used in several languages. Each of the official names of the unit must be presented on a separate row. The expected value is a boolean, true or empty cell for official names, false for non official ones. For example, the official names of Regions in the the European Union for the NUTS 2006-EU27 nomenclature is available on Eurostat Web Site [3].

language_code

MANDATORY

Specifies the language in which the name is given. This code is a 2-characters ISO-639-1 code of the respective language, in upper or lower case. The expected value is a character string of two letters corresponding to an ISO-639-1 language code. The official publication of the ISO-639-1 standard can be found on the ISO site. The full updated list of codes and language names is publicly available at Wikipedia

Data constraints:

  • According to this layout, no more than one official or unofficial name can be specified for a statistical unit in the same language.

  • Any combination of unit_code + is_official + language_code is unique in the table.

2.5. Hierarchy Sheet

The hierarchy sheet describes the hierarchy of statistical units inside the nomenclature. If the nomenclature does not have a hierarchy of units, this file is not relevant.

Hierarchies of units depend on nomenclatures. The general structure of the hierarchy is defined in the version sheet. The hierarchy sheet must refer to the values present in the version one.

Generally, a statistical unit can have one super-unit. For example, in NUTS nomenclature, any unit of NUTS3 level has a super-unit of NUTS2 level; NUTS2 units have parents in NUTS1 level, etc.

In some particular cases, a statistical unit can have more than one super-unit. This happens when the nomenclature defines several dimensions, which can not be merged. For example, the UNEP nomenclature [4] defines three levels in geographical dimension (Regional, Sub-regional and National) and two levels in political dimension (Sovereign and National). The lowest level (National) is shared between the two dimensions, and one unit code can have a super unit code for the geographical dimension and another super unit code for the political dimension.

The units of the top level of the hierarchy do not have super-units and must reference themselves in the table.

The hierarchy sheet must respect the layout detailed in Table 2.8 (examples of values are given in italics). For this example, we suppose that:

  • the version sheet specifies a single default dimension for the mono-dimensional hierarchy used;

  • the levels 0, 1, 2 and 3 are specified (from top to bottom) in the version sheet for the default dimension.

Table 2.8. hierarchy sheet layout example

 ABCD
1unit_codedimensionlevelsuper_unit_code
2BE0BE
3BE11BE
4BE102BE1
5BE1003BE10

The aims of the columns are detailed below:

unit_code

MANDATORY

References the code of the statistical unit present in the units sheet (see Section 2.3). In a hierarchical nomenclature, all statistical units must have at least one super-unit specified, that is to say, all codes from the units sheet must be associated with at least one super-unit. Top-level units reference themselves. The expected value in this column is a character string that is also present in the units sheet.

dimension

OPTIONAL

If not blank (mono-dimensional case), references a dimension defined in the version sheet (see Section 2.2.2). If the nomenclature is monodimensional, the cells in this column can be left blank. Otherwise, the expected value must match one of the values of the dimension fields present in the version sheet.

level

MANDATORY

Specifies the level to which belongs the unit. The expected value must match one of the values of the level fields defined in the version sheet for the given previous dimension value.

super_unit_code

MANDATORY

Specifies the code of the statistical unit that is immediately superior to the one specified in the unit_code column. The expected value in this column is a character string that is also present in the units sheet.

[Important]

The dimension and level labels have replaced the previous level_dimension and unit_level labels since the revision 0.10 of these specifications. Please consult Appendix A for further information.

Data constraints. Any combination of unit_code + level_dimension is unique in the table.

2.6. Changes Sheet

The changes sheet is mandatory only if the nomenclature to integrate represents a new version of a nomenclature already supproted by the Database. This sheet tracks the changes that characterize the new version as compared with the previous one.

Before introducing the template of the sheet, the typology of statistical units evolution is presented here in order to explain the conventions to use.

2.6.1. Typology of Statistical Units Evolution

The typology of statistical units changes is exhaustively described in [2]. The present section makes a brief introduction into it, with a certain adaptation to the ESPON database implementation.

Generally, there are three types of changes or events that may occur to a statistical unit during its evolution in a nomenclature. This may be an existential change, a territorial change or a non-territorial one.

For each significant type of a unit's change the following sections specify the label to use in the changes sheet.

Existential Changes

Existential changes concern life events in a unit's history. They occur when a statistical unit is created or terminated.

A statistical unit is considered to be created when it appears for the first time in the list of statistical unit of the nomenclature. It is considered to be terminated when it disappears from this list.

The creation of a unit may be absolute. This is the case when a new territory is added to the nomenclature, where no units existed before, or when it is impossible to determine which units of the previous version served as ancestors for the new unit. This type of the unit's creation must be labelled "new unit".

It can also be relative. It occurs when the new unit results from the modifications applied to the units that already existed in the previous version of the nomenclature.

The termination of a unit may be absolute. This is the case when a part of territory has been excluded in the new version of the nomenclature, or when it is not possible to determine which units appear in the new version in the area formerly occupied by the unit that was terminated.

It can also be relative. It occurs when new units are created in the area of the previous one, that is to say that the disappearing unit is an ancestor for one or more unit being created in the new version.

Territorial Changes

Territorial changes occur when the territory of a unit is modified, but the unit still exists in both the versions of the nomenclature. It continues to be considered as the same entity.

As for existential changes, territorial ones can be absolute and relative.

An absolute territorial change occurs when the bounds of the area covered by the unit is completely shifted, without intersecting with the previous territory. This case is very rare, but can occur theoretically.

A relative territorial change occurs when modifications are made on the unit and its neighbors, so as the bounds are no more the same in the new version.

Non-Territorial Changes

Non-territorial changes occur when the unit is still considered as the same entity as in the previous version, but its name or code change in the new version. These changes are not accompanied by modifications of the unit's area.

A code change usually happens when the nomenclature incurs a harmonization after a series of modifications in previous versions.

A name change can happen between any versions and may be caused by different factors of historical, political, economic or other contexts.

Code and name changes may occur separately or simultaneously. In the first case, they must be labelled "code change" and "name change" respectively. In the second case, they must be labelled "code and name change".

Code and name changes are tracked only if they represent independent events, happening without existential changes of the unit. In fact, an existential change already implies a code and a name creation and termination.

Statistical Units Changes Case-by-Case

Following the nomenclature of territorial units events ([2], page 148), the changes of statistical units may be represented as follows.

  1. Merge changes happen when two or more units of the previous version are merged into another one. They must be labelled with "merge" keyword.

    1. The resulting unit may be a new unit:

      The description of these changes is the following:
      • GU1 merges into GU3

      • GU2 merges into GU3

      This description is minimal and sufficient to deduce that GU1 and GU2 are terminated by merge (relative termination) and GU3 is a new unit created from merge (relative creation).

    2. The resulting unit may be one of the units that participated in the merge:

      The description of these changes is the following:
      • GU2 merges into GU1

      • GU1 is changed by merge

      This description is minimal and sufficient to deduce that GU2 is terminated by merge (relative termination) and a relative territorial change (by merge) occurred to GU1.

  2. Split changes happen when a unit is divided into two or more units in the new version. They must be labelled with the "split" keyword.

    Several cases of split change are described below:

    1. A split change can cause the original unit's termination:

      The verbose description is the following:
      • GU3 is split into GU1

      • GU3 is split into GU2

      This is minimal and sufficient to deduce that GU3 is terminated by split (relative termination) and that new units GU1 and GU2 are created from this split (relative cretion).

    2. If the original unit is not terminated in the new version, this is a case of extraction:

      The verbose description is the following:
      • GU3 is changed by split

      • GU3 is split into GU4

      This is minimal and sufficient to deduce that GU3 is changed by split (relative change) and that a new unit GU4 is created from this split (relative creation).

  3. Redistribution changes happen when at least one original unit continues its existance or disappears and partial territorial changes are made to all the units concerned. These cases are mixed situations of merges and splits, when it is difficult or impossible to define the main characteristics of the change. These changes must be labelled as "redistributed".

    If a territorial change happens to two or more units, at least one of them disappears or at least one new unit appears in the new version, this is a reallocation change:

    The description of these changes is the following:
    • GU1 terminates by reallocation

    • GU2 changes by reallocation

    • GU3 is created from reallocation

    This description is minimal and sufficient to deduce that GU2 is changed by reallocation (relative change), GU1 is terminated by reallocation (relative termination) and GU1 is created by reallocation (relative creation).

    If the territory of two or more units has been revised in the new version, but no units disappeared, neither appeared in the new version, it is the case of a rectification change:

    The description of these changes is the following:
    • GU1 changes by rectification

    • GU2 changes by rectification

    This description is minimal and sufficient to deduce the respective relative territorial changes.

2.6.2. Changes sheet table example

The case-by-case analyzis made in the previous section allows to create a pattern to follow in order to make the trace of the nomenclature evolution. The example below cites the changes seen between NUTS 2006 and 2010 versions.

The changes sheet must respect the layout shown in Table 2.9 (examples of values are given in italics).

[Important]

To be parsed and taken into account, the changes sheet requires a non-empty value for the version_previous field of the version sheet.

Table 2.9. changes sheet layout example

 ABCDE
1unit_code_previousunit_level_previousunit_code_thisunit_level_thischange
2DE412DE402merge
3DE422DE402merge
4GR0EL0code change
5IE0243IE0243name change
6GRZ1ELZ1code change
7GRZ1ELZ1name change
8ITC453ITC4C3split
9ITC453ITC4D3split

Details on the columns are given below:

unit_code_previous

MANDATORY (except for a new unit, e.g. when the change cell value is one of created, created from merge or created from redistribution).

References the code of the changing unit of the previous nomenclature version. It must be one of the unit codes already present in the version of the nomenclature supported by the Database.

unit_level_previous

MANDATORY

References the level label of the changing unit of the previous nomenclature version. It must be one of the unit levels already present in the version of the nomenclature supported by the Database. An empty value references the default level of a non-hierarchical nomenclature.

unit_code_this

MANDATORY (except for a termination, e.g. when the change cell value is one of termination, split, or redistribution).

Contains the code of the unit of the described version of nomenclature. It must be also present in the units sheet (see Section 2.3).

unit_level_this

MANDATORY

References the level label of the changing unit in this nomenclature version. It must be one of the unit levels declared in the version sheet. An empty value references the default level of a non-hierarchical nomenclature.

change

MANDATORY

Contains the literal (case-unsensitive) corresponding to the type of the change produced, previously described in the subsections on the typology of changes. Two simultaneous changes (example: the code change and the name change) must be mentioned on two lines (see lines 6 and 7 in Table 2.9). The expected literals are:

created

New unit creation event.

created from merge

New unit creation from merge event.

created from split

New unit creation from split event.

created from redistribution

New unit creation from territorial redistribution event.

name change

Name change event.

code change

Code change event.

territory change

Territorial change event.

territory merge

Territorial change event caused by a merge.

territory split

Territorial change event caused by a split.

territory redistribution

Territorial change event caused by a redistribution.

termination

Unit termination event.

merge

Unit termination by merge event.

split

Unit termination by split event.

redistribution

Unit termination by Unit termination by redistribution event.

The line is rejected if the value in this column is not one of this enumeration.

Data constraints. Any combination of unit_code_previous + unit_code_this + change is unique in this table.

2.7. Equivalence Sheet

The equivalence sheet is optional and specifies the units of the nomenclature that have equivalences in other nomenclatures supported by the Database. These links with other nomenclatures are used to make automatic conversions of statistical data between nomenclatures.

[Note]

An equivalence is a particular case of a derivation, whose the relationship value is EQUALS. See the Section 2.8 for more information.

The condition of the validity of this file is the support of the referenced nomenclature and its version by the Database.

It is not necessary to establish equivalence links with the units of other versions of the same nomenclature. This will be automatically done during the integration of the nomenclature into the library.

Two statistical units are considered to be equivalent if they are equal semantically (a country referenced by a nomenclature may be equivalent to a country in another one, but not to a group of countries) and geographically (both the units have the same shape and boundaries).

The equivalence sheet must respect the layout shown in Table 2.10 (examples of values are given in italics).

Table 2.10. Equivalence sheet layout example


Details on the columns are given below:

unit_code

MANDATORY

References the code of the statistical unit present into the units sheet. Theoretically, a statistical unit can be equivalent to any number of statistical units in other nomenclatures. The expected value in this column is a character string that is also present in the units sheet (see Section 2.3).

unit_level

MANDATORY

References the level of the statistical unit present into the version sheet. An empty value references the default level of a non-hierarchical nomenclature.

equivalent_unit_code

MANDATORY

References the code of the equivalent unit valid in another nomenclature. The expected value in this column must be a statistical unit code already registered by the Database.

equivalent_unit_level

MANDATORY

References the level of the equivalent unit. An empty value references the default level of a non-hierarchical nomenclature.

equivalent_nomenclature

MANDATORY

References the nomenclature to which belongs the unit code in the previous column. The expected value in this column must be an acronym of a nomenclature supported by the Database [1].

equivalent_version

MANDATORY

References the version of the nomenclature mentioned in the previous column, where the equivalence between the units is established. The expected value must be an identifier of a nomenclature version supported by the Database [1].

Data constraints. Any combination of unit_code + equivalent_nomenclature + equivalent_version is unique in this table.

2.8. Derivations Sheet

The derivations sheet contains information about the units of other nomenclatures that have been used to create a new nomenclature. This file can exist only for derived nomenclatures (produced using statistical units of other nomenclatures) and is optional. Establishing links with original units will allow the application to convert data between related nomenclatures.

Tree types of links can be established between the original and the derived units in the current version of the specification. These types must be indicated in the derivations sheet. They are the following:

  • The derived unit can be a union of two or more original units, representing an aggregation or a generalization of the original units. In this case (parent relationship), the label "INCLUDES" must be used.

  • The derived unit can be equal to the original. The label to use is "EQUALS".

  • Two or more derived units can represent parts of the original unit, so as the union of the derived units is equal to the original one. In this case (child relationship), the label to use is "INCLUDED".

The derivations sheet must respect layout shown in Table 2.11 example. In this example, the original units (BE, NL and LU) are used by a derived nomenclature to form an aggregated unit having the code BENELUX in this custom nomenclature. The aggregated unit can be characterized as a super-structure, a parent for the original ones, existing inside the described nomenclature.

Table 2.11. Derivations sheet layout example

 ABCDEFG
1unit_original_codeunit_original_levelunit_original_nomenclatureunit_original_versionunit_derived_codeunit_derived_levelrelationship
2BE0NUTS2003BENELUXincludes
3NL0NUTS2003BENELUXincludes
4LU0NUTS2003BENELUXincludes

[Note]

The lines of the tables must be read and understood as follows: the unit encoded BENELUX in this nomenclature description includes (or "derives in an inclusive relationship from") the units encoded BE (line 2), NL (line 3) and LU (line 4), which are available in the nomenclature encoded NUTS in its version 2003. This example assumes that the NUTS version 2003 is already stored in the database.

Details of columns are given below:

unit_original_code

MANDATORY

Contains the code of the original unit used to produce the unit of the described nomenclature. The original unit must be already registered in the Database and its code must be valid.

unit_original_level

MANDATORY

Contains the level label of the original unit used to produce the unit of the described nomenclature. The original level must be already registered in the Database and its code must be valid. An empty value references the default level of a non-hierarchical nomenclature.

unit_original_nomenclature

MANDATORY

Contains the literal of the nomenclature where the original unit exists. This column is added to the file layout because a custom nomenclature can derive from more that one others.

unit_original_version

MANDATORY

Contains the title of the version of the nomenclature where the original unit exists. This column is added to the file layout because a custom nomenclature can derive from more that one others.

unit_derived_code

MANDATORY

Must contain the code of the unit from the described nomenclature version (it must be present in the units sheet, see Section 2.3).

unit_derived_level

MANDATORY

Contains the level label of the present unit. An empty value references the default level of a non-hierarchical nomenclature.

relationship

MANDATORY

Characterizes the link between the units referenced in the currently described nomenclature (unit_derived_code column) and the units described by the three columns A (existing unit code) B (existing nomenclature code) C (existing nomenclature version). The possible values in this relationship column are:

  • INCLUDES - if the current derived unit fully includes (is a parent of) the original unit.

  • EQUALS - if the current derived unit and the original unit are equal.

  • INCLUDED - if the current derived unit is included (is a child of) in the original unit.

  • INTERSECTS - if the derived unit and the original one intersect without full inclusion or equality.

The line is rejected if the value in this column is not one of this enumeration.

2.9. study_area Sheet

This sheet is optional. It has been integrated to these specifications for the particular case of the UMZ nomenclature. The "where" filter of the Search Query in the ESPON Database Portal allows the user to select a study area based on the names of the countries (example: "FR", "UK") or on a set of countries (example: EU 28). In the NUTS nomenclatures definitions, the unit code provides in its prefix the country code it belongs to (example: "FR123" is in France). For the UMZ nomenclature, the unit code is an integer: this study_area sheet aims at associating a country code for each unit code. Consequently, each unit code of the nomenclature can be associated to a study area.

This sheet must respect the layout shown in Table 2.12.

Table 2.12. study_area sheet layout example

 AB
1unit_codecountry_code
2123456FR
3789012UK
4159357NL

Details on the columns are given below:

unit_code

MANDATORY

Contains the codes of statistical units.

country_code

MANDATORY

Contains the codes of the countries.

Data constraints. Unit codes are identifiers of statistical units. They cannot be duplicated in this sheet. The country codes must be given with 2 characters.

Chapter 3. Geometries File

The geometries.zip file must contain the geometries of all the units of the nomenclature that have territorial representation. This file is mandatory. It is an archive wrapping several files required by the ESRI Shapefile format (see below). This format is mandatory for nomenclature geometries in the context of the present specification.

Each geometry present into the shapefile must be associated with an attribute corresponding to the code of the statistical unit. The name of the attribute is unit_code.

The geometries_scale property in the about sheet (nomenclature.xls file) must detail the level of the generalization of the geometries defined in the present file.

Data constraints. Each geometry must correspond to a tuple unit_code + is_territorial in the units.xls file, where the value of the is_territorial property is true.

According to Wikipedia Shapefile page (last visit: 2011-10-27), the description of a geometry with the ESRI shp file format must be composed of the following set of files:

  • Mandatory files:

    • .shp: shape format; the feature geometry itself.

    • .shx: shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly.

    • .dbf: attribute format; columnar attributes for each shape, in dBase IV format.

  • Optional files:

    • .prj: projection format; the coordinate system and projection information, a plain text file describing the projection using well-known text format.

    • etc (other optional files with their specific extensions exist).

For further information about the shape file format, please consult The ESRI Shape File Technical Description [5].

Appendix A. Changes

This appendix describes the main changes and evolutions of these specifications, regarding mainly the expected content of the nomenclature description xls file.

  • Revision 0.16 (2014-07-23)

    The ESPON NUTS 2010 extended Nomenclature has been updated to take into account the changes in the Eurostat official publication for NUTS 2010-2013 [7]. In this document, the spatial units in Croatia identified by the codes HR01 and HR02 have been merged to an unique spatial unit whose code is now HR04. As listed in Table A.1, the codes of the spatial units at sub-level of HR01 and HR02 are now prefixed with HR04.

    Table A.1. Update of Unit Codes for Croatia in NUTS 2010 Nomenclature
    Previous version Updated version (July 2014)
    unit code unit name unit code unit name
    HR01Sjeverozapadna Hrvatska
    HR02Središnja i Istočna (Panonska) Hrvatska
    HR04Kontinentalna Hrvatska
    HR011Grad ZagrebHR041Grad Zagreb
    HR012Zagrebačka županijaHR042Zagrebačka županija
    HR013Krapinsko-zagorska županijaHR043Krapinsko-zagorska županija
    HR014Varaždinska županijaHR044Varaždinska županija
    HR015Koprivničko-križevačka županijaHR045Koprivničko-križevačka županija
    HR016Međimurska županijaHR046Međimurska županija
    HR021Bjelovarsko-bilogorska županijaHR047Bjelovarsko-bilogorska županija
    HR022Virovitičko-podravska županijaHR048Virovitičko-podravska županija
    HR023Požeško-slavonska županijaHR049Požeško-slavonska županija
    HR024Brodsko-posavska županijaHR04ABrodsko-posavska županija
    HR025Osječko-baranjska županijaHR04BOsječko-baranjska županija
    HR026Vukovarsko-srijemska županijaHR04CVukovarsko-srijemska županija
    HR027Karlovačka županijaHR04DKarlovačka županija
    HR028Sisačko-moslavačka županijaHR04ESisačko-moslavačka županija

  • Revision 0.15 (2013-05-31)

    The Appendix B has been updated to take into account the reference to the new geographical objects that have been available in the ESPON Database since the June 2013 delivery: UMZ, FUA and MUA nomenclatures.

  • Revision 0.14 (2013-04-15)

    The NUTS and EFTACC nomenclatures are now deprecated, they have been replaced by the extended NUTS nomenclature. The Appendix B has been upated to be consistent with the changes regarding the integration of this "extended NUTS" nomenclatures.

    For the integration of the UMZ nomenclature into the database, it has been necessary to add a new sheet entitled study_area to these specifications. This sheet is described in Section 2.9.

  • Revision 0.13 (2013-01-18)

    The Appendix B has been refactored to clearly identify the integrated nomenclatures in the ESPON Database, e.g. the nomenclatures that can be referenced from ESPON TPGs Key Indicators Datasets.

  • Revision 0.12

    Consequent modifications in the sheets entitled Changes, Derivations and Equivalences: in order to avoid ambiguity about the referenced spatial units, the levels must now systematically be mentioned. Indeed, for example in the NUTS nomenclatures versions 1995 and 1999, units codes may be duplicated on several levels.

  • Revision 0.11

    • In the sheet entitled Changes, the list of possible spatial events types has been updated and completed. Moreover, the combinations of simultaneous atomic changes (example: code and name change) must be mentioned on several lines, e.g. each line of the table must mention only one atomic change. See Table 2.9 and the possible values for the change column.

    • In the sheet entitled Derivations, the list of possible values for the relationship column has changed. See Table 2.11 and the possible values for the relationship column.

  • Revision 0.10

    Important changes have affected the layout and content of the nomenclature description xls file since the revision 0.9 of the document:

    • In the sheet entitled Version, the definition of levels and dimensions has been reviewed. The levels_count and dimensions_count have been removed. Please consult Section 2.2 for the new expected layout.

    • In the sheet entitled Hierarchy, the column header label level_dimension has been renamed dimension, the column header label unit_level has been renamed level, please consult Section 2.5 for the new expected layout.

  • Revision 0.9

    In the sheet entitled About, the data_format field has been removed. Please consult the Section 2.1 for the new expected layout.

Appendix B. Templates and Examples

This appendix proposes a set of examples of Nomenclatures Input Packages as attached resources.

  • Nomenclature_template.zip

    This example is an empty template showing an example of the expected input nomenclature description package. It proposes a nomenclature.xlt empty nomenclature description file which is valid in terms of layout and expected labels, but without any value. It can be used to build a new nomenclature description from scratch.

  • Available nomenclatures in the ESPON Database:

    • "Extended" NUTS revisions (including EU 28 and EFTA countries):

    • Nomenclature_NUTS23_2006.zip

      Particular NUTS 2/3 2006 Nomenclature.

    • The following "urban" nomenclatures have been integrated in the ESPON Database since the June 2013 delivery:

      • Nomenclature_cities_UMZ.zip

        Geometries of Urban Morphological Zones (UMZ) for cities over 10 000 inhabitants (4300 statistical units in Europe). This nomenclature covers the ESPON area (EU 28 + EFTA) and Western Balkans.

        [Note]

        For the needs of the study area filters of the ESPON Database Portal Search Query interface, an additional sheet has been added to the UMZ nomenclature definition xls file: study_area. As described in Section 2.9, this sheet proposes two columns in order to associate each territorial unit of the nomenclature to the country code it belongs to. Most of the territorial units belong to EU28 or EU28+4 (e.g. EU28 + Iceland + Lichetenstein + Norway + Switzerland) or EU27+4+CC (EU27+4 + Candidate Countries) study areas. Please take into account two particular cases regarding these associations:

        • The territorial unit Monaco Menton Beausoleil (code: 99990) has been attached to France.

        • The territorial unit San Marino (code: 98561) has been attached to Italy.

      • Nomenclature_cities_FUA.zip

        Nomenclature for the Functional Urban Area (FUA) geographical object.

      • Nomenclature_cities_MUA.zip

        Nomenclature for the Morphological Urban Area (MUA) geographical object.

  • Other nomenclatures given as examples (not intented to be available in the ESPON Database):

    [Note]

    The geometries are excluded from these publicly available packages: the expected geometries.zip archive in these input packages only contain a README file. A real expected nomenclature geometry must include the files described in Geometries file. Nevertheless, these examples propose examples of the expected nomenclature description xls files.

Appendix C. References

Table of Contents

[1] Anton Telechev and Benoit Le Rubrus. ESPON Data and Metadata Specifications. ESPON Database Portal (last visit: 2013-01-18) .

[2] Christine Plumejeaud. Modèles et méthodes pour l’information spatio-temporelle évolutive. Thèse soutenue le 22 septembre 2011 à l'Université de Grenoble. Full text in PDF (last visit: 2011-09-30) .

[3] eurostat EUROPEAN COMMISSION. Regions in the European Union. Nomenclature of territorial units for statistics NUTS 2006 / EU-27. Edition 2007. ISSN 1977-0375. Full text in PDF (last visit: 2012-03-13) .

[4] United Nations Environment Programme. UNEP Environmental Data Explorer. Search - Map - Graph - Download. http://geodata.grid.unep.ch/extras/geosubregions.php (last visit: 2012-03-13) .

[5] ESRI. ESRI Shape File Technical Description. An ESRI White Paper - July 1998. Full text in PDF (last visit: 2012-03-23) .

[6] International Organization for Standardization. ISO 3166 Maintenance agency (ISO 3166/MA) ISO's focal point for country codes. http://www.iso.org/iso/country_codes.htm (last visit: 2012-03-30) .

[7] Eurostat. NUTS 2010 - NUTS 2013 (Excel file). Publication (last visit: 2014-07-18) .

Appendix D. About

This document is part of the ESPON 2013 Database Phase 2 project, also known as M4D (Multi Dimension Database Design and Development). It was generated on the 2014-12-19 17:32:40, from the sources of the m4d forge imag project at the svn rev 2420.

The main authors of this document are Anton Telechev and Benoit Le Rubrus (LIG STeamer), with the collaboration of UMS RIATE and LIG STeamer M4D Partners.

For any comment question or suggestion, please contact .

Colophon

Based on DocBook technology [1], this document is written in XML format, sources are validated with DocBook DTD 4.5CR3, then sources are transformed to HTML and PDF formats by using DocBook xslt 1.73.2 stylesheets. The generation of the documents is automatized thanks to the docbench LIG STeamer project that is based on Ant [2], java [3], processors Xalan[4] and FOP [5]. Note that Xslt standard stylesheets are customized in order to get a better image resolution in PDF generated output for admonitions icons: the generated sizes of these icons were turned from 30 to 12 pt.



[1] [on line] DocBook.org (last visit: July 2011)

[2] [on line] Apache Ant - Welcome. Version 1.7.1 (last visit: July 2011)

[3] [on line] Developer Resources For Java Technology (last visit: July 2011). Version 1.6.0_03-b05.

[4] [on line] Xalan-Java Version 2.7.1 (last visit: 18 november 2009). Version 2.7.1.

[5] [on line] Apache FOP (last visit: July 2011). Version 0.94.