Revision History | |
---|---|
Revision 0.16 (svn rev 2420) | 2014-12-19 17:32:37 |
Please consult the Changes appendix for an overview of main changes regarding this document. |
Abstract
This document explains how to prepare a new nomenclature (or a new version of a supported nomenclature) for the integration into ESPON database.
Table of Contents
List of Figures
List of Tables
nomenclature.xls
file structureabout
sheet layoutversion
sheet layout exampleversion
sheet introducing NUTS 2006 nomenclatureversion
sheet introducing UNEP 2006 nomenclatureunits
sheet layout examplenames
sheet layout examplehierarchy
sheet layout examplechanges
sheet layout exampleEquivalence
sheet layout exampleDerivations
sheet layout examplestudy_area
sheet layout exampleDuring the ESPON 2013 Database Phase 2 project (aka M4D - Multi Dimensional Database Design and Development), we are aiming to extend the support of different geographical objects in the database. To the existing library of NUTS units of all public official versions will be added the libraries covering the nomenclatures like WUTS (World Unified Territorial System), UMZ (Urban Morphological Zones), LUZ (Larger Urban Zones), FUA (Functional Urban Areas), LAU (Local Administrative Units) and, probably, some others.
Several datasets, already created during the first phase of ESPON Database project, use statistical units of these nomenclatures, but do not provide information about these units. Before the integration of such datasets, the database needs to know some details about the statistical units used, their hierarchy and relationship with other nomenclatures. This information, collected into the ESPON database, allows afterwards to integrate any number of new datasets using the dedicated statistical units library.
To build the library of statistical units, the automatic analyzer created for this purpose reads the data on a nomenclature of statistical units, checks for its validity and coherence and then puts the analyzed information into the ESPON database.
This document describes the structure of the package and the layout to use in order to prepare a new nomenclature (a new version of an existing nomenclature) for the integration into the ESPON Database. The schema shown in Figure 1.1 illustrates the main elements that compose the description of a nomenclature.
The present document uses the following terms:
The Database - the ESPON Database.
The Library - the library of statistical units supported by the ESPON Database.
The nomenclature - a new nomenclature of statistical units OR a new version of an existing nomenclature, being prepared for integration into the ESPON Database.
The data (file) formats used to prepare a nomenclature for integration into the Database are the following:
Tabular format. This is the main form of data exchange in the ESPON
Database project. In this context, it is represented mainly by Microsoft Excel (.xls
).
This is the default case that does not need any special documentation.
If other tabular formats are used, the documentation coming with the package must precise
the name and the version of the software that is able to open the files and, eventually,
the steps to follow in order to obtain all the information contained in the files without
losses or transformations as compared with its state when it was being sent.
The recommended tabular file format is Microsoft Excel (.xls
) of the 97/2003 version.
The recommended software to create tabular files are Microsoft Excel (proprietary) or
OpenOffice (public and free to use).
GIS format. To represent geographic data (statistical units geometries),
the .shp
(ESRI shape) format must be used. Details on its usage are given below in the present
document (see Geometries file).
.zip
file, with the name composed
of the prefix "Nomenclature_
", followed by the acronym of the nomenclature, an underscore and then a mark
indicating its version. For example, a package describing the NUTS nomenclature of the 2010 version would
be put in the file called "Nomenclature_NUTS_2010.zip
".
Templates and examples for different files cited in the present specification can be found in Appendix B.
Any file or database source composing a package with a nomenclature description must use the UTF-8 characters encoding.
Taking into account the limitation of the .xls
file format of 65536 rows per sheet, large nomenclature
descriptions may not have enough place in a single sheet in a spreadsheet. That is why they
may be distributed among several sheets. Each of the sheets making a part of the list must respect
the same layout and naming convention as explained for the respective part of the specification.
Each tabular sheet must respect the naming rule and the header labels for the columns (the first row of each example below), case insensitive.
There must not be any empty rows before the list header labels, or any empty column between the columns with values.
Most of the values used to describe statistical units are character strings
.
No leading or trailing spaces are allowed for any of them. For certain properties (e.g., unit
codes, nomenclature acronyms), the values are not case sensitive. During the data processing,
they will be converted to upper case. For geographical names, general conventions on proper
nouns should be applied, if the official names are not strictly specified (for example, for NUTS0
units the names must be in upper case without accents).
Boolean values are represented by English literals commonly used for them: true
or false
.
The package describing a nomenclature must contain the files described in the table below.
The file name example given for .xls
(default) format can be changed for other tabular formats
by replacing the file name extension.
Table 1.1. Package contents
File name | Obligation | Description |
---|---|---|
nomenclature.xls | Mandatory |
Describes the nomenclature. This is the main file of the package. |
geometries.zip | Mandatory |
Zip archive containing the shapes/geometries of the statistical units. |
The zip files must not be recursive, e.g they must not include nor prefixes neither subdirectories: that means that, for example, unzipping the archive
directory/ |- myNomenclatureArchive.zip |- geometries.zip |- nomenclature.xlsThen, unzipping the geometries.zip must return the following hierarchy of files:
directory/ |- myNomenclatureArchive.zip |- myGeometries.zip |- myGeometry.shp |- myGeometry.shx |- myGeometry.dbf |- nomenclature.xlsSee also Geometries file for a description of the expected geometries files. |
Table of Contents
The nomenclature.xls
is the main file composing the description of a nomenclature.
It details all the information needed to integrate the nomenclature into the Library.
This file must have a strict structure defined in the present section.
The entire spreadsheet must be divided into several sheets according to the following rule:
Each sheet must have the title corresponding to its contents and defined in the Table 2.1.
For example, the list of statistical units composing a nomenclature must be put in the sheet called
"units
".
Each sheet can contain no more than 65536 rows (limitation of the Excel format). If their effective number
is greater, the entries must be divided between several sheets. Each name of the sheet, chosen according to the
previous rule, must be completed with a numeric index of the sheet corresponding to the ordinal of the part of
the items it describes. For example, a nomenclature composed of 80000 units will have the following sheets:
units1
and units2
, where the first one will contain 65536 units
and the second the remaining part of 14464 units.
The spreadsheet of nomenclature.xls
file can contain the following sheets:
Table 2.1. nomenclature.xls
file structure
Sheet name | Obligation | Description |
about | Mandatory | Contains general information about the package and its originator. |
version | Mandatory | Contains general details on the nomenclature described. |
units | Mandatory | Identifies the statistical units composing the nomenclature. |
names | Mandatory | Lists the names of the statistical units. |
hierarchy | Mandatory for hierarchical nomenclatures. Meaningless for non-hierarchical ones. | Establishes hierarchical relations between the units of the nomenclature. |
changes | Mandatory for new versions of supported nomenclatures. Meaningless for new (original) nomenclatures. | Lists the changes that characterize the described version of the nomenclature as compared with the previous one. |
derivations | Optional for derived nomenclatures. Meaningless for original ones. | Provides the information about the relationship between the units of a derived nomenclature and the ones of the original one. |
equivalence | Optional | Establishes relationship between the units of the described nomenclature and other supported nomenclatures. |
study_area | Optional | Establishes relationship between the units code and the country it belongs to. Aims at associating the units codes with a study area for the "where" filter of the ESPON Database Portal Search Query. |
The following sections give detailed descriptions of the contents of these sheets.
The about
sheet provides additional information about the originator of the
nomenclature description and some details on its use. This sheet must have the layout as shown in
Table 2.2 (examples of values are given in italics).
Table 2.2. about
sheet layout
A | B | |
1 | contact_name | Ronan Ysebaert |
2 | [email protected] | |
3 | package_version | v0 last updated 2017-05-31 |
4 | software_used | Open Office v 5.3 and Quantum GIS 1.7 |
5 | geometries_copyright | Produced by ESPON Database Project. Free public distribution through ESPON DB Web Application. |
6 | geometries_scale | 1:3000000 |
7 | additional_info | Please note that all the names of statistical units are now translated in esperanto (EO) and volapük (VO), the new official EU languages. |
The expected cells in this sheet are detailed below:
MANDATORY. The person who created the files in the package or the person responsible for their creation.
MANDATORY. The email address of the person mentioned by the previous property.
MANDATORY. The version of the package (may be the date of the last update) describing the nomenclature.
MANDATORY. The names and the versions of the software used
to create the nomenclature.xls
and the geometries files.
OPTIONAL. Information on the copyright of the geometries in the package. It must detail the source of production of the geometries and the restrictions on their distribution, if they exist.
OPTIONAL. Information on the scale, the level of details, the sizes of the files for the geometries.
The recommended scales to use are 1:3000000
, 1:10000000
, 1:20000000
and 1:60000000
(3M, 10M, 20M and 60M).
OPTIONAL. Any details that the originator of the package may consider as useful.
The data_format field has been removed since the revision 0.9 of these specifications. Please consult Appendix A for further information. |
The version
sheet gives general information about the nomenclature described by the package.
This sheet has the following layout (examples of values are given in italics):
Table 2.3. version
sheet layout example
A | B | |
1 | nomenclature_name | Nomenclature of Territorial Units for Statistics |
2 | nomenclature_acronym | NUTS |
3 | is_official | true |
4 | URL | http://www.example.org/nomenclatures/n/2019.pdf |
5 | version_name | 2019 |
6 | version_start | 2020-01-01 |
7 | version_previous | 2015 |
8 | similar_to | |
9 | derived_from | |
10 | dimension | |
11 | level |
The expected cells in this sheet are detailed below:
MANDATORY if this nomenclature is not supported yet.
OPTIONAL for new versions of an already supported nomenclature.
The nomenclature_name
property specifies the full name of the nomenclature.
MANDATORY
The nomenclature_acronym
property specifies the acronym used for the nomenclature.
For new versions of supported nomenclatures, the value of this cell must correspond to a supported nomenclature acronym
(see the Appendix B).
MANDATORY
The is_official
property shows if this nomenclature (version) is official. Official nomenclatures are
standardized and legally approved by regulations on statistical units. For example, the NUTS nomenclature is official because it is approved by
the European Commission and is published in the Official Journal of the European Union. If the nomenclature is not a part of a standard,
it cannot be considered as official.
MANDATORY
The URL
property specifies the link to the on-line publication of documents describing the nomenclature (version).
MANDATORY
The version_name
property specifies the name/title used to denote the nomenclature version described by the package.
MANDATORY
The version_start
property specifies the date when this nomenclature version comes in force.
MANDATORY if the package describes a new version of a nomenclature already supported by the Database.
OPTIONAL if it describes a new original nomenclature of statistical units, then this cell must be left blank.
The version_previous
property specifies the name/title of the version that is chronologically previous to the described one.
MANDATORY only if this is a new (original) nomenclature AND if its structure is similar to one of other supported nomenclatures.
OPTIONAL in other cases, then this cell should be left blank.
This field aims at referencing an already supported nomenclature which is similar in its structure: for example, EFTA (European Free Trade Association, see http://www.efta.int/ for further information), and CC (Candidate Countries) nomenclatures are similar to NUTS.
The expected pattern for a similar_to
item is
an already supported nomenclature acronym and its version, separated by a pipe "|" character. For example, if
EFTA nomenclature version 2003 was to be introduced for
the first time into the database, this cell would have had the value NUTS|2003
.
Several similar nomenclatures items can be listed, separated by a comma. Example: NUTS|2003,NUTS|2006
.
The derived_from
property is used in case when the nomenclature is derived (its units are built upon
the units of other supported nomenclature(s)). The value must refer to the nomenclature and the version from which the described nomenclature is
produced. For example, in the NUTS2-3 nomenclature, the units were produced from NUTS nomenclature, so this is specified as NUTS_1989
(for the NUTS of version 1989). If this is not a derived nomenclature, this cell must be left blank.
If there are more than one nomenclature from which the described one derives, each of them must be mentioned on a separate row by repeating the label of the "A" column and giving the reference to the nomenclature from which this one derives in the "B" column.
MANDATORY only for new (original) nomenclatures OR by a nomenclature version that brings changes in the hierarchy of levels.
OPTIONAL in other cases, then this cell must be left blank.
If set, this property must be repeated as many times, as there are dimensions in the hierarchical structure of the nomenclature. After each
dimension
row, there must be rows that declare the levels corresponding to this dimension. After them, another
dimension
row can appear to introduce the next dimension. Please see the example of the file for the UNEP (United Nations Environment Programme)
nomenclature below.
The order in which this property appears in the table is relevant: dimensions with higher priority must appear above dimensions with lower priorities. The priority of a dimension is defined by its importance in the list of all dimensions. For example, in UNEP nomenclature, geographical dimension has higher priority than the political one.
MANDATORY (at least one level) IF the dimension
property is set.
OPTIONAL only after each dimension
property OR after another level
property.
It introduces the identifier of a nomenclature level. All the levels belonging
to the same dimension must be listed after the respective dimension
property, beginning from the most general
level and ending by the most detailed one.
Since the revision 0.10 of these specifications, the definition of dimension and level has been reviewed, the levels_count and dimensions_count fields have been removed. Please consult Appendix A for further information. |
To give more detailed examples, the following sub sections propose two complete "version
" sheet layouts covering different nomenclatures.
The following layout describes the NUTS nomenclature of 2006 version, as if it were not yet supported by the Database, but the previous NUTS versions were already supported.
Table 2.4. version
sheet introducing NUTS 2006 nomenclature
A | B | |
1 | nomenclature_name | |
2 | nomenclature_acronym | NUTS |
3 | is_official | true |
4 | URL | http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2007:039:0001:0037:EN:PDF |
5 | version_name | 2006 |
6 | version_start | 2008-01-01 |
7 | version_previous | 2003 |
The property nomenclature_name
is empty because this nomenclature is a new version of the NUTS nomenclature,
already supported by the Database.
The property similar_to
is empty because the NUTS nomenclature is a base one (but there are other nomenclatures
that are similar to NUTS: in principle, this relationship is reversible).
The property derived_from
is empty because the NUTS nomenclature is not derived from any other.
In the NUTS 2006 version, there have not been any changes to the hierarchical structure of the nomenclature, so it is not necessary to
specify the properies that may follow the derived_from
one.
The next layout introduces the UNEP nomenclature of 2006 version, as if it was not yet supported by the Database. It would be the first supported version of this nomenclature.
Table 2.5. version
sheet introducing UNEP 2006 nomenclature
A | B | |
1 | nomenclature_name | United Nations Environment Programme |
2 | nomenclature_acronym | UNEP |
3 | is_official | false |
4 | URL | http://geodata.grid.unep.ch/extras/geosubregions.php |
5 | version_name | 2006 |
6 | version_start | 2006-01-01 |
7 | version_previous | |
8 | similar_to | |
9 | derived_from | ISO-3166 |
10 | dimension | geographical |
11 | level | Global |
12 | level | Region |
13 | level | Subregion |
14 | level | National |
15 | dimension | political |
16 | level | Sovereign |
17 | level | National |
The is_official
property is set to false
because, although this
nomenclature is widely used by the UNEP, no official documentation was found detailing it, apart from
the list of units available at the cited URL.
The version_previous
property is empty because it is the very first available version of the nomenclature.
The similar_to
property is empty because no other known nomenclatures use exactly the same
structure.
The derived_from
property is set to ISO-3166 because at the National level, the UNEP nomenclature
uses exactly the same units that are references by the ISO-3166 standard, including the digital units codes.
Then two dimensions are set:
one dimension named geographical
, composed of the following levels:
Global
Region
Subregion
National
one dimension named political
, composed of the following levels:
Sovereign
National
The geographical dimension is described before the political one because it has higher priority. The levels of each dimension are described from the most to the least general. The values used to set the |
The units
sheet gives the full and exhaustive list of territorial/statistical
units composing the nomenclature.
This sheet must respect the layout shown in Table 2.6 (examples of values are given in italics).
Details on the columns are given below:
unit_code
MANDATORY
Contains the codes of statistical units. The expected
data type is a character string
.
is_territorial
OPTIONAL
The column is_territorial
is a boolean value that shows if the statistical
unit of this row has a territory: some statistical units are not associated with particular
geographic area, for example the NUTS units having the codes **Z, **ZZ and **ZZZ. For the units
that do not have their own geographic area, the value in this column must be false
.
The expected data type is a boolean
.
Only false values in the |
Data constraints. Unit codes are identifiers of statistical units. They cannot be duplicated in this sheet.
The names
sheet contains the list of the names of all statistical units
referenced in the units
sheet. This sheet is necessary because each statistical unit
can have more that one name (official or inofficial) in more than one language. This multiplicity is
very difficult to layout on a single sheet.
Each statistical unit must have at least one name specified. This helps users to identify the unit without looking into the nomenclature documentation, because the code of the unit may provide no extra-nomenclature information about the statistical unit itself, while the name is more commonly used in everyday life.
The names
sheet must respect the layout shown in Table 2.7 (examples of values are given in italics).
Table 2.7. names
sheet layout example
A | B | C | D | |
1 | unit_code | unit_name | is_official | language_code |
2 | BE | BELGIQUE | true | FR |
3 | BE | BELGIE | true | NL |
4 | BE | BELGIUM | false | EN |
5 | BE1 | RÉGION DE BRUXELLES-CAPITALE | true | FR |
6 | BE1 | BRUSSELS HOOFDSTEDELIJK GEWEST | true | NL |
The aims of columns are detailed below:
unit_code
MANDATORY
References the code of the statistical unit
present into the units
sheet (see Section 2.3). All statistical units must have at least
one name specified, that is to say, all codes from the units
sheet be
associated with at least one name. The expected value in this column is a character string
that is also present in the units
sheet.
unit_name
MANDATORY
Contains statistical units names.
is_official
OPTIONAL
Shows if the name in the previous column is an official
name of the statistical unit. Empty values will be considered as true
values.
There may be more than one official name specified by unit, for example,
when it is officially used in several languages. Each of the official names of the unit must be
presented on a separate row. The expected value is a boolean
, true
or empty cell for official names, false
for non official ones.
For example, the official names of Regions in the the European Union for the NUTS 2006-EU27 nomenclature is available
on Eurostat Web Site [3].
language_code
MANDATORY
Specifies the language in which the name is given.
This code is a 2-characters ISO-639-1 code of the respective language, in upper or lower case.
The expected value is a character string
of two letters corresponding to an
ISO-639-1 language code. The official publication of the ISO-639-1 standard can be found on the
ISO site. The
full updated list of codes and language names is publicly available at
Wikipedia
Data constraints:
According to this layout, no more than one official or unofficial name can be specified for a statistical unit in the same language.
Any combination of unit_code
+
is_official
+ language_code
is unique in the table.
The hierarchy
sheet describes the hierarchy of statistical units inside the
nomenclature. If the nomenclature does not have a hierarchy of units, this file is not relevant.
Hierarchies of units depend on nomenclatures. The general structure of the hierarchy is defined in
the version
sheet. The hierarchy
sheet must refer to the
values present in the version
one.
Generally, a statistical unit can have one super-unit. For example, in NUTS nomenclature, any unit of NUTS3 level has a super-unit of NUTS2 level; NUTS2 units have parents in NUTS1 level, etc.
In some particular cases, a statistical unit can have more than one super-unit. This happens when the nomenclature defines several dimensions, which can not be merged. For example, the UNEP nomenclature [4] defines three levels in geographical dimension (Regional, Sub-regional and National) and two levels in political dimension (Sovereign and National). The lowest level (National) is shared between the two dimensions, and one unit code can have a super unit code for the geographical dimension and another super unit code for the political dimension.
The units of the top level of the hierarchy do not have super-units and must reference themselves in the table.
The hierarchy
sheet must respect the layout detailed
in Table 2.8 (examples of values are given in italics).
For this example, we suppose that:
the version
sheet
specifies a single default dimension for the mono-dimensional hierarchy used;
the levels 0, 1, 2 and 3 are specified (from top to bottom) in the version
sheet
for the default dimension.
Table 2.8. hierarchy
sheet layout example
A | B | C | D | |
1 | unit_code | dimension | level | super_unit_code |
2 | BE | 0 | BE | |
3 | BE1 | 1 | BE | |
4 | BE10 | 2 | BE1 | |
5 | BE100 | 3 | BE10 |
The aims of the columns are detailed below:
MANDATORY
References the code of the statistical unit
present in the units
sheet (see Section 2.3). In a hierarchical nomenclature, all statistical
units must have at least one super-unit specified, that is to say, all codes from the
units
sheet must be associated with at least one super-unit. Top-level
units reference themselves. The expected value in this column is a character string
that is also present in the units
sheet.
OPTIONAL
If not blank (mono-dimensional case), references a dimension defined in the version
sheet
(see Section 2.2.2).
If the nomenclature is monodimensional, the cells in this column can be left blank. Otherwise, the expected value must match
one of the values of the dimension
fields present in the version
sheet.
MANDATORY
Specifies the level to which belongs the unit.
The expected value must match one of the values of the level
fields
defined in the version
sheet for the given previous dimension
value.
MANDATORY
Specifies the code of the statistical unit
that is immediately superior to the one specified in the unit_code
column.
The expected value in this column is a character string
that is also present in the units
sheet.
The dimension and level labels have replaced the previous level_dimension and unit_level labels since the revision 0.10 of these specifications. Please consult Appendix A for further information. |
Data constraints. Any combination of unit_code
+
level_dimension
is unique in the table.
The changes
sheet is mandatory only if the nomenclature to integrate
represents a new version of a nomenclature already supproted by the Database. This sheet tracks the
changes that characterize the new version as compared with the previous one.
Before introducing the template of the sheet, the typology of statistical units evolution is presented here in order to explain the conventions to use.
The typology of statistical units changes is exhaustively described in [2]. The present section makes a brief introduction into it, with a certain adaptation to the ESPON database implementation.
Generally, there are three types of changes or events that may occur to a statistical unit
during its evolution in a nomenclature. This may be an existential
change,
a territorial
change or a non-territorial
one.
For each significant type of a unit's change the following sections specify the label to use
in the changes
sheet.
Existential changes concern life events in a unit's history. They occur when a statistical unit is created or terminated.
A statistical unit is considered to be created when it appears for the first time in the list of statistical unit of the nomenclature. It is considered to be terminated when it disappears from this list.
The creation of a unit may be absolute. This is the case when a new territory is added to the nomenclature, where no units existed before, or when it is impossible to determine which units of the previous version served as ancestors for the new unit. This type of the unit's creation must be labelled "new unit".
It can also be relative. It occurs when the new unit results from the modifications applied to the units that already existed in the previous version of the nomenclature.
The termination of a unit may be absolute. This is the case when a part of territory has been excluded in the new version of the nomenclature, or when it is not possible to determine which units appear in the new version in the area formerly occupied by the unit that was terminated.
It can also be relative. It occurs when new units are created in the area of the previous one, that is to say that the disappearing unit is an ancestor for one or more unit being created in the new version.
Territorial changes occur when the territory of a unit is modified, but the unit still exists in both the versions of the nomenclature. It continues to be considered as the same entity.
As for existential changes, territorial ones can be absolute and relative.
An absolute territorial change occurs when the bounds of the area covered by the unit is completely shifted, without intersecting with the previous territory. This case is very rare, but can occur theoretically.
A relative territorial change occurs when modifications are made on the unit and its neighbors, so as the bounds are no more the same in the new version.
Non-territorial changes occur when the unit is still considered as the same entity as in the previous version, but its name or code change in the new version. These changes are not accompanied by modifications of the unit's area.
A code change usually happens when the nomenclature incurs a harmonization after a series of modifications in previous versions.
A name change can happen between any versions and may be caused by different factors of historical, political, economic or other contexts.
Code and name changes may occur separately or simultaneously. In the first case, they must be labelled "code change" and "name change" respectively. In the second case, they must be labelled "code and name change".
Code and name changes are tracked only if they represent independent events, happening without existential changes of the unit. In fact, an existential change already implies a code and a name creation and termination.
Following the nomenclature of territorial units events ([2], page 148), the changes of statistical units may be represented as follows.
Merge changes happen when two or more units of the previous version are merged into another one. They must be labelled with "merge" keyword.
The resulting unit may be a new unit:
GU1 merges into GU3
GU2 merges into GU3
The resulting unit may be one of the units that participated in the merge:
GU2 merges into GU1
GU1 is changed by merge
Split changes happen when a unit is divided into two or more units in the new version. They must be labelled with the "split" keyword.
Several cases of split change are described below:
A split change can cause the original unit's termination:
GU3 is split into GU1
GU3 is split into GU2
If the original unit is not terminated in the new version, this is a case of extraction:
GU3 is changed by split
GU3 is split into GU4
Redistribution changes happen when at least one original unit continues its existance or disappears and partial territorial changes are made to all the units concerned. These cases are mixed situations of merges and splits, when it is difficult or impossible to define the main characteristics of the change. These changes must be labelled as "redistributed".
If a territorial change happens to two or more units, at least one of them disappears or at least one new unit appears in the new version, this is a reallocation change:
GU1 terminates by reallocation
GU2 changes by reallocation
GU3 is created from reallocation
If the territory of two or more units has been revised in the new version, but no units disappeared, neither appeared in the new version, it is the case of a rectification change:
GU1 changes by rectification
GU2 changes by rectification
The case-by-case analyzis made in the previous section allows to create a pattern to follow in order to make the trace of the nomenclature evolution. The example below cites the changes seen between NUTS 2006 and 2010 versions.
The changes
sheet must respect the layout shown in Table 2.9
(examples of values are given in italics).
To be parsed and taken into account, the |
Table 2.9. changes
sheet layout example
A | B | C | D | E | |
1 | unit_code_previous | unit_level_previous | unit_code_this | unit_level_this | change |
2 | DE41 | 2 | DE40 | 2 | merge |
3 | DE42 | 2 | DE40 | 2 | merge |
4 | GR | 0 | EL | 0 | code change |
5 | IE024 | 3 | IE024 | 3 | name change |
6 | GRZ | 1 | ELZ | 1 | code change |
7 | GRZ | 1 | ELZ | 1 | name change |
8 | ITC45 | 3 | ITC4C | 3 | split |
9 | ITC45 | 3 | ITC4D | 3 | split |
Details on the columns are given below:
unit_code_previous
MANDATORY (except for a new unit, e.g. when the change cell value is one of
created
, created from merge
or created from redistribution
).
References the code of the changing unit of the previous nomenclature version. It must be one of the unit codes already present in the version of the nomenclature supported by the Database.
unit_level_previous
MANDATORY
References the level label of the changing unit of the previous nomenclature version. It must be one of the unit levels already present in the version of the nomenclature supported by the Database. An empty value references the default level of a non-hierarchical nomenclature.
unit_code_this
MANDATORY (except for a termination, e.g. when the change cell value is one of
termination
, split
, or redistribution
).
Contains the code of the unit of the described version of nomenclature. It must be
also present in the units
sheet (see Section 2.3).
unit_level_this
MANDATORY
References the level label of the changing unit in this nomenclature version. It must be one of the unit levels declared in the version sheet. An empty value references the default level of a non-hierarchical nomenclature.
change
MANDATORY
Contains the literal (case-unsensitive) corresponding to the type of the change produced, previously described in the subsections on the typology of changes. Two simultaneous changes (example: the code change and the name change) must be mentioned on two lines (see lines 6 and 7 in Table 2.9). The expected literals are:
created
New unit creation event.
created from merge
New unit creation from merge event.
created from split
New unit creation from split event.
created from redistribution
New unit creation from territorial redistribution event.
name change
Name change event.
code change
Code change event.
territory change
Territorial change event.
territory merge
Territorial change event caused by a merge.
territory split
Territorial change event caused by a split.
territory redistribution
Territorial change event caused by a redistribution.
termination
Unit termination event.
merge
Unit termination by merge event.
split
Unit termination by split event.
redistribution
Unit termination by Unit termination by redistribution event.
Data constraints. Any combination of unit_code_previous
+
unit_code_this
+ change
is unique in this table.
The equivalence
sheet is optional and specifies the units of the
nomenclature that have equivalences in other nomenclatures supported by the Database. These
links with other nomenclatures are used to make automatic conversions of statistical data
between nomenclatures.
An equivalence is a particular case of a derivation, whose the relationship value is |
The condition of the validity of this file is the support of the referenced nomenclature and its version by the Database.
It is not necessary to establish equivalence links with the units of other versions of the same nomenclature. This will be automatically done during the integration of the nomenclature into the library.
Two statistical units are considered to be equivalent if they are equal semantically (a country referenced by a nomenclature may be equivalent to a country in another one, but not to a group of countries) and geographically (both the units have the same shape and boundaries).
The equivalence
sheet must respect the layout shown in Table 2.10
(examples of values are given in italics).
Table 2.10. Equivalence
sheet layout example
A | B | C | D | E | F | |
1 | unit_code | unit_level | equivalent_unit_code | equivalent_unit_level | equivalent_nomenclature | equivalent_version |
2 | BE | 0 | BE | 1 | ISO3166 | 2006 |
3 | BE | 0 | W11122 | 5 | WUTS | 2007 |
4 | BE | 0 | 88 | 3 | UNEP | 2006 |
Details on the columns are given below:
unit_code
MANDATORY
References the code of the statistical unit
present into the units
sheet. Theoretically, a statistical unit can
be equivalent to any number of statistical units in other nomenclatures.
The expected value in this column is a character string
that is also present in the units
sheet (see Section 2.3).
unit_level
MANDATORY
References the level of the statistical unit present into the version
sheet.
An empty value references the default level of a non-hierarchical nomenclature.
equivalent_unit_code
MANDATORY
References the code of the equivalent unit valid in another nomenclature. The expected value in this column must be a statistical unit code already registered by the Database.
equivalent_unit_level
MANDATORY
References the level of the equivalent unit. An empty value references the default level of a non-hierarchical nomenclature.
equivalent_nomenclature
MANDATORY
References the nomenclature to which belongs the unit code in the previous column. The expected value in this column must be an acronym of a nomenclature supported by the Database [1].
equivalent_version
MANDATORY
References the version of the nomenclature mentioned in the previous column, where the equivalence between the units is established. The expected value must be an identifier of a nomenclature version supported by the Database [1].
Data constraints. Any combination of unit_code
+
equivalent_nomenclature
+ equivalent_version
is unique
in this table.
The derivations
sheet contains information about the units of other nomenclatures
that have been used to create a new nomenclature. This file can exist only for derived nomenclatures (produced
using statistical units of other nomenclatures) and is optional. Establishing links with original units will
allow the application to convert data between related nomenclatures.
Tree types of links can be established between the original and the derived units in the current version of the
specification. These types must be indicated in the derivations
sheet. They are the
following:
The derived unit can be a union of two or more original units, representing an aggregation or a generalization of the original units. In this case (parent relationship), the label "INCLUDES" must be used.
The derived unit can be equal to the original. The label to use is "EQUALS".
Two or more derived units can represent parts of the original unit, so as the union of the derived units is equal to the original one. In this case (child relationship), the label to use is "INCLUDED".
The derivations
sheet must respect layout shown in Table 2.11 example.
In this example, the original units (BE
, NL
and LU
) are used by
a derived nomenclature to form an aggregated unit having the code BENELUX
in this custom nomenclature.
The aggregated unit can be characterized as a super-structure, a parent for the original ones, existing inside the described
nomenclature.
Table 2.11. Derivations
sheet layout example
A | B | C | D | E | F | G | |
1 | unit_original_code | unit_original_level | unit_original_nomenclature | unit_original_version | unit_derived_code | unit_derived_level | relationship |
2 | BE | 0 | NUTS | 2003 | BENELUX | includes | |
3 | NL | 0 | NUTS | 2003 | BENELUX | includes | |
4 | LU | 0 | NUTS | 2003 | BENELUX | includes |
The lines of the tables must be read and understood as follows:
the unit encoded |
Details of columns are given below:
unit_original_code
MANDATORY
Contains the code of the original unit used to produce the unit of the described nomenclature. The original unit must be already registered in the Database and its code must be valid.
unit_original_level
MANDATORY
Contains the level label of the original unit used to produce the unit of the described nomenclature. The original level must be already registered in the Database and its code must be valid. An empty value references the default level of a non-hierarchical nomenclature.
unit_original_nomenclature
MANDATORY
Contains the literal of the nomenclature where the original unit exists. This column is added to the file layout because a custom nomenclature can derive from more that one others.
unit_original_version
MANDATORY
Contains the title of the version of the nomenclature where the original unit exists. This column is added to the file layout because a custom nomenclature can derive from more that one others.
unit_derived_code
MANDATORY
Must contain the code of the unit from the described nomenclature version
(it must be present in the units
sheet, see Section 2.3).
unit_derived_level
MANDATORY
Contains the level label of the present unit. An empty value references the default level of a non-hierarchical nomenclature.
relationship
MANDATORY
Characterizes the link between
the units referenced in the currently described nomenclature (unit_derived_code
column) and the
units described by the three columns A (existing unit code) B (existing nomenclature code) C (existing nomenclature version).
The possible values in this relationship
column are:
INCLUDES
- if the current derived unit fully includes (is a parent of) the original unit.
EQUALS
- if the current derived unit and the original unit are equal.
INCLUDED
- if the current derived unit is included (is a child of) in the original unit.
INTERSECTS
- if the derived unit and the original one intersect without full inclusion or equality.
This sheet is optional. It has been integrated to these specifications for the particular case of the UMZ nomenclature.
The "where" filter of the Search Query in the ESPON Database Portal allows the user to select
a study area based on the names of the countries (example: "FR", "UK") or on a set of countries (example: EU 28).
In the NUTS nomenclatures definitions, the unit code provides in its prefix the country code it belongs to (example: "FR123" is in France). For the UMZ nomenclature,
the unit code is an integer: this study_area
sheet aims at associating a country code for each unit code.
Consequently, each unit code of the nomenclature can be associated to a study area.
This sheet must respect the layout shown in Table 2.12.
Details on the columns are given below:
Data constraints. Unit codes are identifiers of statistical units. They cannot be duplicated in this sheet. The country codes must be given with 2 characters.
The geometries.zip
file must contain the geometries of all the units of the nomenclature
that have territorial representation. This file is mandatory. It is an archive wrapping several files
required by the ESRI Shapefile format (see below). This format is mandatory for nomenclature geometries in the context
of the present specification.
Each geometry present into the shapefile must be associated with an attribute corresponding to the code
of the statistical unit. The name of the attribute is unit_code
.
The geometries_scale
property in the about
sheet (nomenclature.xls
file)
must detail the level of the generalization of the geometries defined in the present file.
Data constraints. Each geometry must correspond to a tuple unit_code
+ is_territorial
in the units.xls
file, where the value of the
is_territorial
property is true
.
According to Wikipedia Shapefile page (last visit: 2011-10-27), the description of a geometry with the ESRI shp file format must be composed of the following set of files:
Mandatory files:
.shp
: shape format; the feature geometry itself.
.shx
: shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly.
.dbf
: attribute format; columnar attributes for each shape, in dBase IV format.
Optional files:
.prj
: projection format; the coordinate system and projection information, a plain text file describing the projection using well-known text format.
etc (other optional files with their specific extensions exist).
This appendix describes the main changes and evolutions of these specifications, regarding mainly the expected content of the nomenclature description xls file.
Revision 0.16 (2014-07-23)
The ESPON NUTS 2010 extended Nomenclature has been updated to take into account
the changes in the Eurostat official publication for NUTS 2010-2013 [7].
In this document, the spatial units in Croatia identified by the codes HR01
and HR02
have been merged to an unique spatial unit whose code is now HR04
.
As listed in Table A.1, the codes of the spatial units at sub-level of HR01
and HR02
are
now prefixed with HR04
.
Previous version | Updated version (July 2014) | ||
---|---|---|---|
unit code | unit name | unit code | unit name |
HR01 | Sjeverozapadna Hrvatska | ||
HR02 | Središnja i Istočna (Panonska) Hrvatska | ||
HR04 | Kontinentalna Hrvatska | ||
HR011 | Grad Zagreb | HR041 | Grad Zagreb |
HR012 | Zagrebačka županija | HR042 | Zagrebačka županija |
HR013 | Krapinsko-zagorska županija | HR043 | Krapinsko-zagorska županija |
HR014 | Varaždinska županija | HR044 | Varaždinska županija |
HR015 | Koprivničko-križevačka županija | HR045 | Koprivničko-križevačka županija |
HR016 | Međimurska županija | HR046 | Međimurska županija |
HR021 | Bjelovarsko-bilogorska županija | HR047 | Bjelovarsko-bilogorska županija |
HR022 | Virovitičko-podravska županija | HR048 | Virovitičko-podravska županija |
HR023 | Požeško-slavonska županija | HR049 | Požeško-slavonska županija |
HR024 | Brodsko-posavska županija | HR04A | Brodsko-posavska županija |
HR025 | Osječko-baranjska županija | HR04B | Osječko-baranjska županija |
HR026 | Vukovarsko-srijemska županija | HR04C | Vukovarsko-srijemska županija |
HR027 | Karlovačka županija | HR04D | Karlovačka županija |
HR028 | Sisačko-moslavačka županija | HR04E | Sisačko-moslavačka županija |
Revision 0.15 (2013-05-31)
The Appendix B has been updated to take into account the reference to the new geographical objects that have been available in the ESPON Database since the June 2013 delivery: UMZ, FUA and MUA nomenclatures.
Revision 0.14 (2013-04-15)
The NUTS and EFTACC nomenclatures are now deprecated, they have been replaced by the extended NUTS nomenclature. The Appendix B has been upated to be consistent with the changes regarding the integration of this "extended NUTS" nomenclatures.
For the integration of the UMZ nomenclature into the database, it has been necessary to add a new sheet
entitled study_area
to these specifications. This sheet
is described in Section 2.9.
Revision 0.13 (2013-01-18)
The Appendix B has been refactored to clearly identify the integrated nomenclatures in the ESPON Database, e.g. the nomenclatures that can be referenced from ESPON TPGs Key Indicators Datasets.
Revision 0.12
Consequent modifications in the sheets entitled Changes, Derivations and Equivalences: in order to avoid ambiguity about the referenced spatial units, the levels must now systematically be mentioned. Indeed, for example in the NUTS nomenclatures versions 1995 and 1999, units codes may be duplicated on several levels.
Revision 0.11
In the sheet entitled Changes, the list of possible spatial events types has been updated and completed. Moreover, the combinations of simultaneous atomic changes (example: code and name change) must be mentioned on several lines, e.g. each line of the table must mention only one atomic change. See Table 2.9 and the possible values for the change column.
In the sheet entitled Derivations, the list of possible values for the
relationship
column has changed. See Table 2.11 and the possible values for the relationship column.
Revision 0.10
Important changes have affected the layout and content of the nomenclature description xls file since the revision 0.9 of the document:
In the sheet entitled Version, the definition of levels and dimensions has been reviewed. The levels_count and dimensions_count have been removed. Please consult Section 2.2 for the new expected layout.
In the sheet entitled Hierarchy, the column header label level_dimension
has been renamed dimension
, the column header label unit_level
has been
renamed level
, please consult Section 2.5 for the new expected layout.
Revision 0.9
In the sheet entitled About, the data_format
field has been removed. Please consult the
Section 2.1 for the new expected layout.
This appendix proposes a set of examples of Nomenclatures Input Packages as attached resources.
This example is an empty template showing an example of the expected input nomenclature description package. It proposes a
nomenclature.xlt
empty nomenclature description file which is valid in terms of layout and expected labels,
but without any value. It can be used to build a new nomenclature description from scratch.
Available nomenclatures in the ESPON Database:
"Extended" NUTS revisions (including EU 28 and EFTA countries):
Nomenclature_NUTS_extended_1999.zip
NUTS (Nomenclature of Territorial Units for Statistics) nomenclature for the delineation revision 1999.
Nomenclature_NUTS_extended_2003.zip
NUTS (Nomenclature of Territorial Units for Statistics) nomenclature for the delineation revision 2003.
Nomenclature_NUTS_extended_2006.zip
NUTS (Nomenclature of Territorial Units for Statistics) nomenclature for the delineation revision 2006.
Nomenclature_NUTS_extended_2010.zip
NUTS (Nomenclature of Territorial Units for Statistics) nomenclature for the delineation revision 2010.
Particular NUTS 2/3 2006 Nomenclature.
The following "urban" nomenclatures have been integrated in the ESPON Database since the June 2013 delivery:
Geometries of Urban Morphological Zones (UMZ) for cities over 10 000 inhabitants (4300 statistical units in Europe). This nomenclature covers the ESPON area (EU 28 + EFTA) and Western Balkans.
For the needs of the study area filters of the ESPON Database Portal Search Query interface, an additional
sheet has been added to the UMZ nomenclature definition xls file:
|
Nomenclature for the Functional Urban Area (FUA) geographical object.
Nomenclature for the Morphological Urban Area (MUA) geographical object.
Other nomenclatures given as examples (not intented to be available in the ESPON Database):
The geometries are excluded from these publicly available packages: the expected |
NUTS (EU 27 only) revisions
NUTS (Nomenclature of Territorial Units for Statistics) nomenclature for the delineation revision 1995.
NUTS (Nomenclature of Territorial Units for Statistics) nomenclature for the delineation revision 1999.
NUTS (Nomenclature of Territorial Units for Statistics) nomenclature for the delineation revision 2003.
NUTS (Nomenclature of Territorial Units for Statistics) nomenclature for the delineation revision 2006.
NUTS (Nomenclature of Territorial Units for Statistics) nomenclature for the delineation revision 2010.
EFTACC (European Free Trade Association and Candidate Countries) revisions:
EFTA&CC (European Free Trade Association And Candidate Countries) nomenclature for the delineation revision 1999.
EFTA&CC (European Free Trade Association And Candidate Countries) nomenclature for the delineation revision 2003. This version is intermediate between 1999 and 2003: it contains all the units of EFTA1999, but no Candidate coutntries units that were included into the NUTS 2003 nomenclature.
EFTA&CC (European Free Trade Association And Candidate Countries) nomenclature for the delineation revision 2008.
European Union evolutions nomenclatures examples
The following archives propose nomenclatures descriptions examples based on the evolutions of the European Union from 6 to 27 countries:
Nomenclature_EU_6.zip
,
Nomenclature_EU_10.zip
,
Nomenclature_EU_11.zip
,
Nomenclature_EU_13.zip
,
Nomenclature_EU_15.zip
,
Nomenclature_EU_25.zip
and
Nomenclature_EU_27.zip
.
Nomenclature_ISO3166-1_2006-VI-10.zip
Based on the ISO 3166 codes for the representation of names of countries and their subdivisions [6], this archive proposes a valid descripton of this nomenclature according to the present specifications.
UNEP (United Nations Environment Program) nomenclature 2006 example. Based on info available on the UNEP Web Site [4]
[1] ESPON Data and Metadata Specifications. ESPON Database Portal (last visit: 2013-01-18) .
[2] Modèles et méthodes pour l’information spatio-temporelle évolutive. Thèse soutenue le 22 septembre 2011 à l'Université de Grenoble. Full text in PDF (last visit: 2011-09-30) .
[3] Regions in the European Union. Nomenclature of territorial units for statistics NUTS 2006 / EU-27. Edition 2007. ISSN 1977-0375. Full text in PDF (last visit: 2012-03-13) .
[4] UNEP Environmental Data Explorer. Search - Map - Graph - Download. http://geodata.grid.unep.ch/extras/geosubregions.php (last visit: 2012-03-13) .
[5] ESRI Shape File Technical Description. An ESRI White Paper - July 1998. Full text in PDF (last visit: 2012-03-23) .
[6] ISO 3166 Maintenance agency (ISO 3166/MA) ISO's focal point for country codes. http://www.iso.org/iso/country_codes.htm (last visit: 2012-03-30) .
[7] NUTS 2010 - NUTS 2013 (Excel file). Publication (last visit: 2014-07-18) .
This document is part of the ESPON 2013 Database Phase 2 project, also known as M4D
(Multi Dimension Database Design and Development).
It was generated on the 2014-12-19 17:32:40, from the sources of the m4d
forge imag project at the svn rev 2420.
The main authors of this document are Anton Telechev and Benoit Le Rubrus (LIG STeamer), with the collaboration of UMS RIATE and LIG STeamer M4D Partners.
For any comment question or suggestion, please contact <[email protected]>
.
Colophon
Based on DocBook technology
[1], this document is written in XML format, sources are validated with DocBook DTD 4.5CR3,
then sources are transformed to HTML and PDF formats by using DocBook xslt 1.73.2 stylesheets.
The generation of the documents is automatized thanks to the docbench
LIG STeamer project that is based on Ant [2],
java [3],
processors Xalan[4]
and FOP [5].
Note that Xslt standard stylesheets are customized in order to get a better image resolution in PDF generated output for admonitions icons: the generated sizes
of these icons were turned from 30 to 12 pt.
[1] [on line] DocBook.org (last visit: July 2011)
[2] [on line] Apache Ant - Welcome. Version 1.7.1 (last visit: July 2011)
[3] [on line] Developer Resources For Java Technology (last visit: July 2011). Version 1.6.0_03-b05.
[4] [on line] Xalan-Java Version 2.7.1 (last visit: 18 november 2009). Version 2.7.1.
[5] [on line] Apache FOP (last visit: July 2011). Version 0.94.