ESPON 2020 Core Data Updating Strategy

Introduction

The Updating module is another application in the ESPON Data Portal that allows a user to insert the detected virtual changes into the final database. That module is fully integrated in the ESPON Database and it takes profit of other applications that have been developed in another working packages. All the source code can be found in the official git repository in a module called Updating[1] . Two user expert roles can be derived from Updating module and Updating strategy: one role to run server scripts and another role to update indicators in a local machine with Excel Updating Suite.

Required python packages:

  • python == 3.5.2
  • django == 1.11.22
  • psycopg2 == 2.7.1
  • pandas == 0.22
  • xlwings == 0.15.1
  • xlsxwriter == 1.0.5
  • xlrd == 1.1.0.
  • jinja2 == 2.10.1

[1] https://git.unepgrid.ch/espon-egtc/edp/src/branch/update_base_indicator/django-edp/Updating

Updating tables

To make the Updating module work properly, five tables under the updating namespace in the ESPON database are needed: tdta_metadata, tdta_eurostat, tdta_missing_pattern, tdta_nulls y tdta_report. The three first tables are the most important because they store data that is relevant for a correct working of the tool; last two tables are optional

  • tdta_metadata
    • This table contains information that is relevant to a given ESPON indicator, basically metadata and a unique identifier to group multi indicators
  • tdta_eurostat
    • This table contains information about Eurostat table names as well as parameters that are needed to extract the counterpart ESPON indicators from those datasets
  • tdta_missing_pattern
    • This table contains information related to missing patterns, i.e. a missing matrix as well several parameters to automatize the updating procedure in a local workstation
  • tdta_report
    • This table contains a detailed report about the updates for a given indicator, providing the number of rows and affected tables in the ESPON database
  • tdta_nulls
    • This table contains information about detected nulls for a given indicator

Updating command scripts

The Updating module also contains seven Django commands, namely python scripts, to be launched by an expert user in the ESPON server via line commands. The name of each script reflects the order of execution. These scripts are located in a folder named management and the name of each script reflects the order of execution. Moreover, there are other scripts useful in different stages of the updating procedure. The location folder and the main purpose of different scripts are detailed in Table 1.

Table 1: Command scripts (in blue) and auxiliary functions (in green) employed in Updating module

File name

Folder

Description

1st_insert_metadata.py

management

This script will populate the tdta_metadata table

2nd_download_eurostat.py

management

This script will download the Eurostat tables from Eurostat services and store those tsv files in a private folder in the server

3rd_insert_eurostat.py

management

This script will populate the tdta_eurostat table

4th_insert_nulls.py

management

This script will populate the tdta_nulls table

5th_insert_mp.py

management

This script will populate the tdta_missing_pattern table

6th_create_EUS.py

management

This script will build the Excel Updating Suite as a zip package

7th_importer2ESPONDB.py

management

This script will update several tables: tdta_data_item, trel_data_item_source, trel_data_item_method and tdta_metadata

BaseIndicator.py

 

This file contains a class to get data from ESPON database in a homogeneous way, irrespective from their nature indicator (single, multi).

eurostat_equivalences.py

data

This file contains a dictionary of useful parameters to get the counterpart values from Eurostat tables

utils_fn.py

utils

A collection of utilities in a module for several purposes: transform both ESPON and Eurostat data to pandas-like structures, helpers functions useful to another functions in different modules

nuts_related_fn.py

utils

A collection of functions to transform NUTS units in a hierarchical way, specially to transverse ESPON time series and generate relationships between rows

Table 2: Parametrized templates to generate at runtime the macros for the Excel Updating Suite

File name

Folder

Description

approval_and_launcher.tpl.py

macros

Parametrized script to launch detected virtual changes into ESPON Excel sheets

diff_dataframes_man_chg.tpl.py

macros

Parametrized script to start a new manual updating session and to make differences between ESPON time series

outliers.tpl.py

macros

Parametrized script to launch an Outlier detection method on initial/modified ESPON data

save_local_db.py

macros

Parametrized script to save changes into a local database file

spatial_matrix.tpl.py

macros

Parametrized script to compute spatial matrix

thematic_matrix.tpl.py

macros

Parametrized script to compute thematic matrix

Updating folders

To plug this module into the Database Portal an accessible folder in the server with writing/reading permissions is needed. The files generated by this application will be stored in this folder, in a nested structure with paths relative to root directory,  to be further retrieved on demand by the application.

  • eurostat
    • The location of the folder in which datasets from Eurostat web services are downloaded. Not empty
  • macros
    • The location of the folder in which macros are generated for creating the Excel Updating Suite. A nested subfolder structure is generated. Initially empty.
  • eus
    • An initially empty folder but EUS suite is generated in a subfolder called eus, as well as xlsx files are stored in another subfolder called xslx to make the zip automatically. A nested subfolder structure is generated to get zip files per country basis or per all countries
  • db_updates
    • The location of the folder in which user uploads database dumps in sqlite3 format
  • db_reports
    • The location of the folder in which a report is generated in text format is generated when user launch updates files against the ESPON database. Initially empty

How to use the Updating module

After connecting to ESPON server, an expert user will be able to launch the command scripts. A list of all ESPON Base indicators with their unique identifiers (ind_id) as stored in he ESPON database can be found in Table 4.

The execution procedure will be as follow:

Run 1st_insert_metadata.py script

python manage.py 1st_insert_metadata –indicator_id <ind_id>

This script will read metadata from ESPON 2020 Database and will populate the majority of all fields in the tdta_metadata table. This is achieved through a BaseIndicator class (in Updating/BaseIndicator.py) that populates all fields except for number_of_null

Run 2nd_download_eurostat.py script

python manage.py 2nd_download_eurostat –indicator_id <ind_id>

This script will start the downloading process of tsv files from Eurostat downloading services[1]:

  1. gets the name of the Eurostat dataset as read from a EUROSTAT_EQUIVALENCES dictionary key (eurostat_file) in eurostat_equivalences.py script
  2. checks the lastest version of downloaded tsv file (by reading the metadata.txt file with latest date). If current date (Y-M-Dformat), tsv file is not downloaded; otherwise, is downloaded
  3. if downloaded, moves those tsv files to eurostat directory with a created directory according to some categories (DEMOGRAPHY, ECONOMY, EDUCATION, EMPLOYMENT, ENERGY, ITS, R&D, SOCIETY)

[1] https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&downfile=data/

Run 3rd_insert_eurostat.py script

python manage.py 3rd_insert_eurostat –indicator_id <ind_id>

This script will populate the tdta_eurostat table that will be used in the Updating application to get data from Eurostat and compare with pair-wise ESPON indicator so as to get the so-called virtual changes (later with 6th_create_EUS.py script)

Run 4th_insert_nulls.py script (optional)

python manage.py 4th_insert_nulls --indicator_id <ind_id> --country_code <ctr_code>

This script will populate the tdta_nulls table that will be useful to get an idea of existing nulls in Base Indicators. Moreover, a field in tdta_metadata called number_nulls is populated after this scripts is run.

 Run 5th_insert_mp script

python manage.py 5th_insert_mp --indicator_id <ind_id> --country_code <ctr_code> This script will populate the tdta_missing_pattern table that will be useful to get the missing pattern for a given ESPON indicator and a given country. This information will be very valuable later, when 6th_create_EUS.py script is run. One completed, there is no need to re-run until a new bunch update of Base Indicator is injected in ESPON database.  These five first scripts have only to be launched once and this is the only requirement to start with the updating procedure.

 Run 6th_create_EUS.py script

 python manage.py 6th_create_EUS --indicator_id <ind_id> --country_code <ctr_code> --create_zip True

This script will create the so-called Excel Updating Suite that consists of a zip file with an Excel Dashboard and a bunch of files/macros, gathering information from Updating database (previous steps). The location of created zips is eus/zips folder, with either subpath to country-basis folder (for instance, inds/145/By_Country/) if country_code parameter, or subpath to all-country-basis folder (for instance, inds/145/All_countries/).

These macros are generated from parametrised python templates that are located in the macros/templates folder of the  Updating module. See Table 2

Once expert user has built the EUS zip files for all countries for a given indicator in the server, an accessible point to be downloaded will be ready. With all Excels files and macros in his local machine, the updating procedure controlled by an expert user can be start in a local workstation following the workflow as specified in the Table 3.

When this updating procedure is finished, a local sqlite3 file with all virtual changes for all the countries should be generated (see step 12, Table 3) and the expert user should upload that file to a folder (db_updates) in the ESPON server.

Run 7th_importer2ESPONDB.py script

python manage.py 7th_importer2ESPONDB --indicator_id <ind_id>

This script will read the uploaded sqlite3 file and all virtual changes will be integrated in the final ESPON database. A txt report is generated and stored in database containing relevant information about data to be inserted in ESPON database.


Table 3. Sequential steps in the Updating workflow to updating a base indicator

#Step

Action

Description

1

Preparation of the data for the update (creation of the Excel Updating Suite)

After selecting the indicator to be updated, a process is run on the server to generate an Excel Updating Suite for each country, containing, in different sheets, the current indicator figures, the virtual changes derived from the comparison with Eurostat data, and a specific sheet with macro buttons to launch the following steps. Those Excel sheets are downloaded from the server within a ZIP file.

2

Detection of potential changes from Eurostat data, which are compliant with the spatial and thematic constraints of the specific indicator, when estimated data are involved.

After unzipping the Excel sheets, the process starts with one of the countries. In order to detect the potential and compliant changes, the corresponding macro should be run from the Excel Updating Suite, i.e. by pressing the Launch button

3

Creation of spatial and thematic validation matrices for the whole time-series.

In order to create the matrices, the corresponding buttons (Spatial matrix and Thematic matrix) in the Excel Updating Suite should be pressed.

4

Creation of supportive charts (line plots) of the indicator time-series.

Optional. In case of need, the expert in charge of the update can generate automatic line plots of the data, to have a graphical overview of the indicator time-series by pressing the Charts button.

5

Review of the resulting matrices to decide eventually if potential changes should be integrated or rejected.

The matrices provide the deviation of potential changes with regard to the spatial and thematic rules. It is up to the expert in charge of the update, to decide whether a specific deviation is acceptable or not.

6

Manual editing of potential changes to be rejected or modified.

All the potential changes not edited (or removed) in the corresponding Excel sheet will be eventually actual changes in the database. If some changes are considered as unacceptable, should be removed from the corresponding Excel sheet at this stage.

7

Detection of potential outliers in ESPON time series.

In case of need, the expert in charge of the update can launch two algorithms to detect potential outliers in the ESPON time series by pressing the Outliersbutton.

8

Save potential changes to the local DB.


By means of the Save button, the potential changes that remain in the Excel sheets are stored in a local database.

9

Introduction of additional figures from other sources and save the changes to the local DB.


There is the possibility to add manually potential changes coming from other sources (including estimated values). This is done in this step by pressing the corresponding buttons (Start manual session and Detect manual changes). If manual changes have been introduced, the potential changes need to be saved to the local DB again.

10

Repeat processes 1 to 9 for all the countries covered by the specific indicator.


11

Prepare a single file for the specific indicator including all the countries.


12

Upload the file to the ESPON DB Server and actually update the indicator.



Table 4. ESPON Base Indicators and unique indicator identifier as stored in ESPON database

Category

Indicator

ind_id

Demography 

Population, total, by broad age (0-14, 14-65, >65) groups and gender   

145

Life expectancy, by age, gender   

115

Population, total, by broad age groups (by 5s) and gender

157

Old age dependence ratio

223

Gender imbalances         

218

Ageing index      

215

Deaths by age groups and gender (NUTS2)

1

Deaths by age groups and gender (NUTS3) 

58

Live Births         

313

Total Population

314

Total Deaths

315

Natural Change of population

311

Total Net Migration          

312

Employment

Employment rates by sex & broad age groups (%)

244

Employment by sex & broad age groups (1000)

256

Employment by age, economic activity (Econ. sections C, F, I)

733,738,763

Unemployment by sex & age (1000)

226

Unemployment by sex & age (%)   

340

Employment by age, economic activity (Econ. sections A, F and K)

277

Long term unemployment

316

Education

Early leavers from education and training (18-24 years) by sex

308

Participation rate in education and training (age group 25 to 64 years)

304

Population aged 25-64, 30-34 by educational attainment level, sex (%)          

282

Economy

Disposable income of private households

317

Gross value added at basic prices, million EURO

319

Households with broandband access (percentage of households)

318

Gross domestic product (GDP) at current market prices by Million PPS

321

Gross domestic product (GDP) at current market prices by Million Euro

322

Gross domestic product (GDP) at current market prices by Purchasing Power Standard per inhabitant

320

Energy

Share of energy from renewable sources (in gross final energy consumption)

324

Greenhouse gas emissions by source sector (EEA)   

327

Primary energy consumption

325

Energy intensity of the economy

326

Society

People at risk of poverty or social exclusion (% of total population)

328

Severe material deprivation rate by NUTS2 regions (% of population)

331

People living in households with very low work intensity (population aged 0 to 59 years)         

330

At-risk-of-poverty rate by NUTS2 regions (% of population)

329

R&D

Human resources in science and technology

334

Patent applications to the EPO by priority year by NUTS 3 regions per million inhabitants

333

Total intramural R & D expenditure (GERD) by sectors of performance (Percentage of GDP)

332

Total R&D personnel and researchers by sectors of performance, sex (Percentage of active population)

335

ITS

Individuals who used the internet for interaction with public authorities           

337

Individuals who have never used a computer (% of population)

338

Individuals who ordered goods or services over the Internet for private use (% of population)

336

In this Page

On this section


|