Data checking and insertion

Checking process

Introduction

During Data upload, various checking are launched, here is the description of the procedural steps.

If the process is launched by the upload interface displaying result in a dedicated window, the process can be launched manually for testing purpose:

cd /var/www/edp/.../services/dataimport/
../../../../django-edp/manage.py shell < test_analys.py

#indic_id variable allow to select the indicator for which the checking should be launched.

It calls Data class of file.py script as well as in the usual process launched once data upload is called!

Process description

First steps are the read_excel function to input the uploaded file.

  • Variables and XLS table reading

Then several variables are setup. Two are particularly important:

  • The various information and messages are stored in a nested dictionary variable:
self.checkinfo = { 'structure': { 'errors': { }, 'warnings': { }, 'data': { 'Number of columns':'', }, }, 'log': '' }

This information will be stored in json database field (tdta_file.analysis) and then displayed in data upload interface.

Notice that this is the result of this json which allow or not to launch the next process (insertion) only if error is empty!

  • A specific array is reserved to store data ready to insert in database at the end of the checking process:
ForInsert[]

I
f no basic structural errors are detected in the sheet the process can continue. Various functions extract data from XLS file and metadata contained in PostgreSQL database and compare them. This results to warnings, errors or simple information stored in a dedicated nested dictionary variable before to be stored in a dedicated PostgreSQL table (tdta_datafile.analysis) and display in the upload interface to show the various checking messages and information.
A detailed information is available in Github at: https://git.unepgrid.ch/espon-egtc/edp/src/branch/master/django-edp/edp_damf/services/dataimport/README.md

Insert data process

Introduction

Once the complete checking process is done and no errors remain, the described insertion process is called.

It can be tested or launched manually through test_analys.py script (this launch first the complete checking process). (declare indic_id)

cd /var/www/edp/&/services/dataimport/ 
../../../../django-edp/manage.py shell < test_analys.py

Process description

Various functions are called by Insert class of db.py script.

During checking the variable ForInsert[] is storing the various data to be inserted. During the normal application use (not manual launch), this information is stored in database in a json field (tdta_datafile.analys).

At the end of the insertion process a complementary function check and complete eventual missing data.

Nulls and Missing data particular treatment

During upload, checking is ensuring that no data can be entered for territorial units not existing in the spatial extensions declared in metadata. But user can provide line without data or no line for a territorial unit, both mean null value.

In order to be relevent in database and avoid also no geometry in OGC services or no line in CSV export, it was decided to complete data table with all expected row. For example, if an indicator is declared as EU28 spatial extension for NUTS0. But the user upload an XLS file without France data. Checking will give a warning of completude, but the user will be able to accept. So after the insertion, the system would check and insert missing this missing row in tdta_dataitem table but without anvalue both in txt_value and decimal_value.

To test or launch as batch manually it can be done through:

cd /var/www/edp/&/services/dataimport/ 
../../../../django-edp/manage.py shell < test_missing.py

A detailed information is available in Github at: https://git.unepgrid.ch/espon-egtc/edp/src/branch/master/django-edp/edp_damf/services/dataimport/README.md