Skip to content

mini-kep/parsers

Repository files navigation

Build Status Coverage badge

Concept

Parsers extract data from static files or other APIs to upload them to database.

Output data structure

Parsing result is a list of dictionaries. Each dictionary represents one observation in time for a variable (datapoint). Datapoint dictionary has date, freq, name and value keys. Same data structure is used to upload data to database.

Example:

 {'date': '2017-09-26', 
  'freq': 'd', 
  'name': 'USDRUR_CB', 
  'value': Decimal(57.566)},

Individual parsers

Class Description Frequency Start date
KEP_Annual Annual data from KEP publication (Rosstat) a 1999-01-01
KEP_Qtr Quarterly data from KEP publication (Rosstat) q 1999-01-01
KEP_Monthly Monthly data from KEP publication (Rosstat) m 1999-01-01
Brent Brent oil price (EIA) d 1987-05-20
USDRUR Official USD/RUR exchange rate (Bank of Russia) d 1992-07-01
UST US Treasuries interest rates (UST) d 1990-01-01

Use dataset.ReadmeTable() to update this table.

Parser construction

Each parser is a child of parsers.getter.base.ParserBase class.

A parser has itw own:

  • observation start date (class attribute)
  • frequency (class attribute)
  • url constructor (property)
  • response parsing function (staticmethod)

Parsers are stored in parsers.getter folder.

Arguments

Parsers can return datapoints from a specific date to present:

from parsers.getter.cbr_fx import USDRUR
parser = CBR_USD(start_date='2017-09-01')

or for a fixed period in time:

from parsers.getter.brent import Brent
parser = Brent('2017-09-15', '2017-10-17')

A parser without arguments scans full dataset, and it makes a burden on the original sources.

Running

Run individual parser:

from parsers.getter import KEP_Annual

parser = KEP_Annual()
parser.extract()
parser.upload()

Dataset class used to manipulate a group of parsers.

from parser import Dataset

d = Dataset(start_date='2016-12-31') 
d.extract()
d.upload()
d.save_json(filename='dump.txt')

Scheduler invokes dataset.update() function for uploading latest values.

TODO

repo: rosstat-806-regional

https://github.com/epogrebnyak/data-rosstat-806-regional

repo: data-rosstat-isep

https://github.com/mini-kep/data-rosstat-isep

Glossary

Types of parsers:

heavy - some parsers are styled to download the data, transform it and provide the output in local folder or URL. These ususally work on bad formats of data, eg Word, and require a lot of work to extract data because the source data is not structured well.

thin ('clean') - some parsers can do the job on query, yield datapoints and die fast and easily because source data is clean. These parsers usually do not require disk space to store intermeduate parsing result.

About

Extract and upload data to database

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages