Skip to content

technical_documentation

Ulf Kronman edited this page Apr 21, 2017 · 2 revisions

Technical documentation for Open APC Sweden

Ulf Kronman, 2017-04-03

Swedish pre-processing

Script: /python/se/clean_and_merge_apc_files.py -l se_SV.UTF-8

Uses: Reads list of files to process from ../data/apc_file_list.txt

Function:

  • Merges APC files
  • Changes SANT/FALSKT to TRUE/FALSE
  • Changes comma (,) decimal delimiter to period (.) decimal delimiter
  • Removes big number whitespace formatting from Excel
  • Checks for duplicate DOIs

Result: as TAB delimited in /data/apc_se_merged.tsv

Main enrichment process

Script: /python/se/apc_csv_processing.py -l se_SV.UTF-8 ../data/apc_se_merged.tsv

Uses: /data/apc_se_merged.tsv

Result: /python/out.csv

Swedish post-processing

Script: /python/se/normalise_and_copy.py

Uses: /python/out.csv

Result: /data/apc_se.csv

Analysis

Script: statistics.Rmd

Uses: /data/apc_se.csv

Result: statistics.md