-
Notifications
You must be signed in to change notification settings - Fork 5
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments where things are slightly unclear, but good to merge.
from sqlalchemy.dialects.mysql import TEXT as _TEXT | ||
from functools import partial | ||
|
||
TEXT = _TEXT(collation='utf8mb4_unicode_ci') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the ORM's I missed that the TEXT
import was from Nesta and not SqlAlchemy.
Perhaps it should be named something other than TEXT
, or by convention be imported as something other than TEXT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collation=utf8mb4_[...]
is default in MySQL8+
so this module is effectively deprecated on porting to daps2- so probs not such a big deal!
# NiH run conditions | ||
nih_pk = NihProject.application_id | ||
nih_core = NihProject.base_core_project_num | ||
nih_is_null = nih_core == None | ||
|
||
# Iterate over run params | ||
# Crunchbase | ||
params = (('companies', CrunchbaseOrg.id, None, {}), | ||
# NiH Core IDs != Null | ||
('nih', nih_core, ~nih_is_null, | ||
{'using_core_ids': True}), | ||
# NiH Core IDs == Null | ||
('nih', nih_pk, nih_is_null, | ||
{'using_core_ids': False})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flagging that this section has the potential to large and messy as more datasets are added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I'll switch to a config setup- one sec and I'll recommit
result = pycountry.countries.get(**query) | ||
if result is not None: | ||
return result | ||
except KeyError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under what conditions is a keyerror raised here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if pycountry.get
doesn't find an exact match. I'll add a comment to that effect!
Closes #326
Curation and aggregation of NiH data, ready for ingestion into Elasticsearch.
To run do:
luigi --module general_curate CurateTask --dataset nih
from
nesta/core/routines/projects/general
Data in curated form looks like: