add CONTIBURING.md #10

henrydavidge · 2019-10-14T20:51:52Z

Signed-off-by: Henry D [email protected]

What changes are proposed in this pull request?

Add file with contributing instructions

How is this patch tested?

Unit tests
Integration tests
Manual tests

(Details)

Signed-off-by: Henry D <[email protected]>

# This is the 1st commit message: WIP # This is the commit message projectglow#2: Get jar working Don't use Kryo serializer Don't parallelize un-serializable Hadoop FileStatus Change descrip WIP Whoops bintray Not local Quiet logs Remove tmp file Actually rename bintray Setting version to 0.1.0 WIP WIP License fixup Resolver WIP Change version Setting version to 0.1.1 WIP Setting version to 0.1.2 Setting version to 0.1.3-SNAPSHOT WIP Setting version to 0.1.2 Setting version to 0.1.3-SNAPSHOT Exclude many GATK deps Setting version to 0.1.3 Setting version to 0.1.4-SNAPSHOT Setting version to 0.1.4 Setting version to 0.1.5-SNAPSHOT Whoops Setting version to 0.1.3 Setting version to 0.1.4-SNAPSHOT Setting version to 0.1.4 Setting version to 0.1.5-SNAPSHOT Setting version to 0.1.6 Setting version to 0.1.7-SNAPSHOT Yay deps Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT Setting version to 0.1.8 Setting version to 0.1.9-SNAPSHOT Setting version to 0.1.1 Setting version to 0.1.2-SNAPSHOT Setting version to 0.1.10 Setting version to 0.1.11-SNAPSHOT Setting version to 0.1.15 Setting version to 0.1.16-SNAPSHOT Setting version to 0.1.9 Setting version to 0.1.10-SNAPSHOT WIP Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT Setting version to 0.1.8 Setting version to 0.1.9-SNAPSHOT Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT Add tests back Setting version to 0.1.8 Setting version to 0.1.9-SNAPSHOT Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT Setting version to 0.1.13 Setting version to 0.1.14-SNAPSHOT WIP Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT WIP Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT WIP Setting version to 0.1.8 Setting version to 0.1.9-SNAPSHOT Setting version to 0.1.11 Setting version to 0.1.12-SNAPSHOT Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT Exclude findbugs Setting version to 0.1.8 Setting version to 0.1.9-SNAPSHOT WIP Cleanup # This is the commit message projectglow#3: Rename org # This is the commit message projectglow#4: Rename env # This is the commit message projectglow#5: Setting version to 0.1.0 # This is the commit message projectglow#6: Setting version to 0.1.1-SNAPSHOT # This is the commit message projectglow#7: Rename # This is the commit message projectglow#8: Work on test.pypi # This is the commit message projectglow#9: Fix VCFFileWriterSuite (projectglow#63) # This is the commit message projectglow#10: Remove SpecificInternalRow buffer in RowConverter (projectglow#65) * Remove SpecificInternalRow buffer in RowConverter * comment

# This is the 1st commit message: WIP # This is the commit message projectglow#2: Get jar working Don't use Kryo serializer Don't parallelize un-serializable Hadoop FileStatus Change descrip WIP Whoops bintray Not local Quiet logs Remove tmp file Actually rename bintray Setting version to 0.1.0 WIP WIP License fixup Resolver WIP Change version Setting version to 0.1.1 WIP Setting version to 0.1.2 Setting version to 0.1.3-SNAPSHOT WIP Setting version to 0.1.2 Setting version to 0.1.3-SNAPSHOT Exclude many GATK deps Setting version to 0.1.3 Setting version to 0.1.4-SNAPSHOT Setting version to 0.1.4 Setting version to 0.1.5-SNAPSHOT Whoops Setting version to 0.1.3 Setting version to 0.1.4-SNAPSHOT Setting version to 0.1.4 Setting version to 0.1.5-SNAPSHOT Setting version to 0.1.6 Setting version to 0.1.7-SNAPSHOT Yay deps Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT Setting version to 0.1.8 Setting version to 0.1.9-SNAPSHOT Setting version to 0.1.1 Setting version to 0.1.2-SNAPSHOT Setting version to 0.1.10 Setting version to 0.1.11-SNAPSHOT Setting version to 0.1.15 Setting version to 0.1.16-SNAPSHOT Setting version to 0.1.9 Setting version to 0.1.10-SNAPSHOT WIP Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT Setting version to 0.1.8 Setting version to 0.1.9-SNAPSHOT Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT Add tests back Setting version to 0.1.8 Setting version to 0.1.9-SNAPSHOT Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT Setting version to 0.1.13 Setting version to 0.1.14-SNAPSHOT WIP Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT WIP Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT WIP Setting version to 0.1.8 Setting version to 0.1.9-SNAPSHOT Setting version to 0.1.11 Setting version to 0.1.12-SNAPSHOT Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT Exclude findbugs Setting version to 0.1.8 Setting version to 0.1.9-SNAPSHOT WIP Cleanup # This is the commit message projectglow#3: Rename org # This is the commit message projectglow#4: Rename env # This is the commit message projectglow#5: Setting version to 0.1.0 # This is the commit message projectglow#6: Setting version to 0.1.1-SNAPSHOT # This is the commit message projectglow#7: Rename # This is the commit message projectglow#8: Work on test.pypi # This is the commit message projectglow#9: Fix VCFFileWriterSuite (projectglow#63) # This is the commit message projectglow#10: Remove SpecificInternalRow buffer in RowConverter (projectglow#65) * Remove SpecificInternalRow buffer in RowConverter * comment # This is the commit message projectglow#11: Update CircleCI badge # This is the commit message projectglow#12: Move build/test from README to wiki # This is the commit message projectglow#13: More cleanup # This is the commit message projectglow#14: Newline # This is the commit message projectglow#15: address comments # This is the commit message projectglow#16: Circleci fixups # This is the commit message projectglow#17: Un-exclude netlib from gatk # This is the commit message projectglow#18: CircleCI indents # This is the commit message projectglow#19: Change bintray org # This is the commit message projectglow#20: Setting version to 0.1.0 # This is the commit message projectglow#21: Bintray repo # This is the commit message projectglow#22: Move bintrayrepo # This is the commit message projectglow#23: Setting version to 0.1.1-SNAPSHOT

Get jar working Don't use Kryo serializer Don't parallelize un-serializable Hadoop FileStatus Change descrip WIP Whoops bintray Not local Quiet logs Remove tmp file Actually rename bintray Setting version to 0.1.0 WIP WIP License fixup Resolver WIP Change version Setting version to 0.1.1 WIP Setting version to 0.1.2 Setting version to 0.1.3-SNAPSHOT WIP Setting version to 0.1.2 Setting version to 0.1.3-SNAPSHOT Exclude many GATK deps Setting version to 0.1.3 Setting version to 0.1.4-SNAPSHOT Setting version to 0.1.4 Setting version to 0.1.5-SNAPSHOT Whoops Setting version to 0.1.3 Setting version to 0.1.4-SNAPSHOT Setting version to 0.1.4 Setting version to 0.1.5-SNAPSHOT Setting version to 0.1.6 Setting version to 0.1.7-SNAPSHOT Yay deps Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT Setting version to 0.1.8 Setting version to 0.1.9-SNAPSHOT Setting version to 0.1.1 Setting version to 0.1.2-SNAPSHOT Setting version to 0.1.10 Setting version to 0.1.11-SNAPSHOT Setting version to 0.1.15 Setting version to 0.1.16-SNAPSHOT Setting version to 0.1.9 Setting version to 0.1.10-SNAPSHOT WIP Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT Setting version to 0.1.8 Setting version to 0.1.9-SNAPSHOT Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT Add tests back Setting version to 0.1.8 Setting version to 0.1.9-SNAPSHOT Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT Setting version to 0.1.13 Setting version to 0.1.14-SNAPSHOT WIP Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT WIP Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT WIP Setting version to 0.1.8 Setting version to 0.1.9-SNAPSHOT Setting version to 0.1.11 Setting version to 0.1.12-SNAPSHOT Setting version to 0.1.7 Setting version to 0.1.8-SNAPSHOT Exclude findbugs Setting version to 0.1.8 Setting version to 0.1.9-SNAPSHOT WIP Cleanup Rename org Rename env Setting version to 0.1.0 Setting version to 0.1.1-SNAPSHOT Rename Work on test.pypi Fix VCFFileWriterSuite (projectglow#63) Remove SpecificInternalRow buffer in RowConverter (projectglow#65) * Remove SpecificInternalRow buffer in RowConverter * comment Update CircleCI badge Move build/test from README to wiki More cleanup Newline address comments Circleci fixups Un-exclude netlib from gatk CircleCI indents Change bintray org Setting version to 0.1.0 Bintray repo Move bintrayrepo Setting version to 0.1.1-SNAPSHOT Rename everything to glow (projectglow#2) * do the big rename * Make tests pass * imports * sg -> glow * Trigger CircleCI tests * Trigger CircleCI again * Fix CircleCI config * Fix Python dir * Rename datasources * CircleCI wip * More CircleCI wip * Continue Circleci wip * WIP * Revert last change * Un package-private * More un package-private * Continue un package-private * More un-package-private * Try again * no core * rename * compile * fix test * fix tests * test file * no tabs * less logging' * update * io * ignore unit tests Whoops [HLS-360] Get readthedocs to work (projectglow#3) * Notebooks in docs subfolder * WIP * WIP * WIP * WIP * WIP * Un-raw * WIP * WIP * WIP * WIP * WIP * WIP * WIP * More WIP * Try again * Remove unneeded imports Make Glow a class with extending object (projectglow#5) Fix Glow tests (projectglow#6) * oops * fix pipe transformer cleanup Update log4j.properties Add license header (projectglow#9) add CONTIBURING.md (projectglow#10) Signed-off-by: Henry D <[email protected]>

Signed-off-by: Henry D <[email protected]> Signed-off-by: Karen Feng <[email protected]>

* Fix Glow tests (#6) * oops * fix pipe transformer cleanup Signed-off-by: kianfar77 <[email protected]> * Update log4j.properties Signed-off-by: kianfar77 <[email protected]> * livehtml Signed-off-by: kianfar77 <[email protected]> * fix Signed-off-by: kianfar77 <[email protected]> * more docs Signed-off-by: kianfar77 <[email protected]> * more docs Signed-off-by: kianfar77 <[email protected]> * Add license header (#9) Signed-off-by: kianfar77 <[email protected]> * add CONTIBURING.md (#10) Signed-off-by: Henry D <[email protected]> Signed-off-by: kianfar77 <[email protected]> * notebook addresses Signed-off-by: kianfar77 <[email protected]> * glow to Glow Signed-off-by: kianfar77 <[email protected]> * more docs Signed-off-by: kianfar77 <[email protected]> * sidebar width Signed-off-by: kianfar77 <[email protected]> * utility Signed-off-by: kianfar77 <[email protected]> * [HLS-353] Add utility function docs (#12) * Add glue fns Signed-off-by: Karen Feng <[email protected]> * Address comments Signed-off-by: Karen Feng <[email protected]> Signed-off-by: kianfar77 <[email protected]> * comments addressed Signed-off-by: kianfar77 <[email protected]>

* cleanup Signed-off-by: Karen Feng <[email protected]> * whoops Signed-off-by: Karen Feng <[email protected]> * cleanup Signed-off-by: Karen Feng <[email protected]>

Implemented Filter Parsing and Tabix Pushdown Signed-off-by: Henry Davidge <[email protected]>

Signed-off-by: Henry D <[email protected]> Signed-off-by: Henry Davidge <[email protected]>

* Fix Glow tests (projectglow#6) * oops * fix pipe transformer cleanup Signed-off-by: kianfar77 <[email protected]> * Update log4j.properties Signed-off-by: kianfar77 <[email protected]> * livehtml Signed-off-by: kianfar77 <[email protected]> * fix Signed-off-by: kianfar77 <[email protected]> * more docs Signed-off-by: kianfar77 <[email protected]> * more docs Signed-off-by: kianfar77 <[email protected]> * Add license header (projectglow#9) Signed-off-by: kianfar77 <[email protected]> * add CONTIBURING.md (projectglow#10) Signed-off-by: Henry D <[email protected]> Signed-off-by: kianfar77 <[email protected]> * notebook addresses Signed-off-by: kianfar77 <[email protected]> * glow to Glow Signed-off-by: kianfar77 <[email protected]> * more docs Signed-off-by: kianfar77 <[email protected]> * sidebar width Signed-off-by: kianfar77 <[email protected]> * utility Signed-off-by: kianfar77 <[email protected]> * [HLS-353] Add utility function docs (projectglow#12) * Add glue fns Signed-off-by: Karen Feng <[email protected]> * Address comments Signed-off-by: Karen Feng <[email protected]> Signed-off-by: kianfar77 <[email protected]> * comments addressed Signed-off-by: kianfar77 <[email protected]> Signed-off-by: Henry Davidge <[email protected]>

* cleanup Signed-off-by: Karen Feng <[email protected]> * whoops Signed-off-by: Karen Feng <[email protected]> * cleanup Signed-off-by: Karen Feng <[email protected]> Signed-off-by: Henry Davidge <[email protected]>

* Add Leland's demo notebook * block_variants_and_samples Transformer to create genotype DataFrame for WGR (#2) * blocks Signed-off-by: kianfar77 <[email protected]> * test vcf Signed-off-by: kianfar77 <[email protected]> * transformer Signed-off-by: kianfar77 <[email protected]> * remove extra Signed-off-by: kianfar77 <[email protected]> * refactor and conform with ridge namings Signed-off-by: kianfar77 <[email protected]> * test Signed-off-by: kianfar77 <[email protected]> * test files Signed-off-by: kianfar77 <[email protected]> * remove extra file Signed-off-by: kianfar77 <[email protected]> * sort_key Signed-off-by: kianfar77 <[email protected]> * feat: ridge models for wgr added (#1) * feat: ridge models for wgr added Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Doc strings added for levels/functions.py Some typos fixed in ridge_model.py Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * ridge_model and RidgeReducer unit tests added Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * RidgeRegression unit tests added test data README added ridge_udfs.py docstrings added Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Changes made to accessing the sample ID map and more docstrings The map_normal_eqn and score_models functions previously expected the sample IDs for a given sample block to be found in the Pandas DataFrame, which mean we had to join them on before the .groupBy().apply(). These functions now expect the sample block to sample IDs mapping to be provided separately as a dict, so that the join is no longer required. RidgeReducer and RidgeRegression APIs remain unchanged. docstrings have been added for RidgeReducer and RidgeRegression classes. Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Refactored object names and comments to reflect new terminology Where 'block' was previously used to refer to the set of columns in a block, we now use 'header_block' Where 'group' was previously used to refer to the set of samples in a block, we now use 'sample_block' Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * [HLS-539] Fix compatibility between blocked GT transformer and WGR (#6) * WIP Signed-off-by: Karen Feng <[email protected]> * existing tests pass Signed-off-by: Karen Feng <[email protected]> * rename file Signed-off-by: Karen Feng <[email protected]> * Add compat test Signed-off-by: Karen Feng <[email protected]> * scalafmt Signed-off-by: Karen Feng <[email protected]> * collect minimal columns Signed-off-by: Karen Feng <[email protected]> * address comments Signed-off-by: Karen Feng <[email protected]> * Test fixup Signed-off-by: Karen Feng <[email protected]> * Spark 3 needs more recent PyArrow, reduce mem consumption by removing unnecessary caching Signed-off-by: Karen Feng <[email protected]> * PyArrow 0.15.1 only with PySpark 3 Signed-off-by: Karen Feng <[email protected]> * Don't use toPandas() Signed-off-by: Karen Feng <[email protected]> * Upgrade pyarrow Signed-off-by: Karen Feng <[email protected]> * Only register once Signed-off-by: Karen Feng <[email protected]> * Minimize memory usage Signed-off-by: Karen Feng <[email protected]> * Select before head Signed-off-by: Karen Feng <[email protected]> * set up/tear down Signed-off-by: Karen Feng <[email protected]> * Try limiting pyspark memory Signed-off-by: Karen Feng <[email protected]> * No teardown Signed-off-by: Karen Feng <[email protected]> * Extend timeout Signed-off-by: Karen Feng <[email protected]> * Simplify ordering logic in levels code (#7) * WIP Signed-off-by: Karen Feng <[email protected]> * existing tests pass Signed-off-by: Karen Feng <[email protected]> * rename file Signed-off-by: Karen Feng <[email protected]> * Add compat test Signed-off-by: Karen Feng <[email protected]> * scalafmt Signed-off-by: Karen Feng <[email protected]> * collect minimal columns Signed-off-by: Karen Feng <[email protected]> * start changing for readability * use input label ordering * rename create_row_indexer * undo column sort * change reduce Signed-off-by: Henry D <[email protected]> * further simplify reduce * sorted alpha names * remove ordering * comments Signed-off-by: Henry D <[email protected]> * Set arrow env var in build Signed-off-by: Henry D <[email protected]> * faster sort * add test file * undo test data change * >= * formatting * empty Co-authored-by: Karen Feng <[email protected]> * Limit Spark memory conf in tests (#9) * yapf Signed-off-by: Karen Feng <[email protected]> * yapf transform Signed-off-by: Karen Feng <[email protected]> * Set driver memory Signed-off-by: Karen Feng <[email protected]> * Try changing spark mem Signed-off-by: Karen Feng <[email protected]> * match java tests Signed-off-by: Karen Feng <[email protected]> * whoops Signed-off-by: Karen Feng <[email protected]> * remove driver memory flag Signed-off-by: Karen Feng <[email protected]> * Improve partitioning in block_variants_and_samples transformer (#11) Signed-off-by: kianfar77 <[email protected]> * Remove unnecessary header_block grouping (#10) * cleanup Signed-off-by: Karen Feng <[email protected]> * whoops Signed-off-by: Karen Feng <[email protected]> * cleanup Signed-off-by: Karen Feng <[email protected]> * Create sample ID blocking helper functions (#12) * WIP Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * whoops Signed-off-by: Karen Feng <[email protected]> * tests Signed-off-by: Karen Feng <[email protected]> * simplify tests Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * yapf Signed-off-by: Karen Feng <[email protected]> * index map compat Signed-off-by: Karen Feng <[email protected]> * Add docs Signed-off-by: Karen Feng <[email protected]> * Add more tests Signed-off-by: Karen Feng <[email protected]> * pass args as ints Signed-off-by: Karen Feng <[email protected]> * Don't roll our own splitter Signed-off-by: Karen Feng <[email protected]> * rename sample_index to sample_blocks Signed-off-by: Karen Feng <[email protected]> * Add type-checking to WGR APIs (#14) * Add type-checking to APIs Signed-off-by: Karen Feng <[email protected]> * Check valid alphas Signed-off-by: Karen Feng <[email protected]> * check 0 sig Signed-off-by: Karen Feng <[email protected]> * Add to install_requires list Signed-off-by: Karen Feng <[email protected]> * cleanup comments Signed-off-by: Karen Feng <[email protected]> * Add covariate support (#13) * Added necessary modifications to accomodate covariates in model fitting. The initial formulation of the WGR model assumed a form y ~ Xb, however in general we would like to use a model of the form y ~ Ca + Xb, where C is some matrix of covariates that are separate from the genomic features X. This PR makes numerous changes to accomodate covariate matrix C. Adding covariates required the following breaking changes to the APIs: * indexdf is now a required argument for RidgeReducer.transform() and RidgeRegression.transform(): * RidgeReducer.transform(blockdf, labeldf, modeldf) -> RidgeReducer.transform(blockdf, labeldf, indexdf, modeldf) * RidgeRegression.transform(blockdf, labeldf, model, cvdf) -> RidgeRegression.transform(blockdf, labeldf, indexdf, model, cvdf) Additionally, the function signatures for the fit and transform methods of RidgeReducer and RidgeRegression have all been updated to accomodate an optional covariate DataFrame as the final argument. Two new tests have been added to test_ridge_regression.py to test run modes with covariates: * test_ridge_reducer_transform_with_cov * test_two_level_regression_with_cov Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Cleaned up one unnecessary Pandas import Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Small changes for clarity and consistence with the rest of the code. Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Forgot one usage of coalesce Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Added a couple of comments to explain logic and replaced usages of .values with .array Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Fixed one instance of the change .values -> .array where it was made in error. Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Typo in test_ridge_regression.py. Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Style auto-updates with yapfAll Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> Co-authored-by: Leland Barnard <[email protected]> Co-authored-by: Karen Feng <[email protected]> * Flatten estimated phenotypes (#15) * WIP Signed-off-by: Karen Feng <[email protected]> * Clean up tests Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * Order to match labeldf Signed-off-by: Karen Feng <[email protected]> * Check we tie-break Signed-off-by: Karen Feng <[email protected]> * cleanup Signed-off-by: Karen Feng <[email protected]> * tests Signed-off-by: Karen Feng <[email protected]> * test var name Signed-off-by: Karen Feng <[email protected]> * clean up tests Signed-off-by: Karen Feng <[email protected]> * Clean up docs Signed-off-by: Karen Feng <[email protected]> * Add fit_transform function to models (#17) Signed-off-by: Karen Feng <[email protected]> * Rename levels (#20) * Rename levels to wgr Signed-off-by: Karen Feng <[email protected]> * rename test files Signed-off-by: Karen Feng <[email protected]> * Add license headers (#21) * headers * executable * fix template rendering * yapf * add header to template * add header to template Signed-off-by: Henry D <[email protected]> Co-authored-by: Kiavash Kianfar <[email protected]> Co-authored-by: Karen Feng <[email protected]> Co-authored-by: Leland <[email protected]> Co-authored-by: Leland Barnard <[email protected]>

* Add Leland's demo notebook * block_variants_and_samples Transformer to create genotype DataFrame for WGR (#2) * blocks Signed-off-by: kianfar77 <[email protected]> * test vcf Signed-off-by: kianfar77 <[email protected]> * transformer Signed-off-by: kianfar77 <[email protected]> * remove extra Signed-off-by: kianfar77 <[email protected]> * refactor and conform with ridge namings Signed-off-by: kianfar77 <[email protected]> * test Signed-off-by: kianfar77 <[email protected]> * test files Signed-off-by: kianfar77 <[email protected]> * remove extra file Signed-off-by: kianfar77 <[email protected]> * sort_key Signed-off-by: kianfar77 <[email protected]> * feat: ridge models for wgr added (#1) * feat: ridge models for wgr added Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Doc strings added for levels/functions.py Some typos fixed in ridge_model.py Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * ridge_model and RidgeReducer unit tests added Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * RidgeRegression unit tests added test data README added ridge_udfs.py docstrings added Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Changes made to accessing the sample ID map and more docstrings The map_normal_eqn and score_models functions previously expected the sample IDs for a given sample block to be found in the Pandas DataFrame, which mean we had to join them on before the .groupBy().apply(). These functions now expect the sample block to sample IDs mapping to be provided separately as a dict, so that the join is no longer required. RidgeReducer and RidgeRegression APIs remain unchanged. docstrings have been added for RidgeReducer and RidgeRegression classes. Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Refactored object names and comments to reflect new terminology Where 'block' was previously used to refer to the set of columns in a block, we now use 'header_block' Where 'group' was previously used to refer to the set of samples in a block, we now use 'sample_block' Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * [HLS-539] Fix compatibility between blocked GT transformer and WGR (#6) * WIP Signed-off-by: Karen Feng <[email protected]> * existing tests pass Signed-off-by: Karen Feng <[email protected]> * rename file Signed-off-by: Karen Feng <[email protected]> * Add compat test Signed-off-by: Karen Feng <[email protected]> * scalafmt Signed-off-by: Karen Feng <[email protected]> * collect minimal columns Signed-off-by: Karen Feng <[email protected]> * address comments Signed-off-by: Karen Feng <[email protected]> * Test fixup Signed-off-by: Karen Feng <[email protected]> * Spark 3 needs more recent PyArrow, reduce mem consumption by removing unnecessary caching Signed-off-by: Karen Feng <[email protected]> * PyArrow 0.15.1 only with PySpark 3 Signed-off-by: Karen Feng <[email protected]> * Don't use toPandas() Signed-off-by: Karen Feng <[email protected]> * Upgrade pyarrow Signed-off-by: Karen Feng <[email protected]> * Only register once Signed-off-by: Karen Feng <[email protected]> * Minimize memory usage Signed-off-by: Karen Feng <[email protected]> * Select before head Signed-off-by: Karen Feng <[email protected]> * set up/tear down Signed-off-by: Karen Feng <[email protected]> * Try limiting pyspark memory Signed-off-by: Karen Feng <[email protected]> * No teardown Signed-off-by: Karen Feng <[email protected]> * Extend timeout Signed-off-by: Karen Feng <[email protected]> * Simplify ordering logic in levels code (#7) * WIP Signed-off-by: Karen Feng <[email protected]> * existing tests pass Signed-off-by: Karen Feng <[email protected]> * rename file Signed-off-by: Karen Feng <[email protected]> * Add compat test Signed-off-by: Karen Feng <[email protected]> * scalafmt Signed-off-by: Karen Feng <[email protected]> * collect minimal columns Signed-off-by: Karen Feng <[email protected]> * start changing for readability * use input label ordering * rename create_row_indexer * undo column sort * change reduce Signed-off-by: Henry D <[email protected]> * further simplify reduce * sorted alpha names * remove ordering * comments Signed-off-by: Henry D <[email protected]> * Set arrow env var in build Signed-off-by: Henry D <[email protected]> * faster sort * add test file * undo test data change * >= * formatting * empty Co-authored-by: Karen Feng <[email protected]> * Limit Spark memory conf in tests (#9) * yapf Signed-off-by: Karen Feng <[email protected]> * yapf transform Signed-off-by: Karen Feng <[email protected]> * Set driver memory Signed-off-by: Karen Feng <[email protected]> * Try changing spark mem Signed-off-by: Karen Feng <[email protected]> * match java tests Signed-off-by: Karen Feng <[email protected]> * whoops Signed-off-by: Karen Feng <[email protected]> * remove driver memory flag Signed-off-by: Karen Feng <[email protected]> * Improve partitioning in block_variants_and_samples transformer (#11) Signed-off-by: kianfar77 <[email protected]> * Remove unnecessary header_block grouping (#10) * cleanup Signed-off-by: Karen Feng <[email protected]> * whoops Signed-off-by: Karen Feng <[email protected]> * cleanup Signed-off-by: Karen Feng <[email protected]> * Create sample ID blocking helper functions (#12) * WIP Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * whoops Signed-off-by: Karen Feng <[email protected]> * tests Signed-off-by: Karen Feng <[email protected]> * simplify tests Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * yapf Signed-off-by: Karen Feng <[email protected]> * index map compat Signed-off-by: Karen Feng <[email protected]> * Add docs Signed-off-by: Karen Feng <[email protected]> * Add more tests Signed-off-by: Karen Feng <[email protected]> * pass args as ints Signed-off-by: Karen Feng <[email protected]> * Don't roll our own splitter Signed-off-by: Karen Feng <[email protected]> * rename sample_index to sample_blocks Signed-off-by: Karen Feng <[email protected]> * Add type-checking to WGR APIs (#14) * Add type-checking to APIs Signed-off-by: Karen Feng <[email protected]> * Check valid alphas Signed-off-by: Karen Feng <[email protected]> * check 0 sig Signed-off-by: Karen Feng <[email protected]> * Add to install_requires list Signed-off-by: Karen Feng <[email protected]> * cleanup comments Signed-off-by: Karen Feng <[email protected]> * Add covariate support (#13) * Added necessary modifications to accomodate covariates in model fitting. The initial formulation of the WGR model assumed a form y ~ Xb, however in general we would like to use a model of the form y ~ Ca + Xb, where C is some matrix of covariates that are separate from the genomic features X. This PR makes numerous changes to accomodate covariate matrix C. Adding covariates required the following breaking changes to the APIs: * indexdf is now a required argument for RidgeReducer.transform() and RidgeRegression.transform(): * RidgeReducer.transform(blockdf, labeldf, modeldf) -> RidgeReducer.transform(blockdf, labeldf, indexdf, modeldf) * RidgeRegression.transform(blockdf, labeldf, model, cvdf) -> RidgeRegression.transform(blockdf, labeldf, indexdf, model, cvdf) Additionally, the function signatures for the fit and transform methods of RidgeReducer and RidgeRegression have all been updated to accomodate an optional covariate DataFrame as the final argument. Two new tests have been added to test_ridge_regression.py to test run modes with covariates: * test_ridge_reducer_transform_with_cov * test_two_level_regression_with_cov Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Cleaned up one unnecessary Pandas import Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Small changes for clarity and consistence with the rest of the code. Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Forgot one usage of coalesce Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Added a couple of comments to explain logic and replaced usages of .values with .array Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Fixed one instance of the change .values -> .array where it was made in error. Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Typo in test_ridge_regression.py. Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Style auto-updates with yapfAll Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> Co-authored-by: Leland Barnard <[email protected]> Co-authored-by: Karen Feng <[email protected]> * Flatten estimated phenotypes (#15) * WIP Signed-off-by: Karen Feng <[email protected]> * Clean up tests Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * Order to match labeldf Signed-off-by: Karen Feng <[email protected]> * Check we tie-break Signed-off-by: Karen Feng <[email protected]> * cleanup Signed-off-by: Karen Feng <[email protected]> * tests Signed-off-by: Karen Feng <[email protected]> * test var name Signed-off-by: Karen Feng <[email protected]> * clean up tests Signed-off-by: Karen Feng <[email protected]> * Clean up docs Signed-off-by: Karen Feng <[email protected]> * Add fit_transform function to models (#17) Signed-off-by: Karen Feng <[email protected]> * support alpha inference Signed-off-by: Karen Feng <[email protected]> * test fixup Signed-off-by: Karen Feng <[email protected]> * more test fixup Signed-off-by: Karen Feng <[email protected]> * test fixups Signed-off-by: Karen Feng <[email protected]> * sub-sample Signed-off-by: Karen Feng <[email protected]> * test fixup Signed-off-by: Karen Feng <[email protected]> * address comments - only infer alphas during fit Signed-off-by: Karen Feng <[email protected]> * exception varies Signed-off-by: Karen Feng <[email protected]> * Rename levels (#20) * Rename levels to wgr Signed-off-by: Karen Feng <[email protected]> * rename test files Signed-off-by: Karen Feng <[email protected]> * Errors vary by Spark version Signed-off-by: Karen Feng <[email protected]> * Add license headers (#21) * headers * executable * fix template rendering * yapf Co-authored-by: Kiavash Kianfar <[email protected]> Co-authored-by: Karen Feng <[email protected]> Co-authored-by: Leland <[email protected]> Co-authored-by: Leland Barnard <[email protected]>

* Add Leland's demo notebook * block_variants_and_samples Transformer to create genotype DataFrame for WGR (#2) * blocks Signed-off-by: kianfar77 <[email protected]> * test vcf Signed-off-by: kianfar77 <[email protected]> * transformer Signed-off-by: kianfar77 <[email protected]> * remove extra Signed-off-by: kianfar77 <[email protected]> * refactor and conform with ridge namings Signed-off-by: kianfar77 <[email protected]> * test Signed-off-by: kianfar77 <[email protected]> * test files Signed-off-by: kianfar77 <[email protected]> * remove extra file Signed-off-by: kianfar77 <[email protected]> * sort_key Signed-off-by: kianfar77 <[email protected]> * feat: ridge models for wgr added (#1) * feat: ridge models for wgr added Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Doc strings added for levels/functions.py Some typos fixed in ridge_model.py Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * ridge_model and RidgeReducer unit tests added Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * RidgeRegression unit tests added test data README added ridge_udfs.py docstrings added Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Changes made to accessing the sample ID map and more docstrings The map_normal_eqn and score_models functions previously expected the sample IDs for a given sample block to be found in the Pandas DataFrame, which mean we had to join them on before the .groupBy().apply(). These functions now expect the sample block to sample IDs mapping to be provided separately as a dict, so that the join is no longer required. RidgeReducer and RidgeRegression APIs remain unchanged. docstrings have been added for RidgeReducer and RidgeRegression classes. Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Refactored object names and comments to reflect new terminology Where 'block' was previously used to refer to the set of columns in a block, we now use 'header_block' Where 'group' was previously used to refer to the set of samples in a block, we now use 'sample_block' Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * [HLS-539] Fix compatibility between blocked GT transformer and WGR (#6) * WIP Signed-off-by: Karen Feng <[email protected]> * existing tests pass Signed-off-by: Karen Feng <[email protected]> * rename file Signed-off-by: Karen Feng <[email protected]> * Add compat test Signed-off-by: Karen Feng <[email protected]> * scalafmt Signed-off-by: Karen Feng <[email protected]> * collect minimal columns Signed-off-by: Karen Feng <[email protected]> * address comments Signed-off-by: Karen Feng <[email protected]> * Test fixup Signed-off-by: Karen Feng <[email protected]> * Spark 3 needs more recent PyArrow, reduce mem consumption by removing unnecessary caching Signed-off-by: Karen Feng <[email protected]> * PyArrow 0.15.1 only with PySpark 3 Signed-off-by: Karen Feng <[email protected]> * Don't use toPandas() Signed-off-by: Karen Feng <[email protected]> * Upgrade pyarrow Signed-off-by: Karen Feng <[email protected]> * Only register once Signed-off-by: Karen Feng <[email protected]> * Minimize memory usage Signed-off-by: Karen Feng <[email protected]> * Select before head Signed-off-by: Karen Feng <[email protected]> * set up/tear down Signed-off-by: Karen Feng <[email protected]> * Try limiting pyspark memory Signed-off-by: Karen Feng <[email protected]> * No teardown Signed-off-by: Karen Feng <[email protected]> * Extend timeout Signed-off-by: Karen Feng <[email protected]> * Simplify ordering logic in levels code (#7) * WIP Signed-off-by: Karen Feng <[email protected]> * existing tests pass Signed-off-by: Karen Feng <[email protected]> * rename file Signed-off-by: Karen Feng <[email protected]> * Add compat test Signed-off-by: Karen Feng <[email protected]> * scalafmt Signed-off-by: Karen Feng <[email protected]> * collect minimal columns Signed-off-by: Karen Feng <[email protected]> * start changing for readability * use input label ordering * rename create_row_indexer * undo column sort * change reduce Signed-off-by: Henry D <[email protected]> * further simplify reduce * sorted alpha names * remove ordering * comments Signed-off-by: Henry D <[email protected]> * Set arrow env var in build Signed-off-by: Henry D <[email protected]> * faster sort * add test file * undo test data change * >= * formatting * empty Co-authored-by: Karen Feng <[email protected]> * Limit Spark memory conf in tests (#9) * yapf Signed-off-by: Karen Feng <[email protected]> * yapf transform Signed-off-by: Karen Feng <[email protected]> * Set driver memory Signed-off-by: Karen Feng <[email protected]> * Try changing spark mem Signed-off-by: Karen Feng <[email protected]> * match java tests Signed-off-by: Karen Feng <[email protected]> * whoops Signed-off-by: Karen Feng <[email protected]> * remove driver memory flag Signed-off-by: Karen Feng <[email protected]> * Improve partitioning in block_variants_and_samples transformer (#11) Signed-off-by: kianfar77 <[email protected]> * Remove unnecessary header_block grouping (#10) * cleanup Signed-off-by: Karen Feng <[email protected]> * whoops Signed-off-by: Karen Feng <[email protected]> * cleanup Signed-off-by: Karen Feng <[email protected]> * Create sample ID blocking helper functions (#12) * WIP Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * whoops Signed-off-by: Karen Feng <[email protected]> * tests Signed-off-by: Karen Feng <[email protected]> * simplify tests Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * yapf Signed-off-by: Karen Feng <[email protected]> * index map compat Signed-off-by: Karen Feng <[email protected]> * Add docs Signed-off-by: Karen Feng <[email protected]> * Add more tests Signed-off-by: Karen Feng <[email protected]> * pass args as ints Signed-off-by: Karen Feng <[email protected]> * Don't roll our own splitter Signed-off-by: Karen Feng <[email protected]> * rename sample_index to sample_blocks Signed-off-by: Karen Feng <[email protected]> * Add type-checking to WGR APIs (#14) * Add type-checking to APIs Signed-off-by: Karen Feng <[email protected]> * Check valid alphas Signed-off-by: Karen Feng <[email protected]> * check 0 sig Signed-off-by: Karen Feng <[email protected]> * Add to install_requires list Signed-off-by: Karen Feng <[email protected]> * cleanup comments Signed-off-by: Karen Feng <[email protected]> * Add covariate support (#13) * Added necessary modifications to accomodate covariates in model fitting. The initial formulation of the WGR model assumed a form y ~ Xb, however in general we would like to use a model of the form y ~ Ca + Xb, where C is some matrix of covariates that are separate from the genomic features X. This PR makes numerous changes to accomodate covariate matrix C. Adding covariates required the following breaking changes to the APIs: * indexdf is now a required argument for RidgeReducer.transform() and RidgeRegression.transform(): * RidgeReducer.transform(blockdf, labeldf, modeldf) -> RidgeReducer.transform(blockdf, labeldf, indexdf, modeldf) * RidgeRegression.transform(blockdf, labeldf, model, cvdf) -> RidgeRegression.transform(blockdf, labeldf, indexdf, model, cvdf) Additionally, the function signatures for the fit and transform methods of RidgeReducer and RidgeRegression have all been updated to accomodate an optional covariate DataFrame as the final argument. Two new tests have been added to test_ridge_regression.py to test run modes with covariates: * test_ridge_reducer_transform_with_cov * test_two_level_regression_with_cov Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Cleaned up one unnecessary Pandas import Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Small changes for clarity and consistence with the rest of the code. Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Forgot one usage of coalesce Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Added a couple of comments to explain logic and replaced usages of .values with .array Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Fixed one instance of the change .values -> .array where it was made in error. Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Typo in test_ridge_regression.py. Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> * Style auto-updates with yapfAll Signed-off-by: Leland Barnard ([email protected]) Signed-off-by: Leland Barnard <[email protected]> Co-authored-by: Leland Barnard <[email protected]> Co-authored-by: Karen Feng <[email protected]> * Flatten estimated phenotypes (#15) * WIP Signed-off-by: Karen Feng <[email protected]> * Clean up tests Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * Order to match labeldf Signed-off-by: Karen Feng <[email protected]> * Check we tie-break Signed-off-by: Karen Feng <[email protected]> * cleanup Signed-off-by: Karen Feng <[email protected]> * tests Signed-off-by: Karen Feng <[email protected]> * test var name Signed-off-by: Karen Feng <[email protected]> * clean up tests Signed-off-by: Karen Feng <[email protected]> * Clean up docs Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * tests Signed-off-by: Karen Feng <[email protected]> * remove accidental files Signed-off-by: Karen Feng <[email protected]> * Add fit_transform function to models (#17) Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * Rename levels (#20) * Rename levels to wgr Signed-off-by: Karen Feng <[email protected]> * rename test files Signed-off-by: Karen Feng <[email protected]> * Add license headers (#21) * headers * executable * fix template rendering * yapf * WIP Signed-off-by: Karen Feng <[email protected]> * WIP Signed-off-by: Karen Feng <[email protected]> * More work Signed-off-by: Karen Feng <[email protected]> * More cleanup Signed-off-by: Karen Feng <[email protected]> * Fix docs tests Signed-off-by: Karen Feng <[email protected]> * address comments Signed-off-by: Karen Feng <[email protected]> * fix regression fit description Signed-off-by: Karen Feng <[email protected]> * fix capitalization Signed-off-by: Karen Feng <[email protected]> * address some comments Signed-off-by: Karen Feng <[email protected]> * more cleanup Signed-off-by: Karen Feng <[email protected]> * More cleanup Signed-off-by: Karen Feng <[email protected]> * add notebook Signed-off-by: Karen Feng <[email protected]> * update notebook Signed-off-by: Karen Feng <[email protected]> Co-authored-by: Henry D <[email protected]> Co-authored-by: Kiavash Kianfar <[email protected]> Co-authored-by: Leland <[email protected]> Co-authored-by: Leland Barnard <[email protected]>

add CONTIBURING.md

ef63368

Signed-off-by: Henry D <[email protected]>

henrydavidge merged commit f59948a into projectglow:master Oct 14, 2019

karenfeng pushed a commit to karenfeng/glow that referenced this pull request Oct 15, 2019

add CONTIBURING.md (projectglow#10)

10cbcca

Signed-off-by: Henry D <[email protected]> Signed-off-by: Karen Feng <[email protected]>

henrydavidge pushed a commit to henrydavidge/glow that referenced this pull request Jun 22, 2020

Filter Parsing and Tabix Index Push Down and Test Data (projectglow#10)

3dde45d

Implemented Filter Parsing and Tabix Pushdown Signed-off-by: Henry Davidge <[email protected]>

henrydavidge added a commit to henrydavidge/glow that referenced this pull request Jun 22, 2020

add CONTIBURING.md (projectglow#10)

e10287f

Signed-off-by: Henry D <[email protected]> Signed-off-by: Henry Davidge <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add CONTIBURING.md #10

add CONTIBURING.md #10

henrydavidge commented Oct 14, 2019

add CONTIBURING.md #10

add CONTIBURING.md #10

Conversation

henrydavidge commented Oct 14, 2019

What changes are proposed in this pull request?

How is this patch tested?