Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for WGR #235

Merged
merged 49 commits into from
Jun 23, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
513b8be
Add Leland's demo notebook
henrydavidge May 15, 2020
1955d38
Merge pull request #3 from henrydavidge/add-nb
henrydavidge May 19, 2020
41d8fba
block_variants_and_samples Transformer to create genotype DataFrame f…
kianfar77 May 19, 2020
27e400e
Merge branch 'master' of https://github.com/projectglow/glow
karenfeng May 20, 2020
dfa6c08
Merge branch 'master' of https://github.com/projectglow/glow
karenfeng May 21, 2020
f5424ee
feat: ridge models for wgr added (#1)
LelandBarnard May 22, 2020
b065560
[HLS-539] Fix compatibility between blocked GT transformer and WGR (#6)
karenfeng May 29, 2020
9778381
Merge branch 'master' of github.com:projectglow/glow
henrydavidge May 29, 2020
35a2383
Merge branch 'master' of https://github.com/projectglow/glow
karenfeng Jun 1, 2020
86fab65
Merge branch 'master' of https://github.com/projectglow/glow
karenfeng Jun 2, 2020
265370f
Simplify ordering logic in levels code (#7)
henrydavidge Jun 2, 2020
1f32506
Limit Spark memory conf in tests (#9)
karenfeng Jun 2, 2020
f6f00d4
Merge branch 'master' of https://github.com/projectglow/glow
karenfeng Jun 3, 2020
cfc08e6
Improve partitioning in block_variants_and_samples transformer (#11)
kianfar77 Jun 5, 2020
f2f30c0
Remove unnecessary header_block grouping (#10)
karenfeng Jun 5, 2020
bcbadd6
Merge branch 'master' of https://github.com/projectglow/glow
karenfeng Jun 5, 2020
5bbad57
Merge branch 'master' of https://github.com/projectglow/glow
karenfeng Jun 8, 2020
1686138
Create sample ID blocking helper functions (#12)
karenfeng Jun 10, 2020
6bfad34
Add type-checking to WGR APIs (#14)
karenfeng Jun 12, 2020
afaa6df
Add covariate support (#13)
LelandBarnard Jun 12, 2020
cd6c6a1
Flatten estimated phenotypes (#15)
karenfeng Jun 15, 2020
5944b84
WIP
karenfeng Jun 15, 2020
e29ebfe
tests
karenfeng Jun 17, 2020
aeb91d8
remove accidental files
karenfeng Jun 17, 2020
d558115
Add fit_transform function to models (#17)
karenfeng Jun 17, 2020
79e0eea
Merge branch 'master' of https://github.com/projectglow/glow
karenfeng Jun 19, 2020
6ffd77a
WIP
karenfeng Jun 19, 2020
a14b27b
Merge branch 'master' of https://github.com/databricks/glow-wgr into …
karenfeng Jun 19, 2020
5ccc005
WIP
karenfeng Jun 19, 2020
0b2f5c6
WIP
karenfeng Jun 22, 2020
e920d06
Rename levels (#20)
karenfeng Jun 22, 2020
939e9bb
Add license headers (#21)
henrydavidge Jun 22, 2020
db50584
WIP
karenfeng Jun 22, 2020
49f7e65
Merge branch 'master' of https://github.com/databricks/glow-wgr into …
karenfeng Jun 22, 2020
d3a882e
WIP
karenfeng Jun 22, 2020
f9212b0
More work
karenfeng Jun 22, 2020
2a50994
More cleanup
karenfeng Jun 22, 2020
bf0963a
Fix docs tests
karenfeng Jun 22, 2020
52c7b3b
Merge branch 'master' of https://github.com/projectglow/glow into wgr…
karenfeng Jun 22, 2020
86e12a6
address comments
karenfeng Jun 22, 2020
418d714
fix regression fit description
karenfeng Jun 22, 2020
648e06e
Merge branch 'master' of https://github.com/projectglow/glow into wgr…
karenfeng Jun 22, 2020
48943cf
fix capitalization
karenfeng Jun 22, 2020
ee87aef
Merge branch 'master' of https://github.com/projectglow/glow into wgr…
karenfeng Jun 22, 2020
39601e9
address some comments
karenfeng Jun 22, 2020
4c8aac1
more cleanup
karenfeng Jun 22, 2020
2d9409c
More cleanup
karenfeng Jun 22, 2020
e899e55
add notebook
karenfeng Jun 23, 2020
75ffd4c
update notebook
karenfeng Jun 23, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/source/_static/images/wgr_runtime.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
42 changes: 42 additions & 0 deletions docs/source/_static/notebooks/tertiary/glowgr.html

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions docs/source/etl/vcf2delta.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _vcf2delta:

============================
Create a Genomics Delta Lake
============================
Expand Down
1 change: 1 addition & 0 deletions docs/source/tertiary/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ Perform population-scale statistical analyses of genetic variants.
pipe-transformer
pandas-udf
regression-tests
whole-genome-regression
6 changes: 3 additions & 3 deletions docs/source/tertiary/regression-tests.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ Example
standardError=0.1783963733160434,
pValue=0.44349953631952943
)
assert_rows_equal(lin_reg_df.head(), expected_lin_reg_row)
assert_rows_equal(lin_reg_df.filter('contigName = 22 and start = 16050114').head(), expected_lin_reg_row)

Parameters
----------
Expand Down Expand Up @@ -191,7 +191,7 @@ Example
waldConfidenceInterval=[0.7813704896767115, 3.247273366082802],
pValue=0.19572327843236637
)
assert_rows_equal(lrt_log_reg_df.head(), expected_lrt_log_reg_row)
assert_rows_equal(lrt_log_reg_df.filter('contigName = 22 and start = 16050114').head(), expected_lrt_log_reg_row)

expected_firth_log_reg_row = Row(
contigName='22',
Expand All @@ -202,7 +202,7 @@ Example
waldConfidenceInterval=[0.7719062301156017, 3.2026291934794795],
pValue=0.20086839802280376
)
assert_rows_equal(firth_log_reg_df.head(), expected_firth_log_reg_row)
assert_rows_equal(firth_log_reg_df.filter('contigName = 22 and start = 16050114').head(), expected_firth_log_reg_row)

Parameters
----------
Expand Down
388 changes: 388 additions & 0 deletions docs/source/tertiary/whole-genome-regression.rst

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion test-data/gwas/README
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
The genotypes are sampled from the Thousand Genomes Project Phase 3 release chr22 VCF
The genotypes are sampled from the Thousand Genomes Project Phase 3 release chr21 and chr22 VCFs
(ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf).

The covariates and continuous phenotypes are simulated with PhenotypeSimulator
Expand Down
Binary file modified test-data/gwas/binary-phenotypes.csv.gz
Binary file not shown.
Binary file modified test-data/gwas/continuous-phenotypes.csv.gz
Binary file not shown.
Binary file modified test-data/gwas/covariates.csv.gz
Binary file not shown.
Binary file modified test-data/gwas/genotypes.vcf.gz
Binary file not shown.