Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add several functions to work with refGene exon sequences #13

Merged
merged 10 commits into from
Aug 29, 2018

Conversation

alumi
Copy link
Member

@alumi alumi commented Aug 23, 2018

Summary

This PR adds several utility functions to varity.ref-gene module.

  • refGene record => region
  • refGene record => sequence of exons
  • refGene record => base sequence

Tests

  • lein check 🆗
  • lein test :all 🆗

Notes

  • Upgraded dependencies
  • Replaced varity.util/revcomp-bases with more efficient implementation in cljam 0.6.0

@codecov
Copy link

codecov bot commented Aug 23, 2018

Codecov Report

Merging #13 into master will increase coverage by 1.53%.
The diff coverage is 56.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #13      +/-   ##
==========================================
+ Coverage   34.89%   36.43%   +1.53%     
==========================================
  Files          12       12              
  Lines        1192     1227      +35     
  Branches       21       25       +4     
==========================================
+ Hits          416      447      +31     
  Misses        755      755              
- Partials       21       25       +4
Impacted Files Coverage Δ
src/varity/util.clj 85.71% <ø> (-7.15%) ⬇️
src/varity/hgvs_to_vcf/cdna.clj 7.81% <0%> (ø) ⬆️
src/varity/vcf_to_hgvs/protein.clj 23.61% <0%> (ø) ⬆️
src/varity/hgvs_to_vcf/protein.clj 10.14% <0%> (ø) ⬆️
src/varity/vcf_to_hgvs/cdna.clj 9.27% <0%> (ø) ⬆️
src/varity/ref_gene.clj 66.35% <90.47%> (+5.89%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80c62a4...ab0ae45. Read the comment docs.

Copy link
Member

@totakke totakke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I added a few comments.

(->> (reverse s)
(map (partial get comp-base-map))
(apply str)))
(util-seq/revcomp s))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you leave varity.util/revcomp-bases? I feel you should replace all revcomp-bases in varity codebase with cljam.util.sequence/revcomp.

:end e,
:reverse? reverse?})))))

(defn cds-exon-seq
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cds-seq is correct name. cds-exon- is wordy because CDS does not include introns.

@alumi
Copy link
Member Author

alumi commented Aug 24, 2018

@totakke Done, thanks for your review! 😸

Copy link
Member

@totakke totakke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing. It seems good.

Copy link
Member

@federkasten federkasten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply. This features looks very useful.

I added a comment about function naming.

(cond-> (cseq/read-sequence seq-rdr exon)
reverse? util-seq/revcomp))

(defn ^String read-gene-sequence
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gene word contains more widely meanings. Thus I think you should use transcript here.

Additionally, could you add docstrings to clarify the function, such as the sequence contains 5'-UTR, CDS, and 3'-UTR.

@alumi
Copy link
Member Author

alumi commented Aug 28, 2018

@federkasten I have replaced 'gene' with 'transcript' or 'ref-gene-record'.

Also I've added a commit to fix :strand values (please refer to chrovis/cljam#143 (comment))
I'll open another PR for fixing existing functions like chrovis/cljam#144

Thank you.

Copy link
Member

@federkasten federkasten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the quick response. LGTM 👍

@totakke totakke merged commit 02630b8 into master Aug 29, 2018
@totakke totakke deleted the feature/rg-exon-seqs branch August 29, 2018 03:27
@totakke
Copy link
Member

totakke commented Aug 29, 2018

Thank you! The additional commits also seem good, so I've just merged.

@alumi
Copy link
Member Author

alumi commented Aug 29, 2018

@totakke @federkasten Thank you for your helpful comments! 😹

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants