Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reference protein seq in fs-ter-substitution #40

Merged
merged 2 commits into from
Jun 16, 2021
Merged

Fix reference protein seq in fs-ter-substitution #40

merged 2 commits into from
Jun 16, 2021

Conversation

itoooo
Copy link
Contributor

@itoooo itoooo commented Jun 14, 2021

This PR fixes the assert error that is occured if returning :ref contains stop symbol, '*', at the end.

VCV000965170.1 is a one of example case to cause this.

You can reproduce this problem like below steps:

(require ['varity.hgvs-to-vcf :as 'h2v]
         ['varity.vcf-to-hgvs :as 'v2h]
         ['clj-hgvs.core :as 'clj-hgvs])

(h2v/hgvs->vcf-variants (clj-hgvs/parse "NM_199003.1:c.72-53_*28del") "/path/to/hg38.fa" "/path/to/hg38/20200818015652/refGene.txt.gz")
;; =>
;; ({:chr "chr8",
;;   :pos 42838217,
;;   :ref "GAGATTAACAGGGGTCTGAAGAGGCGGCATTAGTAATCCAATAGCAGCATCAACCTGGGAAACAGGAGGCGGTAAAGGAGGTGGGGGAAGCTGTTCCTGTGGCTCCAGAAGATCTTCTTTCTAAAACAAAAATACAAAGTATGTTTGAATTTAGTAACTAAAAACAGTTTAAA",
;;   :alt "G"})

(v2h/vcf-variant->hgvs (first *1) "/path/to/hg38.fa" "/path/to/hg38/20200818015652/refGene.txt.gz" {:verbose? true})
;;  variant: {:chr "chr8", :pos 42838217, :ref "GAGATTAACAGGGGTCT...", :alt "G"}
;; ref-gene: {:name "NM_199003", :name2 "THAP1", :strand :reverse}
;; 
;;              42838201  42838211  42838221  42838231  42838241  42838251  42838261  42838271  42838281  42838291  42838301  42838311  42838321  42838331  42838341  42838351  42838361  42838371  42838381  42838391  42838401
;;     ref: TTGTGGTCACAGAAAACTGAGAGATTAACAGGGGTCTGAAGAGGCGGCATTAGTAATCCAATAGCAGCATCAACCTGGGAAACAGGAGGCGGTAAAGGAGGTGGGGGAAGCTGTTCCTGTGGCTCCAGAAGATCTTCTTTCTAAAACAAAAATACAAAGTATGTTTGAATTTAGTAACTAAAAACAGTTTAAAAGAATCTGTGGACTGACCAG
;;     alt: TTGTGGTCACAGAAAACTGAG                                                                                                                                                                            AGAATCTGTGGACTGACCAG
;;                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
;; 
;;                   72-65     72-55     72-45     72-35     72-25     72-15     72-5      77        87        97        107       117       127       137       147       157       *5        *15       *25       *35       *45
;;   c.ref: cctggtcagtccacagattcttttaaactgtttttagttactaaattcaaacatactttgtatttttgttttagAAAGAAGATCTTCTGGAGCCACAGGAACAGCTTCCCCCACCTCCTTTACCGCCTCCTGTTTCCCAGGTTGATGCTGCTATTGGATTACTAATGCCGCCTCTTCAGACCCCTGTTAATCTCTCAGTTTTCTGTGACCACAA
;;   c.alt: cctggtcagtccacagattct                                                                                                                                                                            CTCAGTTTTCTGTGACCACAA
;;                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
;; 
;;                 11        21        31        41
;;   p.ref: SCSAYGCKNRYDKDKPVSFHKKKIFWSHRNSFPHLLYRLLFPRLMLLLDY*
;;   p.alt: SCSAYGCKNRYDKDKPVSFHK*GPCAPRGAVPPLQTPVNLS
;;                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
;; Execution error (AssertionError) at clj-hgvs.mutation/protein-substitution (mutation.cljc:1584).
;; Assert failed: (intl/valid? :clj-hgvs.mutation/protein-substitution %)

So I fixed this issue by trimming :ref, that is returned caller, with returning :alt's length. In the above case, "KK" for :ref, "K*" for :alt.

Now, vcf-variant-hgvs should returns following HGVS instances:

({:coding-dna #clj-hgvs/hgvs"NM_199003:c.72-53_*28delTTTAAACTGTTTTTAGTTACTAAATTCAAACATACTTTGTATTTTTGTTTTAGAAAGAAGATCTTCTGGAGCCACAGGAACAGCTTCCCCCACCTCCTTTACCGCCTCCTGTTTCCCAGGTTGATGCTGCTATTGGATTACTAATGCCGCCTCTTCAGACCCCTGTTAATCT",
  :protein #clj-hgvs/hgvs"p.K25*"}
 {:coding-dna #clj-hgvs/hgvs"NM_018105:c.268-53_386delTTTAAACTGTTTTTAGTTACTAAATTCAAACATACTTTGTATTTTTGTTTTAGAAAGAAGATCTTCTGGAGCCACAGGAACAGCTTCCCCCACCTCCTTTACCGCCTCCTGTTTCCCAGGTTGATGCTGCTATTGGATTACTAATGCCGCCTCTTCAGACCCCTGTTAATCT",
  :protein #clj-hgvs/hgvs"p.K90Lfs*5"})

I confirmed lein test :all passed.

@codecov
Copy link

codecov bot commented Jun 14, 2021

Codecov Report

Merging #40 (02c31cc) into master (7f5eb02) will decrease coverage by 0.11%.
The diff coverage is 0.00%.

❗ Current head 02c31cc differs from pull request most recent head 9aaad15. Consider uploading reports for the commit 9aaad15 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master      #40      +/-   ##
==========================================
- Coverage   39.06%   38.94%   -0.12%     
==========================================
  Files          15       15              
  Lines        1687     1692       +5     
  Branches       33       33              
==========================================
  Hits          659      659              
- Misses        995     1000       +5     
  Partials       33       33              
Impacted Files Coverage Δ
src/varity/vcf_to_hgvs/protein.clj 27.94% <0.00%> (-0.42%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f5eb02...9aaad15. Read the comment docs.

@totakke totakke self-assigned this Jun 15, 2021
@totakke totakke self-requested a review June 15, 2021 06:12
Copy link
Member

@totakke totakke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Your fix seems to work fine. I have left the only comment at the nonsense code.

palt-len (count palt)
palt-ter-len (inc palt-len)]
(if (<= pref-len palt-ter-len)
(str pref (subs ref-prot-rest 0 (max 0 (inc (- palt-len pref-len)))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(max 0 ...) is nonsense because (inc (- palt-len pref-len)) is at least 0 under (<= pref-len palt-ter-len).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, exactly, I'm kind of too focused on to keep the original code. I'll fix it!

@itoooo
Copy link
Contributor Author

itoooo commented Jun 16, 2021

Thank you for the review! I fixed the code you pointed to. Could you review that again?

Copy link
Member

@totakke totakke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@totakke totakke merged commit f85c31e into chrovis:master Jun 16, 2021
@itoooo
Copy link
Contributor Author

itoooo commented Jun 16, 2021

Thanks for the merge! 🙏

@itoooo itoooo deleted the fix/long-fs-ter-pref branch June 16, 2021 04:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants