Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] anvi-run-hmms does not accurately detect 5S rRNA #2344

Open
philipwoods opened this issue Sep 20, 2024 · 3 comments
Open

[BUG] anvi-run-hmms does not accurately detect 5S rRNA #2344

philipwoods opened this issue Sep 20, 2024 · 3 comments
Assignees

Comments

@philipwoods
Copy link

philipwoods commented Sep 20, 2024

Short description of the problem

Even in single-contig circularized genomes, anvi-run-hmms does not reliably detect the 5S rRNA gene.

anvi'o version

Anvi'o .......................................: marie (v8)
Python .......................................: 3.10.13

Profile database .............................: 38
Contigs database .............................: 21
Pan database .................................: 16
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 4
tRNA-seq database ............................: 2

System info

Our operating system is Red Hat enterprise Linux. Anvio was installed in a conda environment.

Detailed description of the issue

Looking at the ribosomal RNA HMM readme file, it appears that the ribosomal RNA HMMs all use a GA cutoff of 750 bitscore. However, the 5S rRNA gene is short, and when I run the 5S HMM independently on circularized genomes I see that alignment along the full length of the HMM (about 115nt) only reaches an e-value of about 1e-14 or 1e-15 and a bitscore of 52-55. This suggests that the current bitscore cutoff makes it impossible to detect the 5S rRNA gene with anvi-run-hmms. After more testing on a few other genomes to refine the cutoff, I would suggest changing the GA bitscore cutoff for the 5S rRNA HMM to 45.

Files / commands to reproduce the issue

To test Anvio HMMs on a circularized genome found here:

anvi-get-sequences-for-hmm-hits -c Methanosarcina_acetivorans_GCF_000007345.1.db --hmm-sources Ribosomal_RNA_5S -o 5S-test.fasta

To run the HMM independently of Anvio:

nhmmer 5S-rRNA.hmm Methanosarcina_acetivorans_GCF_000007345.1.fna

To test the HMM independently with the GA threshold:

nhmmer --cut_ga 5S-rRNA.hmm Methanosarcina_acetivorans_GCF_000007345.1.fna

@xvazquezc
Copy link
Contributor

If I'm not wrong, the rRNA HMM models come from Barrnap. Tbh, I've had similar issues recently with other pipelines that rely on Barrnap for rRNA detection, also with a collection of archaeal circularised genomes. Mainly missing the 5S but occasionally also some copies of 16S. For single genome annotations I ended using Infernal with the covariance models from Rfam.

@philipwoods
Copy link
Author

The HMMs do come from Barrnap, but the Anvio readme file about how they imported them says they arbitrarily added a GA cutoff of 750 bitscore that wasn't present in the original HMMs.

@meren
Copy link
Member

meren commented Sep 21, 2024

Thanks for bringing this up, @philipwoods. Do you happen to have any insights whether changing that GA cutoff changes the rate of the recognition of 5S rRNA?

We can also remove the GA cutoff and put a significance threshold there, as well. I'm happy to take a look and see what works well, but it would be much better if it was done by someone who is genuinely interested in identifying 5S rRNAs more effectively, and we always appreciate any help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants