Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illumina Complete Long Read presets #1069

Merged
merged 2 commits into from
Jun 4, 2023
Merged

Conversation

koadman
Copy link
Contributor

@koadman koadman commented Jun 1, 2023

Implements general purpose transition-aware alignment scoring, and configuration presets for alignment of ICLR to reference sequence contigs

@lh3
Copy link
Owner

lh3 commented Jun 1, 2023

Thanks! However, there is a pitfall in ksw2 that may not work with the current PR. See

minimap2/ksw2_extd2_sse.c

Lines 165 to 184 in e28a55b

if (!(flag & KSW_EZ_GENERIC_SC)) {
for (t = st0; t <= en0; t += 16) {
__m128i sq, st, tmp, mask;
sq = _mm_loadu_si128((__m128i*)&sf[t]);
st = _mm_loadu_si128((__m128i*)&qrr[t]);
mask = _mm_or_si128(_mm_cmpeq_epi8(sq, m1_), _mm_cmpeq_epi8(st, m1_));
tmp = _mm_cmpeq_epi8(sq, st);
#ifdef __SSE4_1__
tmp = _mm_blendv_epi8(sc_mis_, sc_mch_, tmp);
tmp = _mm_blendv_epi8(tmp, sc_N_, mask);
#else
tmp = _mm_or_si128(_mm_andnot_si128(tmp, sc_mis_), _mm_and_si128(tmp, sc_mch_));
tmp = _mm_or_si128(_mm_andnot_si128(mask, tmp), _mm_and_si128(mask, sc_N_));
#endif
_mm_storeu_si128((__m128i*)((int8_t*)s + t), tmp);
}
} else {
for (t = st0; t <= en0; ++t)
((uint8_t*)s)[t] = mat[sf[t] * m + qrr[t]];
}

Unless flag KSW_EZ_GENERIC_SC is applied, ksw2 will not work with a generic scoring matrix. Ksw2 does not automatically test if scoring matrix uses the same mismatching penalty because matching to ambiguous bases "N" is often scored differently.

You need to apply KSW_EZ_GENERIC_SC in mm_align_pair to use transition scores:

minimap2/align.c

Lines 316 to 344 in e28a55b

static void mm_align_pair(void *km, const mm_mapopt_t *opt, int qlen, const uint8_t *qseq, int tlen, const uint8_t *tseq, const uint8_t *junc, const int8_t *mat, int w, int end_bonus, int zdrop, int flag, ksw_extz_t *ez)
{
if (mm_dbg_flag & MM_DBG_PRINT_ALN_SEQ) {
int i;
fprintf(stderr, "===> q=(%d,%d), e=(%d,%d), bw=%d, flag=%d, zdrop=%d <===\n", opt->q, opt->q2, opt->e, opt->e2, w, flag, opt->zdrop);
for (i = 0; i < tlen; ++i) fputc("ACGTN"[tseq[i]], stderr);
fputc('\n', stderr);
for (i = 0; i < qlen; ++i) fputc("ACGTN"[qseq[i]], stderr);
fputc('\n', stderr);
}
if (opt->max_sw_mat > 0 && (int64_t)tlen * qlen > opt->max_sw_mat) {
ksw_reset_extz(ez);
ez->zdropped = 1;
} else if (opt->flag & MM_F_SPLICE) {
int flag_tmp = flag;
if (!(opt->flag & MM_F_SPLICE_OLD)) flag_tmp |= KSW_EZ_SPLICE_CMPLX;
ksw_exts2_sse(km, qlen, qseq, tlen, tseq, 5, mat, opt->q, opt->e, opt->q2, opt->noncan, zdrop, opt->junc_bonus, flag_tmp, junc, ez);
} else if (opt->q == opt->q2 && opt->e == opt->e2)
ksw_extz2_sse(km, qlen, qseq, tlen, tseq, 5, mat, opt->q, opt->e, w, zdrop, end_bonus, flag, ez);
else
ksw_extd2_sse(km, qlen, qseq, tlen, tseq, 5, mat, opt->q, opt->e, opt->q2, opt->e2, w, zdrop, end_bonus, flag, ez);
if (mm_dbg_flag & MM_DBG_PRINT_ALN_SEQ) {
int i;
fprintf(stderr, "score=%d, cigar=", ez->score);
for (i = 0; i < ez->n_cigar; ++i)
fprintf(stderr, "%d%c", ez->cigar[i]>>4, MM_CIGAR_STR[ez->cigar[i]&0xf]);
fprintf(stderr, "\n");
}
}

Would be good to test if transition score really works. I could have missed something.

@koadman
Copy link
Contributor Author

koadman commented Jun 3, 2023

Thanks Heng, indeed the transition scoring does not get enabled without the flag you suggested. I've cleaned up & fixed the matrix construction as well, hopefully it is more readable now. The AS:i tags now contain the expected alignment scores on my test datasets. I don't see any obvious way to include automated tests or test data in the source code repo, but I'm happy to add some test cases or send test data separately if you have a preferred approach?

@lh3 lh3 merged commit ace990c into lh3:master Jun 4, 2023
@lh3 lh3 added the enhancement label Jun 4, 2023
@lh3
Copy link
Owner

lh3 commented Jun 4, 2023

Thanks!

kojix2 added a commit to kojix2/minimap2 that referenced this pull request Mar 14, 2024
Add transition. Related to lh3#1069
@kojix2 kojix2 mentioned this pull request Mar 14, 2024
lh3 pushed a commit that referenced this pull request Mar 14, 2024
Add transition. Related to #1069
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants