Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU inefficiency and ulimit / disk error #1236

Closed
tracychew opened this issue May 13, 2021 · 1 comment
Closed

CPU inefficiency and ulimit / disk error #1236

tracychew opened this issue May 13, 2021 · 1 comment

Comments

@tracychew
Copy link

Hi,

I am using STAR (v2.7.3a) to align >250 FASTQ pairs (human, ~80M read pairs) on a large HPC system. The code below worked (albeit inefficiently) for most of the samples but failed for 8.

For the 8 that didn't work, I get the exit status codes 102 (3 samples) and 1 (5 samples), both related to disk.

The stdout from one of the samples with exit 102 is below. It suggests that it failed during sorting of BAMs, and to check disk and ulimit -n. There is definitely enough disk and iNode available. The ulimit -n for the system I am using is 16384. This thread suggests this should be sufficient with --runThreadN 3 and default --outBamsortingBinsN 50.

Stdout for sample with exit 102

[tc6463@gadi-login-09 Logs]$ cat ./star_align_trimmed/AGRF_CAGRF20104118-1_HYNHMDSXY/nCIT098_2.oe
Mon May 10 13:10:58 AEST 2021 : Mapping with STAR 2.7.3a. Sample:nCIT098 R1:../AGRF_CAGRF20104118-1_HYNHMDSXY_trimmed/nCIT098_HYNHMDSXY_CCTCTACATG-AGGAGGTATC_L002_R1_trimmed.fastq.gz R2: ../AGRF_CAGRF20104118-1_HYNHMDSXY_trimmed/nCIT098_HYNHMDSXY_CCTCTACATG-AGGAGGTATC_L002_R2_trimmed.fastq.gz centre:AGRF platform:ILLUMINA library:1 lane:2 flowcell:HYNHMDSXY logfile:./Logs/star_align_trimmed/AGRF_CAGRF20104118-1_HYNHMDSXY/nCIT098_2.oe NCPUS:3
May 10 13:10:58 ..... started STAR run
May 10 13:10:59 ..... loading genome
May 10 13:12:29 ..... started mapping
May 10 14:14:56 ..... finished mapping
May 10 14:56:47 ..... started sorting BAM

EXITING because of fatal ERROR: could not open temporary bam file: ../AGRF_CAGRF20104118-1_HYNHMDSXY_STAR/nCIT098_2__STARtmp//BAMsort//b13
SOLUTION: check that the disk is not full, increase the max number of open files with Linux command ulimit -n before running STAR
May 10 15:16:38 ...... FATAL ERROR, exiting

Here is an example stdout from one of the samples that failed with exit code 1. STAR isn't able to read one of the temporary files it creates:

Mon May 10 13:10:58 AEST 2021 : Mapping with STAR 2.7.3a. Sample:nCIT092 R1:../AGRF_CAGRF20104118-1_HYNHMDSXY_trimmed/nCIT092_HYNHMDSXY_TGTCGCTGGT-AGTCAGACGA_L002_R1_trimmed.fastq.gz R2: ../AGRF_CAGRF20104118-1_HYNHMDSXY_trimmed/nCIT092_HYNHMDSXY_TGTCGCTGGT-AGTCAGACGA_L002_R2_trimmed.fastq.gz centre:AGRF platform:ILLUMINA library:1 lane:2 flowcell:HYNHMDSXY logfile:./Logs/star_align_trimmed/AGRF_CAGRF20104118-1_HYNHMDSXY/nCIT092_2.oe NCPUS:3
May 10 13:10:58 ..... started STAR run
May 10 13:10:59 ..... loading genome
May 10 13:12:29 ..... started mapping
May 10 14:07:16 ..... finished mapping
May 10 14:56:47 ..... started sorting BAM

EXITING because of FATAL ERROR: failed reading from temporary file: ../AGRF_CAGRF20104118-1_HYNHMDSXY_STAR/nCIT092_2__STARtmp//BAMsort/2/11
May 10 15:16:46 ...... FATAL ERROR, exiting

Here is the code in star_align_paired_unmapped.sh, inputs are read as arguments:

#!/bin/bash

# Align to the reference genome using STAR

filename=`echo $1 | cut -d ',' -f 1`
dataset=`echo $1 | cut -d ',' -f 2`
sampleid=`echo $1 | cut -d ',' -f 3`
fastq1=`echo $1 | cut -d ',' -f 4`
fastq2=`echo $1 | cut -d ',' -f 5`
ref=`echo $1 | cut -d ',' -f 6`
seqcentre=`echo $1 | cut -d ',' -f 7`
platform=`echo $1 | cut -d ',' -f 8`
library=`echo $1 | cut -d ',' -f 9`
lane=`echo $1 | cut -d ',' -f 10`
flowcell=`echo $1 | cut -d ',' -f 11`
outdir=`echo $1 | cut -d ',' -f 12`
logfile=`echo $1 | cut -d ',' -f 13`
NCPUS=`echo $1 | cut -d ',' -f 14`

echo `date` ": Mapping with STAR 2.7.3a. Sample:$sampleid R1:$fastq1 R2: $fastq2 centre:$seqcentre platform:$platform library:$library lane:$lane flowcell:$flowcell logfile:$logfile NCPUS:$NCPUS" > ${logfile} 2>&1

# Mapping
STAR \
        --runThreadN ${NCPUS} \
        --genomeDir ${ref} \
        --quantMode GeneCounts \
        --readFilesCommand zcat \
        --readFilesIn ${fastq1} ${fastq2} \
        --outSAMattrRGline ID:${flowcell}:${lane} PU:${flowcell}.${lane}.${sampleid} SM:${sample} PL:${platform} CN:${seqcentre} LB:${library} \
        --outSAMtype BAM SortedByCoordinate \
        --outReadsUnmapped Fastx \
        --outSAMunmapped Within KeepPairs \
        --outFileNamePrefix ${outdir}/${sampleid}_${lane}_ >> ${logfile} 2>&1

My last question is related to the inefficiency - for the samples that did work, I used --runThreadN 24, but the CPU efficiency was ~0.06. At --runThreadN 6 efficiency was 0.16 and at --runThreadN 3 efficiency was at 0.5. The HPC system I am using has 48 CPU, 190 GB nodes. Could you recommend how I could utilise the hardware more efficiently?

Thanks so much!

@tracychew
Copy link
Author

Explicitly setting --outBAMsortingThreadN ${NCPUS} and --outBAMsortingBinsN 100 helped both efficiency and resolved the ulimit issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant