Skip to content

yonda310/Spatial_epigenome-transcriptome_co-sequencing

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Spatial_epigenome-transcriptome_co-sequencing

Introduction

This repository aims to share the raw data processing and visualization codes used in Spatial_epigenome-transcriptome_co-sequencing project.

Data processing

Next Generation Sequencing (NGS) was performed using the Illumina NovaSeq 6000 sequencer (pair-end 150 bp mode).

Spatial_epigenome-transcriptome_co-sequencing chemistry (Zhang & Deng et al. 2023)

1.Raw Fastq data
  • Read 1: contains the genome sequences
  • Read 2: contains the spatial Barcode A, Barcode B and genome sequences
2. Reformat raw Fastq file to Cell Ranger ATAC format (10x Genomics)

Raw read 2 -> New Read 1 + New Read 2

  • New Read 1: contains the genome sequences
  • New Read 2: contains the spatial Barcode A and Barcode B

Raw read 1 -> New Read 3

Reformatting raw data was implemented by BC_process.py in the Data_preprocessing folder.

3. Sequence alignment and generation of fragments file

The reformated data was processed using Cell Ranger ATAC v1.2 with following references: Mouse reference (mm10):

curl -O https://cf.10xgenomics.com/supp/cell-atac/refdata-cellranger-atac-mm10-1.2.0.tar.gz

Human reference (GRCh38):

curl -O https://cf.10xgenomics.com/supp/cell-atac/refdata-cellranger-atac-GRCh38-1.2.0.tar.gz

A preprocessing pipeline we developed using Snakemake workflow management system is in the Data_preprocessing folder. To run the pipeline, use the command:

sbatch Snakemake.sh

Spatial_RNA-seq

1. Raw Fastq data processing using ST pipeline and generate expression matrix

The DBiT-seq Raw fastq file Read 1: Contains the cDNA sequence Read 2: Contains the spatial Barcode A, Barcode B and UMIs

2.Reformat Fastq Read 2 file To reformat the Raw data, run the fastq_process.py in Rawdata_processing folder and gzip the resulted fastq file to save space:

python fastq_process.py
gzip sample_R2_processed.fastq

The reformated data was processed following ST pipeline.

3、Run ST pipeline Run st_pipeline.sh to start the ST pipeline: The input is processed_R2.fastq.gz and Raw R1.fastq.gz. It also requires a "spatial_barcodes_index.txt" to decode the spatial location information. Genome references and annotatation files were aslo needed.

#!/bin/bash
# FASTQ reads
FW=$tmp/${sample}_R2_processed.fastq
RV=$tmp/${sample}_raw_qc_R1.fastq.gz
# References for mapping, annotation and nonRNA-filtering
MAP=/Dropseq_Alignment_References/mm10/
ANN=/Dropseq_Alignment_References/mm10/mm10.gtf 
CONT=/Spatial_omics_references/mouse/GRCm38_86/ncRNA/StarIndex/

# Barcodes settings
ID=/useful/spatial_barcodes.txt

# Output folder and experiment name
OUTPUT=../output/
mkdir -p $OUTPUT

TMP_ST=$OUTPUT/tmp
mkdir -p $TMP_ST

# Running the pipeline
st_pipeline_run.py \
  --output-folder $OUTPUT \
  --ids $ID \
  --ref-map $MAP \
  --ref-annotation $ANN \
  --expName $sample \
  --htseq-no-ambiguous \
  --verbose \
  --log-file $OUTPUT/${sample}_log.txt \
  --demultiplexing-kmer 5 \
  --threads 20 \
  --temp-folder $TMP_ST \
  --no-clean-up \
  --umi-start-position 16 \
  --umi-end-position 26 \
  --demultiplexing-overhang 0 \
  --min-length-qual-trimming 20 \
  $FW $RV

4.Convert Ensemble to Gene Names Then, Run converttoname.sh to annotate the resulting sample_stdata.tsv.

#!/bin/bash
tsv_E=$OUTPUT/${sample}_stdata.tsv
path_to_annotation_file=/Dropseq_Alignment_References/mm10/mm10.gtf

convertEnsemblToNames.py --annotation $path_to_annotation_file --output $OUTPUT/${sample}_stdata_names.tsv $tsv_E

Identify useful pixels (pixel on tissue) from microscope image using Matlab

link: https://github.com/edicliuyang/DBiT-seq_FFPE/tree/master/Figure_Processing

Data visualization

The data visualization were completed with R language. The package used extensively the functions in Signac V1.6, Seurat V4.0 and ArchR v1.0.2.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 72.0%
  • Python 22.6%
  • Shell 5.4%