Gene Isoform Analysis Tools on Single Cell Data (Scasa Summary)

Shahzaib Ali
Dec 4, 2023
2 min read

Detecting genes at the single-cell level has provided numerous intriguing insights. However, exploring RNA expression at the isoform level could potentially unveil rare cells or a subset within a cluster, offering a distinct perspective on the biology of these cells, influenced by factors such as mutations, cancer, or aging. This type of analysis has the potential to uncover previously overlooked biomarkers and enhance drug discovery efforts.

There is a tool called Miso which detects the number of reads a particular isoform has but unfortunately, it could only give insight as to how effective a particular sequencing protocol is, but it does not provide any useful hints in research when it comes to single-cell data. Hence through research, I found this great tool called Scasa. Here is a link for that: https://academic.oup.com/bioinformatics/article/38/5/1287/6448218

Please beware that this tool at the moment is only useful for human data.

"ScASA enables the utilization of single-cell RNA sequencing (scRNA-Seq) data generated from high-throughput protocols, such as the Chromium Single Cell 3ʹ 10× Genomics protocol. The platform encompasses read mapping to a reference transcriptome and the tallying of supporting reads for equivalence classes (eqClasses) from each cell. Notably, read mapping and counting for eqClasses leverage existing external tools, such as Alevin."

Read Simulation:

Each isoform was simulated with the number of reads set to twice the transcript length but with a minimum of 1000 reads.

Data Processing Steps:

Mapped each read to the transcriptome reference (hg38).
Identified equivalence classes (eqClasses) associated with each isoform.
Grouped isoforms into Transcript Clusters (TCs) based on overlapping eqClasses.
Summarized reads from eqClasses and isoforms within a TC in a matrix (Supplementary Table S1a).
Normalized the columns of the matrix to 1, resulting in the initial X matrix associated with a TC.

Paralog Identification and Merging:

Identified paralogs using the approach of Deng et al. (2020).
The number of non-zero singular values of X determined the estimable paralogs; a low non-zero threshold (1/30) was set.
Paralogs were constructed using k-means clustering.
For instance, we illustrated this process using the TC associated with the RPL13A gene, which contained five isoforms.
The original X matrix underwent singular value decomposition, indicating the need to reduce parameters to three estimable paralogs.
k-means clustering produced one paralog of size three and preserved two original isoforms.
The merged X matrix was generated.

Gene Isoform Analysis Tools on Single Cell Data (Scasa Summary)

Recent Posts

Comments