High-throughput next-generation sequencing (NGS) systems have allowed for large scale collection of transcriptomic data with single cell resolution. Within this data lies variability allowing researchers to characterize and/or infer certain morphological aspects of interest, such as single cell type, cell state, cell growth trajectories, and inter-cellular gene regulatory networks. All of these qualities are important parts of understanding how cells interact with one another, both for building better cellular models in vitro and for understanding biological processes in vivo. While the size of single cell data has increased massively, NGS techniques for key pieces of analysis have not kept pace, using slow, manual pipelines of domain experts for initial clustering. Attempts to improve NGS classification performance have fallen short as the numbers of cell types (often asymmetric) and cell subtypes have increased while the number of samples per label has become small. The technical variability between NGS experiments can make robust classification between multiple tissue samples difficult. Moreover, the high-dimensional nature of NGS transcriptomic data makes this type of analysis statistically and computationally intractable.
To help address these challenges in NGS, investigators at UC Santa Cruz (UCSC) have developed a new approach based on deep learning and transformer-based neural network. UCSC’s Scalable, Interpretable Machine Learning for Single-Cell (SIMS) targeting traditional NGS classification methods though an end-to-end modeling pipeline for discrete morphological prediction of single-cell data, with minimal overhead and high classification accuracy. SIMS takes an expression matrix with associated labels and learns a mapping between transcriptome and cell type. This mapping can be used to automatically infer cell types in new single-cell data. SIMS can handle both small and large transcriptomic datasets effectively while having direct interpretability at individual sample level.
Patent Pending
transcriptomics, cell-type, cell type, classification, cell atlas, sequencing, RNA-seq, single-cell, transcriptome, expression matrix, clusters, clustering, classifier, class distribution, next-generation sequencing, NGS