Single Cell Linkage Using Distance Elimination (SLIDE) to analyze large-scale data sets without reducing their dimensionality.
About
Abstract: Stanford researchers have developed a statistical algorithm termed Single Cell Linkage Using Distance Elimination (SLIDE) to analyze large-scale data sets without reducing their dimensionality, including those generated by single-cell mass cytometry. Single-cell mass cytometry permits deep proteomic profiling of cells based on the simultaneous detection of multiple parameters (upto 100) including surface and intracellular proteins that generate large volumes of high-dimensional data. Methods to analyze such large volumes of data are currently not satisfactory as they rely on reducing data complexity in order to be feasible in terms of computational and time demands. However, such reduction of complexity defeats the purpose of collecting high-dimensional data in the first instance. To solve this problem the inventors have developed the SLIDE method. It is based on the principles of nearest-neighbor analysis and allows analysis of high-dimensional data sets without having to average sample points or reduce the dimensionality of parameters that are measured per sample. SLIDE offers a rigorous statistical method to compare individual samples in multi-dimensional parameter spaces and allows identification of subpopulations within heterogeneous single cell populations. The uniqueness of this algorithm [different from the other available analysis tools for mass cytometry data] is the ability to quantify degree of change in protein expression in single cells following stimulation; stimulations could be in the form of exogenous cytokines, virus infection, or induced internally during cancer/disease progression (of any progressive disease). Such weighted quantifications of several marker proteins can lead to accurate predictions in disease progression and diagnosis. The SLIDE algorithm can be applied to accurately characterize and relate individual members of any large-scale dataset that is high-dimensional. Applications: Research tool- interpret multi-parameter single-cell analysis data for use with: mass cytometry multi-color flow cytometry microfluidics-based RT-PCR single cell RNA sequencing As an alternate analytical algorithm to interpret high-dimensional data including: Clinical trial data that contains multiple parameters measured from large numbers of samples Discovery of more accurate profiles for user-defined states (for example, a cancerous cell, stem cell, aging cell, well-defined baselines for healthy cell, etc) that contain many more defining parameters than possible using averaging or dimension-reducing approaches Advantages: Does not reduce dimensionality of parameters that are measured per sample Does not require averaging sample points Algorithm scales well for big data sets Method is data-adaptive Statistical consistency on algorithm’s operational analytics Can be used with a variety of analysis methods that generate large-scale data sets Can be used to gain insight from existing large-scale data without reducing their complexity Uses commonly available computing resources