Alpha SC, the most efficient GPU-accelerated single-cell data analysis pipeline from BioTuring Alpha, is an innovative initiative by BioTuring designed to address the challenges of analyzing large-scale biological data.
Alpha SC is expected to boost the efficiency of single-cell data analysis, laying the foundation for a revolutionary shift for scientists to analyze large single-cell datasets in real time.
Reading sparse matrices is an essential step in single-cell analysis workflows. However, existing implementations are often inefficient. We offer a highly optimized approach that significantly accelerates the process. Our solution enables reading a sparse matrix up to 150 times faster compared to scipy in Python and Matrix in R.
Geometric sketching is a useful technique to reduce the workload of your analyses by constructing a representative subset of your dataset. With Alpha SC’s implementation, this task can now finish in under half a second even for a dataset of 1.7M cells.
PCA is a widely used dimensionality reduction technique in single-cell analysis. Without special optimizations, it is very memory-intensive to run on thousands of genes and up to millions of cells. Alpha SC provides a highly optimized GPU-accelerated implementation of PCA, yielding significant performance gains, while consuming little GPU memory. With this advancement, researchers can perform PCA up to 100 times faster.
Harmony, a batch removal algorithm for scRNA-seq data, helps ensure that cells are clustered by biological similarity rather than technical variations. In addition to GPU acceleration, Alpha SC incorporates several algorithmic improvements that eliminate computationally intensive matrix operations. As a result, Alpha SC achieves a remarkable up to 400x speed improvement compared to both the original harmony and harmonypy implementation.
Finding approximated nearest neighboring cells is a prerequisite for many subsequent steps in the pipeline. Alpha SC provides a highly optimized GPU implementation of NN-descent to unlock unprecedented performance. Our pipeline finishes this step 300 times faster than scanpy.
Louvain clustering is a common choice for identifying distinct cell populations within single-cell datasets. However, it can be computationally intensive, and time-consuming for large-scale analyses. Utilizing GPU acceleration, Alpha SC achieves an impressive 1000x speed-up for some dataset while maintaining similar clustering quality.
t-SNE (t-distributed Stochastic Neighbor Embedding), and UMAP (Uniform Manifold Approximation and Projection) are the two most popular visualization algorithms for single-cell data. Both algorithms benefit from the impressive improvements in our k-NN routine. And with our GPU accelerated-implementation, Alpha SC produces 2D t-SNE embeddings 700 times faster, and UMAP embeddings 100 times faster than Scanpy.
AUCell helps identify enriched gene sets in each cell. Alpha SC implementation gains up to 1000x, and 500x speedup compared to the AUCell package for R, and the pySCENIC package for Python, respectively.
Venice is a fast non-parametric test designed to find differentially expressed genes between heterogeneous populations. Now with GPU acceleration, Venice offers an even more impressive performance, while maintaining the same accuracy.