SC23 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Workshops Archive

Transcriptomics Atlas Pipeline: Cloud vs HPC


Workshop: The 18th Workshop on Workflows in Support of Large-Scale Science (WORKS23) - Part 1 of 2

Authors: Piotr Kica (Sano Centre for Computational Medicine, Krakow, Poland; AGH University of Science and Technology, Krakow, Poland); Sabina LichoĊ‚ai (Sano Centre for Computational Medicine, Krakow, Poland); and Maciej Malawski (Sano Centre for Computational Medicine, Krakow, Poland; AGH University of Science and Technology, Krakow, Poland)


Abstract: Transcriptomics studies the RNA present in a specific cell or tissue at a given time or condition. This dependence on time makes the problem computationally challenging, as the data generated by transcriptomics experiments is larger than the genomics studies on DNA sequences. The goal of the Transcriptomics Atlas project is to create a database of analyzed RNA sequences corresponding to given tissue and organ types based on the data from public repositories and make it available for researchers. We describe our transcriptomics atlas pipeline as an example of a new data- and compute-intensive scientific workflow. After analyzing the requirements of the tasks in the pipeline, we describe our proposed cloud architecture. We present the preliminary results of the experimental evaluation of the pipeline in the AWS cloud, and compare the performance results to the traditional execution on the HPC cluster.





Back to The 18th Workshop on Workflows in Support of Large-Scale Science (WORKS23) - Part 1 of 2 Archive Listing



Back to Full Workshop Archive Listing