SC23 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Birds of a Feather

DAOS Storage Community BoF


Authors: Michael Hennecke (Intel Corporation), Kevin Harms (Argonne National Laboratory (ANL)), Dean Hildebrand (Google Cloud), László Szűcs (Leibniz Supercomputing Centre)

Abstract: DAOS (https://docs.daos.io/) is an open-source scale-out object store that delivers extremely high performance to the most data-intensive HPC/AI workloads. With growing adoption, DAOS has seen significant community contributions like domain-specific container types, additional hardware support beyond x86_64 (e.g. ARM64), and enabling DAOS in the cloud.

This BoF brings together the DAOS community to discuss, share experiences, and brainstorm on future enhancements of DAOS. Topics include practical experiences with on-prem and cloud deployments, application use cases, and the software roadmap. This session targets end users, middleware developers, system administrators, DAOS core software developers, and vendors of DAOS-based hardware/software/cloud offerings.


Long Description: The primary goal of this BoF is to gather DAOS community members to share their experiences with DAOS and brainstorm on future developments. Socializing with other community members is an important secondary goal.

The session leaders will present short lightning talks, which will be used to spark the discussion among the participants:

Intel: The community roadmap will be presented, including support for new HPC fabrics, running DAOS on servers without persistent memory, and new functionality in the upcoming releases. We will also share some exciting news about broadening the governance of the DAOS open source project.

ANL: ANL will provide an update on running DAOS in a semi-production environment of Sunspot and/or Aurora as well as current and future plans on DAOS deployment and operation.

Google Cloud: A discussion of DAOS usage in the Cloud is planned, building on the lessons learned from customers on GCP as well as from the collaboration between Intel and Google to automate DAOS deployments on GCP.

LRZ: First experiences with DAOS in SuperMUC-NG will be shared, with a focus on operational network and storage performance validation as well as pilot user feedback from running applications on DAOS.

These topics are particularly relevant in 2023: Several large DAOS production deployments have been completed including Aurora and LRZ's SNG-Phase2, and the technology has demonstrated its performance capabilities with both synthetic workloads like IO500 and full HPC/AI applications. DAOS is also being evaluated for the next round of Exascale and post-Exascale system procurements in major HPC institutions across the USA, Europe and Asia.

Expected outcome: Users (and prospective users) of DAOS will get an update on the features, performance, and usability aspects of DAOS from practitioners in the field, and will have the opportunity to share their own work and experiences. The feedback from these conversations will also help the development teams in prioritizing future development efforts.

Background: The DAOS project started in 2012 through the DoE Fast Forward Storage and I/O program. It aims at supporting nextgen HPC workflows combining simulation, big data and AI in a single storage tier. DAOS presents a rich and scalable storage interface for both structured and unstructured data. DAOS v1.0 was released in 2020; v2.4 is released in July/2023. DAOS supports multiple application interfaces including a parallel filesystem, MPI-IO and HDF5 backends, native key/value APIs with C and Python bindings, Hadoop/Spark connector, TensorFlow-IO, and several domain-specific data models.

DAOS is completely open-source (see https://docs.daos.io/ and https://github.com/daos-stack/daos/). It is organized around a vibrant and active community composed of partners (e.g. HPE, Google, Croit, Lenovo) and end-users (e.g. ANL, LRZ, ZIB, CERN, ECMWF, DKRZ) collaborating on github, slack, and the public mailing list. In-person DAOS BOF sessions were held at ISC’22 and SC'22 with around 45 and 60 attendees. The community also meets once a year for the DAOS User Group (https://dug.daos.io/).




Back to Birds of a Feather Archive Listing