AD/AE Appendices Process & Badges

The goal of this committee is to encourage and promote reproducible research within the SC community. To that end, we aim to assist SC authors in providing us with the necessary documentation that describes your artifact and help us evaluate it so we can assign a badge to the artifact.

Reproducibility Co-Chair
Rafael Tolosana-Calasanz, University of Zaragoza, Spain

Reproducibility Co-Chair
Jay Lofstead, Sandia National Laboratories

Getting Started

Why Should You Participate?

You will be making it easy for other researchers to compare with your work, to adopt and extend your research. This instantly means more recognition directly visible through badges for your work and higher impact. As described in this SC20 survey, thirty-five percent (35%) of the respondents have used the appendices information from papers in the SC conference proceedings in their research. There will be a general announcement for all accepted papers reproduced and badged.

What Is An Artifact/Research Object?

We use the terms artifact and research object as synonyms. A paper consists of several computational artifacts that extend beyond the submitted article itself: software, datasets, environment configuration, mechanized proofs, benchmarks, test suites with scripts, etc. A complete artifact package must contain (1) the computational artifacts, and (2) instructions/documentation describing the contents and how to use it. Further details and guidance about artifacts and research objects are provided all through this page.

AD/AE Process

The review process will take place in two stages:

Phase 1

The Artifact Description will be checked for completion and accessibility, and in Phase 2, Artifact Evaluation will happen for accepted papers that applied for badges. In conjunction with the paper submission, Artifact Descriptions are mandatory. The artifact description provides information about a paper’s artifacts. All SC papers must provide (1) an artifact description, or (2) provide a reason why an artifact description can not be provided (see below for the Artifact Description criteria).

APR 20, 2023: AD (mandatory) (Two weeks after the paper submission deadline)

Phase 2

Based on the reviews of the AD, an updated version of the AD needs to be submitted. At this phase, it is optional to submit the Artifact Evaluation, and all the computational artifacts required for the reproducibility of experiments (e.g. any code / data artifacts used). When the AE is submitted, authors need to apply for any of the reproducibility badges available. Then, the AD/AE committee starts to evaluate the artifact. This step relies on cooperation between paper authors and the committee. We will the SC Conference Submission System for single-blind messaging between the two parties. Via this communication, the committee may ask for access to special hardware, ask for missing components, provide failure/error messages, and generally keep the author’s posted on their evaluation status. Phase 2 finishes with the artifact freeze, authors need to assign a DOI (What is a DOI? Check this out) to their artifacts to guarantee no further modifications are possible to their artifacts.

JUN 20, 2023: Revised AD submission (mandatory) and AE (optional) submission deadline

JUN 23, 2023: Artifact badge evaluation starts for accepted papers

AUG 10, 2023: Artifact freeze, authors assign a DOI to their Artifact

AUG 19, 2023: Artifact badge decision

The Artifact Description/Artifact Evaluation (AD/AE) process is single-blind, unlike paper submissions which are double-blind reviewed. Authors do not need to remove identifying information from artifacts or papers.

The Committee may provide feedback to authors in a single-blind arrangement. The AD/AE Committee will not share any information with the Program Committee other than to confirm whether artifacts meet the criteria.

How to Submit

Artifacts are submitted via the Artifact Description submission form (Available Winter 2022). Submission includes application for badging in the second stage. Artifact freeze means the artifact must not be changed after this time, or a tagged version be provided.

AD/AE Organization & Evaluation

The Artifact Description is a mandatory step for all submitted papers. All authors must provide descriptions of their artifacts. The Artifact Description (AD) Appendix will be auto-generated from author responses to a standard form embedded in the online submission system. It must include the following aspects:

Artifact identification

Including:

the article’s title
the author’s names and affiliations
an abstract describing the main contributions of the article and the role of the computational artifact(s) in these contributions

The abstract may include a software architecture or data models and its description to help the readers understand the computational artifact(s) and a clear description on to what extent the computational artifact(s) contribute(s) to the reproducibility of the experiments in the article.

Reproducibility of Experiments

Including:

a complete description of the experiment workflow that the code can execute
an estimation of the execution time to execute the experiment workflow
a complete description of the expected results and an evaluation of them
how the expected results from the experiment workflow relate to the results found in the article

Best practices indicate that, to facilitate the understanding of the scope of the reproducibility, the expected results from the artifact should be in the same format as the ones in the article. For instance, when the results in the article are depicted in a graph figure, ideally, the execution of the code should provide a (similar) figure (there are open-source tools that can be used for that purpose such as gnuplot). It is critical that authors devote their efforts on these aspects of the reproducibility of experiments to minimize the time needed for their understanding and verification.

If you are unable to provide an artifact description (e.g. due to proprietary reasons), please provide detailed reasons why you are not able to do so. Failure to provide detailed description will lead to further questions from the AD/AE Committee. The AD/AE Committee will provide their feedback to the SC Technical Program Committee, and inadequate explanations will be taken against the overall paper review.

AD Evaluation

The AD/AE Committee will evaluate completion of the form and artifact accessibility from any links included as part of the form, for instance, links to the computational artifacts.

The Artifact Evaluation is an optional step for all accepted papers. It allows authors to apply for a reproducibility badge. If you wish to acquire a badge for your artifact, you must choose appropriate badges in AD/AE form. The Artifact Evaluation (AE) Appendix will extend the contents of the AD and they will both be included as an Appendix. The AE will be also auto-generated from author responses to a standard form embedded in the online submission system. In addition to the Artifacts Identification and Reproducibility of Experiments from the AD, the AE must include the following aspects:

Artifact Dependencies & Requirements

Including (i) a description of the hardware resources required, (ii) a description of the operating systems required, (iii) the software libraries needed, (iv) the input dataset needed to execute the code or when the input data is generated, and (v) optionally, any other dependencies or requirements. Best practices to facilitate the understanding of the descriptions indicate that unnecessary dependencies and requirements should be suppressed from the artifact.

Artifact Installation & Deployment Process

Including (i) the process description to install and compile the libraries and the code, and (ii) the process description to deploy the code in the resources. The description of these processes should include an estimation of the installation, compilation, and deployment times. When any of these times exceed what is reasonable, authors should provide some way to alleviate the effort required by the potential recipients of the artifacts. For instance, capsules with the compiled code can be provided, or a simplified input dataset that reduces the overall experimental execution time. On the other hand, best practices indicate that, whenever it is possible, the actual code of software dependencies (libraries) should not be included in the artifact, but scripts should be provided to download them from a repository and perform the installation.

Other notes

Including other related aspects that can be important and were not addressed in the previous points.

Please notice that it is not allowed to conduct deployments on public cloud infrastructures or any other platform that require reviewers to pay to reproduce the experiments. This is because such a practice can create barriers for potential reviewers and for future third-party researchers interested in the artifacts. In case you submit artifacts that do not respect this requirement, the only badge you can apply for is Artifacts Available.

In addition to the AD/AE, when applying for a reproducibility badge, computational artifacts also need to be submitted for the reproducibility of experiments.

The computational artifacts of a paper include all the elements that support the reproducibility of its experiments, such as software, datasets, environment configuration, mechanized proofs, benchmarks, test suites with scripts, etc. Authors can choose any version-controlled software and data repositories to share their artifacts, such as Zenodo, FigShare, Dryad, Software Heritage, GitHub, or GitLab.

The AD/AE, in addition to documenting the computational artifacts, will also include links to the required repositories. If needed, README files can also be attached to the computational artifacts, either containing the same information as in the AD/AE or complementing it, for instance, providing further and more detailed instructions and documentation. As a general rule, authors should try to do their best to simplify the reproducibility process, to save committee members the burden of reverse-engineering the authors’ intentions. For example, a tool without a quick tutorial is generally very difficult to use. Similarly, a dataset is useless without some explanation on how to browse the data. For software artifacts, the AD/AE and the README should—at a minimum—provide instructions for installing and running the software on relevant inputs. For other types of artifacts, describe your artifact and detail how to “use” it in a meaningful way.

Importantly, make your claims about your artifacts concrete. This is especially important if you think that these claims differ from the expectations set up by your paper. The AD/AE Committee is still going to evaluate your artifacts relative to your paper, but your explanation can help to set expectations up front, especially in cases that might frustrate the evaluators without prior notice. For example, tell the AD/AE Committee about difficulties they might encounter in using the artifact, or its maturity relative to the content of the paper.

packaging methods

Configuring software dependencies of artifacts can require a significant amount of time for reproducibility purposes. Therefore, to alleviate it, authors should consider one of the following methods to package the software components of their artifacts (although the AD/AE Committee is open to other reasonable formats as well):

Source Code: If your artifact has few dependencies and can be installed easily on several operating systems, you may submit source code and build scripts. However, if your artifact has a long list of dependencies, please use one of the other formats below.
Virtual Machine/Container: A virtual machine or Docker image containing the software application already set up with the right toolchain and intended runtime environment. For example:
- For raw data, the VM would contain the data and the scripts used to analyze it.
- For a mobile phone application, the VM would have a phone emulator installed.
- For mechanized proofs, the VM would contain the right version of the relevant theorem prover. We recommend using a format that is easy for AD/AE Committee members to work with, such as OVF or Docker images.
Binary Installer: Indicate exactly which platform and other run-time dependencies your artifact requires.
Live Instance on the Web: Ensure that it is available for the duration of the artifact evaluation process.
Internet-accessible Hardware: If your artifact requires special hardware (e.g., GPUs or clusters), or if your artifact is actually a piece of hardware, please make sure that AD/AE Committee members can somehow access the device. VPN-based access to the device might be an option.

preparation sources

There are several sources of good advice about preparing artifacts for evaluation:

HOWTO for AEC Submitters, by Dan Borowy, Charlie Cursinger, Emma Tosch, John Vilk, and Emery Berger
Artifact Evaluation: Tips for Authors, by Rohan Padhye
SIGOPS articles on award winning artifacts [1] and [2]
Github CSArtifacts Resources

During the Artifact Evaluation Stage, all the computational artifacts associated with the paper, such as software, datasets, or environment configuration required to reproduce the experiments are assessed. The goal of Artifact Evaluation is to award badges to artifacts of accepted papers. We base all badges on the NISO Reproducibility Badging and Definitions Standard. In 2023, the assigned badges will be per ACM Reproducibility Standard.

Authors of papers must choose to apply for a badge a priori during the AE phase. Authors can apply for one or more of the three kinds of badges that we offer. The badges available are Artifacts Available, Artifacts Evaluated-Functional, and Results Replicated. Please, note that they are incremental: If one applies for Artifacts Evaluated Functional, this also includes Artifacts Available. If one applies for Results Replicated, this also includes the other two badges. The type of badge and the criteria for each badge is explained next. To start the Reproducibility Evaluation Process, authors must provide links to their computational artifacts.

After the evaluation process, artifacts must be freezed to guarantee their persistence and immutability. An artifact must be accessible via a permanently persistent and publicly shareable DOI (What is a DOI? Check this out) on a hosting platform that supports persistent DOIs and versioning (for example, DataPort, Dryad, FigShare, Harvard Dataverse, or Zenodo). Authors should not provide links or zipped files hosted through personal webpages or shared collaboration platforms, such as Next Cloud, Google Drive, or Dropbox.

Zenodo and FigShare provide an integration with GitHub to automatically generate DOIs from Git tags. Therefore, it is possible to host code using version control provided by GitHub and describe the artifact using Zenodo or FigShare. Please, observe that Git itself (or any other control versioning software) does not generate a DOI, and it needs to be paired with Zenodo or FigShare.

artifacts available

The following are necessary to receive this badge:

Assigned DOI to your research object by the Artifact Freeze deadline (07/29/2023). DOIs can be acquired via Zenodo, FigShare, Dryad, Software Heritage. Zenodo provides an integration with Github to automatically generate DOIs from Git tags.
Links to code and data repositories on a hosting platform that supports versioning: GitHub, or GitLab. In other words, please do NOT provide Dropbox links or gzipped files hosted through personal webpages.

Note that, for physical objects relevant to the research, the metadata about the object should be made available.

What do we mean by accessible? Artifacts used in the research (including data and code) are permanently archived in a public repository that assigns a global identifier and guarantees persistence, and are made available via standard open licenses that maximize artifact availability.

Artifacts Evaluated-Functional

The criteria for the Artifacts Evaluated-Functional badge require an AD/AE committee member to agree whether the artifact provides enough details to exercise the artifact of components in the paper. For example, is it possible to compile the artifact, use a Makefile, or perform a small run? If the artifact runs on a large cluster—can it be compiled on a single machine? Can analysis be run on a small scale? Does the artifact describe the components to nurture future use of this artifact?

The reviewer will assess the details of the research artifact based on the following criteria:

Documentation: Are the artifacts sufficiently documented to enable them to be exercised by readers of the paper?
Completeness: Do the submitted artifacts include all of the key components described in the paper?
Exercisability: Do the submitted artifacts include the scripts and data needed to run the experiments described in the paper, and can the software be successfully executed?

We encourage authors to describe their (i) workflow underlying the paper, (ii) describing some of the black boxes, or a white box (e.g., source, configuration files, build environment), (iii) input data: either the process to generate the input data should be made available, or when the data is not generated, the actual data itself or a link to the data should be provided, (iv) environment (system configuration and initialization, scripts, workload, measurement protocol) used to produce the raw experimental data, and (v) the scripts needed to transform the raw data into the graphs included in the paper.

Results Replicated

The evaluators successfully reproduced the key computational results using the author-created research objects, methods, code, and conditions of analysis. Note we do not aim to recreate the exact or identical results, especially hardware-based results. However, we do aim to:

Reproduce Behavior: This is of specific importance where results are hardware-dependent. Bit-wise reproducibility is not our goal. If we get access to the same hardware as used by experiments, we will aim to reproduce the results on that hardware. If not, we aim to work with authors to determine the equivalent or approximate behavior on available hardware. For example, if results are about response time, our objective will be to check if a given algorithm is significantly faster than another one, or that a given parameter affects negatively or positively the behavior of a system.
Reproduce the Central Results and Claims of the Paper: We do not aim to reproduce all the results and claims of the paper. The AD/AE committee will determine the central results of the accepted paper, and will work with authors to confirm it. Once confirmed, the badge will be assigned based on the committee being able to reproduce behavior of these central results.

Is participation in the badging process mandatory?

No. Participation in the badging process is voluntary. Please choose which badges you wish to apply for in the AD/AE Appendices Form. Artifact Evaluation will only occur for accepted papers who have applied for an appropriate badge. The badge will be assigned after the artifact evaluation process is over.

What is the set of badge labels that a paper can apply for?

Artifacts Available
Artifacts Evaluated-Functional
Results Replicated
No badge

Will SC host the artifacts of reproduced papers?

Authors are responsible to host their artifacts, whether reproduced or not. We suggest using one of the following platforms: Zenodo, FigShare, Dryad, Software Heritage for sharing their artifacts with the AD/AE Committee. The SC Reproducibility Initiative does not have any place to permanently host what we reproduce and/or review. So any work done to badge artifacts will not be hosted on a longer term.

reproducibility initiative

SC has been a leader in tangible progress towards scientific rigor, through its pioneering practice of enhanced reproducibility of accepted papers.