Skip to content

Awesome dist system testing papers

  • Anvil: Verifying Liveness of Cluster Management Controllers

    • Tianyin Xu, OSDI 24 Best Paper
    • A formal verification framework to verify the correctness (not only the safety, but also the liveness) of Kubenetes Controller
  • Acto: Automatic End-to-End Testing for Operation Correctness of Cloud System Management

    • Tianyin Xu, Owolabi, SOSP 23
    • An end-to-end tester for Kubenetes Operators. Acto continuously instructs an operator to reconcile a system to different states and checks if the system success-fully reaches those desired states.
  • Push-Button Reliability Testing for Cloud-Backed Applications with Rainmaker

    • Tianyin Xu, NSDI 23
    • Automatic restful api fault hijacking and injection
    • Related works: fault injection techniques and error-handling analysis for cloud apps testing
  • Automatic Reliability Testing for Cluster Management Controllers

    • Tianyin Xu, OSDI 22
    • Automated reliability testing for Kubenetes controller, by Systematic Perturbation Testing and Differential Test Oracles
  • Testing Configuration Changes in Context to Prevent Production Failures (CTest)

    • Tianyin, Owolabi, OSDI 20
    • CTest tested configuration changes by re-do tests related to the configuration change context, and ctest can be generated from existing tests.
  • Do Not Blame Users for Misconfigurations

    • Tianyin, SOSP 13
    • Using Static Analysis to Infer Configuration Constraints
  • Efficient Exposure of Partial Failure Bugs in Distributed Systems with Inferred Abstract States

    • Peng (Ryan) Huang, NSDI 2024
    • Use static analysis to automatically infers abstract states from distributed system code to make fault injection. It use Budgeted-State-Round-Robin (BSRR) Algorithm to guide fault injection decisions.
  • Run-Time Prevention of Software Integration Failures of Machine Learning APIs

    • Shan Lu, OOPSLA 2023
    • preventing integration failures in software applications using machine learning (ML) APIs
      • exceptions, mismatching ....
  • Automated Verification of Idempotence for Stateful Serverless Applications

    • Haibo Chen, OSDI 2023
    • Verification of Idempotence for Serverless
  • Fail through the Cracks: Cross-System Interaction Failures in Modern Cloud Systems

    • Tianyin Xu, Yongle Zhang, EuroSys 23
    • Cloud system reliability is affected not only bythe reliability of each individual system, but also by the interplay between these systems.
    • Open-Source CSI Failure Dataset
    • Cross-System Testing: check whether the two systems, Spark and Hive, each process data consistently by writing the data and then reading it through various interfaces of the two systems.
  • Understanding and Detecting Software Upgrade Failures in Distributed Systems

    • Shan Lu, Yongle Zhang, Ding Yuan, SOSP 21
    • A static checkers DUPChecker
    • Related Work: automated testing
    • we designed a static checker DUPChecker to search for two types of data-syntax incompatibility across versions:
      • (type-1) on data defined by serialization libraries and
      • (type-2) on data of enum types,
  • Simple Testing Can Prevent Most Critical Failures

    • Ding Yuan, Yongle Zhang, OSDI 14
    • We ex-tracted three simple rules from the bugs that have lead to some of the catastrophic failures, and developed a static checker, Aspirator, capable of locating these bugs

Contributor

Changelog

Written with