Rx: Treating Bugs As Allergies— A Safe Method to Survive Software Failures

Posted on Jan 19, 2024

Paper: Rx: Treating Bugs As Allergies— A Safe Method to Survive Software Failures

  • Software failure recovery to make the softwares more available
  • Makes use of Checkpointing and Rollback to revert to an older state
  • Then makes some environmental changes and continues the execution of the application.
    • If none of the changes work, it goes back one more checkpoint and retries
  • Components
    • Proxy: Separates client and server interactions and helps in the saving and replay of requests upon re-execution.
    • Sensors: Identifies when there is an error in the application using exceptions, interrupts etc.
    • Checkpointing and Rollback
      • Based on: Flashback
      • Deletes oldest checkpoint based on stratergies.
    • Environmental wrappers: For modifying environment during re-execution
      • Memory allocation wrappers: eg: zero fill, add padding
      • Scheduling wrapper to change the unit of time for scheduling
      • User request dropping
    • Control unit: Coordinates with all the components
      • Also provides useful information for the programmer to diagnose and fix errors.
  • Tested on Squid, Apache, CVS, MySQL