Early Detection of Configuration Errors to Reduce Failure Damage
Paper: Early Detection of Configuration Errors to Reduce Failure Damage
- Defines
Latent Configuration (LC) Errors
which are caused due to insufficient validations on the configuration, later until the configuration is actually used- There might be a large time between loading this configuration, generally in the initialization phase, to actually using it (thus latent).
- When such configurations are related to reachability, availability or serviceability, LC errors can lead to downtime.
- Two main issues with such configs
- The values are not checked at all. eg: check if file exists
- The values are not checked according to the usage. eg: value is used in open(config_value, WRITE)
- Paper implements a checker based on the static analysis and instrumentation
- Static analysis:
- Taint analysis to go from the configuration to the actual usage along the data flow path. Control flow is ignored in most cases to avoid over tainting
- Along with these instruction, the dependent values are also extracted. Eg: open(config_value, permission) <- here permission is dependent value
- Any value that cannot be determined are skipped. Eg: a dependent value read from network
- Instrumentation:
- Code is generated to perform same check as that in the actual usage, but in a “sandboxed” manner
- Here any side effect on the program is avoided. Eg: a local copy of global value is used instead of the actual global value.
- Utilities are written to check the actions performed by some library and system calls.
- This generated code is run right after the initialization phase of the program
- Developer need to annotate two things
- The interface of how configuration values are fetched
- The place where program moves from initialization state to execution
- Developer need to annotate two things
- TOCTOU issues are avoided by adding support to run these checkers regularly in a thread