Verifying Sanitizer Correctness through Black-Box Learning: A Symbolic Finite Transducer Approach


String sanitizers are widely used functions for preventing injection attacks such as SQL injections and cross-site scripting (XSS). It is therefore crucial that the implementations of such string sanitizers are correct. We present a novel approach to reason about a sanitizer’s correctness by automatically generating a model of the implementation and comparing it to a model of the expected behaviour. To automatically derive a model of the implementation of the sanitizer, this paper introduces a black-box learning algorithm that derives a Symbolic Finite Transducer (SFT). This black-box algorithm uses membership and equivalence oracles to derive such a model. In contrast to earlier research, SFTs not only describe the input or output language of a sanitizer but also how a sanitizer transforms the input into the output. As a result, we can reason about the transformations from input into output that are performed by the sanitizer. We have implemented this algorithm in an open-source tool of which we show that it can reason about the correctness of non-trivial sanitizers within a couple of minutes without any adjustments to the existing sanitizers.