Summary
A system for automatically generating test inputs for and testing the output of PHP web applications is described. Input that maximizes coverage is generated via concolic (concrete and symbolic) execution of the PHP code, tracking user input as it travels through the program and accumulating execution path constraints when variables tagged as input are involved in conditionals. New inputs are generated by successively negating elements of each path constraint and using a constraint solver in conjunction with a test input generator to generate new inputs that satisfy the mutated path constraints.
This process is repeated on successive sets of path constraints. PHP errors as well as HTML validity errors are considered bugs and are tracked in a database along with the path constraints and particular choice of inputs that triggered them. The architecture of the system is described as well as each individual component: executor (shadow interpreter, database manager), bug finder (oracle, bug report repository, input minimizer) and input generator (symbolic driver, constraint solver, value generator).
The results of executing the tool on four open source PHP projects is provided along with comparisons versus random test input generation and a static analysis tool that uses a substantially different approach.
Comments
The process of tracking which variables are associated with inputs and constructing a set of execution path constraints that identify a particular unique execution path through the program seems to be a powerful technique, though it apparently did not originate in this paper. Their main contribution seems to be being crazy enough to do this for PHP and ironing out some of the resultant wrinkles. They also claim that using an HTML validator to identify a larger class of program “failures” is a novel contribution. Fair enough, but I don’t think it’s as big a deal as they make it out to be.
They also had to punt somewhat on simulating user input and handling session state. For scripts that output form buttons or other means by which a user would provide input and cause a new script to be executed, they manually added some code to make that input appear as if it were an input to the original page and then added a switch statement that executed the subsequent page inline, tacking its output onto the output of the first page. That’s clever and expedient, but it also side-steps one of the hard and possibly interesting challenges of testing web applications. To be fair, they clearly point out this limitation and claim to be working on it.
I applaud the extra effort put into making the results actually useful to a developer using them to find and fix bugs in their application. They go to extra trouble to track the particular line of PHP code that triggers the script failure or that emits the malformed HTML and they include that information in their bug reports. They also take the trouble to merge reports that generate essentially the same failures and make efforts to minimize the path constraints needed to trigger a failure such that they usually provide the minimum set of inputs needed to reproduce a bug.