Making changes to computer code may have unintended consequences for program performance. For instance, modifying loops or changing data structures in a specific program could cause an increase in execution time or in memory or disk usage. We refer to changes in such performance measures as the differential cost of modifying the code.
The ability to reason about the cost of code changes is called differential cost analysis. Being able to perform such an analysis before deploying a new version of program code is of particular interest to Amazon, as it not only enables a better customer experience but can also reduce resource usage and carbon footprint.
But bounding the cost of a code change is an undecidable problem, meaning there’s no algorithm guaranteed to give you an answer. Previous approaches have focused on estimating the cost of a single version of the code, or they assumed the ability to align code changes in a syntactic way.
At this year’s ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2022), we presented a paper on differential cost analysis that overcomes some of these challenges. Our approach is based on the idea of jointly computing a potential function and an anti-potential function that provide, respectively, the upper and lower bounds for changes in cost.
Unlike previous approaches, our implementation can compute tight bounds on the costs of code changes between pairs of program versions collected from the literature. In particular, we are able to provide tight bounds on 14 out of 19 examples, which include both variations that have an impact on cost and variations that do not impact the cost but require complex analysis to establish as much.
Thresholds
The ability to reason about cost changes in code is of fundamental importance in most software applications, but particularly for Prime Video: the Prime Video app runs on a range of devices, some of them with very limited memory and processing power. As our colleague Alex Ene has described, efficiency is a key concern for Prime Video: not only do we need to provide code that runs fast, with very tight bounds on startup time, but we also need to address the memory limitations of USB-powered streaming devices.
While new architectures can help with achieving these goals, the approach we propose in this paper is finer-grained. It is a form of automated reasoning that allows us to provide feedback to developers on every code change, in a workflow similar to the one we presented in an earlier blog post.
In particular, we address the following code analysis problem: given two versions of a program, old and new, and given a cost we are interested in (e.g., run time, memory, number of threads, disk space), we want to compute a numerical bound, the threshold t, such that
costnew - costold ≤ t.
We focus on imperative programs — the most familiar type of program, with explicit specification of every computational step — with integer variables and polynomial arithmetic. The programs may also have nondeterministic elements, meaning that the same inputs may yield different outputs.
The anti-potential function — which calculates the lower bound — captures the minimum cost to be “paid” for a program to run. If we indicate with ϕ the potential function of a program and with χ its anti-potential function, then the threshold in cost variation between an old and a new version of a program can be approximated by ϕnew - χold.
As a concrete example, consider the two versions of the program below, where lines in yellow model the cost, text in red is the code that is removed, and text in green is the code that is added. This program encodes a common join operation over two sequences, with an operator f having some cost per pair of elements.
The formulas in boxes are, respectively, the values of the anti-potential and the potential functions. For instance, the anti-potential value in the box labeled ℓ0 encodes the fact that the cost to terminate the program is at least equal to the product of the size of the two sequences. As a result, the difference between ϕnew and χold is lenB ⋅ lenA, which is the desired threshold.
In the paper, we show that it is possible to compute both ϕ and χ by working with polynomial expressions over program variables at each program location, as shown in the example above. We represent the program versions as transition systems, a model of computation that consists of a set of program states and a set of valid transitions between states. We assume that the transition systems terminate — i.e., there’s no input that will cause them to run forever.
We fix a symbolic variable for the threshold t, and by traversing the two transition systems, we obtain a system of constraints that can be solved to obtain concrete values for the threshold and for the potential functions.
Constraint satisfaction
A key aspect of our approach is obtaining a simultaneous system of constraints for both the potential function of the new program version and the anti-potential function of the old program.
Unfortunately, the resulting system of constraints is hard to solve, as it involves universal quantifiers and polynomial constraints over variables. We solve it by employing the results of Handelman’s theorem to convert these constraints into a system of purely existentially quantified linear constraints. That is, we convert constraints of the form “for all X’s, P(X)” (a universal quantifier) to constraints of the form “there exist X’s such that Q(X)” (an existential quantifier), where Q is linear, meaning its variables are not squared, cubed, etc.
Such systems of constraints can be solved efficiently via an off-the-shelf linear-programming solver. This constraint representation has the additional benefit of enabling either the verification of a symbolic threshold or the optimization of a concrete one, which results in a threshold t that is as tight as possible.
We have validated this approach using 19 benchmarks in C from the current literature. We convert these programs to transition systems, and for 17 of them, we are able to compute a value for the threshold. The threshold is optimal in 14 cases, and more importantly, we can provide a threshold value in less that five seconds in all cases.
Acknowledgements: Djordje Zikelic, Pauline Bolignano, Daniel Schoepe, Ilina Stoilkovska.