Low Degree Extension Polynomials
In the context of interactive proofs, and consequently arguments and ZKPs, it is often useful to represent a dataset or a function as a polynomial called its low degree extension. Extending a dataset and extending a function work in the same way, since we can represent any dataset as a function mapping indices to values, and any function as a dataset containing all input-output pairs.
Given a dataset of length where each symbol in the data can be mapped to an element of a finite field , we can represent the dataset as an -dimensional vector of elements in .
We can also represent the dataset with a function that maps indices to the corresponding entries in the vector .
The entries for all of the vector , and the evaluations of the function for all are the same. The low degree extensions polynomials we define can be thought of as extending a function with a domain of size or extending an -dimensional vector.
Different approaches exist for defining polynomial extensions of data and functions, but with each scheme we will see, the resulting polynomial extension is
- Low degree - the degree is where is the size of the domain of the function being extended or equivalently, the length of the vector being extended
- Unique - the extension is the only polynomial of degree which extends the function or vector according to the rules of the scheme
Distance Amplification and Probabilistic Comparison of Data
Low degree extensions enable efficient probabilistic comparison of data by allowing us to compare the evaluations of two low degree extension polynomials given the same input, rather than comparing the full underlying datasets. As long as the low degree extensions are defined over a sufficiently large field, they'll be distance amplifying, meaning that for two datasets that are identical except at one entry, or two functions which agree on all but one point, their extensions will be different at almost all points.
Specifically, given two -dimensional vectors and with low degree extensions and , if the vectors are the same, the extensions will be the same.
If the vectors differ, the probability that the extensions evaluated at a random will be the same is proportional to the length of the vector divided by the size of the field over which the extensions are defined.
Since and differ in at least one place, their extensions must be different polynomials.
The extensions are defined as or lower degree polynomials which guarantees that they intersect at no more than points. This is because if is a non-zero polynomial, it would be an degree polynomial with more than roots, contradicting the fundamental theorem of algebra.
The extension polynomials are defined over the domain containing elements, so they will agree at no more than a fraction of the points in their domains. By choosing a sufficiently large field over which to define the extensions, the probability of two distinct polynomial extensions agreeing at a random point in their domains can be made very low. We can thus check the equality of the vectors and probabilistically by evaluating both of their extensions at the same random input and checking .
Coefficient Based Approach
We can construct a unique polynomial by interpreting each entry of the vector , or equivalently, each evaluation of for all ), as a coefficient for increasing powers of . The th entry in the vector or will thus be the coefficient of the th standard monomial basis vector , and the polynomial will be a linear combination over the standard monomial basis. The polynomial will be of degree and is a low-degree extension of . The polynomial looks like this:
Or as a summation:
The polynomial is clearly unique to the dataset under this extension scheme since each of 's entries scales one of the terms in . The degree of the polynomial is guaranteed to be lower than the length of the encoded vector because the polynomial is a sum of monomials up to .
Point Evaluation Approach
Another approach to creating a unique low-degree extension of a function or dataset is to interpolate a set of point evaluations. In terms of the vector , our interpolating polynomial, , will pass through the points . In terms of the function representation of ,
, the interpolating polynomial will agree with for all inputs in its domain . In both cases, a set of points are being interpolated by a polynomial defined over a much larger domain, .
Lagrange interpolation gives us a straightforward technique to construct the unique interpolating polynomial with degree . We'll apply the entries of , or equivalently, each evaluation of for all ), as weights in a linear combination of Lagrange basis polynomials.
We defined the set of Lagrange basis polynomials for a set of nodes as the polynomials mapping their corresponding node to 1 and all other nodes to 0. Thus, for all and for all if .
The formula is covered in more detail here, but concisely, the th Lagrange basis polynomial with interpolating set can be written as
. The numerator has the effect of creating a zero at all where and the denominator scales the evaluation to 1 in the case where .
We can construct the interpolating polynomial by as a weighted sum of the basis polynomials. In terms of the function we define a low degree extension via Lagrange interpolation as agreeing with for all inputs .
In terms of the -dimensional vector , the low degree extension maps an index to the corresponding entry .
The Lagrange basis polynomials are each of degree , since they're each the product of terms each with degree 1. The interpolating polynomial is a sum of scaled basis polynomials, so its degree is also or lower. Since the Lagrange interpolating polynomial has degree , and interpolates points, it must be unique. If there were another polynomial of degree which interpolated the same points, then would be a polynomial of degree with zeroes. Then and .