Univariate Lagrange Interpolation

August 08, 2024

Background

Unique Low Degree Extensions

  • Lagrange interpolation and Reed-Solomon both give us techniques to create unique low-degree extension polynomials for a message. The unique low degree extensions will always have degree n1n-1 or lower where nn is the length of the message. The extension polynomials are then evaluated at pp points where p>>np>>n to create a low degree extension encoding which is much longer than the original message. The algorithms only differ in how the message is represented as a polynomial.
  • Low degree extension encodings are valuable because of their distance amplifying properties - if two messages differ even at one entry, almost every entry in their extension encodings will be different, allowing for fast comparison of messages via random sampling of entries in the encoding.
  • In both Lagrange interpolation and Reed-Solomon, the symbols of the message mm are mapped to elements of a finite field Fp\mathbb{F}_p and the entries of the message (m1,mn)Fpn(m_1,\dots m_n) \in \mathbb{F}_p^n are applied as the weights in a linear combination over a polynomial basis.
    • With Reed-Solomon, we use the standard monomial basis {x0,x1,...xn1}\{x^0, x^1,...x^{n-1}\} resulting in the polynomial
      Pm(x)=i=0nmixi1P_m(x)=\sum^{n}_{i=0} m_i \cdot x^{i-1}
    • In univariate Lagrange interpolation, the Lagrange basis is used {L0(x),L1(x),...Ln1(x)}\{L_0(x), L_1(x),...L_{n-1}(x)\}, resulting in a polynomial with the form
      Pq(x)=i=0n1mi+1Li(x)P_q(x)=\sum^{n-1}_{i=0} m_{i+1} \cdot L_{i}(x)
  • With Reed-Solomon, the symbols of the message uniquely determine a polynomial by specifying the coefficients. The degree is clearly n1n−1 because the sum spans nn standard monomial basis polynomials {x0,x1,...xn1}\{x^0, x^1,...x^{n-1}\}.
  • Univariate Lagrange interpolation obtains an n1n-1 or lower degree polynomial for the message by interpreting its entries as a set of nn points {(0,m1),,(n1,mn)}\{(0,m_1),\dots,(n-1,m_n)\} and defining the polynomial that passes through them. Because two distinct polynomials of degree n1n-1 or lower can intersect at no more than n1n-1 points, we know that there is no other polynomial of degree n1n-1 or lower that interpolates the nn points. Thus, the Lagrange interpolating polynomial is unique.

Univariate Lagrange Interpolation

High Level

Specifying the low degree extension for a message with Lagrange interpolation is less intuitive than with Reed-Solomon, so we'll begin with a high level summary of steps.

  • Our goal is to uniquely represent an nn-length message mm with a univariate polynomial qmq_m of degree n1\leq n-1 such that evaluating qm(i)q_m(i) for i{0,,n1}i\in\{0,\dots,n-1\} produces the i+1thi+1th character of the message. We define qmq_m over a finite field Fp\mathbb{F}_p where p>np>n.
  • We'll create the interpolating polynomial by pairing the nn characters of the message m=(m1,,mn)m = (m_1, \dots, m_n) with their corresponding zero-based indices i{0,,n1}i\in\{0,\dots,n-1\} to obtain a set of nn coordinate pairs {(0,m1),(1,m2)(n1,mn)}\{(0, m_1), (1, m_2)\dots (n-1, m_n)\}.
  • The x-values in the set of coordinates being interpolated are the nodes over which we can define a set of Lagrange basis polynomials.
  • Lagrange basis polynomials are polynomials that evaluate to 1 for their corresponding node, and 0 for all other nodes in the set.
    • For example, the second Lagrange basis polynomial will return 0 for all nodes other than the node at index two:
      • L2(0)=0L_2(0) = 0
      • L2(1)=0L_2(1) = 0
      • L2(2)=1L_2(2) = 1
      • L2(3)=0L_2(3) = 0
      • L2(4)=0L_2(4) = 0
  • We create the interpolating polynomial qmq_m by multiplying the entries in the message by Lagrange basis polynomials.
    qm(i)=m1L0(i)+m2L1(i)++mnLn1(i)q_m(i) = m_1 \cdot L_0(i) + m_2 \cdot L_1(i) + \dots + m_n \cdot L_{n-1}(i)
  • For an input ii to qmq_m, each Lagrange basis polynomial LjL_j will evaluate to 0, except Li(i)L_i(i) which will evaluate to 1.
  • As a result, the evaluation of of qm(i)q_m(i) will be
    qa(i)=m10++mi+11++mn0=mi+1q_a(i) = m_1 \cdot 0 + \dots + m_{i+1} \cdot 1 + \dots + m_n \cdot 0 = m_{i+1}
    • For example, we can see that qm(2)q_m(2) evaluates to the third character in the message qm(2)=m1L0(2)+m2L1(2)+m3L2(2)++mnLn1(2)q_m(2) = m_1 \cdot L_0(2) + m_2 \cdot L_1(2) + m_3 \cdot L_2(2) + \dots + m_n \cdot L_{n-1}(2)
      =m10+m20+m31++mn0= m_1 \cdot 0 + m_2 \cdot 0 + m_3 \cdot 1 + \dots + m_n \cdot 0
      =m3= m_3
  • We define qaq_a, the Lagrange interpolating polynomial, as mapping qa:imi+1q_a: i \mapsto m_{i+1}.
  • The x-values in the coordinate pairs {0,,n1}\{0,\dots,n-1\} are the nodes over which we define our Lagrange basis polynomials, or the interpolating set.
  • We create the interpolating polynomial by using entries in the message as the weights in a linear combination of the Lagrange basis polynomials
    qa(i)=m1L0(i)+m2L1(i)++mnLn1(i)q_a(i) = m_1 \cdot L_0(i) + m_2 \cdot L_1(i) + \dots + m_n \cdot L_{n-1}(i)

The Details

Lagrange Basis Polynomials

  • Let mm be an nn-dimensional vector m=(m1,m2,,mn)Fpnm = (m_1, m_2, \dots,m_n) \in \mathbb{F}_p^n where pp is a prime number larger than nn. There exists a unique univariate Lagrange interpolating polynomial qmq_m over Fp\mathbb{F}_p with degree n1\leq n-1 such that
    i{0,,n1}(qm(i)=mi+1)\forall i\in\{0,\dots,n-1\}(q_m(i) = m_{i+1})
  • We'll interpolate the set of points {(0,m1),(n1,mn)}\{(0, m_1), \dots (n-1, m_n)\}, so our nodes are {0n1}\{0\dots n-1\}.
  • Given a set of nn distinct nodes, we define the Lagrange basis for polynomials of degree n1\leq n-1 to be the set of n1n-1 degree polynomials {L0,L1,Ln1}\{L_0, L_1,\dots L_{n-1}\} over Fp\mathbb{F}_p where Li(xi)=1L_i(x_i) = 1 and Li(xj)=0L_i(x_j) = 0 if j{0n1}j \in \{0\dots n-1\} and jij \neq i.
    Li(x)=j=0,jin1xxjxixjL_i(x)= \prod_{j=0, j\neq i}^{n-1} \frac{x-x_j}{x_i-x_j}
  • In other words, the iith Lagrange basis polynomial evaluates to 11 for the iith node and 0 for all other nodes in the set.
    • Note that this is only true for inputs in the domain {0,,n1}\{0,\dots,n-1\} over which the set of Lagrange basis polynomials was defined. Evaluating a Lagrange basis polynomial at a value outside of that domain will not return 00 even if the value does not equal ii.
    • Another way you may see this written written is Li(xj)=δijL_i(x_j) = \delta_{ij}. The δij\delta_{ij} refers to the Kronecker delta function which returns 1 if i=ji=j and 0 if iji \neq j, for example δ22=1\delta_{22}=1 and δ21=0\delta_{21}=0.
  • We're intentionally creating a polynomial for every node index ii such that there will be a zero at every element in the domain {0,n1}\{0,\dots n-1\} that isn't ii, and that the evaluation of the polynomial at the node xix_i is 1.
    • We set the numerator so that for every j{0,n1}j\in \{0,\dots n-1\} except j=ij=i, there will be a zero at x=xjx=x_j.
    • We set the denominator so that in the one case where j=ij=i, terms in the numerator and denominator will all cancel, leaving us with 1.
  • The set {L0,L1,Ln1}\{L_0, L_1,\dots L_{n-1}\} is a Lagrange basis for polynomials of degree n1\leq n-1 because
    • The Lagrange basis polynomials are linearly independent so no polynomial in the set can be written as a linear combination of the others.
    • Any other polynomial of degree n1\leq n-1 can be expressed as a linear combination of the Lagrange basis polynomials.

Lagrange Interpolating Polynomial

  • Using the Lagrange basis polynomials that we defined for the set of coordinates being interpolated, we can create the Lagrange interpolating polynomial, qmq_m. Our goal is that for the vector m=(m1,,mn)Fpnm = (m_1,\dots,m_n) \in \mathbb{F}_p^n
    i{0,,n1}(qm(i)=mi+1)\forall i \in \{0,\dots,n-1\}(q_m(i) = m_{i+1})
  • We must create a sum where all terms evaluate to 0 except the one multiplied by the mx+1m_{x+1}th entry of the message. $$
    q_m(x) = \sum_{i=0}^{n-1}m_{i+1}\cdot L_i(x)
  • Expanded, we have
    qa(x)=m1L0(x)+m2L1(x)++mnLn1(x)q_a(x) = m_{1}\cdot L_0(x) + m_{2}\cdot L_1(x) + \dots + m_n\cdot L_{n-1}(x)
  • When evaluating qa(i)q_a(i) for i{0,n1}i \in \{0,n-1\} only the basis polynomial LiL_i evaluates to 1 and that all other basis polynomials in {L0,,Ln1}\{L_0,\dots,L_{n-1}\} evaluate to zero. Since we use the entries (m1,,mn)(m_1,\dots,m_n) as coefficients of the Lagrange basis polynomials, this has the effect of causing the interpolating polynomial to return mi+1Li(i)=mi+11m_{i+1} \cdot L_i(i) = m_{i+1} \cdot 1.

Lagrange Interpolating Polynomials are Unique Low Degree Extensions

  • The Lagrange interpolating polynomial is unique - it's the only polynomial of degree n1\leq n-1 that represents the nn-length message mm.
  • Each Lagrange basis polynomial has degree n1n-1.
    Li(x)=j=0,jin1xxjxixjL_i(x)= \prod_{j=0, j\neq i}^{n-1} \frac{x-x_j}{x_i-x_j}
    In the iith Lagrange basis polynomial, xx has degree 1 and xjx_j and xix_i are constant values such that xj,xi{0,,n1}x_j, x_i \in \{0,\dots,n-1\}. Thus, the Lagrange basis polynomial Li(x)L_i(x) is a product of n1n-1 linear terms and the interpolating polynomial is a linear combination of Lagrange basis polynomials, so its degree is n1\leq n-1.
  • Since the Lagrange interpolating polynomial qmq_m has degree n1\leq n-1, and interpolates nn points, it must be unique.
  • Proof: Suppose there were another polynomial qmq_m' of degree n1\leq n-1 which interpolated the same nn points. Then qmqmq_m - q_m' is a polynomial of degree n1\leq n-1 with roots at all nn nodes, which would mean that qmqmq_m - q_m' is the constant zero function, so qm=qmq_m = q_m'.
  • We call the Lagrange interpolating polynomial the unique low degree extension of mm, and the set of all evaluations of qm(x)q_m(x) for xFpx\in \mathbb{F}_p a low degree extension encoding of mm.

Low Degree Extensions and Distance Amplification

  • For an nn-dimensional vector m=(m1,,mn)Fpnm=(m_1,\dots,m_n) \in \mathbb{F}_p^n, the low degree extension encoding is a pp-dimensional vector of the evaluations c=(qm(0),,qm(n1))c = (q_m(0),\dots,q_m(n-1)).
  • A message mm' that differs from mm at even one entry will have a Lagrange polynomial that interpolates a different set of values for the same set of nodes x{0,,n1}x\in \{0,\dots,n-1\}, so mm' will have a different low degree extension, qmq_m' still of degree n1\leq n-1.
  • Since qmq_m and qmq_m' are distinct polynomials of degree n1\leq n-1, they agree at n1\leq n-1 points. The low degree extension encodings of mm and mm' are of length pp, so an entry at a random rFpr \in \mathbb{F}_p has
    PrrFp[qm(r)=qm(r)]n1p\Pr_{r\in\mathbb{F}_p}[q_m(r) = q_m'(r)] \leq \frac{n-1}{p}
  • As long as p>np>n, the extension is distance amplifying. If an encoding is distance amplifying, the distance between the codewords (number of entries at which they are different) is greater than the distance between the original messages.
  • As long as the order of the field is sufficiently large, the probability of two random entries in an extension code being the same for different messages is near zero.
  • Both Reed-Solomon and Lagrange interpolation result in a unique n1n-1 or lower degree extension polynomial with distance amplification properties.
  • In the case of Lagrange interpolation, the extension encoding contains the original vector. For an nn-dimensional vector m=(m1,,mn)Fpnm=(m_1,\dots,m_n) \in \mathbb{F}_p^n, the first nn evaluations of the Lagrange interpolating polynomial on elements of Fp\mathbb{F}_p will result in the entries (m1,,mn)(m_1,\dots,m_n). We say that this encoding is systemic.

Lagrange Interpolation Worked Example

  • Given a message m=(m1,m2,m3,m4,m5)Fp5m = (m_1, m_2, m_3, m_4, m_5) \in \mathbb{F}_p^5, the length of the message is 5 so the Lagrange polynomial will be of degree 4\leq 4
  • The points being interpolated are ((0,m1),(1,m2),(2,m3),(3,m4),(4,m5))((0,m_1),(1,m_2),(2,m_3),(3,m_4),(4,m_5)) and our nodes are {0,1,2,3,4}\{0,1,2,3,4\}.
  • As an example, we'll evaluate the fourth Lagrange basis polynomial, L3L_3, so i=3i=3, x3=3x_3 = 3 and for
    • L3(x3)=L3(3)=1L_3(x_3) = L_3(3) = 1
    • L3(xj)=0L_3(x_j) = 0 if j0,n1j \in{0,\dots n-1} and j3j \neq 3
  • The expanded Lagrange basis polynomial looks like this:
    L3(x)=xx0x3x0xx1x3x1xx2x3x2xx4x3x4=(xx0)(xx1)(xx2)(xx4)(x3x0)(x3x1)(x3x2)(x3x4)L_3(x) = \frac{x-x_0}{x_3-x_0} * \frac{x-x_1}{x_3-x_1} * \frac{x-x_2}{x_3-x_2} * \frac{x-x_4}{x_3-x_4} = \frac{(x-x_0)(x-x_1)(x-x_2)(x-x_4)}{(x_3-x_0)(x_3-x_1)(x_3-x_2)(x_3-x_4)}
  • Plugging in node values we have
    L3(x)=(x0)(x1)(x2)(x4)(30)(31)(32)(34)=(x5)(x1)(x2)(x4)6L_3(x) = \frac{(x-0)(x-1)(x-2)(x-4)}{(3-0)(3-1)(3-2)(3-4)} =\frac{(x-5)(x-1)(x-2)(x-4)}{-6}
  • Evaluated at x3x_3, the terms in the numerator and the terms in the denominator all cancel out, leaving us with 1
    L3(x3)=(x3x0)(x3x1)(x3x2)(x3x4)(x3x0)(x3x1)(x3x2)(x3x4)=1L_3(x_3) = \frac{(x_3-x_0)(x_3-x_1)(x_3-x_2)(x_3-x_4)}{(x_3-x_0)(x_3-x_1)(x_3-x_2)(x_3-x_4)} = 1
    L3(3)=(30)(31)(32)(34)(30)(31)(32)(34)=(3)(2)(1)(1)(3)(2)(1)(1)=1L_3(3) = \frac{(3-0)(3-1)(3-2)(3-4)}{(3-0)(3-1)(3-2)(3-4)} = \frac{(3)(2)(1)(-1)}{(3)(2)(1)(-1)} = 1
  • Evaluated at any other node, one of the numerator terms evaluates to zero, leaving us with 0
    L3(x2)=(x2x0)(x2x1)(x2x2)(x2x4)(x3x0)(x3x1)(x3x2)(x3x4)=(x2x0)(x2x1)(0)(x2x4)(x3x0)(x3x1)(x3x2)(x3x4)=0L_3(x_2) = \frac{(x_2-x_0)(x_2-x_1)\textcolor{red}{(x_2-x_2)}(x_2-x_4)}{(x_3-x_0)(x_3-x_1)(x_3-x_2)(x_3-x_4)} = \frac{(x_2-x_0)(x_2-x_1)\textcolor{red}{(0)}(x_2-x_4)}{(x_3-x_0)(x_3-x_1)(x_3-x_2)(x_3-x_4)} = 0
    L3(2)=(21)(22)(24)(25)(31)(32)(34)(35)=(21)(0)(24)(25)(31)(32)(34)(35)=0L_3(2) = \frac{(2-1)\textcolor{red}{(2-2)}(2-4)(2-5)}{(3-1)(3-2)(3-4)(3-5)}= \frac{(2-1)\textcolor{red}{(0)}(2-4)(2-5)}{(3-1)(3-2)(3-4)(3-5)} = 0
  • The full Lagrange interpolating polynomial for m=(m0,m1,m2,m3,m4)m = (m_0,m_1,m_2,m_3,m_4) can be calculated as follows
    L0(x)=j=1,j0n1xxjx0xj=(xx1)(xx2)(xx3)(xx4)(x0x1)(x0x2)(x0x3)(x0x4)=(x1)(x2)(x3)(x4)(01)(02)(03)(04)=(x1)(x2)(x3)(x4)24L_0(x) = \prod_{j=1,j\neq 0}^{n-1} \frac{x-x_j}{x_0-x_j} = \frac{(x-x_1)(x-x_2)(x-x_3)(x-x_4)}{(x_0-x_1)(x_0-x_2)(x_0-x_3)(x_0-x_4)} = \frac{(x-1)(x-2)(x-3)(x-4)}{(0-1)(0-2)(0-3)(0-4)} = \frac{(x-1)(x-2)(x-3)(x-4)}{-24}
    L1(x)=j=0,j1n1xxjx1xj=(xx0)(xx2)(xx3)(xx4)(x1x0)(x1x2)(x1x3)(x1x4)=(x0)(x2)(x3)(x4)(10)(12)(13)(14)=(x0)(x2)(x3)(x4)6L_1(x) = \prod_{j=0,j\neq 1}^{n-1} \frac{x-x_j}{x_1-x_j} = \frac{(x-x_0)(x-x_2)(x-x_3)(x-x_4)}{(x_1-x_0)(x_1-x_2)(x_1-x_3)(x_1-x_4)} = \frac{(x-0)(x-2)(x-3)(x-4)}{(1-0)(1-2)(1-3)(1-4)} = \frac{(x-0)(x-2)(x-3)(x-4)}{-6}
    L2(x)=j=0,j2n1xxjx2xj=(xx0)(xx1)(xx3)(xx4)(x2x0)(x2x1)(x2x3)(x2x4)=(x0)(x1)(x3)(x4)(20)(21)(23)(24)=(x0)(x1)(x3)(x4)4L_2(x) = \prod_{j=0,j\neq 2}^{n-1} \frac{x-x_j}{x_2-x_j} = \frac{(x-x_0)(x-x_1)(x-x_3)(x-x_4)}{(x_2-x_0)(x_2-x_1)(x_2-x_3)(x_2-x_4)} = \frac{(x-0)(x-1)(x-3)(x-4)}{(2-0)(2-1)(2-3)(2-4)} = \frac{(x-0)(x-1)(x-3)(x-4)}{4}
    L3(x)=j=0,j3n1xxjx3xj=(xx0)(xx1)(xx2)(xx4)(x3x0)(x3x1)(x3x2)(x3x4)=(x0)(x1)(x2)(x4)(30)(31)(32)(34)=(x0)(x1)(x2)(x4)6L_3(x) = \prod_{j=0,j\neq 3}^{n-1} \frac{x-x_j}{x_3-x_j} = \frac{(x-x_0)(x-x_1)(x-x_2)(x-x_4)}{(x_3-x_0)(x_3-x_1)(x_3-x_2)(x_3-x_4)} = \frac{(x-0)(x-1)(x-2)(x-4)}{(3-0)(3-1)(3-2)(3-4)} = \frac{(x-0)(x-1)(x-2)(x-4)}{-6}
    L4(x)=j=0,j4n1xxjx4xj=(xx0)(xx1)(xx2)(xx3)(x4x0)(x4x1)(x4x2)(x4x3)=(x0)(x1)(x2)(x3)(40)(41)(42)(43)=(x0)(x1)(x2)(x3)24L_4(x) = \prod_{j=0,j\neq 4}^{n-1} \frac{x-x_j}{x_4-x_j} = \frac{(x-x_0)(x-x_1)(x-x_2)(x-x_3)}{(x_4-x_0)(x_4-x_1)(x_4-x_2)(x_4-x_3)} = \frac{(x-0)(x-1)(x-2)(x-3)}{(4-0)(4-1)(4-2)(4-3)} = \frac{(x-0)(x-1)(x-2)(x-3)}{24}
    qa(x)=i=0n1mi+1Li(x)=m1L0(x)+m2L1(x)+m3L2(x)+m4L3(x)+m5L4(x)q_a(x) = \sum^{n-1}_{i=0} m_{i+1} \cdot L_i(x) = m_{1}\cdot L_0(x) + m_{2}\cdot L_1(x) + m_{3}\cdot L_2(x) + m_{4}\cdot L_3(x) + m_{5}\cdot L_4(x)

Implementation

 function getUnivariateLagrange(message: string, r: Field): Field {
  const n = message.length;
  if (n === 0) {
    throw Error("Message cannot be empty.");
  }
  // r value is an element in the interpolating set, return the corresponding message entry  
  if (r.lessThan(n).toBoolean()) {
    const index = Number(r.toBigInt());
    return Field(asciiToNumber(message[index]));
  }

  // accumulator for summation over interpolation points  
  let result = Field(0);
  // interpolation points ({0...n-1})  
  const interpolatingSet = Array.from({ length: n }, (_, index) =>
          Field(index),
  );

  // sum over interpolation points  
  for (let i = 0; i < interpolatingSet.length; i++) {
    // coefficient  
    const coeff = Field(asciiToNumber(message[i]));
    // lagrange basis evaluated at r  
    const lagrangeBasis = getLagrangeBasis(i, interpolatingSet, r);
    result = result.add(coeff.mul(lagrangeBasis));
  }

  return result;
}

// returns the index-th lagrange basis polynomial for the interpolating set evaluated at x  
function getLagrangeBasis(
        index: number,
        interpolatingSet: Field[],
        x: Field,
): Field {
  // accumulator for product over interpolating set indices  
  let accumulator = Field(1);
  for (let j = 0; j < interpolatingSet.length; j++) {
    // skip term where j = index  
    if (j === index) continue;
    // jth term = (x-interpolatingSet[j]) / (interpolatingSet[index] - interpolatingSet[j])  
    const term = (x.sub(interpolatingSet[j]))
            .div(
                    (interpolatingSet[index].sub(interpolatingSet[j]))
            );
    accumulator = accumulator.mul(term);
  }

  return accumulator;
}

Profile picture

Written by Breads