Linear Least Squares Fitting

Introduction

We are likely all familiar with the concept of curve-fitting — sometimes, we, as humans, can recognize certain patterns in data about which the data itself is agnostic. If we look at a chart like the following: We think immediately of a linear relationship. But what tools can we use to precisely determine a line from these individual points of data? Intuitively speaking, we can all figure this out pretty easily — we want to find the line that is minimally far from every single one of these points. The way this is actually implemented is through a method known as least squares regression. This form of regression is very powerful, and is widely used in applications from signals processing to economics research. Least squares fitting finds the best curve to fit a set of points through minimizing the sum of the squares of the offsets of each point from the curve.

Linear Regression

Suppose that we have measured three data points P={(1,3),(1,1),(2,2)}, and that our model for these data asserts that the points should lie on a line. Of course, these three points do not actually lie on a single line, but this could be due to errors in our measurement. How do we predict which line they are supposed to lie on?

The general equation for a (non-vertical) line is y=a0+a1x. If our three data points were to lie on this line, then the following equations would be satisfied: {a0a1=3a0+a1=1a0+2a1=2 In order to find the best-fit line, we try to solve the above equations in the unknowns a0 and a1. Putting our linear equations into matrix form, we are trying to solve Ax=b where A=(111112)x=(a0a1)b=(312) As the three points do not actually lie on a line, there is no actual solution. So instead we have to compute an approximate solution.

Consequently, we have to answer to the following important question:
Suppose that Ax=b does not have a solution. What is the best approximate solution?

For our purposes, the best approximate solution is called the least-squares solution. Least squares fitting finds the best curve to fit a set of points through minimizing the sum of the squares of the offsets of each point from the curve. We use the squares of the offsets to ensure the differentiability property, which allows us to linear tools from calculus in analyzing the data. We will present two methods for finding least-squares solutions, and we will give several applications to best-fit problems. We begin by clarifying exactly what we will mean by a “best approximate solution” to an inconsistent matrix equation Ax=b.

Let A be a m×n (with m>n) matrix of Mm,n(R) and b be a vector of Rm. A least-squares solution of the matrix equation Ax=b is a vector x in Rn such that for all xRnA.xb2A.xb2

The term “least squares” comes from the fact that A.xb2 is the square root of the sum of the squares of the entries of the vector A.xb. So a least-squares solution minimizes the sum of the squares of the differences between the entries of Ax and b. In other words, a least-squares solution solves the equation Ax=b as closely as possible, in the sense that the sum of the squares of the difference A.xb is minimized.

Let A be a m×n (with m>n) matrix of Mm,n(R) and b be a vector of Rm. The least-squares solutions of Ax=b are the solutions of the matrix equation AT.A.x=AT.b where AT is the transpose matrix of A. Moreover, if rank(A)=n then Ax=b has a unique least-squares solution x with x=A+.b where A+=(AT.A)1.AT is called the pseudo inverse of matrix A.

Example: Let y=a0+a1x be a linear model of the data P={(0,1),(2,1.9),(4,3.2)}. Then we have to solve the overdetermined linear system Ax=b by using a least squares solution where AT=[111024],bT=[1,1.9,3.2] As the rank of the matrix is equal to 2, by using the previous Theorem, it yields: x^=A+b=(ATA)1ATb=[0.93,0.55]T Consequently, y=0.93+0.55x is the unique least squares solution to this problem.

  • The least squares method is based on the residual, the difference between the predicted value by the model Ax and the actual value b given in the dataset: r=Axb. To find the optimal function parameters in x, we have to minimize the squared error F(x)=||r||22=rTr=(Axb)T(Axb) F(x)=xTAT.Ax2xTATb+bTb A necessary condition is to minimized this quadratic function: F(x)=2(xTATAbTA)=0
  • If rank(A)=n then matrix ATA is positive definite i.e. this matrix is invertible: for any vector xRnxTATAx=Ax20 As rank(A)=n it yields that xTATAx=0 implies that Ax=0 and x=0.
    Consequently, ATA is invertible and x=(ATA)1ATb=A+b
  • Sufficient condition:
    if x satifies AT.A.x=AT.b then for any xRn we have F(x)F(x)=(Axb)T(Axb)(Axb)T(Axb), F(x)F(x)=(xx)TATA(xx) After some calculations, it yields (xx)TATA(xx)=A(xx)220 Consequently, for any xRn we have F(x)F(x). Moreover if rank(A)=n then F(x)=F(x) implies that A(xx)2=0 and x=x. Hence, x is the unique minimum of F.
  • You may have noticed that we have been careful to say “the least-squares solutions” in the plural, and “a least-squares solution” using the indefinite article. This is because a least-squares solution need not be unique: indeed, if the columns of A are linearly dependent, then the least-squares solution of Ax=b has infinitely many solutions. The previous theorem, gives us a criteria for uniqueness.

    Exercice [PY-1]

    • We wish to calculate the least-square approximation of the previous data P={(1,3),(1,1),(2,2)} with a polynomial function of degree n=1.
      Firstly, write this problem as an interpolation problem of the form A.x=b. We then obtain an overdetermined system. We propose to solve it by minimizing the following function: min(a0,a1)R2F(a0,a1)=||A.xb||2 Show that ||A.xb||2=j=13(a0+a1xjyj)2

    • By writting all the partial derivatives of F and by solving Fa0=0 and Fa1=0 show that we have to solve ATA.x=ATb.

    • Draw the data points and the linear function solution to this system.



    • Let f(x)=cos(x) be a real function defined on [0,2π].
      We select p=10 or p=20 uniform data points on this interval P={(xj=(j1)pπ,yj=f(xj))forj=1,...,p}
    • Calculate the best least-squares polynomial approximation P4(x)=a0+a1x+a2x2+a3x3+a4x4 of degree equal or less than n=4 of these data points P.
    • Print the maximum error.
    • Draw the data points and the polynomial solution P4(x).

    • In this part we want to minimize the sum of the squares of the relative errors: εj=P4(xj)yjyj For this we want to minimize the following function: F(a0,a1,,a4)=j=1pεj2=j=1pwj(P4(xj)yj)2 where the values wj=1(yj)2 are called the weights of the data points. We assume that yj0. Consequently, the data with high values yj are less sensitive in the minimization process.
      Show that, in this case, we have to solve the following linear system: AT.W.A.x=AT.W.b where W is a p×p diagonal matrix with the wj as diagonal values.

  • Firstly, we calculate the partial derivatives:
    Fa0=2×j=131×(a0+a1xjyj)=0 and Fa1=2×j=13xj(a0+a1xjyj)=0 We then obtain the following linear system: {a0j=131+a1j=13xj=j=13yja0j=13xj+a1j=13xj2=j=13xjyj Consequently we have to solve B.x=c where B=[3j=13xjj=13xjj=13xj2],c=[j=13yjj=13xj.yj] and xT=[a0,a1].
    By direct calculations we can show that B=ATA and c=ATb.
  • For n=4 we have to solve: {a0.p+a1j=1pxj+a2j=1pxj2+a3j=1pxj3+a4j=1pxj4=j=1pyja0j=1pxj+a1j=1pxj2+a2j=1pxj3+a3j=1pxj4+a4j=1pxj5=j=1pxjyja0j=1pxj2+a1j=1pxj3+a2j=1pxj4+a3j=1pxj5+a4j=1pxj6=j=1pxj2yja0j=1pxj3+a1j=1pxj4+a2j=1pxj5+a3j=1pxj6+a4j=1pxj7=j=1pxj3yja0j=1pxj4+a1j=1pxj5+a2j=1pxj6+a3j=1pxj7+a4j=1pxj8=j=1pxj4yj
  • As F(a0,a1,,a4)=j=1pwj(P4(xj)yj)2 then for any k{0,1,2,3,4}: Fak=2×j=1pxjkwj×(a0+a1xj+...+a4xj4yj)=0 It yields the following equations for any k{0,1,2,3,4}: a0j=1pwjxjk+a1j=1pwjxjk+1+...+a4j=1pwjxjk+4=j=1pwjxjkyj which can be written: AT.W.A.x=AT.W.b with W=diag(w1,w2,...,wp).
  • import numpy as np
    import matplotlib.pyplot as plt
    import numpy.linalg as LA
    
    print("*******************") 
    print("PY-1")
    print("*******************") 
    
    
    
    # création de la liste des données à interpoler
    def f(x):
        return np.abs(x)+1
    
    # Horner Algorithm
    def Horner(x,coeff):
        p=len(coeff)
        s=coeff[p-1]
        for i in range(p-1):
            s=x*s+coeff[p-2-i]
        return s
    
    # nombre de points
    p=7
    # degré du polynôme
    n=4
    # intervalle
    a=-np.pi
    b= np.pi
    
    X=np.linspace(a,b,p)
    Y=f(X)
    
    # Création de la matrice d'interpolation
    A=np.zeros((p,n+1))
    
    for i in range(p):
        for j in range(n+1):
            A[i,j]=X[i]**j
            
    #print("matrice interpolation: ")
    #print(np.round(A,1))
    # résolution
    coeff=LA.solve(np.dot(np.transpose(A),A),np.dot(np.transpose(A),Y))
    #print("mat= ",np.dot(np.transpose(A),A)," b= ",np.dot(np.transpose(A),Y))
    print("solution= ",coeff)
    
    err1 = [np.abs(Horner(X[i],coeff)-Y[i]) for i in range(p)]
    print("\nmax error data points = ",np.max(err1))
    
    t=np.linspace(a,b,200)
    err2 = [np.abs(Horner(t[i],coeff)-f(t[i])) for i in range(len(t))]
    print("\nmax error data |polynom -function| = ",np.max(err2))
    
    print("\nRelative error minimization")
    
    w = [1/Y[i]**2 for i in range(p)]
    W = np.diag(w)
    
    # résolution
    coeffr=LA.solve(np.dot(np.transpose(A),np.dot(W,A)),np.dot(np.transpose(A),np.dot(W,Y)))
    print("solution= ",coeffr)
    err1 = [np.abs(Horner(X[i],coeffr)-Y[i]) for i in range(p)]
    print("\nmax error data points = ",np.max(err1))
    
    t=np.linspace(a,b,200)
    err2 = [np.abs(Horner(t[i],coeffr)-f(t[i])) for i in range(len(t))]
    print("\nmax error data |polynom -function| = ",np.max(err2))
    
    plt.clf()
    p1=plt.plot(X,Y,'bo')
    p2=plt.plot(t,f(t),'k',label='function')
    p3=plt.plot(t,Horner(t,x),'r',label="l.s. polynom")
    p3=plt.plot(t,Horner(t,xr),'g',label="l.s.r polynom")
    plt.legend(loc='lower right')
    plt.show()
    plt.savefig('PY1.png')
    

    Exercice [PY-2]

    Gauss invented the method of least squares to find a best-fit ellipse: he correctly predicted the (elliptical) orbit of the asteroid Ceres as it passed behind the sun in 1801.

    • Find the best least square ellipse through the points P={(0,2),(2,1),(1,1),(1,2),(3,1),(1,1)}. What quantity is being minimized?

    • Solve this problem with the numpy python library and draw the best least-squares ellipse solution. For this, you can use the contour function of matplotlib.pyplot

  • The general equation for an ellipse (actually, for a nondegenerate conic section) is f(x,y)=x2+By2+Cxy+Dx+Ey+F=0. This is an implicit equation: the ellipse is the set of all solutions of the equation. To say that our data points lie on the ellipse means that the above equation is satisfied for the given values of x and y. To put this in matrix form, we move the constant terms to the right-hand side of the equals sign; then we can write this as Ax=b and solve it by using the normal equations.
  • print("\n***************")
    print("Exercice PY-2")
    print("***************\n")
    
    def ellipse(x,y,c):
        res = x*x+c[0]*y*y+c[1]*x*y+c[2]*x+c[3]*y+c[4]
        return res
    
    P=[[0,2],[2,1],[1,-1],[-1,-2],[-3,1],[-1,1]]
    xp = np.transpose(P)[0]
    yp = np.transpose(P)[1]  
    
    # Approximation au sens des moindres carrées de la conique
    A=[]
    b=[]
    for i in range(len(P)):
        x=P[i][0]
        y=P[i][1]
        A.append([y*y,x*y,x,y,1])
        b.append(-x*x)
    coeff=LA.solve(np.dot(np.transpose(A),A),np.dot(np.transpose(A),b))
    
    #calcul de la grille des valeurs et de la conique
    delta = 0.025
    x = np.arange(-5.0, 5.0, delta)
    y = np.arange(-3.0, 3.0, delta)
    X, Y = np.meshgrid(x, y)
    Z=ellipse(X,Y,coeff)
    
    
    plt.clf()
    #tracé des points
    p1=plt.plot(xp,yp,'bo')
    CS=plt.contour(X, Y, Z,levels = np.arange(0.0, 1.0, 5.0),extend='both')
    plt.clabel(CS,inline=1,fontsize=10)
    plt.title("Least Squares Ellipse")
    plt.savefig('ellipse.png')
    

  • Segmented Linear Regression

    Segmented linear regression (SLR) addresses this issue by offering piecewise linear approximation of a given dataset [2]. It splits the dataset into a list of subsets with adjacent ranges and then for each range finds linear regression, which normally has much better accuracy than one line regression for the whole dataset. It offers an approximation of the dataset by a sequence of line segments. In this context, segmented linear regression can be viewed as a decomposition of a given large dataset into a relatively small set of simple objects that provide compact but approximate representation of the given dataset within a specified accuracy. Segmented linear regression provides quite robust approximation. This strength comes from the fact that linear regression does not impose constraints on ends of a line segment. For this reason, in segmented linear regression, two consecutive line segments normally do not have a common end point. This property is important for practical applications, since it enables us to analyse discontinuities in temporal data: sharp and sudden changes in the magnitude of a signal, switching between quasi-stable states. This method allows us to understand the dynamics of a system or an effect. For comparison, high degree polynomial regression provides a continuous approximation [3]. Its interpolation and extrapolation suffer from overfitting oscillations [4]. Polynomial regression is not a reliable predictor in applications mentioned earlier. Despite the simple idea, segmented linear regression is not a trivial algorithm when it comes to its implementation. Unlike linear regression, the number of segments in an acceptable approximation is unknown. The specification of the number of segments is not a practical option, since a blind choice produces usually a result that badly represents a given dataset. Similarly, uniform splitting a given dataset into equal-sized subsets is not effective for achieving an optimal approximation result. Instead of these options, we place the restriction of the allowed deviation and attempt to construct an approximation that guarantees the required accuracy with as few line segments as possible. The line segments detected by the algorithm reveal essential features of a given dataset. The deviation at a specified x value is defined as the absolute difference between original and approximation y values. The accuracy of linear regression in a range and the approximation error are defined as the maximum of all of the values of the deviations in the range. The maximum allowed deviation (tolerance) is an important user interface parameter of the model. It controls the total number and lengths of segments in a computed segmented linear regression. If input value of this parameter is very large, there is no splitting of a given dataset. The result is just one line segment of linear regression. If input value is very small, then the result may have many short segments. It is interesting to note that within this formulation, the problem of segmented linear regression is similar to the problem of polyline simplification. The solution to this problem provides the algorithm of Ramer, Douglas and Peucker [5]. Nevertheless, there are important reasons why polyline simplification is not the best choice for noisy data. This algorithm gives first priority to data points the most distant from already constructed approximation line segments. In the context of this algorithm, such data points are the most significant. The issue is that in the presence of noise, the significant points normally arise from high frequency fluctuations. The biggest damage is caused by outliers. For this reason, noise can have substantial effect on the accuracy of polyline simplification. In piecewise linear regression, each line segment is balanced against the effect of noise. The short term fluctuations are removed to show significant long term features and trends in original data.

    Linear Approximation Theory: an introduction

    Approximation theory is concerned with the ability to approximate functions by simpler and more easily calculated functions. The first question we ask in approximation theory concerns the possibility of approximation. Is the given family of functions from which we plan to approximate dense in the set of functions we wish to approximate? In this section we survey some of the main density results and density methods.

    Let us consider the following function f(x)=|x|, we want to find a polynom function Pn(x) of degree n which closely approximates f. One way to choose this polynom is to minimized the "area" or the "distance" between f and Pn.

    Let E=C0([a,b]) be the vector space of continuous functions on [a,b] (with a<b) into R. We can easily show that the polynomial vector space Pn belongs to C0([a,b]).

    We denote by f,g:E×ER the scalar product of any functions f and g belonging to E. It satisfies f+h,g=f,g+h,gfor any λR   λf,g=λ.f,gf,g=g,ff,f0f,f=0f=0

    We say that such scalar product on E is a symetric bilinear form which is positive definite.

    The scalar product can be defined by: f,g=abw(x)f(x)g(x)dx where w(x) is a "weighted" function. For example we can choose:
    • w(x)=1 or w(x)=11x2 on ]1,1[
    • w(x)=ex2 on ],[
    So as to define a polynomial approximation Pn of a continuous function f, we need a metric or a distance between f and Pn. For this, we define a norm on E.

    Let E be the vector space of continuous functions on [a,b] and .,. be a scalar product on E then f2=f,f is a norm on E.

    In our previous example, we have: f2=(abw(x)|f(x)|2dx)12. Consequently, the distance between two functions is given by d(f,g)=fg2.
  • Let us proof the Cauchy -Schwarz inequality for any f,gE,   we have  |f,g|f2g2 For any λR and by using the scalar product properties, it yields: f+λg,f+λg=f,f+2λf,g+λ2g,g0 If g,g0, we set λ=f,gg,g then we have f,f(f,g)2g,g0 and the inequality follows.
    If g,g=0 then for any λR we have f,f+2λf,g0. As f,f0, it implies that f,g must be equal to 0. The inequality is then also satisfied.
  • We can then show that f2 is a norm.
  • f+g22=f,f+2f,g+g,g=f22+2f,g+g22 By using the Cauchy-Schwarz inequality it yields: f+g22f22+2f2g2+g22=(f2+g2)2 Consequently f+g2f2+g2.

  • The other norm properties directly follow from the scalar product properties:
  • f2=(f,f)120,
  • f2=0f,f=0f=0,
  • and λf2=(λ2f,f)12=|λ|.f2.

  • Exercice [M-1]

    • Show that for any fE, f2=(ab|f(x)|2dx)12 is a norm.

    • Show that f1=ab|f(x)|dx is also a norm. Show that we cannot associate a scalar product to this norm.

    • We set a=0 and b=1. We consider the following functions f(x)=x2 and Pλ(x)=λx where λ[0,1]. Calculate the optimal values of λ in the following minimization problems: minλ[0,1]fPλ1 and minλ[0,1]fPλ2

  • We start with the 1-norm demonstration:

    - We have f1=ab|f(x)|dx0 as the integration of a positive function is positive. The fact that a<b is very important here.
    - Moreover we have that 01=0 and λf1=|λ|.ab|f(x)|dx=|λ|.f1 due to the integral properties.
    - The triangular inequality is straightforward by considering for any x[a,b] the following inequality: |f(x)+g(x)||f(x)|+|g(x)| Consequently f+g1ab(|f(x)|+|g(x)|)dx=ab|f(x)|dx+ab|g(x)|dx by the integral monotony property. It yields f+g1f1+g1.
    - To conclude, we have to show that for any positive and continuous function g on [a,b] that abg(x)dx=0      for any x[a,b]  g(x)=0. We will proof it by contrapositive (we assume non(Q) and show that this implies non(P)). We set g(x)=|f(x)|. We assume that there exists x0[a,b] such that g(x0)=γ>0. As function g is continuous, by setting ϵ=γ2>0 we know that: there exists η>0 such that for any x[x0η,x0+η],  |g(x)γ|γ2. Consequently for any x[x0η,x0+η], γ2g(x)3γ2. As g0 on [a,b] it yields abg(x)dxx0ηx0+ηg(x)dxx0ηx0+ηγ2dx=γ.η>0. This achieves the proof.
    Remark. We can adapt the interval to [a,a+η] (resp. [bη,b]) if x0=a (resp. x0=b).

  • The 2-norm case:

    It is easy to show that f2=(ϕ(f,f))12 where ϕ(f,f)=ab|f(x)|2dx. If ϕ(f,f) is a scalar product it then follows directly by the previous Theorem that f2 is a norm. Let us show the more difficult property: ϕ(f,f)=0 implies that f=0. For this, we set g(x)=|f(x)|2. As ϕ(f,f)=abg(x)dx then by using the previous proof, we obtain that ϕ(f,f)=0 implies that g(x)=|f(x)|2=0 and consequently f=0.

  • We set ϕ(x)=|f(x)λ.x|=x|xλ|. We study this function and we find that λ1=12 and λ2=34.
  • Let Pn be a polynom of Pn. The polynom Pn is the best least square approximation of f[a,b] if fPn2=minQnPnfQn2 We also denote this by Pn=argminQnPnfQn2.

    In order to calculate such a solution we need the following important result.

    A necessary and sufficient condition for PnPn being a best least square approximation of fE is: QnPn, fPn,Qn=0.

    This condition can be interpreted geometrically. Among all the polynoms Qn of Pn the solution Pn is obtained as the orthogonal projection of f on the vector space Pn which is included in E.

    () Let Pn be the best least square approximation of f. We use a reductio ad absurdum argument (raisonnement par l'absurde). We assume there exists a polynom QPn such that fPn,Q=α0. We set R=Pn+λQ. If λ]0,2αQ22[ then we have fR22=(fPn)λQ22=(fPn)222λα+λ2Q22<fPn22 By choosing λ=αQ22 we obtain a contradiction.
    () Let PnPn such that for any QPn, fPn,Qn=0. As fPn,QPn=0 because QPnPn, we then have fQ22=(fPn)(QPn),(fPn)(QPn)=fPn22+QPn22 Consequently fPn2fQ2 and Pn is the best least square approximation of f.


    Moreover, this projection is unique.

    The best least square approximation PnPn of fE is unique.

    Let g1 and g2 be two best least square approximation belonging to Pn of f. It yields g1g222=g1f+fg2,g1g2=g1f,g1g2+fg2,g1g2 As g1g2 belongs to Pn we then have by the previous Theorem that the scalar products are equals to zero. Consequently g1=g2.

    Exercice [PY-3]

    We want to calculate P2, the best least square polynomial approximation of f(x)=|x| on [1,1]. For this, we consider the following scalar product: f,g=abf(x)g(x)dx.

    • By using the necessary and sufficient condition, show that the solution P2(x)=a0+a1x+a2x2 can be obtained by solving the following linear system: [1,11,x1,x2x,1x,xx,x2x2,1x2,xx2,x2][a0a1a2]=[f,1f,xf,x2]

    • Calculate all the coefficients of the previous system and solve it with python.

    • Draw the function and polynom P2.

    • If the scalar product becomes f,g=abw(x)f(x)g(x)dx then the we have to solve the following system: pj=0(11w(x)xj+kdx)aj=11w(x)f(x)xkdx Calculate the new solution when w(x)=11x2 on ]1,1[.

            
    import numpy as np
    import time
    from numpy import linalg as LA
    import matplotlib.pyplot as plt
    
    
    def f(x):
        return np.abs(x)
    
    def Pol(x,coeff):
        s = 0.0
        # TO DO
        return s
    
    # Define the matrix
    def Mat(X):
        n=len(X)-1
        A=np.zeros((n+1,n+1))
        # TO DO
        return A
    
    # degree of the polynom
    n=2
    
    #Plot function f
    t=np.linspace(a,b,100)
    p2=plt.plot(t,f(t),'k')
    
    #TO DO: solve the linear system
    
    # Plot the polynom
    p3=plt.plot(t,Pol(t,coeff),'r')
            
        

                            
    import numpy as np
    from numpy import linalg as LA
    import matplotlib.pyplot as plt
    
    # Interpolation polynomiale
    print("Interpolation polynomiale")
    def f(x):
        return np.abs(x)
        #return (x**2-4*x+4)*(x**2+4*x+2) 
    
    # nombre de point d'interpolation
    p=5
    # degré du polynôme
    n=p-1
    # création de la liste des données à interpoler
    a=-1.0
    b= 1.0
    
    # Horner Algorithm
    def Pol(x,coeff):
        p=len(coeff)
        s=coeff[p-1]
        for i in range(p-1):
            s=x*s+coeff[p-2-i]
        return s
    
    
    X=np.linspace(a,b,p)
    Y=f(X)
    
    t=np.linspace(a,b,1000)
    plt.ylim(-1,2)
    p1=plt.plot(X,Y,'bo')
    
    
    # Création de la matrice d'interpolation
    A=np.eye(n+1)
    for i in range(n+1):
        for j in range(n+1):
            A[i,j]=X[i]**j    
    
    # Resolution de AX=Y
    coeff=LA.solve(A,Y)
    print("coefficients = ",coeff)
    
    p2=plt.plot(t,f(t),'k')
    p3=plt.plot(t,Pol(t,coeff),'r')
    plt.show()
                            
                        

    References

    1. C. Eckart, G. Young, The approximation of one matrix by another of lower rank. Psychometrika, Volume 1, 1936, Pages 211–8. doi:10.1007/BF02288367
    2. C. de Boor (1978) A Practical Guide to Splines. Springer-Verlag.
    3. G.D. Knott (2000) Interpolating Cubic Splines. Birkhäuser.
    4. H.J. Nussbaumer (1981) Fast Fourier Transform and Convolution Algorithms. Springer-Verlag.
    5. H. Späth (1995) One Dimensional Spline Interpolation. AK Peters.