Polynomial regression

From Rosetta Code

Jump to: navigation, search
Task
Polynomial regression
You are encouraged to solve this task according to the task description, using any language you may know.

Find an approximating polynom of known degree for a given data.

Example: For input data:

x = {0,  1,  2,  3,  4,  5,  6,   7,   8,   9,   10};
y = {1,  6,  17, 34, 57, 86, 121, 162, 209, 262, 321};

The approximating polynom is:

3 x2 + 2 x + 1

Here, the polynom's coefficients are (3, 2, 1).

This task is intended as a subtask for Measure relative performance of sorting algorithms implementations.


Contents

[edit] Ada

with Ada.Numerics.Real_Arrays;  use Ada.Numerics.Real_Arrays;
 
function Fit (X, Y : Real_Vector; N : Positive) return Real_Vector is
A : Real_Matrix (0..N, X'Range); -- The plane
begin
for I in A'Range (2) loop
for J in A'Range (1) loop
A (J, I) := X (I)**J;
end loop;
end loop;
return Solve (A * Transpose (A), A * Y);
end Fit;

The function Fit implements least squares approximation of a function defined in the points as specified by the arrays xi and yi. The basis φj is xj, j=0,1,..,N. The implementation is straightforward. First the plane matrix A is created. Ajij(xi). Then the linear problem AATc=Ay is solved. The result cj are the coefficients. Constraint_Error is propagated when dimensions of X and Y differ or else when the problem is ill-defined.

[edit] Example

with Fit;
with Ada.Float_Text_IO; use Ada.Float_Text_IO;
 
procedure Fitting is
C : constant Real_Vector :=
Fit
( (0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0),
(1.0, 6.0, 17.0, 34.0, 57.0, 86.0, 121.0, 162.0, 209.0, 262.0, 321.0),
2
);
begin
Put (C (0), Aft => 3, Exp => 0);
Put (C (1), Aft => 3, Exp => 0);
Put (C (2), Aft => 3, Exp => 0);
end Fitting;

Sample output:

 1.000 2.000 3.000

[edit] ALGOL 68

Translation of: Ada

Works with: ALGOL 68 version Standard - lu decomp and lu solve are from the GSL library
Works with: ALGOL 68G version Any - tested with release mk15-0.8b.fc9.i386

MODE FIELD = REAL;
 
MODE
VEC = [0]FIELD,
MAT = [0,0]FIELD;
 
PROC VOID raise index error := VOID: (
print(("stop", new line));
stop
);
 
COMMENT from http://rosettacode.org/wiki/Matrix_Transpose#ALGOL_68 END COMMENT
OP ZIP = ([,]FIELD in)[,]FIELD:(
[2 LWB in:2 UPB in,1 LWB in:1UPB in]FIELD out;
FOR i FROM LWB in TO UPB in DO
out[,i]:=in[i,]
OD;
out
);
 
COMMENT from http://rosettacode.org/wiki/Matrix_multiplication#ALGOL_68 END COMMENT
OP * = (VEC a,b)FIELD: ( # basically the dot product #
FIELD result:=0;
IF LWB a/=LWB b OR UPB a/=UPB b THEN raise index error FI;
FOR i FROM LWB a TO UPB a DO result+:= a[i]*b[i] OD;
result
);
 
OP * = (VEC a, MAT b)VEC: ( # overload vector times matrix #
[2 LWB b:2 UPB b]FIELD result;
IF LWB a/=LWB b OR UPB a/=UPB b THEN raise index error FI;
FOR j FROM 2 LWB b TO 2 UPB b DO result[j]:=a*b[,j] OD;
result
);
 
OP * = (MAT a, b)MAT: ( # overload matrix times matrix #
[LWB a:UPB a, 2 LWB b:2 UPB b]FIELD result;
IF 2 LWB a/=LWB b OR 2 UPB a/=UPB b THEN raise index error FI;
FOR k FROM LWB result TO UPB result DO result[k,]:=a[k,]*b OD;
result
);
 
COMMENT from http://rosettacode.org/wiki/Pyramid_of_numbers#ALGOL_68 END COMMENT
OP / = (VEC a, MAT b)VEC: ( # vector division #
[LWB a:UPB a,1]FIELD transpose a;
transpose a[,1]:=a;
(transpose a/b)[,1]
);
 
OP / = (MAT a, MAT b)MAT:( # matrix division #
[LWB b:UPB b]INT p ;
INT sign;
[,]FIELD lu = lu decomp(b, p, sign);
[LWB a:UPB a, 2 LWB a:2 UPB a]FIELD out;
FOR col FROM 2 LWB a TO 2 UPB a DO
out[,col] := lu solve(b, lu, p, a[,col]) [@LWB out[,col]]
OD;
out
);
 
FORMAT int repr = $g(0)$,
real repr = $g(-7,4)$;
 
PROC fit = (VEC x, y, INT order)VEC:
BEGIN
[0:order, LWB x:UPB x]FIELD a; # the plane #
FOR i FROM 2 LWB a TO 2 UPB a DO
FOR j FROM LWB a TO UPB a DO
a [j, i] := x [i]**j
OD
OD;
( y * ZIP a ) / ( a * ZIP a )
END # fit #;
 
PROC print polynomial = (VEC x)VOID: (
BOOL empty := TRUE;
FOR i FROM UPB x BY -1 TO LWB x DO
IF x[i] NE 0 THEN
IF x[i] > 0 AND NOT empty THEN print ("+") FI;
empty := FALSE;
IF x[i] NE 1 OR i=0 THEN
IF ENTIER x[i] = x[i] THEN
printf((int repr, x[i]))
ELSE
printf((real repr, x[i]))
FI
FI;
CASE i+1 IN
SKIP,print(("x"))
OUT
printf(($"x**"g(0)$,i))
ESAC
FI
OD;
IF empty THEN print("0") FI;
print(new line)
);
 
fitting: BEGIN
VEC c =
fit
( (0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0),
(1.0, 6.0, 17.0, 34.0, 57.0, 86.0, 121.0, 162.0, 209.0, 262.0, 321.0),
2
);
print polynomial(c);
VEC d =
fit
( (0, 1, 2, 3, 4, 5, 6, 7, 8, 9),
(2.7, 2.8, 31.4, 38.1, 58.0, 76.2, 100.5, 130.0, 149.3, 180.0),
2
);
print polynomial(d)
END # fitting #

Output:

3x**2+2x+1
 1.0848x**2+10.3552x-0.6164

[edit] C

Library: GNU Scientific Library

Include file (to make the code reusable easily) named polifitgsl.h

#ifndef _POLIFITGSL_H
#define _POLIFITGSL_H
#include <gsl/gsl_multifit.h>
#include <stdbool.h>
#include <math.h>
bool polynomialfit(int obs, int degree,
double *dx, double *dy, double *store); /* n, p */
#endif

Implementation (the examples here helped alot to code this quickly):

#include "polifitgsl.h"
 
bool polynomialfit(int obs, int degree,
double *dx, double *dy, double *store) /* n, p */
{
gsl_multifit_linear_workspace *ws;
gsl_matrix *cov, *X;
gsl_vector *y, *c;
double chisq;
 
int i, j;
 
X = gsl_matrix_alloc(obs, degree);
y = gsl_vector_alloc(obs);
c = gsl_vector_alloc(degree);
cov = gsl_matrix_alloc(degree, degree);
 
for(i=0; i < obs; i++) {
gsl_matrix_set(X, i, 0, 1.0);
for(j=0; j < degree; j++) {
gsl_matrix_set(X, i, j, pow(dx[i], j));
}
gsl_vector_set(y, i, dy[i]);
}
 
ws = gsl_multifit_linear_alloc(obs, degree);
gsl_multifit_linear(X, y, c, cov, &chisq, ws);
 
/* store result ... */
for(i=0; i < degree; i++)
{
store[i] = gsl_vector_get(c, i);
}
 
gsl_multifit_linear_free(ws);
gsl_matrix_free(X);
gsl_matrix_free(cov);
gsl_vector_free(y);
gsl_vector_free(c);
return true; /* we do not "analyse" the result (cov matrix mainly)
to know if the fit is "good" */

}

Testing:

#include <stdio.h>
 
#include "polifitgsl.h"
 
#define NP 11
double x[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
double y[] = {1, 6, 17, 34, 57, 86, 121, 162, 209, 262, 321};
 
#define DEGREE 3
double coeff[DEGREE];
 
int main()
{
int i;
 
polynomialfit(NP, DEGREE, x, y, coeff);
for(i=0; i < DEGREE; i++) {
printf("%lf\n", coeff[i]);
}
return 0;
}

Output of the test:

1.000000
2.000000
3.000000

[edit] Fortran

Library: LAPACK

module fitting
contains
 
function polyfit(vx, vy, d)
implicit none
integer, intent(in) :: d
integer, parameter :: dp = selected_real_kind(15, 307)
real(dp), dimension(d+1) :: polyfit
real(dp), dimension(:), intent(in) :: vx, vy
 
real(dp), dimension(:,:), allocatable :: X
real(dp), dimension(:,:), allocatable :: XT
real(dp), dimension(:,:), allocatable :: XTX
 
integer :: i, j
 
integer :: n, lda, lwork
integer :: info
integer, dimension(:), allocatable :: ipiv
real(dp), dimension(:), allocatable :: work
 
n = d+1
lda = n
lwork = n
 
allocate(ipiv(n))
allocate(work(lwork))
allocate(XT(n, size(vx)))
allocate(X(size(vx), n))
allocate(XTX(n, n))
 
! prepare the matrix
do i = 0, d
do j = 1, size(vx)
X(j, i+1) = vx(j)**i
end do
end do
 
XT = transpose(X)
XTX = matmul(XT, X)
 
! calls to LAPACK subs DGETRF and DGETRI
call DGETRF(n, n, XTX, lda, ipiv, info)
if ( info /= 0 ) then
print *, "problem"
return
end if
call DGETRI(n, XTX, lda, ipiv, work, lwork, info)
if ( info /= 0 ) then
print *, "problem"
return
end if
 
polyfit = matmul( matmul(XTX, XT), vy)
 
deallocate(ipiv)
deallocate(work)
deallocate(X)
deallocate(XT)
deallocate(XTX)
 
end function
 
end module

[edit] Example

program PolynomalFitting
use fitting
implicit none
 
! let us test it
integer, parameter :: degree = 2
integer, parameter :: dp = selected_real_kind(15, 307)
integer :: i
real(dp), dimension(11) :: x = (/ (i,i=0,10) /)
real(dp), dimension(11) :: y = (/ 1, 6, 17, 34, &
57, 86, 121, 162, &
209, 262, 321 /)
real(dp), dimension(degree+1) :: a
 
a = polyfit(x, y, degree)
 
write (*, '(F9.4)') a
 
end program

Output (lower powers first, so this seems the opposite of the Python output):

   1.0000
   2.0000
   3.0000

[edit] gnuplot

# The polynomial approximation
f(x) = a*x**2 + b*x + c
 
# Initial values for parameters
a = 0.1
b = 0.1
c = 0.1
 
# Fit f to the following data by modifying the variables a, b, c
fit f(x) '-' via a, b, c
0 1
1 6
2 17
3 34
4 57
5 86
6 121
7 162
8 209
9 262
10 321
e
 
print sprintf("\n --- \n Polynomial fit: %.4f x^2 + %.4f x + %.4f\n", a, b, c)

[edit] Haskell

Uses module Matrix.LU from hackageDB DSP

import Data.List
import Data.Array
import Control.Monad
import Matrix.LU
 
ppoly p x = map (x**) p
 
polyfit d ry = elems $ solve mat vec where
mat = listArray ((1,1), (d,d)) $ liftM2 concatMap ppoly id [0..fromIntegral $ pred d]
vec = listArray (1,d) $ take d ry

Output in GHCi:

*Main> polyfit 3 [1,6,17,34,57,86,121,162,209,262,321]
[1.0,2.0,3.0]


[edit] HicEst

REAL :: n=10, x(n), y(n), m=3, p(m)
 
x = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y = (1, 6, 17, 34, 57, 86, 121, 162, 209, 262, 321)
 
p = 2 ! initial guess for the polynom's coefficients
 
SOLVE(NUL=Theory()-y(nr), Unknown=p, DataIdx=nr, Iters=iterations)
 
WRITE(ClipBoard, Name) p, iterations
 
FUNCTION Theory()
! called by the solver of the SOLVE function. All variables are global
Theory = p(1)*x(nr)^2 + p(2)*x(nr) + p(3)
END
SOLVE performs a (nonlinear) least-square fit (Levenberg-Marquardt): 
p(1)=2.997135145; p(2)=2.011348347; p(3)=0.9906627242; iterations=19;

[edit] J

   X=:i.# Y=:1 6 17 34 57 86 121 162 209 262 321
Y (%. (^/ x:@i.@#)) X
1 2 3 0 0 0 0 0 0 0 0

Note that this implementation does not use floating point numbers, so we do not introduce floating point errors. Using exact arithmetic has a speed penalty, but for small problems like this it is inconsequential.

[edit] Mathematica

Using the built-in "Fit" function.

data = Transpose@{Range[0, 10], {1, 6, 17, 34, 57, 86, 121, 162, 209, 262, 321}};
Fit[data, {1, x, x^2}, x]

Result:

1 + 2x + 3x^2

[edit] MATLAB

Matlab has a built-in function "polyfit(x,y,n)" which performs this task. The arguments x and y are vectors which are parametrized by the index suck that pointi = (xi,yi) and the argument n is the order of the polynomial you want to fit. The output of this function is the coefficients of the polynomial which best fit these x,y value pairs.

>> x = [0,  1,  2,  3,  4,  5,  6,   7,   8,   9,   10];
>> y = [1, 6, 17, 34, 57, 86, 121, 162, 209, 262, 321];
>> polyfit(x,y,2)
 
ans =
 
2.999999999999998 2.000000000000019 0.999999999999956

[edit] Octave

x = [0:10];
y = [1, 6, 17, 34, 57, 86, 121, 162, 209, 262, 321];
coeffs = polyfit(x, y, 2)

[edit] Python

Library: numpy

>>> x = [0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10]
>>> y = [1, 6, 17, 34, 57, 86, 121, 162, 209, 262, 321]
>>> coeffs = numpy.polyfit(x,y,deg=2)
>>> coeffs
array([ 3., 2., 1.])

Substitute back received coefficients.

>>> yf = numpy.polyval(numpy.poly1d(coeffs), x)
>>> yf
array([ 1., 6., 17., 34., 57., 86., 121., 162., 209., 262., 321.])

Find max absolute error.

>>> '%.1g' % max(y-yf)
'1e-013'

[edit] Example

For input arrays `x' and `y':

>>> x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> y = [2.7, 2.8, 31.4, 38.1, 58.0, 76.2, 100.5, 130.0, 149.3, 180.0]
>>> p = numpy.poly1d(numpy.polyfit(x, y, deg=2), variable='N')
>>> print p
2
1.085 N + 10.36 N - 0.6164

Thus we confirm once more that for already sorted sequences the considered quick sort implementation has quadratic dependence on sequence length (see Example section for Python language on Query Performance page).

[edit] R

The easiest (and most robust) approach to solve this in R is to use the base package's lm function which will find the least squares solution via a QR decomposition:

 
x <- c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(1, 6, 17, 34, 57, 86, 121, 162, 209, 262, 321)
coef(lm(y ~ x + I(x^2)))

Output

 
(Intercept) x I(x^2)
1 2 3
 


[edit] Ruby

require 'matrix'
 
def regress x, y, degree
x_data = x.map { |xi| (0..degree).map { |pow| (xi**pow).to_f } }
 
mx = Matrix[*x_data]
my = Matrix.column_vector(y)
 
((mx.t * mx).inv * mx.t * my).transpose.to_a[0]
end

Testing:

betas = regress [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[1, 6, 17, 34, 57, 86, 121, 162, 209, 262, 321],
2
 
p betas

Output:

[1.00000000000018, 2.00000000000011, 3.00000000000001]

[edit] Tcl

Library: tcllib(which includes a package for performing linear algebra operations)

package require math::linearalgebra
 
proc build.matrix {xvec degree} {
set sums [llength $xvec]
for {set i 1} {$i <= 2*$degree} {incr i} {
set sum 0
foreach x $xvec {
set sum [expr {$sum + pow($x,$i)}]
}
lappend sums $sum
}
 
set order [expr {$degree + 1}]
set A [math::linearalgebra::mkMatrix $order $order 0]
for {set i 0} {$i <= $degree} {incr i} {
set A [math::linearalgebra::setrow A $i [lrange $sums $i $i+$degree]]
}
return $A
}
 
proc build.vector {xvec yvec degree} {
set sums [list]
for {set i 0} {$i <= $degree} {incr i} {
set sum 0
foreach x $xvec y $yvec {
set sum [expr {$sum + $y * pow($x,$i)}]
}
lappend sums $sum
}
 
set x [math::linearalgebra::mkVector [expr {$degree + 1}] 0]
for {set i 0} {$i <= $degree} {incr i} {
set x [math::linearalgebra::setelem x $i [lindex $sums $i]]
}
return $x
}
 
# Now, to solve the example from the top of this page
set x {0 1 2 3 4 5 6 7 8 9 10}
set y {1 6 17 34 57 86 121 162 209 262 321}
 
# build the system A.x=b
set degree 2
set A [build.matrix $x $degree]
set b [build.vector $x $y $degree]
# solve it
set coeffs [math::linearalgebra::solveGauss $A $b]
# show results
puts $coeffs

This will print:

1.0000000000000207 1.9999999999999958 3.0

which is a close approximation to the correct solution.

[edit] TI-89 BASIC

DelVar x
seq(x,x,0,10) → xs
{1,6,17,34,57,86,121,162,209,262,321} → ys
QuadReg xs,ys
Disp regeq(x)

seq(expr,var,low,high) evaluates expr with var bound to integers from low to high and returns a list of the results. is the assignment operator. QuadReg, "quadratic regression", does the fit and stores the details in a number of standard variables, including regeq, which receives the fitted quadratic (polynomial) function itself. We then apply that function to the (undefined as ensured by DelVar) variable x to obtain the expression in terms of x, and display it.

Output: 3.·x2 + 2.·x + 1.

[edit] Ursala

Library: LAPACK The fit function defined below returns the coefficients of an nth-degree polynomial in order of descending degree fitting the lists of inputs x and outputs y. The real work is done by the dgelsd function from the lapack library. Ursala provides a simplified interface to this library whereby the data can be passed as lists rather than arrays, and all memory management is handled automatically.

#import std
#import nat
#import flo
 
(fit "n") ("x","y") = ..dgelsd\"y" (gang \/*pow float*x iota successor "n")* "x"

test program:

x = <0.,1.,2.,3.,4.,5.,6.,7.,8.,9.,10.>
y = <1.,6.,17.,34.,57.,86.,121.,162.,209.,262.,321.>
 
#cast %eL
 
example = fit2(x,y)

output:

<3.000000e+00,2.000000e+00,1.000000e+00>
Personal tools
Support