Multicollinearity is caused by the presence of linear relationship between the regressors. When the regressors are not orthogonal and become almost perfectly related, estimates of the individual regression coefficients may become unstable. Moreover, the inferences based on the model may tend to be misleading [1]. The effects, diagnostics and handling of multicollinearity in linear models have been discussed here.
1. Nonlinear model
Nonlinear regression is characterized by the fact that the prediction equation depends non linearly on one or more unknown parameters [2]. The basic nonlinear model has the form :
1) y = f(X, b)+e
where f(.) is a nonlinear (in the parameters b) differentiable function, f: Rn ® Rm, y is the dependent variable (y Î Rm), X is a set of exogenous variables (X Î Rn), b represents the nonlinear
parameter estimates to be computed (e.g. they can take the form of power or trigonometric or exponential or of any other nonlinear function) and e = y - f(X, b) represents the vector of identically and independently distributed error terms (uncorrelated with the conditional mean function for all the observations: E[ei|f(Xi,b)] = 0 ).
The objective function may be the usual sum-of-squares error in the form:
2) SSE = (1/2)×∥e’×e ∥
to be minimized with respect to the vector of parameters b.
An analytical solution is not achievable because of the non linearity of the error function, therefore iterative methods are adopted for the parameter estimation, like the Gauss-Newton algorithm, Gradient descent algorithm, Levenberg-Marquardt algorithm.
Basically, the process of parameter adaptation at each step "i" returns: b(i) = b(i-1) + Db(i-1)
Unlike the linear case, the hyper-plane defined by the error function is not always convex. The convergence of the algorithm may be slowed down by the transition across saddle-points and flat areas on the hyper-plane. Moreover, the algorithm may run into local minimum where it would remain [3].
2. Does multicollinearity have a hold on nonlinear models too?
If the increase Db(i-1) is small the solution for those algorithms can be obtained [4] from the first order Taylor series approximation of the error function SSE:
3) SSE(b(i)) = SSE(b(i-1))
+ Z×(Db(i-1))
where the elements of the matrix Z are the derivatives
4) Zij = {¶ei/¶bj}
Therefore, at step "i" the algorithm search to minimize the updated error function 3) with respect to the new value of b: b(i)
5) b(i) = b(i-1) – (Z’×Z)-1×Z×SSE(b(i-1))
Under this form, the adaptive searching process recalls the pseudolinear regression (PLR) (also called approximate maximum likelihood or extended least squares method) [5-7] which is computationally less cumbersome than the prediction error method [8].
This is not surprising, since the Taylor expansion at the first-order produces a linear approximation.
In general, the elements of the Hessian matrix (H) for a sum-of-squares error function as in equation 2) are
6) H
ij =
{¶2SSE/¶bi¶bj} = {(¶e/¶bi)×(¶e/¶bj) + e×(¶2e/¶bi¶bj)}
H can then be approximated by the matrix Z’
×Z:
7) (Z’
×Z)
ij = {(¶e/¶bi)×(¶e/¶bj)}
by discarding the term e×(¶2e/¶bi¶bj) of the equation 6).
This means that the minimization procedure of equation 5) calls for the Hessian matrix that multiplied by the gradient of the error function (ÑSSE = Z) yields the Newton step (or Newton direction):
8) b(i) = b(i-1) - H-1×ÑSSE
For finite data sets the solution driven by the 8) is exact only for linear problems. Instead, for nonlinear structures the Newton direction approaches the global minimum asymptotically (i.e. for infinite data sets). Therefore, for finite samples the Hessian should be estimated and updated at each step.
The issue now is: independently of the sample size, the algorithm of pseudolinear regression implies the inversion of the Hessian matrix (i.e. of its approximated form Z’×Z).
Thus, the columns of (Z’×Z)-1 must be linearly independent so as to make (Z’×Z)-1 invertible.
As a consequence, the issue of multicollinearity comes up in the nonlinear models during the iterative procedures of optimization.