### R Regression: Comparing Speed Using lm() lm.fit() and RCPP

One of the problems of R is speed and memory. Below I compare three methods to perform multiple linear regression.

The built in R function is lm(). It is the slowest. A bare bones R implementation is lm.fit() which is substantially faster but still slow. The fastest method is to use Rcpp and RcppArmadillo which is the C++ Armadillo linear algebra library.

Using a 31400 x 4 design matrix a simulation is run to compare the three methods:

A simulation of 1000 multiple linear regressions using the R function lm() provides the below average system time:

> mean(s_lm)  0.067614

A simulation of 1000 multiple linear regressions using the R function lm.fit() provides the below average system time:

> mean(s_lmfit)  0.006888
This is an improvement of almost 9 times over lm()

A simulation of 1000 multiple linear regressions using the C++ implementation using Rcpp and RcppArmadillo code below:
`````` // [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
arma::mat lm_rcpp(arma::mat X,arma::vec y) {
arma::vec b_hat;
b_hat = (X.t()*X).i()*X.t()*y;
return(b_hat);
}
``````
> mean(s_rcpp)  0.002169

The Rcpp code is 30 times faster than the basic R lm() implementation!

### Robust Regression Package for R

I wrote this package in 2006 when the major statistical software companies did not have a robust regression package available.  It has been downloaded over 100k times.  Using iteratively reweighted least squares (IRLS), the function calculates the optimal weights to perform m-estimator or bounded influence regression. Returns robust beta estimates and prints robust ANOVA table using either a Huber or bisquare function.

Recent changes make the structure of the arguments similar to glm() or lm() and speed has dramatically increased.