The built in R function is lm(). It is the slowest. A bare bones R implementation is lm.fit() which is substantially faster but still slow. The fastest method is to use Rcpp and RcppArmadillo which is the C++ Armadillo linear algebra library.
Using a 31400 x 4 design matrix a simulation is run to compare the three methods:
A simulation of 1000 multiple linear regressions using the R function lm() provides the below average system time:
> mean(s_lm) [1] 0.067614
A simulation of 1000 multiple linear regressions using the R function lm.fit() provides the below average system time:
> mean(s_lmfit) [1] 0.006888
This is an improvement of almost 9 times over lm()
A simulation of 1000 multiple linear regressions using the C++ implementation using Rcpp and RcppArmadillo code below:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
arma::mat lm_rcpp(arma::mat X,arma::vec y) {
arma::vec b_hat;
b_hat = (X.t()*X).i()*X.t()*y;
return(b_hat);
}
> mean(s_rcpp) [1] 0.002169 The Rcpp code is 30 times faster than the basic R lm() implementation!