I am going to demonstrate how to improve speed in R when performing multiple linear regression. Below I compare three methods:
The standard built in R function for regression is lm(). It is the slowest. A bare bones R 
implementation is lm.fit() which is substantially faster than lm() but still slow.
 The fastest method to perform multiple linear regression is to use Rcpp and RcppArmadillo which is the C++ Armadillo linear algebra library.  
A 1253 x 26 design matrix (X) is built from the cars_19 dataset and a simulation is run to compare the three methods:
The cars_19 dataset from previous posts: 
str(cars_19)
'data.frame':    1253 obs. of  12 variables:
 $ fuel_economy_combined: int  21 28 21 26 28 11 15 18 17 15 ...
 $ eng_disp             : num  3.5 1.8 4 2 2 8 6.2 6.2 6.2 6.2 ...
 $ num_cyl              : int  6 4 8 4 4 16 8 8 8 8 ...
 $ transmission         : Factor w/ 7 levels "A","AM","AMS",..: 3 2 6 3 6 3 6 6 6 5 ...
 $ num_gears            : int  9 6 8 7 8 7 8 8 8 7 ...
 $ air_aspired_method   : Factor w/ 5 levels "Naturally Aspirated",..: 4 4 4 4 4 4 3 1 3 3 ...
 $ regen_brake          : Factor w/ 3 levels "","Electrical Regen Brake",..: 2 1 1 1 1 1 1 1 1 1 ...
 $ batt_capacity_ah     : num  4.25 0 0 0 0 0 0 0 0 0 ...
 $ drive                : Factor w/ 5 levels "2-Wheel Drive, Front",..: 4 2 2 4 2 4 2 2 2 2 ...
 $ fuel_type            : Factor w/ 5 levels "Diesel, ultra low sulfur (15 ppm, maximum)",..: 4 3 3 5 3 4 4 4 4 4 ...
 $ cyl_deactivate       : Factor w/ 2 levels "N","Y": 1 1 1 1 1 2 1 2 2 1 ...
 $ variable_valve       : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
Each function lm(), lm.fit(), and lm_rcpp() is run 5000 times and the average system time is measured.   
The code for the C++ implementation of multiple linear regression using Rcpp and RcppArmadillo is below:  
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
arma::mat lm_rcpp(arma::mat X, arma::vec y)
{
    arma::vec b_hat;
    b_hat = (X.t() * X).i() * X.t() * y;
    return (b_hat);
}
Multiple linear regression using Rcpp and RcppArmadillo is multiples times faster than the standard R functions!
 
 