### R: Improving Regression Speed with Rcpp and RcppArmadillo

I am going to demonstrate how to improve speed in R when performing multiple linear regression. Below I compare three methods:

The standard built in R function for regression is lm(). It is the slowest. A bare bones R implementation is lm.fit() which is substantially faster than lm() but still slow. The fastest method to perform multiple linear regression is to use Rcpp and RcppArmadillo which is the C++ Armadillo linear algebra library.

A 1253 x 26 design matrix (X) is built from the cars_19 dataset and a simulation is run to compare the three methods:

The cars_19 dataset from previous posts:

``````str(cars_19)
'data.frame':    1253 obs. of  12 variables:
\$ fuel_economy_combined: int  21 28 21 26 28 11 15 18 17 15 ...
\$ eng_disp             : num  3.5 1.8 4 2 2 8 6.2 6.2 6.2 6.2 ...
\$ num_cyl              : int  6 4 8 4 4 16 8 8 8 8 ...
\$ transmission         : Factor w/ 7 levels "A","AM","AMS",..: 3 2 6 3 6 3 6 6 6 5 ...
\$ num_gears            : int  9 6 8 7 8 7 8 8 8 7 ...
\$ air_aspired_method   : Factor w/ 5 levels "Naturally Aspirated",..: 4 4 4 4 4 4 3 1 3 3 ...
\$ regen_brake          : Factor w/ 3 levels "","Electrical Regen Brake",..: 2 1 1 1 1 1 1 1 1 1 ...
\$ batt_capacity_ah     : num  4.25 0 0 0 0 0 0 0 0 0 ...
\$ drive                : Factor w/ 5 levels "2-Wheel Drive, Front",..: 4 2 2 4 2 4 2 2 2 2 ...
\$ fuel_type            : Factor w/ 5 levels "Diesel, ultra low sulfur (15 ppm, maximum)",..: 4 3 3 5 3 4 4 4 4 4 ...
\$ cyl_deactivate       : Factor w/ 2 levels "N","Y": 1 1 1 1 1 2 1 2 2 1 ...
\$ variable_valve       : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
``````

Each function `lm(), lm.fit(), and lm_rcpp()` is run 5000 times and the average system time is measured.

The code for the C++ implementation of multiple linear regression using Rcpp and RcppArmadillo is below:

``````// [[Rcpp::depends(RcppArmadillo)]]#include <RcppArmadillo.h>using namespace Rcpp;using namespace arma;// [[Rcpp::export]]arma::mat lm_rcpp(arma::mat X, arma::vec y){    arma::vec b_hat;    b_hat = (X.t() * X).i() * X.t() * y;    return (b_hat);}
``````

Multiple linear regression using Rcpp and RcppArmadillo is multiples times faster than the standard R functions!