I am going to demonstrate how to improve speed in R when performing multiple linear regression. Below I compare three methods:
The standard built in R function for regression is lm(). It is the slowest. A bare bones R
implementation is lm.fit() which is substantially faster than lm() but still slow.
The fastest method to perform multiple linear regression is to use Rcpp and RcppArmadillo which is the C++ Armadillo linear algebra library.
A 1253 x 26 design matrix (X) is built from the cars_19 dataset and a simulation is run to compare the three methods:
The cars_19 dataset from previous posts:
str(cars_19)
'data.frame': 1253 obs. of 12 variables:
$ fuel_economy_combined: int 21 28 21 26 28 11 15 18 17 15 ...
$ eng_disp : num 3.5 1.8 4 2 2 8 6.2 6.2 6.2 6.2 ...
$ num_cyl : int 6 4 8 4 4 16 8 8 8 8 ...
$ transmission : Factor w/ 7 levels "A","AM","AMS",..: 3 2 6 3 6 3 6 6 6 5 ...
$ num_gears : int 9 6 8 7 8 7 8 8 8 7 ...
$ air_aspired_method : Factor w/ 5 levels "Naturally Aspirated",..: 4 4 4 4 4 4 3 1 3 3 ...
$ regen_brake : Factor w/ 3 levels "","Electrical Regen Brake",..: 2 1 1 1 1 1 1 1 1 1 ...
$ batt_capacity_ah : num 4.25 0 0 0 0 0 0 0 0 0 ...
$ drive : Factor w/ 5 levels "2-Wheel Drive, Front",..: 4 2 2 4 2 4 2 2 2 2 ...
$ fuel_type : Factor w/ 5 levels "Diesel, ultra low sulfur (15 ppm, maximum)",..: 4 3 3 5 3 4 4 4 4 4 ...
$ cyl_deactivate : Factor w/ 2 levels "N","Y": 1 1 1 1 1 2 1 2 2 1 ...
$ variable_valve : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
Each function lm(), lm.fit(), and lm_rcpp()
is run 5000 times and the average system time is measured.
The code for the C++ implementation of multiple linear regression using Rcpp and RcppArmadillo is below:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
arma::mat lm_rcpp(arma::mat X, arma::vec y)
{
arma::vec b_hat;
b_hat = (X.t() * X).i() * X.t() * y;
return (b_hat);
}
Multiple linear regression using Rcpp and RcppArmadillo is multiples times faster than the standard R functions!
// [[Rcpp::depends(RcppArmadillo)]] | |
#include <RcppArmadillo.h> | |
using namespace Rcpp; | |
using namespace arma; | |
// [[Rcpp::export]] | |
arma::mat lm_rcpp(arma::mat X, arma::vec y) | |
{ | |
arma::vec b_hat; | |
b_hat = (X.t() * X).i() * X.t() * y; | |
return (b_hat); | |
} |