Saturday, May 29, 2010

Best Statistical Software Package

The best statistical software package is the tool that works best for the user based upon what needs to be done, cost, and what works best with existing software. Similar to using a particular tool, each package has its advantages and disadvantages over each other. The major packages I use are R, SPSS, and SAS.

R
R is a free open source statistical software language that is best used for writing custom statistical programs. It has the steepest learning curve but once you learn the language it is the most customizable and powerful out of the major packages. R has a very large community of statisticians who have written custom add on packages.

Here is an example of the power and flexibility of R.  I wrote a custom package that performs robust regression using iteratively reweighted least squares (IRLS).

The best way to run R is with EMACS which is a text editor and ESS (Emacs Speaks Statistics).

Advantages
Free
Fully customizable programming language
Powerful graphics
Interacts well with databases such as MYSQL
Can interact with C++ to speed up processes
Runs on Windows, Linux, and Macintosh operating systems

Disadvantages
Steepest learning curve

SPSS
SPSS is a GUI based statistical software package that is popular in the social sciences. Because it is GUI based, it is good for running quick analyses. SPSS was also recently acquired by IBM which should help the software in the long run.

Advantages
GUI based program
Quick descriptive statistics capability
Most popular package in the social sciences
Good for cluster analysis
Runs on Windows, Linux, and Macintosh operating systems

Disadvantages
Requires annual license
Limited statistical procedures vs R or SAS
Some procedures require purchase of add-on modules

PSPP is a free open source version of SPSS which can read and manipulate SPSS data files and perform most of the procedures of SPSS.

SAS
SAS is the software used in the statistical analysis of clinical pharmaceutical trials for submission to the FDA. SAS has been in existence since 1976 and was originally developed for use on mainframe computers. Eventually as personal computers became popular and faster, SAS was developed for PCs. SAS is very popular in experimental design and ANOVA.

SAS is also a programming language and allows users to write macros which are custom data steps and procedures.

Advantages
Standard package of pharmaceutical industry
Programming language is flexible although not as flexible/powerful as R
Good for experimental design and ANOVA

Disadvantages
Poor graphics capabilities although they have improved
Requires annual license
Limited statistical procedures vs R although more procedures than SPSS
Some procedures require purchase of add-on components