Statistics Books

This is a list of some of my favorite statistics books. For each book, full bibliographic information is given, along with a brief summary and/or review of the book. Each book also offers a link to purchase the book directly from Amazon.com. If you see a book you like, you can click on the title to order it and have a copy delivered to your door.

If you have any comments or questions about the books listed here, or if you'd like to give me suggestions about other books you'd like to see on the list, please send me a message.

If you don't see the book you're looking for here, you can do a search at Amazon.com. See the search form at the end of this page.

The books are categorized as follows:

Mass-market statistics books (intended for the general public)
Linear models and extensions (including regression, ANOVA, logistic and loglinear models, etc.)
Design of experiments (DOE)
Multivariate statistics
Categorical data analysis
Structural equation modeling (AKA LISREL, econometric models, latent variable models)
Visualization and graphics
Exploratory data analysis (EDA), data mining, and knowledge discovery in databases (KDD)
Statistical reference books

Mass-market statistics books

Against the Gods: the Remarkable Story of Risk by Peter Bernstein: This well-crafted book traces the history of probabilistic thought and its application to measuring and controlling risk. If you are interested in statistics, you'll love this book; if you have only a passing interest in statistics or probability, you'll still find this a basically enjoyable read.
The Cartoon Guide to Statistics by Gonick and Smith: This book gives a gentle introduction to statistics in cartoon format. It's actually surprisingly effective--the visual approach makes statistical concepts easy to grasp. Recommended for anyone who needs to get a feel for what statistics are all about and the basic reasoning behind statistical methods, but doesn't want to wade through a traditional textbook.
How to Lie with Statistics by Darrell Huff: A classic, written in 1954 but every bit as relevant today as it was then. This book describes how, if misused, statistics can lead people far astray. Many pitfalls are described in humorous fashion, along with suggestions for how to avoid problems.
Innumeracy: Mathematical Illiteracy and Its Consequences by John Allen Paulos: An insightful look into the phenomenon of mathematical illiteracy, some of the causes of it, the detrimental effects of it, and ways to improve the situation. One example: people who refuse to travel due to fear of terrorism, when the risk of being killed in an auto accident is roughly 300 times greater than the risk of dying at the hands of terrorists. By encouraging readers to develop a feel for numbers (especially large ones) and an appreciation for the workings of probability, he does much to combat innumeracy (at least in his readers).

Linear models (and extensions)

Generalized Linear Models by McCullagh and Nelder: This book gives a nice overview of how linear models can be generalized to derive things like loglinear models and survival models (Cox's proportional hazards model). A bit dense in places, but a worthwhile read, and a nice reference to have handy.
Applied Linear Statistical Models by Neter, Kutner, Nachtsheim, and Wasserman: This book gives comprehensive coverage of linear models, including simple regression, multiple regression, analysis of variance (ANOVA), and some coverage of experimental designs. Very thorough, an invaluable reference book.
Statistical Methods: the Geometric Approach by Saville and Wood: This book covers linear models (ANOVA and Regression) from a geometric point of view, in terms of projections of the data on model and error spaces. A very unusual approach, but quite effective, especially for those who think in visual terms more than mathematical terms.

Design of Experiments

A note to social science researchers: most of the books listed here discuss experimental design from an engineering perspective, which is somewhat different from the social science approach. In industrial settings, experimental designs generally focus on efficiency, emphasizing fractional factorial designs and polynomial response surface models, and usually with little or no coverage of the repeated measures (AKA within-subjects) designs so familiar to social scientists. In contrast, most books aimed at social scientists tend not to do much with fractional designs (coverage is usually restricted to discussion of Latin square designs), but give in-depth information on repeated measures designs. Books with this emphasis will be clearly identified in the list.

Statistics for Experimenters by Box, Hunter, and Hunter: This is a classic in design of experiments. A lot of in-depth information about design of experiments, including fractional factorial designs. Also includes some coverage of response surface designs. The emphasis is on getting usable results, combining the traditional statistical tools (e.g. ANOVA) with plotting of results to show important effects at a glance. Highly recommended!
Experiments with Mixtures by John Cornell: Experiments involving mixtures, where the sum of the mixture components must equal some constant (usually 1.0), require specialized methods. This book is the definitive reference for such experiments, covering a variety of design approaches, analysis problems, and practical advice.
Statistical Design and Analysis of Experiments by Peter W. M. John: This old text has recently been reprinted by SIAM in their Classics in Applied Mathematics series. I found this to be a very good book, with plenty of examples to illustrate the concepts. The book was originally published in 1971 and has not been updated, so some more recent developments in experimental design are not covered.
John, P. W. M. (1998). Statistical Design and Analysis of Experiments. Philadelphia: SIAM.
Design and Analysis : A Researcher's Handbook by Geoffrey Keppel: This is a social-science oriented treatment of experimental design, aimed at a first or second graduate-level course in statistics. Emphasis is on ANOVA, including full-factorial designs, and a whole section (four chapters) on within-subjects (repeated measures) designs. No coverage of response surface designs (though some discussion of "trend analysis" can be found), and coverage of fractional designs is limited to Latin square designs. A nice reference for the beginning social science researcher.

Multivariate statistics

Multivariate Analysis: Methods and Applications by Dillon and Goldstein: This book gives good coverage of several multivariate techniques, including cluster analysis and multidimensional scaling in addition to the more common techniques like factor analysis, multiple regression, and discriminant analysis. Everything is explained in clear language, so you don't have to be a mathematician to follow the text.
Using Multivariate Statistics by Tabachnick and Fidell: A very nice book that covers a wide variety of multivariate statistical applications. Focuses on practical aspects of multivariate data analysis, and is very accessible. Also included are examples of output from various statistical software packages. This book is a nice reference to have handy. My only complaint: no coverage of cluster analysis.

Categorical data analysis

Categorical Data Analysis by Alan Agresti: This book is a must-have for anyone doing serious data analysis with categorical variables. Covers techniques from simple chi-square methods through loglinear models, models for ordinal data, models for matched pairs, repeated categorical response data, and parametric models for categorical data.

Structural equation modeling

Structural Equations with Latent Variables by Kenneth Bollen: The reference book for structural equation modeling. Goes into more depth on the subject than any other book I've seen. Covers everything from the basics up to diagnosing problematic models and the subtleties of different estimation methods and fit indices. The book was written over 10 years ago, so some of the material is a bit out of date, but the foundations are still solid, and this book should still be required reading for serious structural equation modelers.
Structural Equation Modeling: Concepts, Issues, and Applications edited by Rick Hoyle: This is a nice collection of chapters by some of the top researchers in SEM. Gives a good grounding in the basics, as well as introducing some of the controversies surrounding SEMs (e.g. when to use which fit index). Also includes chapters of a more practical nature, such as writing about SEM models and a comparison of two software packages, and some examples of real-life SEM research.

Visualization and graphics

The Visual Display of Quantitative Information, Envisioning Information, Visual Explanations: Images and Quantities, Evidence and Narrative all by Edward Tufte: These books by a master of data visualization are must reading for anyone who generates charts and graphs. Tufte doesn't just dwell on specifics of graphing techniques--he presents a philosophy of data visualization, one that harnesses creativity as well as rigor to effectively and accurately convey information (as opposed to many examples seen in the popular media, where "creativity" often obscures and distorts the informational message). Even the books themselves are things of beauty--the layout is very elegant, making the books a pleasure to read.
The Grammar of Graphics by Leland Wilkinson: If Tufte represents the right-brain approach to data visualization, Wilkinson represents the left-brain approach. Lee is one of the smartest people I've ever met, and he's done an amazing job of codifying the theory that underlies most visualization techniques. The theory leads to some new, innovative graph types.

Exploratory data analysis, data mining, and KDD

Data Mining Techniques for Marketing, Sales, and Customer Support by Berry and Linoff: This is a nice introduction to data mining in the context of real business problems. The book is written at a very accessible level, and includes many examples of real data-mining situations to illustrate the points in the book.
Data Warehousing, Data Mining, and OLAP by Berson and Smith: This comprehensive (and large!) book covers a wide range of topics, with soup-to-nuts coverage of assembling and maintaining a data warehouse and using various techniques (statistical and other) to get useful information out of it. Puts data mining in the larger context of decision support. Lots of good practical information, including sections on "ten mistakes for data warehousing managers to avoid" and "big data--better returns: leveraging your hidden data assets to improve ROI."
Exploratory Data Analysis by John Tukey: Decades before "data mining" came into vogue, Tukey wrote this classic book that defined an entire subdiscipline of statistics. Some of the material is based on outdated technology--written in 1977, it assumes the reader doesn't have access to computers with graphical capabilities. However, the fundamental ideas are even more relevant today than they were when the book was written. With high-speed PC's at everyone's beck and call and tons of archived data available, everyone is trying to sift through their data to find something "interesting"--it's more important than ever to make sure that exploratory data analysis is done well.
Learning from Data: Concepts, Theory, and Methods by Vladimir Cherkassky and Filip Mulier: This book gives a good, in-depth introduction into statistical learning theory, and how modern techniques such as neural networks and statistical models fit into the theory. It has especially good coverage of support vector machines.

Statistical reference books

Finding Statistics Online by Paula Berinstein: This is a nice reference that helps you find statistics in electronic resources, including (but not limited to) the Internet. A nice book to have around if you often need to dig up specific statistics. Note that this will not help you find information on statistical methods--it will help you find specific numbers, e.g. the population of Lithuania or the number of Roman Catholics in the U.S. One caveat: avoid the chapter on "Statistics Basics". There are some errors in this chapter that may mislead novices (and irk statisticians). If you skip this chapter, the rest of the book is very useful.
The Cambridge Dictionary of Statistics by B. S. Everitt: This is a valuable reference to have on the shelf. Gives concise but thorough definitions of most of the important terminology used in statistics. An essential tool for people who write about (or read about) statistics.
Encyclopedia of Statistical Sciences edited by Samuel Kotz: This 9 volume set (plus updates) is a tremendous asset to any statistics library. Gives brief but complete background information on most topics in statistical theory and practice. This reference is too pricey for most individuals (almost $2000 for the 9-volume set), but it's definitely a worthwhile investment for institutional libraries--or for those independently wealthy statisticians among us.
Encyclopedia of Measurement and Statistics 3-Volume Set edited by Neil Salkind: This 3-volume set covers most of the statistical methodology that will be of interest to social science researchers. The explanations given are generally appropriate for an educated person who is not necessarily a professional statistician. A very useful reference to have handy when reading journal articles that throw around unfamiliar methodological terms. Full disclosure: I contributed two entries to this encyclopedia.

Search for books

This page maintained by Clay Helberg. Last updated November 18, 2007

Books Music	Enter keywords...