Statistics Books

In Association with

This is a list of some of my favorite statistics books. For each book, full bibliographic information is given, along with a brief summary and/or review of the book. Each book also offers a link to purchase the book directly from If you see a book you like, you can click on the title to order it and have a copy delivered to your door.

If you have any comments or questions about the books listed here, or if you'd like to give me suggestions about other books you'd like to see on the list, please send me a message.

If you don't see the book you're looking for here, you can do a search at See the search form at the end of this page.

The books are categorized as follows:

Mass-market statistics books

Against the Gods: the Remarkable Story of Risk by Peter Bernstein
This well-crafted book traces the history of probabilistic thought and its application to measuring and controlling risk. If you are interested in statistics, you'll love this book; if you have only a passing interest in statistics or probability, you'll still find this a basically enjoyable read.

Bernstein, P. (1996). Against the Gods: the Remarkable Story of Risk. New York: Wiley.

The Cartoon Guide to Statistics by Gonick and Smith
This book gives a gentle introduction to statistics in cartoon format. It's actually surprisingly effective--the visual approach makes statistical concepts easy to grasp. Recommended for anyone who needs to get a feel for what statistics are all about and the basic reasoning behind statistical methods, but doesn't want to wade through a traditional textbook.

Gonick, L. and Smith, W. (1993). The Cartoon Guide to Statistics. New York: Harper Collins.

How to Lie with Statistics by Darrell Huff
A classic, written in 1954 but every bit as relevant today as it was then. This book describes how, if misused, statistics can lead people far astray. Many pitfalls are described in humorous fashion, along with suggestions for how to avoid problems.

Huff, D. (1993). How to Lie with Statistics. New York: W. W. Norton & Co.

Innumeracy: Mathematical Illiteracy and Its Consequences by John Allen Paulos
An insightful look into the phenomenon of mathematical illiteracy, some of the causes of it, the detrimental effects of it, and ways to improve the situation. One example: people who refuse to travel due to fear of terrorism, when the risk of being killed in an auto accident is roughly 300 times greater than the risk of dying at the hands of terrorists. By encouraging readers to develop a feel for numbers (especially large ones) and an appreciation for the workings of probability, he does much to combat innumeracy (at least in his readers).

Paulos, J. A. (1988). Innumeracy: Mathematical Illiteracy and Its Consequences. New York: Vintage Books.

Back to top

Linear models (and extensions)

Generalized Linear Models by McCullagh and Nelder
This book gives a nice overview of how linear models can be generalized to derive things like loglinear models and survival models (Cox's proportional hazards model). A bit dense in places, but a worthwhile read, and a nice reference to have handy.

McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. New York: Chapman & Hall.

Applied Linear Statistical Models by Neter, Kutner, Nachtsheim, and Wasserman
This book gives comprehensive coverage of linear models, including simple regression, multiple regression, analysis of variance (ANOVA), and some coverage of experimental designs. Very thorough, an invaluable reference book.

Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996). Applied Linear Statistical Models. Chicago: Irwin.

Statistical Methods: the Geometric Approach by Saville and Wood
This book covers linear models (ANOVA and Regression) from a geometric point of view, in terms of projections of the data on model and error spaces. A very unusual approach, but quite effective, especially for those who think in visual terms more than mathematical terms.

Saville, D. J. and Wood, G. R. (1991). Statistical Methods: the Geometric Approach. New York: Springer-Verlag.

Back to top

Design of Experiments

A note to social science researchers: most of the books listed here discuss experimental design from an engineering perspective, which is somewhat different from the social science approach. In industrial settings, experimental designs generally focus on efficiency, emphasizing fractional factorial designs and polynomial response surface models, and usually with little or no coverage of the repeated measures (AKA within-subjects) designs so familiar to social scientists. In contrast, most books aimed at social scientists tend not to do much with fractional designs (coverage is usually restricted to discussion of Latin square designs), but give in-depth information on repeated measures designs. Books with this emphasis will be clearly identified in the list.

Statistics for Experimenters by Box, Hunter, and Hunter
This is a classic in design of experiments. A lot of in-depth information about design of experiments, including fractional factorial designs. Also includes some coverage of response surface designs. The emphasis is on getting usable results, combining the traditional statistical tools (e.g. ANOVA) with plotting of results to show important effects at a glance. Highly recommended!

Box, G. E. P., Hunter, W. G. and Hunter, J. S. (1978). Statistics for Experimenters. New York: Wiley.

Experiments with Mixtures by John Cornell
Experiments involving mixtures, where the sum of the mixture components must equal some constant (usually 1.0), require specialized methods. This book is the definitive reference for such experiments, covering a variety of design approaches, analysis problems, and practical advice.

Cornell, J.A. (1990). Experiments with Mixtures. New York: Wiley.

Statistical Design and Analysis of Experiments by Peter W. M. John
This old text has recently been reprinted by SIAM in their Classics in Applied Mathematics series. I found this to be a very good book, with plenty of examples to illustrate the concepts. The book was originally published in 1971 and has not been updated, so some more recent developments in experimental design are not covered.

John, P. W. M. (1998). Statistical Design and Analysis of Experiments. Philadelphia: SIAM.

Design and Analysis : A Researcher's Handbook by Geoffrey Keppel
This is a social-science oriented treatment of experimental design, aimed at a first or second graduate-level course in statistics. Emphasis is on ANOVA, including full-factorial designs, and a whole section (four chapters) on within-subjects (repeated measures) designs. No coverage of response surface designs (though some discussion of "trend analysis" can be found), and coverage of fractional designs is limited to Latin square designs. A nice reference for the beginning social science researcher.

Keppel, G. (1991). Design and Analysis : A Researcher's Handbook. Englewood Cliffs, NJ: Prentice-Hall.

Back to top

Multivariate statistics

Multivariate Analysis: Methods and Applications by Dillon and Goldstein
This book gives good coverage of several multivariate techniques, including cluster analysis and multidimensional scaling in addition to the more common techniques like factor analysis, multiple regression, and discriminant analysis. Everything is explained in clear language, so you don't have to be a mathematician to follow the text.

Dillon, W. R. and Goldstein, M. (1984). Multivariate Analysis: Methods and Applications. New York: Wiley.

Using Multivariate Statistics by Tabachnick and Fidell
A very nice book that covers a wide variety of multivariate statistical applications. Focuses on practical aspects of multivariate data analysis, and is very accessible. Also included are examples of output from various statistical software packages. This book is a nice reference to have handy. My only complaint: no coverage of cluster analysis.

Tabachnick, B. G., and Fidell, L. S. (1996). Using Multivariate Statistics. New York: HarperCollins College Publishers.

Back to top

Categorical data analysis

Categorical Data Analysis by Alan Agresti
This book is a must-have for anyone doing serious data analysis with categorical variables. Covers techniques from simple chi-square methods through loglinear models, models for ordinal data, models for matched pairs, repeated categorical response data, and parametric models for categorical data.

Agresti, A. (1990). Categorical Data Analysis. New York: Wiley.

Back to top

Structural equation modeling

Structural Equations with Latent Variables by Kenneth Bollen
The reference book for structural equation modeling. Goes into more depth on the subject than any other book I've seen. Covers everything from the basics up to diagnosing problematic models and the subtleties of different estimation methods and fit indices. The book was written over 10 years ago, so some of the material is a bit out of date, but the foundations are still solid, and this book should still be required reading for serious structural equation modelers.

Bollen, K. (1987). Structural Equations with Latent Variables. New York: Wiley.

Structural Equation Modeling: Concepts, Issues, and Applications edited by Rick Hoyle
This is a nice collection of chapters by some of the top researchers in SEM. Gives a good grounding in the basics, as well as introducing some of the controversies surrounding SEMs (e.g. when to use which fit index). Also includes chapters of a more practical nature, such as writing about SEM models and a comparison of two software packages, and some examples of real-life SEM research.

Hoyle, R. H. (1995). Structural Equation Modeling: Concepts, Issues, and Applications. Thousand Oaks, CA: Sage.

Back to top

Visualization and graphics

The Visual Display of Quantitative Information,
Envisioning Information,
Visual Explanations: Images and Quantities, Evidence and Narrative all by Edward Tufte
These books by a master of data visualization are must reading for anyone who generates charts and graphs. Tufte doesn't just dwell on specifics of graphing techniques--he presents a philosophy of data visualization, one that harnesses creativity as well as rigor to effectively and accurately convey information (as opposed to many examples seen in the popular media, where "creativity" often obscures and distorts the informational message). Even the books themselves are things of beauty--the layout is very elegant, making the books a pleasure to read.

Tufte, E. R. (1983). The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press.
Tufte, E. R. (1990). Envisioning Information. Cheshire, CT: Graphics Press.
Tufte, E. R. (1997). Visual Explanations: Images and Quantities, Evidence and Narrative. Cheshire, CT: Graphics Press.

The Grammar of Graphics by Leland Wilkinson
If Tufte represents the right-brain approach to data visualization, Wilkinson represents the left-brain approach. Lee is one of the smartest people I've ever met, and he's done an amazing job of codifying the theory that underlies most visualization techniques. The theory leads to some new, innovative graph types.

Wilkinson, L. (2005). The Grammar of Graphics. 2nd Ed. New York: Springer Verlag.

Back to top

Exploratory data analysis, data mining, and KDD

Data Mining Techniques for Marketing, Sales, and Customer Support by Berry and Linoff
This is a nice introduction to data mining in the context of real business problems. The book is written at a very accessible level, and includes many examples of real data-mining situations to illustrate the points in the book.

Berry, M. J. A. and Linoff, G. (1997). Data Mining Techniques for Marketing, Sales, and Customer Support. New York: Wiley.

Data Warehousing, Data Mining, and OLAP by Berson and Smith
This comprehensive (and large!) book covers a wide range of topics, with soup-to-nuts coverage of assembling and maintaining a data warehouse and using various techniques (statistical and other) to get useful information out of it. Puts data mining in the larger context of decision support. Lots of good practical information, including sections on "ten mistakes for data warehousing managers to avoid" and "big data--better returns: leveraging your hidden data assets to improve ROI."

Berson, A. and Smith, S. J. (1997). Data Warehousing, Data Mining, and OLAP. New York: McGraw-Hill.

Exploratory Data Analysis by John Tukey
Decades before "data mining" came into vogue, Tukey wrote this classic book that defined an entire subdiscipline of statistics. Some of the material is based on outdated technology--written in 1977, it assumes the reader doesn't have access to computers with graphical capabilities. However, the fundamental ideas are even more relevant today than they were when the book was written. With high-speed PC's at everyone's beck and call and tons of archived data available, everyone is trying to sift through their data to find something "interesting"--it's more important than ever to make sure that exploratory data analysis is done well.

Tukey, J. W. (1977). Exploratory Data Analysis. Reading, MA: Addison-Wesley.

Learning from Data: Concepts, Theory, and Methods by Vladimir Cherkassky and Filip Mulier
This book gives a good, in-depth introduction into statistical learning theory, and how modern techniques such as neural networks and statistical models fit into the theory. It has especially good coverage of support vector machines.

Cherkassky, V. and Mulier, F. (2007). Learning from Data: Concepts, Theory, and Methods. New York: Wiley.

Back to top

Statistical reference books

Finding Statistics Online by Paula Berinstein
This is a nice reference that helps you find statistics in electronic resources, including (but not limited to) the Internet. A nice book to have around if you often need to dig up specific statistics. Note that this will not help you find information on statistical methods--it will help you find specific numbers, e.g. the population of Lithuania or the number of Roman Catholics in the U.S. One caveat: avoid the chapter on "Statistics Basics". There are some errors in this chapter that may mislead novices (and irk statisticians). If you skip this chapter, the rest of the book is very useful.

Berinstein, P. (1998). Finding Statistics Online: How to Locate the Elusive Numbers You Need. Medford, NJ: Information Today.

The Cambridge Dictionary of Statistics by B. S. Everitt
This is a valuable reference to have on the shelf. Gives concise but thorough definitions of most of the important terminology used in statistics. An essential tool for people who write about (or read about) statistics.

Everitt, B. S. (2006). The Cambridge Dictionary of Statistics, 3rd Ed. New York: Cambridge University Press.

Encyclopedia of Statistical Sciences edited by Samuel Kotz
This 9 volume set (plus updates) is a tremendous asset to any statistics library. Gives brief but complete background information on most topics in statistical theory and practice. This reference is too pricey for most individuals (almost $2000 for the 9-volume set), but it's definitely a worthwhile investment for institutional libraries--or for those independently wealthy statisticians among us.

Kotz, S. (1988). Encyclopedia of Statistical Sciences, vol. 1-9 plus supplements. New York: Wiley.

Encyclopedia of Measurement and Statistics 3-Volume Set edited by Neil Salkind
This 3-volume set covers most of the statistical methodology that will be of interest to social science researchers. The explanations given are generally appropriate for an educated person who is not necessarily a professional statistician. A very useful reference to have handy when reading journal articles that throw around unfamiliar methodological terms. Full disclosure: I contributed two entries to this encyclopedia.

Salkind, N (Ed.). (1988). Encyclopedia of Measurement and Statistics, vol. 1-3. Thousand Oaks, CA: Sage Publications. Wiley.

Back to top

Search for books

Books Music Enter keywords... logo

This page maintained by Clay Helberg. Last updated November 18, 2007