This is a list of some of my favorite statistics books. For each book, full bibliographic
information is given, along with a brief summary and/or review of the book. Each book
also offers a link to purchase the book directly from
Amazon.com.
If you see a book you like, you can click on the title to order it and have a copy delivered
to your door.
If you have any comments or questions about the books listed here, or if you'd like to give me
suggestions about other books you'd like to see on the list, please
send me a message.
If you don't see the book you're looking for here, you can do a search at Amazon.com. See the
search form at the end of this page.
This well-crafted book traces the history of probabilistic thought and its application to measuring
and controlling risk. If you are interested in statistics, you'll love this book; if you have only
a passing interest in statistics or probability, you'll still find this a basically enjoyable read.
Bernstein, P. (1996). Against the Gods: the Remarkable Story of Risk. New York: Wiley.
This book gives a gentle introduction to statistics in cartoon format. It's actually
surprisingly effective--the visual approach makes statistical concepts easy to grasp.
Recommended for anyone who needs to get a feel for what statistics are all about and
the basic reasoning behind statistical methods, but doesn't want to wade through a traditional
textbook.
Gonick, L. and Smith, W. (1993). The Cartoon Guide to Statistics. New York: Harper Collins.
A classic, written in 1954 but every bit as relevant today as it was then. This book describes
how, if misused, statistics can lead people far astray. Many pitfalls are described in humorous
fashion, along with suggestions for how to avoid problems.
Huff, D. (1993). How to Lie with Statistics. New York: W. W. Norton & Co.
An insightful look into the phenomenon of mathematical illiteracy, some of the causes of it, the
detrimental effects of it, and ways to improve the situation. One example: people
who refuse to travel due to fear of terrorism, when the risk of being killed in an auto accident is
roughly 300 times greater than the risk of dying at the hands of terrorists. By encouraging readers
to develop a feel for numbers (especially large ones) and an appreciation for the workings of
probability, he does much to combat innumeracy (at least in his readers).
Paulos, J. A. (1988). Innumeracy: Mathematical Illiteracy and Its
Consequences. New York: Vintage Books.
This book gives a nice overview of how linear models can be generalized to derive things like
loglinear models and survival models (Cox's proportional hazards model). A bit dense in places, but
a worthwhile read, and a nice reference to have handy.
McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. New York: Chapman & Hall.
This book gives comprehensive coverage of linear models, including simple regression, multiple
regression, analysis of variance (ANOVA), and some coverage of experimental designs. Very thorough,
an invaluable reference book.
Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996). Applied Linear Statistical
Models. Chicago: Irwin.
This book covers linear models (ANOVA and Regression) from a geometric point of view, in terms of
projections of the data on model and error spaces. A very unusual approach, but quite effective,
especially for those who think in visual terms more than mathematical terms.
Saville, D. J. and Wood, G. R. (1991). Statistical Methods: the Geometric Approach. New York:
Springer-Verlag.
This is a classic in design of experiments. A lot of in-depth information about
design of experiments, including fractional factorial designs. Also includes some
coverage of response surface designs. The emphasis is on getting usable results, combining the
traditional statistical tools (e.g. ANOVA) with plotting of results to show important effects at
a glance. Highly recommended!
Box, G. E. P., Hunter, W. G. and Hunter, J. S. (1978). Statistics for Experimenters.
New York: Wiley.
Experiments involving mixtures, where the sum of the mixture components must equal some constant
(usually 1.0), require specialized methods. This book is the definitive reference for such
experiments, covering a variety of design approaches, analysis problems, and practical advice.
Cornell, J.A. (1990). Experiments with Mixtures. New York: Wiley.
This old text has recently been reprinted by SIAM in their Classics in Applied Mathematics
series. I found this to be a very good book, with plenty of examples to illustrate the concepts. The
book was originally published in 1971 and has not been updated, so some more recent developments in
experimental design are not covered.
John, P. W. M. (1998). Statistical Design and Analysis of Experiments. Philadelphia: SIAM.
This is a social-science oriented treatment of experimental design, aimed at a first or second
graduate-level course in statistics. Emphasis is on ANOVA, including full-factorial designs, and
a whole section (four chapters) on within-subjects (repeated measures) designs. No coverage of
response surface designs (though some discussion of "trend analysis" can be found), and coverage
of fractional designs is limited to Latin square designs. A nice reference for the beginning social
science researcher.
Keppel, G. (1991). Design and Analysis : A Researcher's Handbook. Englewood Cliffs, NJ: Prentice-Hall.
This book gives good coverage of several multivariate techniques, including cluster analysis and
multidimensional scaling in addition to the more common techniques like factor analysis,
multiple regression, and discriminant analysis. Everything is explained in clear language, so
you don't have to be a mathematician to follow the text.
Dillon, W. R. and Goldstein, M. (1984). Multivariate Analysis: Methods and Applications.
New York: Wiley.
A very nice book that covers a wide variety of multivariate statistical applications. Focuses on
practical aspects of multivariate data analysis, and is very accessible. Also included are examples
of output from various statistical software packages. This book is a nice
reference to have handy. My only complaint: no coverage of cluster analysis.
Tabachnick, B. G., and Fidell, L. S. (1996). Using Multivariate Statistics. New York:
HarperCollins College Publishers.
This book is a must-have for anyone doing serious data analysis with categorical variables.
Covers techniques from simple chi-square methods through loglinear models, models for ordinal data,
models for matched pairs, repeated categorical response data, and parametric models for
categorical data.
Agresti, A. (1990). Categorical Data Analysis. New York: Wiley.
The reference book for structural equation modeling. Goes into more depth on the subject
than any other book I've seen. Covers everything from the basics up to diagnosing problematic
models and the subtleties of different estimation methods and fit indices. The book was written over
10 years ago, so some of
the material is a bit out of date, but the foundations are still solid, and this book should still
be required reading for serious structural equation modelers.
Bollen, K. (1987). Structural Equations with Latent Variables. New York: Wiley.
This is a nice collection of chapters by some of the top researchers in SEM. Gives a good grounding
in the basics, as well as introducing some of the controversies surrounding SEMs (e.g. when to use
which fit index). Also includes chapters of a more practical nature, such as writing about SEM models
and a comparison of two software packages, and some examples of real-life SEM research.
Hoyle, R. H. (1995). Structural Equation Modeling: Concepts, Issues, and Applications.
Thousand Oaks, CA: Sage.
These books by a master of data visualization are must reading for anyone who
generates charts and graphs. Tufte doesn't just dwell on specifics of graphing techniques--he
presents a philosophy of data visualization, one that harnesses creativity as well as rigor
to effectively and accurately convey information (as opposed to many examples seen in the popular
media, where
"creativity" often obscures and distorts the informational message). Even the books themselves
are things of beauty--the layout is very elegant, making the books a pleasure to read.
Tufte, E. R. (1983). The Visual Display of Quantitative Information. Cheshire, CT:
Graphics Press. Tufte, E. R. (1990). Envisioning Information. Cheshire, CT: Graphics Press. Tufte, E. R. (1997). Visual Explanations: Images and Quantities, Evidence and Narrative.
Cheshire, CT: Graphics Press.
If Tufte represents the right-brain approach to data visualization, Wilkinson represents the left-brain
approach. Lee is one of the smartest people I've ever met, and he's done an amazing job of codifying
the theory that underlies most visualization techniques. The theory leads to some new, innovative graph
types.
Wilkinson, L. (2005). The Grammar of Graphics. 2nd Ed. New York: Springer Verlag.
This is a nice introduction to data mining in the context of real business problems. The book
is written at a very accessible level, and includes many examples of real data-mining situations
to illustrate the points in the book.
Berry, M. J. A. and Linoff, G. (1997). Data Mining Techniques for Marketing, Sales, and Customer
Support. New York: Wiley.
This comprehensive (and large!) book covers a wide range of topics, with soup-to-nuts coverage of assembling and maintaining a
data warehouse and using various techniques (statistical and other) to get useful information out of it. Puts data mining
in the larger context of decision support. Lots of good practical information, including sections on "ten
mistakes for data warehousing managers to avoid" and "big data--better returns: leveraging your hidden
data assets to improve ROI."
Berson, A. and Smith, S. J. (1997). Data Warehousing, Data Mining, and OLAP. New York:
McGraw-Hill.
Decades before "data mining" came into vogue, Tukey wrote this classic book that defined an entire
subdiscipline of statistics. Some of the material is based on outdated technology--written in 1977,
it assumes the reader doesn't have access to computers with graphical capabilities. However, the
fundamental ideas are even more relevant today than they were when the book was written. With
high-speed PC's at everyone's beck and call and tons of archived data available, everyone is trying
to sift through their data to find something "interesting"--it's more important than ever to make
sure that exploratory data analysis is done well.
Tukey, J. W. (1977). Exploratory Data Analysis. Reading, MA: Addison-Wesley.
This book gives a good, in-depth introduction into statistical learning theory, and how
modern techniques such as neural networks and statistical models fit into the theory. It has
especially good coverage of support vector machines.
Cherkassky, V. and Mulier, F. (2007). Learning from Data: Concepts, Theory, and Methods. New York: Wiley.
This is a nice reference that helps you find statistics in electronic resources,
including (but not limited to) the Internet. A nice book to have around if you
often need to dig up specific statistics. Note that this will not help you find
information on statistical methods--it will help you find specific numbers,
e.g. the population of Lithuania or the number of Roman Catholics in the U.S.
One caveat: avoid the chapter on "Statistics Basics". There are some errors in
this chapter that may mislead novices (and irk statisticians). If
you skip this chapter, the rest of the book is very useful.
Berinstein, P. (1998). Finding Statistics Online: How to Locate the
Elusive Numbers You Need. Medford, NJ: Information Today.
This is a valuable reference to have on the shelf. Gives concise but thorough definitions of
most of the important terminology used in statistics. An essential tool for people who write about
(or read about) statistics.
Everitt, B. S. (2006). The Cambridge Dictionary of Statistics, 3rd Ed. New York: Cambridge University
Press.
This 9 volume set (plus updates) is a tremendous asset to any statistics library. Gives brief but
complete background information on most topics in statistical theory and practice. This reference is
too pricey for most individuals (almost $2000 for the 9-volume set), but it's definitely a
worthwhile investment for institutional libraries--or for those independently wealthy statisticians
among us.
Kotz, S. (1988). Encyclopedia of Statistical Sciences, vol. 1-9 plus supplements. New York:
Wiley.
This 3-volume set covers most of the statistical methodology that will be of interest to social science
researchers. The explanations given are generally appropriate for an educated person who is not necessarily
a professional statistician. A very useful reference to have handy when reading journal articles that throw around
unfamiliar methodological terms. Full disclosure: I contributed two entries to this encyclopedia.
Salkind, N (Ed.). (1988). Encyclopedia of Measurement and Statistics, vol. 1-3. Thousand Oaks, CA: Sage Publications.
Wiley.