logo
logo

logo
-
/ 5
votes

Data analysis for politics and policy

Abstract: Introduction to data analysis; Predictions and projections: some issues of research design; Two-variable linear regression; Multiple regression.

Review: "Journal of the American Statistical Association, September 1976"

Data Analysis for Politics and Policy, Edward R. Tufte. Englewood Cliffs, New Jersey : Prentice-Hall, 1974. x + 179 pp.

"Edward R. Tufte's book Data Analysis for Politics and Policy is, quite simply, excellent. The aims of the author in the writing of this book is "is to present fundamental material not found in statistics books, and in particular, to show techniques of quantitative analysis in action on problems of politics and policy" (p. ix). To achieve this end, Tufte considers a narrow range of important topics in statistical analysis, primarily dealing with problems Of prediction (including a good discussion of the concept of causation) and the relationships among variables through simple and multiple regression.

Most of the ideas discussed are presented in several detailed examples. For example, much of the first chapter explores the relationship, causal or otherwise, between mandatory motor vehicle inspection and deaths due to automobile accidents. This example begins with an interesting problem and then suggests a collection Of data to study it (i.e., data on 49 states for the years 1966—68). Problems, such as units of measurements, causation vs. association, and the types Of inference possible from such data, naturally arise. Tufte leads the reader through a systematic analysis and, by presenting the raw data in the text, leaves the reader to pursue the problem. The bulk Of the book concerns the use and interpretation of simple and multiple regression. Here, the discussion centers on issues that, as Tufte claims, do not usually find a place in standard statistics texts. For example, in simple regression, the book stresses the central role of residuals and residual analysis, and describes many of the measures familiar to social scientists, r2, S2Y/X, etc., as functions of the residuals, "…since reasonable measures Of the quality of a line's fit to the data could hardly be anything but a function of the magnitudes Of the errors" (page 70).

Tufte puts residual plots to good use to gain understanding of a data set, and he shows how finding outliers gives the analyst hints about the inadequacy Of a statistical model. This attitude is clearly passed along to the reader. The discussion Of graphical techniques in general is quite good and includes the reproduction of graphs of several scatter plots with the same regression line from [1].

Other topics in simple regression are also considered. A brief but compelling discussion of the "value of data as evidence," with regard to the interpretation of nonrandom samples, is presented. An important discussion of the usefulness of computing slopes instead of correlation coefficients is given, complete with a good quote from John Tukey.

Several examples requiring transformations Of one or both variables to the logarithmic scale are given, along with an interpretation of transformed variables.

The section on transformations is difficult for many students, but it contains information that is not usually available to the beginning nontechnical student. The presentation of multiple regression is rather brief. There is sufficient content for the reader to appreciate multiple regression, but not really enough to actually do it. The discussion concentrates on the meaning of several predictors for a single response variable and on ways to understand complicated relationships.

There is also a fine discussion of multicollinearity. The examples of the use of multiple regression are rather small, but I have found them useful in classes since the reader can reproduce the analysis with a minimum of effort. The book was probably intended to be used in quantitative-methods courses in political science, public affairs or similar fields. For the last two years, I have use it as a supplemental text in a demanding statistics service course for first year social science graduate students. The book has received almost uniform praise from the students involved. " --

SANFORD WEISBERG University Of Minnesota

REFERENCE [1] Anscombe, F.J.. "Graphs in Statistical Analysis," The American Statistician, 27 (February 1973), 17-21. "

Edward Rolf Tufte (born 1942 in Kansas City, Missouri to Virginia and Edward E. Tufte), a professor emeritus of statistics, graphic design, and political economy at Yale University has been described by The New York Times as "the Leonardo da Vinci of Data". He is an expert in the presentation of informational graphics such as charts and diagrams, and is a fellow of the American Statistical Association. Tufte has held fellowships from the Guggenheim Foundation and the Center for Advanced Studies in Behavioral Sciences.