Exploratory data analysis (EDA) is an approach to analyzing data for the purpose of formulating hypotheses worth testing, complementing the tools of conventional statistics for testing hypotheses[1]. It was so named by John Tukey.
Arthur Bowley used precursors of the stemplot and five-number summary (Bowley actually used a "seven-figure summary", including the extremes, deciles and quartiles, along with the median - see his Elementary Manual of Statistics (3rd edn., 1920), p.62 - he defines "the maximum and minimum, median, quartiles and two deciles" as the "seven positions").
Hoaglin, D C; Mosteller, F & Tukey, John Wilder (Eds) (1985). Exploring Data Tables, Trends and Shapes. ISBN 0-471-09776-4.
Hoaglin, D C; Mosteller, F & Tukey, John Wilder (Eds) (1983). Understanding Robust and Exploratory Data Analysis. ISBN 0-471-09777-2.
Tukey, John Wilder (1977). Exploratory Data Analysis, Addison-Wesley. ISBN 0-201-07616-0.
Velleman, P F & Hoaglin, D C (1981) Applications, Basics and Computing of Exploratory Data AnalysisISBN 0-87150-409-X
Notes
^ "And roughly the only mechanism for suggesting questions is exploratory. And once they’re suggested, the only appropriate question would be how strongly supported are they and particularly how strongly supported are they by new data. And that’s confirmatory.", A conversation with John W. Tukey and Elizabeth Tukey, Luisa T. Fernholz and Stephan Morgenthaler, Statistical Science Volume 15, Number 1 (2000), 79-94.
^ Konold, C. (1999). Statistics goes to school. Contemporary Psychology, 44(1), 81-82.
^ "Exploratory data analysis is an attitude, a flexibility, and a reliance on display, NOT a bundle of techniques, and should be so taught.", John W. Tukey, We need both exploratory and confirmatory, The American Statistician, 34(1), (Feb., 1980), pp. 23-25.
References
Leinhardt, G., Leinhardt, S., Exploratory Data Analysis: New Tools for the Analysis of Empirical Data, Review of Research in Education, Vol. 8, 1980 (1980), pp. 85-157.
External links
KNIME Konstanz Information Miner - open-source data exploration platform, also integrates other popular frameworks like Weka or R
Visalix (free interactive web application for EDA)
DataDesk (free-to-try commercial EDA software for Mac and PC)
GGobi (free interactive multivariate visualization software linked to R)