May 27 2005

Stats, continued

Published by matt at 17:12 under

I think there’s a bit of context required here.

The statistical part of my dissertation is not the most important; while collecting this data, the gross statistical investigations that I carried out previously, while inexpert, were descriptive in nature, and sufficiently clear to guide further investigation.

As I begin assembling this data, I’ve become… concerned with how this data is presented. This concern stems from a number of things; first and foremost, I want to do a thorough job of presenting the data; second, I think the data needs to be seen and understood by more teachers of programming, as I think it presents part of a story of how students learn to program that hasn’t been told in this way before. Third, but unrelated, is the fact that I don’t like not understanding things: in this case, I am not a statistician, and it is frustrating knowing there are tools out there that I don’t know how to use.

I’ve come to appreciate that statistics involves as much process—design and implementation—as architecting software. Exploratory data analysis might be likened to refactoring a system that already exists: something is there for you to touch and explore, but making sense of it and restructuring it so that others you and others can understand it is the goal.

Or perhaps not. Regardless, I’m going to focus on doing the kinds of explorations I understand, and I’ll leave the inferential statistics and modeling until later. For now, I want to get a sense for the breadth and depth of my full data set, and I’ll circle back around to fill in holes later.

Comments are closed at this time.