Jan 31 2003
A little lesson
Actually, two.
What I’ve been doing, in English
First, a bit about what I’ve been working on: trees. No, I haven’t become an arborist, but I have become interested in determining how different two programs are. One way to get at this is to look at the source code for the program, and treat it as a string. You then ask “how different are the two strings?”
Or, put in plain English, I’m asking (essentially) how different two words are.
Consider the words “god” and “good”. These words are separated by a distance of 1. One insertion needs to be made to “god” to turn it into the word “good.” For another example, consider the words “bird” and “fish”. These words are separated by a distance of three: the ‘b’->’f', ‘r’->’s’, and ‘d’->’h’. The ‘i’ is identical in both words.
Using this notion of ‘different,’ I can take the source code for a program and see how much editing has been done. But, this ignores the structure of the program. For example, I could take two essays, and look for how much has changed, but if I ignore the fact that the essay has structure (paragraphs, sentences, etc.), then I’m really ignoring an important part of the document. Programs are the same way–they have sentences and paragraphs as well, and those differences have a lot of meaning in terms of the way a program executes.
To get at the problem of structure, you have to consider the distance between words again, but it needs to be more complex. Instead of two words (like “bird” and “fish”), consider two crossword puzzles. How can you tell how different two (completed) crossword puzzles are?
For example, how different is the puzzle
f bird s h | and |
t bard r p |
Ahh. That’s hard.
A lesson (re)learned about research
The algorithms (recipes, directions) employed in comparing two crossword puzzles are not simple. In researching these algorithms, I asked some friends at Indiana where to start (not being familiar with the field), and then immediately began tracing the evolution of ideas in this area to the present. I looked for the most recent papers on the subject.
Typically, you go back to seminal papers to understand a subject; because I wanted to write fast, working code now, I went for the most recent work in the field. As a result, I spent a week frustrated, reading breadth-first across the field in the present, as opposed to a depth-first dig through the past.
This week, I went back to ‘83 and ‘93. Those were the good years–the algorithms weren’t as fast, but they weren’t poor either. And, they were easier to understand; there are fewer tricks at work in ‘93.
So, here, at the end of the week, I think I have in place all the tools to make comparisons between student code at two different levels–the word level, and the structural level.
More on why later.
| Do you care? No! Currently, I’m listening to Heart of Rock and Roll by Huey Lewis And The News |
Comments Off
