Wednesday, October 10, 2007

Measuring science

This post was inspired by an excellent one by GrrlScientist, linked from the title above. She starts off discussing journal impact factors, which are a measure of the average number of times a paper in a journal is cited by others. Then there's what is essentially a personal impact factor, which is the number of times a particular researcher's papers are cited. These have problems, which the H-index is meant to address. Briefly, a person has an H-index of h if he or she has at least h papers cited at least h times. So, if I have 100 papers, each cited once, then I have an H-index of 1. If 99 are cited once and one is cited 43,000 times, my H-index is still 1. If 95 are cited once and the remaining 5 are each cited at least 5 times, then I have an H-index of 5. And so on.

So, first of all, there is the question of gaming the system. It's unlikely that I can convince 43,000 of my colleagues to cite one of my papers (but, if you'd like, pick one from my CV on my UW home page and cite away). But if I'm only shooting for, say, an H-factor of 20 or so, then that might be doable. Supposedly, people do try to game the system by doing things like citation swapping, though this seems to me to represent time better spend being a more productive researcher (rather than just trying to look more productive or impactful).

Though I may be unconvinced about the effects of such gaming, I see this as a fatal flaw of any attempt to extract a simple metric from the interrelationships among such publications. Just look at how much effort Google has expended on providing good search results. Since these results are presented in a sequence, presumably from most relevant (or "best") to least, they have been implicitly assigned a single measure. And there's a cottage industry surrounding pushing sites' ratings up that has nothing to do with their content. I'll come back to this idea of creating a one-dimensional ordering later.

To me, there's another problem with metrics such as this. Let's say that my H-index is 11, as computed using Google Scholar. Furthermore, let's assume that issues such as self-cites (citing one's own work) and co-cites (citing of one's work by collaborators; I'll revisit this topic, too) don't effect rankings (these may be invalid assumptions). There's still one problem: is an H-index of 11 good? Bad? Middling? If we read Wikipedia, we learn, "In physics, a moderately productive scientist should have an h equal to the number of years of service while biomedical scientists tend to have higher values."

But what about computer scientists? We could consult a listing like the CS Meta H index. We would then have to compare my H-index with other faculty at similar stages in their careers who are working at similar institutions and who have had roughly similar career paths. Unfortunately, that information isn't in the index. We need to know a lot about different universities, different CS departments, and individual faculty. Maybe it would just be easier to read one or two of my papers and judge for yourself.

Coming back to the subject of co-cites, this could be considered a sign of an attempt to game the system. On the other hand, it would make more sense for me to make gaming arrangements with colleagues with whom I have no direct professional connection. (Hmm. Three more strategically placed citations will get me to an H-index of 12; five more in just the right spots will get me to 13.) But what about people who collaborate widely? Their papers will have lots of co-cites, but their work will also be more broadly influential because of all that collaboration. So, when I've prepared materials for external review, I always separate out the co-cites. Make of them what you will.

The desire to create this scalar (one-dimensional) metric of scholarly is a natural one. When I look at the complex dynamical behavior of a neural network, one of the first things I want to do is extract a single measure to characterize that behavior, so that I can then more easily examine how behavior depends on various parameters. But I have a very carefully defined question in mind when I do this. When we measure science, what is our question? Are we asking if a particular scientist is "good"? What is good? Does it mean that the scientist's work has impact in the field? How can we really ascertain this without understanding the field and the scientist's contributions in that context?

Einstein had four papers that changed the field of physics forever. But that's just an H-index of 4. I was discussing this with one of my colleagues, however, and his opinion was that 4 was a reasonable assessment of Einstein, and that we should want to hire and promote scientists who are consistently productive, not ones who have one brilliant flash of insight and then nothing approaching that for the rest of their lives. But how can we tell the difference between consistent, quality productivity and a laser-like focus on getting out each least publishable unit? To me, the only solution is knowing the person; we can't reduce the behavior of that large a neural network to a single useful measure.

1 comment:

  1. Hi Professor,

    Biases aside, how do you think UWB CSS program holds up to rest of the nation or UW Seattle CS? Also why does UWB call is program CSS rather then CS?

    UW Bothell campus is known, somewhat, locally; however, outside of Washington state I dont think the UWB name means much to anyone? How will it effect those who graduate from there, and want to attend graduate school or want to work for corporations outside of the state?