Monday, March 21, 2005

Remote device fingerprinting -- a new privacy concern

The linked paper describes a method for uniquely identifying computers from remote locations. This device "fingerprinting" relies on identification of clock skews -- the difference in the rate of "ticking" of a computer's internal clock versus some reference clock -- based on time stamps incorporated into the low-level packets of information that make up internet communications. It turns out that this skew is, to a great extent, unique to each combination of machine and operating system and is reasonably constant despite geographic location and network connectivity. This approach would allow a web site, for instance, to identify client computers without the use of cookies. It would also allow anyone who can attach a computer to a network backbone to scan for a set of computers that are under surveillance. Shades of Carnivore.

The article ends with the following: "Our results compellingly illustrate a fundamental reason why securing real-world systems is so genuinely difficult: it is possible to extract security-relevant signals from data canonically considered to be noise. This aspect renders perfect security elusive, and even more ominously suggests that there remain fundamental properties of networks that we have yet to integrate into our security models." Those of us concerned with our privacy take great care to ensure, for example, that we only allow certain sites to store cookies in our web browsers and that each cookie can only be retrieved by the storing site. In that respect, cookies are fairly privacy/security benign. Methods like those in this paper will require explicit counter-measures. For this particular approach, it appears (judging from the results in Table 5 of the paper) that software can be used to alter the skew, randomizing it to foil fingerprinting. But I wonder about other measurable patterns of computer network activity. It might even be possible to use network activity patterns to identify users, even if they switch computers.

Saturday, March 19, 2005

The end of my portable computing problems?

I'm in a bit of a bind regarding portable computing devices. My Powerbook is too big to lug around everywhere. I have a Sharp Zaurus, which is pretty much a full-featured Linux machine in a PDA package. It's still a bit big; I have to carry it around in a belt holster. I have a Sony Ericsson T637 cell phone. It's very small; fits in my pocket. It has alarms and a phone book, but is only about 75% of the way there as far as PDA functions. Part of the reason the cell phone isn't a sufficient PDA platform is software, but part is hardware: a tiny display and painful input method. given the hardware limitations, I don't blame Sony for not including more robust software.

Well, one part of the hardware problem appears to be solved in the near future. The linked New York Times article describes flexible display technology that Philips Polymer Vision will be bringing to the market within a couple years (they apparently will be showing a prototype device in a couple months). Basically, these (initially 5" diagonal) displays can be rolled up inside the body of a device, where a small portion could be exposed to view. You can then pull the full display out when you need it. Finally, a decent-sized display that fits in a (normal) packet.

Thursday, March 10, 2005

Maser developer has difficulty understanding conditional probability

I heard the interview linked from this post's title on Morning Edition on National Public Radio this morning, and for a bit, thought I was still semi-conscious. The story was about Dr. Charles Townes who, along with Dr. Arthur Schawlow, developed the maser (microwave amplification by stimulated emission) -- the longer-wavelength precursor to the laser. Dr. Townes shared the 1964 Nobel Prize in Physics for this work.

Now, what stood out as unusual was that the story was about his receiving the "Templeton Prize" for his work in the field of religion.

This prize is awarded by the John Templeton Foundation, an organization devoted to the intersection between science and religion. I've looked through their web site, and they don't seem on the surface to be religious right crazies (if someone ever reads this and knows more, please drop me a line or add a comment). According to this June 2004 article, Townes is on the Templeton Foundation Board of Advisors.

What stuck out in my mind was Townes' comments on what one might call "cosmological intelligent design" (see the last paragraph of this Cornell news story. Basically, like aficionados of "intelligent design" as a stealth way of inserting religion into the biology curriculum, he reasons that the universe must have been designed with life in mind, because it is so improbable that the universe's physical laws would turn out "just right" for life to arise. In fact, it is true that physicists don't currently know why various underlying physical constants have the values they do, thought only fairly narrow ranges would produce universes suitable for life.

This is an old argument against science in general and evolution in particular: that since we can't explain everything right now, that must mean that God set things up. Besides the logical fallacy, there is a fundamental probabilistic fallacy in this. Let's imagine that many, many universes could be created, and only a small fraction would have physical laws suitable for life. to make things concrete, I'll use 0.0001 as a small probability, but any number would work. So, the probability of a suitable universe is P(S) = 0.0001. Let's also assume that, even if the universe is suitable, the probability of life arising is small, too (again, the number actually doesn't matter). This is a conditional probability, the probability of life given we know the universe is suitable, P(L|S) = 0.0001. We also know that life cannot arise if the universe is not suitable, P(L|~S) = 0. So, the a priori probability of life occurring in a universe, before we checked to see if the universe is suitable, is P(L) = P(S) P(L|S) = 10^-8.

What we want to know is the probability that the universe is suitable, given that there is life in it. This is exactly what Townes is having a problem with -- we've already seen that P(S) and P(L) are very small, so there's no way that this suitable universe could have happened "by chance". (Of course, we don't know that it was by chance, but that's beside my point here.) Actually, however, that is not true -- given that we exist (otherwise, we wouldn't be around to be amazed by this improbable-seeming universe), the probability that the universe is suitable is 100%, P(S|L)=1. There are a variety of ways of arriving at this; I'll use Bayes' rule (not the simplest way to do it), but it's really a tautology of the math.

Bayes' rule allows us to reason from evidence back to cause, given that we understand ahead of time how the cause relates to the evidence. For our suitable universe, the rule is:
P(S|L) = P(L|S) P(S)/P(L) = (0.0001)(0.0001)/(10^-8) = 1

As we can see now, the numbers don't matter because P(L) = P(L|S) P(S), which is true because P(L|~S) = 0. In other words, of course we see that the universe is suitable for life; if it weren't suitable, we wouldn't be here! The same reasoning can be applied to evolution. It doesn't matter how improbable the evolution of intelligence is, the fact is that only intelligent life can ponder such matters, and so all such conditional probabilities are actually certainties.

Note added 3/11/2005: There's a great FAQ on the "Anthropic Principle" that you should read, if you're interested in this sort of thing.

Monday, March 07, 2005

More cool software: Doxygen

Since I teach my students to document their code well, and since I've had long experience of not being able to make heads or tails of my own code after more than a couple days, I've been working on making my own code's documentation more useful. I had been using Apple's HeaderDoc software,
which converts JavaDoc-like comments in code to HTML. It's a bit clunky, but OK. Then, I tried Doxygen, a truly incredible package. First off, it will do almost everything without special markup: UML class diagrams, class hierarchies, call graphs, header include dependency graphs, cross references to called functions, etc. So you can use it to extract code structure to reverse engineer and reuse undocumented source files. Using the Graphviz package, it generates beautiful graphs. If you have comments before each class, method, and variable, all that will automatically get included in the generated documentation. If the comments were to include, for example, XML, it will detect that XML's structure and format it with appropriate indentation. You can also include a wide range of Doxygen-specific markup to create explicit links, file dates and versions, numbered or bulleted lists, mathematical formulas, etc. And, while I'm using it for C++, it works for many different input languages.

Note that I haven't mentioned output format. That's because Doxygen will output just about anything you'd want: HTML (with and without frames), LATEX, RTF (for MS Word), Postscript, PDF (with hyperlinks), compressed HTML (for Windows Help), and Unix man pages. The original source code can be included with the documentation, which means the HTML output is really an absolutely complete documentation web site for the project. The output can be completely customized to "brand" the documentation as you like. The only downside I see is that you must rebuild all of the documentation, rather than just the documentation for a changed file (probably unavoidable, given that the software automatically determines and documents interdependencies). If you write code, I think you'll agree that Doxygen will make your work look good.

Thursday, March 03, 2005

Cool software: SnipSnap

I've been contemplating the use of blogs and their ilk for something other than feeding my egotism. More specifically, one of the weaknesses of the blog format is that it is only organized chronologically. Yes, you can create indices in some blog software, and since every post has a permanent link, you can freely interlink them. But, generally speaking, there's a fair amount of effort involved in maintaining those additional links, plus the time needed to administer your own blog software (which is why I'm using Blogger).

Well, I ran across some software this week that fixes most of these issues: SnipSnap.

SnipSnap is a combination of blog and wiki software. If you're not familiar with wiki software, you can think of it as very flexible, simple knowledge/content management software. Articles placed in the wiki can be interconnected based on their titles (more or less), hierarchically, and by assigned labels (at the least). Articles placed in the blog are organized chronologically and also by title and label. Wiki software emphasizes customizability, with users typically empowered to edit most any document and even changing the the site's organization. Everything can be stored in a backend database. Anyway, if you already knew about wikis then the foregoing sounded pretty lame; if you didn't, it may have piqued your interest.

The great things about SnipSnap are that it is cross platform (written in Java) and incredibly easy to install and get up and running. if you don't want to customize it much, it literally takes about five minutes (at most) from binary installation to up and running. I installed it on my home file server and spent more time playing with it to have it place its data where I wanted it, run under a non-privileged user, and set up an ssh tunnel through the firewall. As I evaluate how it fits into our workflow, my wife and I will use it to store recipes, notes about the kids' education (homeworks, tests, etc.), various records (for example, notes from doctors visits), software development notes (UML diagrams, debugging and design notes), research lab book entries (what was done, results), etc., etc. I'd eventually be interested in incorporating it into my research lab. There are only two things missing that I might want: real calendaring (schedule future events, set notifications, later edit them to note results -- all with the usual organization and linking ability) and more comprehensive security options (option to login before seeing anything, per-item or hierarchy level access control). Some aspects of these may already exist as contributed plugins; I haven't had time to look into everything about it yet (it's easy enough to get going that there has been no need to). And it's open source.