Monday, December 10, 2007

Googleplot and recommendation systems

Daniel Lemire finds recommendation systems (I can't bring myself to call them "recommender systems", since they seem to me to be targeted more at recommendees) fascinating, though he is less than impressed by their implementation, such as Google's pagerank. Personally, I don't need sources of more information; I need lower volume and greater signal-to-noise ratio. A system that recommends that I not read something -- now that would be valuable (perhaps you're thinking the same thing right about now).

But he's right about one thing: Google's chart API is the rebirth of GNUplot. Except that it's not really open source. Anyone want to write a MATLAB function to export a graph as a Google chart?


  1. Recommending you *not* look at something is called "black balling" and we discussed it here:

    Collaborative Filtering and Inference Rules for Context-Aware Learning Object Recommendation

    I guess that a related topic is spam detection.

    As for not needing more data sources... who does? But what you can hope to achieve is to replace existing sources with better sources. Also, maybe just changing your data feed might be helpful. For example, you may decide that you have learned everything you needed to learn from D. Lemire and you now want to try your luck with another dummy.

  2. I'll have to read your paper. Not having read it, I think that there's a difference between what I'll call "negative filtering"/"positive filtering" and spam filtering. Almost everyone would agree with what spam is, but there's a sea of information that is legitimately useful and interesting, but just not for me. And that changes from person to person. And it's often based on the information itself, not necessarily the source.

    But in terms of recommendation, I'm skeptical that an observation like, "oh, you've been interested in that stuff in the past, and this other person has been interested in similar things, so maybe you'll be interested in other stuff that person liked/read/whatever," will be useful. Too context dependent, too time dependent, and the source data itself, in terms of past behaviors, contents of collections, etc., seems too noisy to me.

    But, here I am commenting on something I know little about, while others have actually spent time thinking about it. Doing my part to lower the net's SNR.