LSI, true or lie?

I’ve mentioned a number of times that, five years ago, in the second edition of my book I wrote about latent semantic indexing (LSI) and referred to some papers available at the time. Of course, Susan Dumais of Microsoft (just search for her) was the instigation of my research into the subject. However, the most important paper I read was by another well known researcher (now attached to Google) about LSI as applied to link anchor text.

Of course, if you take the whole notion of LSI and apply it on a document-by-document basis, it’s going to take a lot of processing time. But if you apply it to a separate corpus made entirely from link anchor text and a connectivity server, then, perhaps, you reduce the overhead. I’ll bow to Dr Edel Garcia to correct me here if wrong. But local ranking using link anchor text is something I believe all search engines rely on to reduce overhead anyway.

LSI has its place. PageRank has its place. On a page-by-page basis they would work. But neither has developed into a sub-zero second response, in my opinion. Does that mean search engines can’t use them? Not at all. There are elements of every known discipline in science that you can adapt and apply in different ways. I believe the co-occurrence theories of Dr Edel Garcia are much more applicable in an heterogeneous corpus such as the web and link anchor text. Whereas, I believe that in an homogenous environment such as a digital library or CD ROM, for instance, LSI is an ideal method (response time not critical) of not missing relevant documents simply because they didn’t have all of the query keywords included on the page.

Can you optimize in the SEO sense around LSI? Well, the first step is you’d need to know what it is. And I have no idea why SEOs who have no idea what it is resort to just adding it into their promotional blurb, or vanity posts and give the industry such a bad name at times.

If you know what it is and have a method of optimizing around it, step out now. Now is the time to prove you’re not just doing the usual thesaurus lookup. And if you don’t know what it is and you’re just talking about it… Then shut up!

My ClickZ column invites anyone to take part in a little survey about what is, and what isn’t LSI. What do you think?

