Microsoft’s Powerset Acquisition

There has been a lot of speculation surrounding the value of Powerset’s technology, and the Microsoft acquisition of the company.

Last week’s Techcrunch interview with Barney Pell and Ramez Naam contains a lot of insight about the technology and the acquisition.

It is true that building a search index by parsing and understanding web pages is more expensive than building a keyword index. Scaling up from wikipedia to the whole web is something which will require a lot of resources, which the Microsoft acquisition provides.

I was compelled to reply to a recent blog post on ZDnet speculating that Microsoft acquired Powerset solely to enhance its advertising offerings, not search. This is pure speculation. As far as I know, and as far as the Techcrunch article indicates, Powerset under Microsoft will still be doing search.

Keeping up with Technology

Often when I get involved in some new piece of technology, I am doing it not just for some specific reason, but also to learn about the technology. I don’t want to be left behind.

For example, setting up this website, I had to learn about hosting, setting up a blog (I already have several but WordPress is more… configurable), and, it turns out style sheets.

The style sheet aspect was interesting: I wanted to use a “glow seed” photo I took as a header on each page to give the site a consistent look and feel (read – to look cool) and so I started digging in some online HTML documentation to find out what the right IMG tag was. It is probably 10 years since I did any serious HTML editing. I found the description of the IMG tag in the documentation and noticed “this feature is deprecated – use style sheets”. Wonderful – I have to learn about style sheets.

Luckily, my MacHeist bundle had, amongst its goodies, CSSEdit. This made it very easy to get started and roll a very simple first style sheet.

OpenCalais, Tagaroo WordPress Success

One of the reasons I got a blog on WordPress (actually, I like the feel better than Blogger), was to try Tagaroo to tag entities in my posts, which I think is an important stepping stone to more semantic search.

Unfortunately, installing Tagaroo seems to require you have your own WordPress 2.3 or 2.5 installation, and I don’t have that. I’m considering getting a hosting solution but then I might lose the awesome name of this Blog, and moving is always a pain. Still, better sooner than later.

And so… now I have a new, shiny, proper blog, and I could even import all the old musings. I even managed to set up Tagaroo!

 

Open search

John Battelle’s post on open search stimulated me to distill some of my thoughts about open search. Here’s what I wrote in a comment to his post:

The later that a third party component can make changes in the process of serving search results, the less leverage they have and therefore less of a business opportunity.

What will really change the game is to allow third parties to add features to the index, and access those features during search. Ensuring that different contributions to the search engine did not fight each other would be a challenge which would need to be addressed upfront, but the opportunities would be amazing.

Search is opening up in a fascinating way. Powerset (now part of Microsoft) is part of that moving beyond the five blue links to richer forms of presentation, and beyond keywords to a much richer index.

Powerset acquired by Microsoft

It was announced yesterday that Microsoft is acquiring Powerset.

This is a great deal for Microsoft – Powerset’s natural language technology really can give Microsoft’s search a significant bump in the quality of its search results. Parsing the web is computationally expensive, but Microsoft have the resources there to do it. These computational resources, and the all the other significant resources we now have access to at Microsoft, will allow Powerset to advance its goals, so this is a great deal for Powerset too.

Big changes like this introduce

Things Missing From (Sicstus) Prolog

Prolog is a great language. I love it.

Recently I have been writing code which uses a very large number of facts. Hundreds of thousands or millions. This code often turns out to be slow and the algorithms need to be obfuscated somewhat to speed it up.

A few additional features for Sicstus could help this situation significantly:

– an interface to MySQL
– facility to allow predicates to index on any or all arguments

Wikipedia readers

I have noticed a lot of wikipedia reading and searching applications cropping up on the web recently.

ReadWriteWeb recently published a list of top 10 ways to search Wikipedia, but some things are missing, so here is my list:

The mental wanderings of Julian Richardson