There has been a lot of speculation surrounding the value of Powerset’s technology, and the Microsoft acquisition of the company.
Last week’s Techcrunch interview with Barney Pell and Ramez Naam contains a lot of insight about the technology and the acquisition.
It is true that building a search index by parsing and understanding web pages is more expensive than building a keyword index. Scaling up from wikipedia to the whole web is something which will require a lot of resources, which the Microsoft acquisition provides.
I was compelled to reply to a recent blog post on ZDnet speculating that Microsoft acquired Powerset solely to enhance its advertising offerings, not search. This is pure speculation. As far as I know, and as far as the Techcrunch article indicates, Powerset under Microsoft will still be doing search.
Often when I get involved in some new piece of technology, I am doing it not just for some specific reason, but also to learn about the technology. I don’t want to be left behind.
For example, setting up this website, I had to learn about hosting, setting up a blog (I already have several but WordPress is more… configurable), and, it turns out style sheets.
The style sheet aspect was interesting: I wanted to use a “glow seed” photo I took as a header on each page to give the site a consistent look and feel (read – to look cool) and so I started digging in some online HTML documentation to find out what the right IMG tag was. It is probably 10 years since I did any serious HTML editing. I found the description of the IMG tag in the documentation and noticed “this feature is deprecated – use style sheets”. Wonderful – I have to learn about style sheets.
Luckily, my MacHeist bundle had, amongst its goodies, CSSEdit. This made it very easy to get started and roll a very simple first style sheet.
One of the reasons I got a blog on WordPress (actually, I like the feel better than Blogger), was to try Tagaroo to tag entities in my posts, which I think is an important stepping stone to more semantic search.
Unfortunately, installing Tagaroo seems to require you have your own WordPress 2.3 or 2.5 installation, and I don’t have that. I’m considering getting a hosting solution but then I might lose the awesome name of this Blog, and moving is always a pain. Still, better sooner than later.
And so… now I have a new, shiny, proper blog, and I could even import all the old musings. I even managed to set up Tagaroo!
John Battelle’s post on open search stimulated me to distill some of my thoughts about open search. Here’s what I wrote in a comment to his post:
The later that a third party component can make changes in the process of serving search results, the less leverage they have and therefore less of a business opportunity.
What will really change the game is to allow third parties to add features to the index, and access those features during search. Ensuring that different contributions to the search engine did not fight each other would be a challenge which would need to be addressed upfront, but the opportunities would be amazing.
Search is opening up in a fascinating way. Powerset (now part of Microsoft) is part of that moving beyond the five blue links to richer forms of presentation, and beyond keywords to a much richer index.
It was announced yesterday that Microsoft is acquiring Powerset.
This is a great deal for Microsoft – Powerset’s natural language technology really can give Microsoft’s search a significant bump in the quality of its search results. Parsing the web is computationally expensive, but Microsoft have the resources there to do it. These computational resources, and the all the other significant resources we now have access to at Microsoft, will allow Powerset to advance its goals, so this is a great deal for Powerset too.
Big changes like this introduce
Prolog is a great language. I love it.
Recently I have been writing code which uses a very large number of facts. Hundreds of thousands or millions. This code often turns out to be slow and the algorithms need to be obfuscated somewhat to speed it up.
A few additional features for Sicstus could help this situation significantly:
– an interface to MySQL
– facility to allow predicates to index on any or all arguments
I have noticed a lot of wikipedia reading and searching applications cropping up on the web recently.
ReadWriteWeb recently published a list of top 10 ways to search Wikipedia, but some things are missing, so here is my list:
We seem to have gone live a few minutes early! Awesome!!
This caught my eye a while back and took some work to track down… a list of illegal foods, including: Mellified Man
Old Man’s war was an incredible book. I was struck by the relentless logic of the plot, and the tight writing.
In contrast, the sequel “The Ghost Brigades”, limps along. The writing is nowhere near as precise, and the plot ambles from front to back cover.