19 July 2007

Freebase: Ideology vs. Practice

I have been on the alpha version of Freebase of about a week now and I'm very impressed. It is an interesting experiment in how to find a reasonable median between the vast openness of Wikis and the narrow-mindedness of the Semantic Web. Though the user expandable types and properties, it looks to be a very exciting development. The community definable domains will prove even more exciting, I believe, as the folks at MetaWeb realise just how powerful these are for different domains of expertise and knowledge.

I was somewhat dismayed, therefore, when I read Robert Cook's latest blog in his Freebasics blog. This blog entry is a comparison between Freebase and Google Base, most of which I agree with. However, he goes on to say that Google Base has many different records for each object where ...

"Metaweb, by contrast, has just a single record for the Canon EOS 20D with redundancy and discrepancies resolved. Metaweb contains only ‘reconciled’ data, and maps a single object to a single thing in the world."

and that ...

"This idea of reconciliation is core to the idea of a Metaweb Topic. From my earlier posting:

  1. A topic represents a person, place, thing or idea.
  2. No two topics should have the same meaning.
  3. A topic should be important enough that a group of sane people would have something to say about it."
The fact that a topic on Freebase represents a person, place, thing or idea is just fine, as is the point that a topic should be important to a significant group of people. I am not sure why he defines them as being necessarily sane, as it is my experience that different groups of people have different interests, often vastly different, sane or otherwise.

The major problem I think arises from point number 2. Cook goes on to underline this point ...

"All distinguish Metaweb from other online data sources, but the second one is the most important. A key value of Metaweb is to squeeze out redundancy so that people (and machines) have definitive information."

But what is "definitive information"? Knowledge is not a definitive set of attributes or properties, but a rich history and contemporary discussion about the object. From this emerges attributes and properties, but these are constantly under dispute. Could we imagine a scientific discipline where there was only allowed one account of any process or object? It would be disastrous. Could we imagine any industrial process where only one account, or topic, could have only one meaning? Culture, industry and science as we know it would cease to be.

Knowledge is promoted, grows, evolves and develops through disagreement, challenge and critique. It is just those unresolved differences which makes knowledge possible. Remove them, and you remove the possibility for knowledge. Try to remove them from Freebase, or to severely restrict them, and Freebase will fail.

It is not needed, so why have such a requirement?

Semantic Nodix: Freebase