19 July 2007

Freebase: Ideology vs. Practice

I have been on the alpha version of Freebase of about a week now and I'm very impressed. It is an interesting experiment in how to find a reasonable median between the vast openness of Wikis and the narrow-mindedness of the Semantic Web. Though the user expandable types and properties, it looks to be a very exciting development. The community definable domains will prove even more exciting, I believe, as the folks at MetaWeb realise just how powerful these are for different domains of expertise and knowledge.

I was somewhat dismayed, therefore, when I read Robert Cook's latest blog in his Freebasics blog. This blog entry is a comparison between Freebase and Google Base, most of which I agree with. However, he goes on to say that Google Base has many different records for each object where ...

"Metaweb, by contrast, has just a single record for the Canon EOS 20D with redundancy and discrepancies resolved. Metaweb contains only ‘reconciled’ data, and maps a single object to a single thing in the world."

and that ...

"This idea of reconciliation is core to the idea of a Metaweb Topic. From my earlier posting:

  1. A topic represents a person, place, thing or idea.
  2. No two topics should have the same meaning.
  3. A topic should be important enough that a group of sane people would have something to say about it."
The fact that a topic on Freebase represents a person, place, thing or idea is just fine, as is the point that a topic should be important to a significant group of people. I am not sure why he defines them as being necessarily sane, as it is my experience that different groups of people have different interests, often vastly different, sane or otherwise.

The major problem I think arises from point number 2. Cook goes on to underline this point ...

"All distinguish Metaweb from other online data sources, but the second one is the most important. A key value of Metaweb is to squeeze out redundancy so that people (and machines) have definitive information."

But what is "definitive information"? Knowledge is not a definitive set of attributes or properties, but a rich history and contemporary discussion about the object. From this emerges attributes and properties, but these are constantly under dispute. Could we imagine a scientific discipline where there was only allowed one account of any process or object? It would be disastrous. Could we imagine any industrial process where only one account, or topic, could have only one meaning? Culture, industry and science as we know it would cease to be.

Knowledge is promoted, grows, evolves and develops through disagreement, challenge and critique. It is just those unresolved differences which makes knowledge possible. Remove them, and you remove the possibility for knowledge. Try to remove them from Freebase, or to severely restrict them, and Freebase will fail.

It is not needed, so why have such a requirement?

Semantic Nodix: Freebase


  1. Robin -- Unfortunately, I think the post on my blog was a little ambiguous.

    Definitive for Freebase means "definitive entities". In other words, there shouldn't be two George W. Bush topics, but the properties (the assertions, facts and otherwise attached to George W. Bush) are infinitely extensible.

    In other words, if we disagree, we need to be sure first that we're disagreeing on the same topic.

    One goal of Freebase is to define some of the typical properties a community would expect (date of birth, political party, etc.), but let the community come up with other properties that express what they want.

    It is then up to the application to filter which of these many facets to show.

  2. "No two topics should have the same meaning"

    I don't think this assertion necessarily precludes diversity of opinion. For example, the topic "Evolution" could be typed/tagged "Scientific Theory" and "Scientific Hoax" at the same time. Depending which perspective I subscribe to, I could ask for the relevant structured data.

    The power is that I can ask for "Evolution" by a unique id and not get, for example, the musical group "Evolution."

    All that said, I agree with your point, generally, I think the contention will be around the relationships between "things", not necessarily the "things" themselves. Those relationships make up the bulk of what is asserted to be "true" about a thing. I don't think this is impossible to get around, though. The more expansive in outlook freebase is, the better it will be.

  3. Many thanks for your reply Robert, and point taken. I realise that it is the properties that are extensible, and agree that this is the real power of Freebase. Also, I would like to reiterate that I am very excited about Freebase generally, so you are discussing with a fan.
    Your second point, about properties, that Freebase defines a set of expected properties needs some discussion though. I agree that you should define some expected properties, but different communities 'expect' different properties as basic. Here I think there is an unrealised, or perhaps not, power to Freebase as different communities may define different domains with different 'expected' base properties. This would vastly extend Freebase's power to work for any number of knowledge communities. For me, this is a really exciting potential.
    Back to my original point, though, I can see why you have decided to keep topics singular. I imagine that you do not want, for very good reasons, to get the kind of proliferation of topics that you get on most 'Web 2.0' information sites. However, my point was that there is a big difference between a good reason to choose to make topics singular, and to assert, which you may not be doing, that topics -- nouns -- are singluar. I would still argue that you are still faced with the problem that these representations of entities are still just representations and that this will cause you problems. I do not think that this is fatal to Freebase, far from it, but it will need to be recognised at some point. You may want to look at my comments on the Domains and Topics discussion on Freebase (Domain suggestion: History) where this problem is beginning to surface.