14 August 2007

Wikipedia, Freebase and the Semantic Web

There is a lot of discussion about how to organize information on the Web. For that matter, there is, and always has been, a lot of discussion about how to organize information generally. I have been on Freebase for about the last month and have found the differences between its approach and those of Wikipedia, at one extreme, and the Semantic Web, at the other, very enlightening. It is not that I think the Freebase is the ultimate answer, I do not. However, I do think that it offers a very interesting alternative to the other two extremes of information organization. Freebase offers a middle ground between the two extremes. It offers the ability to add as much information as possible, but makes only one requirement -- that each 'topic' has only one instance. In recognition of the Semantic Web, a series of high level 'types' are being created, but, unlike the Semantic Web, anyone can extend and create new types as they wish. This may not seem like much of a change, but it is, in fact, quite profound.

The key differences between the Semantic Web and Wikipedia are very telling. If we first look at the difference with the Semantic Web, we see that Freebase has abandoned the central tenant of the Semantic Web (SM). This is that there is a universal logical structure to all knowledge, and that this can be defined (by a select few at the top of the W3C). The Types of Freebase may seem to be very similar to the high-level types of the SM, but they are much more like a hybrid of RDFs and OWLs. Rather than creating a pyramid of truth, as SM is trying to do, Freebase's Types are a more traditional, and more pragmatic, categorisation of things. The high-level Types of Freebase to not, yet, claim to be higher order concepts, but generalised conventions such as 'people', 'places', 'times', etc. I will not go into the discussion as to why these Kantianesque categories are problematic as it doesn't really matter here. We can happily use these categories within Freebase even though in most contexts they are problematic and uncertain. The key point here is that the pragmatic categorisations -- Types -- of Freebase are infinitely extendible where those of the SM are ultimately reducible. Freebase may seem very Semantic Webish, but it is not as it inverts the logical structure of its categorisation. Whereas the SM starts from the messy diversity of the information world and, it hopes, progressively refines it to basic principles, Freebase starts from some pragmatic general categories and allows us all to extend them.

The differences with Wikipedia are even more interesting. Whereas Types are a kind of inversion of SM's hierarchy, Freebase takes on the Wikipedia's uncontroled extension through its definition of 'topics'. By insisting that each "thing" in the world has only one instance -- one topic entry -- Freebase hopes to overcome the multiple accounts that proliferate on Wikipedia. They hope that it will be the categorisations that will proliferate, not the instances.

This is not a bad idea, though it is fraught with its own problems. Robert Cook and I had a few discussions about this problem here and on Freebase, though I don't think I expressed my concerns very well. Perhaps I can clarify a bit here.

What is very interesting about Wikipedia, and all Wikis, is that when a topic is begun it takes a bit of time to stablise. The process of stablisation usually occurs as a certain group of wiki-editors appropriates the topic and keeps others from complicating their version. As a result, others, who may disagree with the now 'authoratative' account create other entries with different accounts. We might call this process "budding'. Other accounts "bud" off of the original stable account to create a constellation of accounts around any topic. It is this budding that Freebase is attempting to avoid.

As I have stated below I do not think that this is a problem as I see it as a sensible pragmatic decision. Not lease as this problem, how to link-up all the different account which surround a topic, is one of the most difficult in the history of philosophy. Thomas Kuhn, for one, demonstrated in the 1960s that this kind of budding around a stable topic is the key mechanism of paradigm shifts in science. Others have argued since that it is a key mechanism in all knowledge production. As such, to legislate against this budding around topics could have serious implications for the future of Freebase.

A problem is, though, that to go down the route of Wikipedia won't work either. There is no way of accounting for the discursive connections between the stable topic and the buds. By Freebase keeping one topic instance and one topic instance only, they overcome the problem of the multiple instances, but at the cost that they deny any mechanism for accounting for the new and diverse opinions that create paradigm shifts in knowledge. What I am arguing here is not very different to that proposed by Marvin Minsky in his Society of Mind theory. Or, for that matter by Danny Hillis, the founder of Metaweb.

I am afraid that I too have no real solution to this problem, but I do ask the people at Metaweb to not ignore this problem by claiming that the single instance topic is philosophically real.


  1. Robin -- I agree with you that the budding is real. Already, Freebase topics have begun to drift from Wikipedia. Typically, this is because Freebase requires finer granularity.

    Depending on the nature of the 'budding', there are mechanisms for mapping the earlier topics and newer ones. Where additional granularity is needed, a containment relationship can be defined: the original, more abstract topic links to the more concrete ones.

    We see this often in genres applied to creative works, such as the film genre "Western" has a relationship to "Spaghetti Western" that indicates containment.

    More abstract topics are more likely to drift over time just as they are less likely to map closely among different languages. Finer granularity is one way to establish more semantically stable points.

    I don't claim that Freebase has answered the questions you are posing. Indeed, I believe that we will need to eventually introduce additional semantics that will formally encode drift, blurring or narrowing of a topic's meaning.

    For now, however, since there is so much to do, we have made some simplifying assumptions that allow us to understand what a community really wants before adding features that could keep average users from participating.

  2. Robert .. I hope that my ramblings are not seen as belittling of the immense job that Metaweb has undertaken. It was not my intention. I realise that all of you are Metaweb have far more important things to get on with. I use Freebase here because I see in it the potential to at least partially resolve these problems that I have been working on for so many years.

    However, my point was not that one topic is more abstract and the other more specific. My point about budding was that two quite different, and perhaps incommensurable, definitions of a topic take place. We could see as an example the topic "species". There will be one take by the pheneticists and a completely different one, and incommensurable one, by the cladists. Each will have completely different properties and even different designations for "the same" types of animals. Both, however, refer to the same abstract topic. This is where I see problems arising.