Wanted: "Joined-Up" News Search For Grown-Ups Via Linked Data

All the information in the world and beyond falls into one of three categories.

  • What we know
  • What we know we don't know
  • What we don't know we don't know

    Sounds very much like the poetry of Donald Rumsfeld, but the idea of separated knowledge predates him all the way back to the time of Plato's Dialogues. When it comes to the news - the information updates that inform our knowledge of the world and our specific interests - there are two other conditions to take into consideration: the expected and the unexpected.

    Expected news is the regular features and updates that we get from feature writers, blogs that share our views, and is largely a comforting place to be: something new to consider on a regular basis but in a familiar continuum of form and style. Updates aren't really more news, but just more information on a topic. Our knowledge is expanded in a staged manner, a bit like taking a walk along an unfamiliar path in a familiar piece of countryside.

    We all subscribe to blogs, tweets, Facebook pages, etc., because they give us more of what we want to know. Very often, the information is new but rarely is it a break from the past. We are mildly stimulated and feel comforted because we feel abreast of the times.

    This is the stuff we know about.

    The unexpected news item, by definition, comes as a huge surprise. In the traditional media, huge surprises tend to be huge, bad surprises (the Haiti earthquake being the most recent). They tend to arrive at irregular and unwelcome moments and usually have a moral, physical or spiritual affect on us: for those of us who care anyway.

    We can never predict these big stories and the most that journalists around the world can do to prepare for them is to keep a ‘crash bag' by the door full of travel essentials.

    This is the stuff we don't know about.

    What these two categories have in common is the presence of an awful lot of information. Either a large database of information accrued over a period of time or large amounts of information generated in a very short time. What both content providers and content consumers really want is a well-told and relevant story. Turning information into stories makes things easier to comprehend, easier to remember and perhaps easier to learn from. Making information meaningful is what a journalist does.

    Relevance is the true promise of the Semantic Web. Profium and the Open Calais project from Thompson Reuters are just two of the increasing number of Semantic Web efforts that promise better and more relevant information. Profium applies Semantic Web technology to the massive ever-filling databases of news agencies, and links that information in a meaningful way to produce wire-copy that can then be sold on to their news publishing clients. This timely relevance means that conflicting reports can be avoided and accuracy enhanced.

    Open Calais (nice overview here from Lullabot) can be used to convert great lumps of text into something meaningful and relevant through predesigned taxonomies, classifications and preassigned tags. Slate magazine has News Dots which is a very interesting application of Semantic Web technology to news reports.

    This powerful pre-filtering and tagging means that a journalist on the wrong end of a lot of text information (which is most of the time) now has a significantly useful tool for making sense of the incoming data. On the Lullabot page there is a link to an Open Calais viewer. Drop a lump of text in there, this article for example, and see what comes up. It becomes immediately obvious how useful this technology can be.

    Through their Open blog, the New York Times, in their usual wonderfully innovative way, have given us the opportunity to build our own API to retrieve and assemble information from the NYT archives that we think might be relevant to us. The demos here use the info boxes from Wikipedia that were converted to linked data by DBpedia. But the awesomeness is there for all to see. (Tip: The code looks even easier if you squint your eyes.)

    This is joined-up search for grown-ups where we can bring ideas together by the means of linked data. Contrast this to the block-letter searches of Google where many items are returned but few of them are relevant and none are related to each other in a meaningful way.

    The hold up for semantic technologies is in implementation. Things will really speed up once data is stored in ways that make searching and finding quicker and more accurate. The momentum is alreadhy there and will only increase exponentially over time.

    So that leaves us with the the third part of the statement to resolve: what about the stuff we don't know we don't know? Well, by definition it's unknowable, but linked data gives the possibility of throwing up a surprise or two.

    Perhaps separate pieces of data that we thought up until now as having no relation to each other are perhaps related by unknown connections hidden away from our awareness. Through the magic of the Semantic Web we have the possibility of increased serendipity and the making of vital life-enhancing connections, in the same way that talking to a complete stranger can reveal a mutual acquaintance.

    We may never be able to get rid of the big nasty surprises, but having lots more fun small ones will always be welcome.

Reblog this post [with Zemanta]


About author

Tom Murphy's picture

Tom Murphy is a writer for the New Tech Post. He worked as a video journalist filming in many different environments throughout the world, specializing in current affairs and documentaries.