Fact heaps, searching and the rolling encyclopaedia

Conditioned by the rythmn of daily newspapers and nightly television bulletins, we think of news as a rolling thing, constantly renewed, refreshed and updated. Twenty-four-hour news channels speed up the cycle, but don’t change that idea of news as the latest version of the story. The newspaper you hold and read is only that latest slice of information; so is a broadcast news bulletin. And anything that isn’t the latest version is dead and gone, waste material.

When I worked at a newspaper with a large website I began to wonder if this idea of news would change. A website is a “rolling” platform in the sense that it can be updated with news quickly and many millions of users go to news sites for just that. But such a site is also something else: a vast store of data that isn’t news any more, a giant heap of facts and judgements. If you want to go deeper into a subject or backwards in the sequence of events, in theory you can. Newspapers and broadcast are news in two dimensions; digital adds a third with its ability to drill downwards, sideways into the information. So a major news website is truly something more like a rolling news encyclopaedia: topped up all the time, but with added depth and uses which newspapers and broadcast don’t have.

Potential depth. The leading news sites have hardly begun to exploit this asset, which grows every hour with the addition of more news. The New York Times chooses to do this by literally organising its material in reference-book form in their “topic pages”. But the material is confined to what’s been published in the Times. Various software companies offer programmes which automate the business of cutting archive material into topical strands. I guess that Daylife is one of the best known. The general consensus, floating on a tide of Google-style optimism, is that software will crack the problem. I began to wonder about this while reading two reflective pieces around this subject by Jonathan Stray and Felix Salmon.

Most of the automated versions I’ve seen just aggregate material: they tack together in one strand all the previously published material on a subject. This is fine but often unsatisfying. There is a vast amount of repetition, which becomes time-wasting and aggravating very quickly. If you’re lucky, the site you’re looking at may have done a “new readers start here” Q&A or an “explainer”; if you’re trying to catch up, that should help. But the problem with moving stories is that they move and those movements often change the attempt to explain what’s happening. Even explainers go out of date. Here’s an example of a bouncy explainer in Mother Jones on Libya (I like “why can’t anyone agree on how to spell Qadaffi’s name?”) which goes awry when it tries to add “updates” below. The reason I’m reading an explainer in the first place is that I don’t want a daisy chain of disconnected update fragments. Integrated information makes better sense.

Most of the attempts to square this circle and present illuminating background material in a useful form have been tech-led. The attempt to automate is understandable and may give better results over the long run. But there’s a false premise there which is starting to make these projects look like experiments in making moonbeams out of cucumbers.

Nobody has yet calculated an algorithm to replace judgement. Explaining context and cause requires judgement because it involves risk: risk that an assessment of importance might be wrong, cause and effect wrongly linked or whatever. I think we may find that only humans can do that. With the power of search, getting more and more sophisticated all the time, they can do vastly better than any editor or writer could do 20 years ago. The key will lie in the most effective mix of human judgement with useful software. If the new fashion is to train people to “curate” news sources, how about extending that to cover curating the news archive so that it responds to what people need?



Tags: , , , ,

Comments are closed.