Skip navigation

Tag Archives: search

birds

While there has been much breathless speculation about the threat that Twitter poses to Google’s iron grip on the search market, I haven’t seen many specific ideas about how that would work. By now we all know that searching real-time content is an increasingly valuable resource for information, but its utility seems to be limited to certain kinds of information like breaking news events and consumer product reviews.  It certainly doesn’t appear to be as flexible a mechanism for locating the wide range of content that traditional search has been since the inception of the world wide web.

The most intriguing speculation that I’ve read about the future of search has been this post about the notion of ‘PageRank for People’. In essence, the idea states that the current algorithms that govern the ranking of search results are inadequate because they rely too heavily on the location of the content being listed. Since Google relies on the volume of inbound links to judge the value of content, it favors content posted in popular locations. The thing is, a piece of good content is just as valuable if it is posted to ‘Bob’s Blog’ or the ‘New York Times’. The solution that was proposed, was a system that factors the ‘reputation’ and ‘authority’ of the content’s author when ranking search results. Just how to calculate these numbers, though, is the tricky part. One answer might be Twitter.

There are two aspects of Twitter that need to be changed. Both the ‘Retweeting’ and ‘Hashtag’ behaviors need to be provided as official features of the service. That means the mechanism for these actions needs to be separated from the 140 character text string of each tweet. That is to say, we should lose the ‘RT’ syntax and make the identity of the original poster some form of metadata that exists outside the post itself. Likewise, tags like those currently labeled with hash-signs ‘#’ should be saved as metadata separate from the actual tweet. With those two changes, searching the web could become a lot more useful. Here’s how it would work:

When I publish a post to Twitter, I should have the ability to tag that post with semantically accessible identifying labels. Each post associated with a given subject (as described by the label) potentially contributes to my ‘authority’ about that subject. Now, when someone retweets my post, they are in essence endorsing what I have said and contributing to their own ‘authority’ about that subject. If there was a scoring mechanism that assigned, say, one point for each endorsement, then you’ve created a system that establishes quantitative values for ‘authority’. Imagine it this way:

tweets

For each retweet, additional points are cascaded up the tree. That way the original poster is always given the most credit for contributing the idea, but those that help propagate it are given credit as well.  Authority is defined by the community, not the individual. It should be pointed out that for each subsequent retweet, the poster will have the opportunity to revise the tagged metadata, either adding more detail, or removing labels they believe are not appropriate. That way, the system guards against abuse by so-called trend-squatting.

Now, once we start getting values assigned for Twitter users’ authority on specific topics, search engines can start factoring this in to their rankings. So, content authored by an individual with higher authority for the subject of that content are favored over others. Content authored by organizations might be scored, in part, by the collective authority of that organization’s members. It would create a tremendous upward pressure to contribute value to the community. For some industries, I imagine that one’s scores in this respect would become a factor in employment decisions or compensation levels.

Now,  what I’ve described would need to be only one part of the search ranking algorithm. As described by the original post, there are many other factors that should be considered. Additionally, the scoring mechanism described above is probably far too simplistic and vulnerable to abuse. For example, one complication that could make the system more reliable would be to consider the reputation of the endorsing party when assigning a value to the score that their retweet provides the original poster. That way, it is more valuable to get retweeted by individuals with more authority. Note that ‘authority’ as I’ve described it is an entirely separate metric from ‘popularity’, which is defined by the number of followers that a user has.

There would be many additional side-benefits of such a system. For example, much of what is posted on Twitter are links to content elsewhere on the web. A robust labeling function would turn Twitter into a tagging system for the entire semantic web.

Here is a related discussion.

Advertisements