Ada Lovelace Day

About The Authors

Suw Charman-Anderson

Suw Charman-Anderson

Suw Charman-Anderson is a social software consultant and writer who specialises in the use of blogs and wikis behind the firewall. With a background in journalism, publishing and web design, Suw is now one of the UK’s best known bloggers, frequently speaking at conferences and seminars.

Her personal blog is Chocolate and Vodka, and yes, she’s married to Kevin.

Email Suw

Kevin Anderson

Kevin Anderson

Kevin Anderson is a freelance journalist and digital strategist with more than a decade of experience with the BBC and the Guardian. He has been a digital journalist since 1996 with experience in radio, television, print and the web. As a journalist, he uses blogs, social networks, Web 2.0 tools and mobile technology to break news, to engage with audiences and tell the story behind the headlines in multiple media and on multiple platforms.

From 2009-2010, he was the digital research editor at The Guardian where he focused on evaluating and adapting digital innovations to support The Guardian’s world-class journalism. He joined The Guardian in September 2006 as their first blogs editor after 8 years with the BBC working across the web, television and radio. He joined the BBC in 1998 to become their first online journalist outside of the UK, working as the Washington correspondent for BBCNews.com.

And, yes, he’s married to Suw.

E-mail Kevin.

Member of the Media 2.0 Workgroup
Dark Blogs Case Study

Case Study 01 - A European Pharmaceutical Group

Find out how a large pharma company uses dark blogs (behind the firewall) to gather and disseminate competitive intelligence material.


free page hit counter



hit counter script


All content © Kevin Anderson and/or Suw Charman

Interview series:
at the FASTforward blog. Amongst them: John Hagel, David Weinberger, JP Rangaswami, Don Tapscott, and many more!

Corante Blog

Sunday, January 30th, 2005

More on Technorati tags

Posted by Suw Charman-Anderson

Over on Burningbird, Shelley has written a great summary/analysis of the current thinking on Technorati’s tags. It is beautifully written, sports some wonderful photographs, and is well worth reading. I’m not even going to attempt to summarise it here, because to do so would be to be like reinventing the wheel in triangular shape - pointless and nowhere near as good as the original.

The thoughts that follow are an elaboration of the comment I left on Shelley’s post, so if you read that then some of this may seem eerily familiar.

As I said on my previous post about Technorati tags, I can’t help feeling that we’re really only at the very beginning of the creation of meaningful tagsonomies and tagsonomical tools. Technorati’s implementation of tags is one step on a long road, but until we can sort by what Technorati calls ‘authority’ (but which is really a sort of popularity), pull the search results in to our aggregators by RSS, search using Boolean operands on multiple tags and do all sorts of complicated bespoke filtering, tags will remain a bit of a kludge.

Tags are, at the moment, at the ’sledgehammer to crack a walnut’ stage, and there’s a lot of work to be done before we get it refined down to the toffee hammer stage.

A big issue is obviously implementation. People are lazy - I certainly am and I am sure I am not alone. Until we have a way to automatically tag or create tag suggestions that can be approved or disapproved by the user, we are going to have to rely on people bothering to tag their posts, and we’re going to have to put up with the way that the variable quality of their metadata affects this metadata-reliant system.

Of course, we have movement in that direction in terms of the various tagging tools which have sprung up with impressive rapidity. Ecto supports tags using the Custom Tag facility - just create a custom HTML tag with the code below and it will automatically create a tag from the selected text.

<a href="http://technorati.com/tag/%*" rel="tag"></a>

Stephanie Booth has created a plug-in for Wordpress, and there is of course the Oddiophile bookmarklet I have mentioned previously. All good starts, but they still require the blogger to bother using them and think clearly about which tags are relevant. As Shelley and others have noted, people are not necessarily very good at creating accurate tags - even people knowledgeable in the area of taxonomy and metadata don’t always create good tags for their own work.

That said, I think there are a few uses for which tags, even as they stand, beat every other system hands down, and one of those is classifying posts by language. At the moment, there really isn’t a consistent way to mark blogs or blog posts by language and that makes it very difficult if one is interested in finding blogs in a given tongue.

If I wanted to find blogs written in Welsh, then I have a bit of a challenge ahead of me. I can search in Google for ‘blog cymraeg’ but all that gives me are blog posts which use the word ‘cymraeg’, so if the post is in Welsh but doesn’t mention the word ‘cymraeg’ it’s not going to show up. For more popular languages, I can choose which language Google should search in, but that still means I need to pick some keywords to search on.

There is a similar problem even with specialised blog search engines, including the keyword search on Technorati - they all search content. I’m no metadata expert, but I see a clear difference between metadata that describes the contents of a post, i.e. what it is about, and metadata that describes the format of the post, such as what language it is in.

By allowing people to add format metadata, tags give bloggers the power to describe aspects of their posts that would not be accurately reflected by keywords selected from the content. Tagging all Welsh posts with ‘Cymraeg‘, for example, allows anyone interested in Welsh blogging to locate the most recent posts in that language, regardless of what those posts might be about.

Using tags to make up for this shortfall in existing blog metadata, we can then use Technorati as an engine for discovery (as opposed to search) within a set of given criteria. At the moment there is just no other way to do this.

Tags may be a bit kludgy at the moment, but because they are capable of filling a gap in the way we locate blog posts that may be of interest, I think they are going to be with us for the long haul.

, , , , , , ,

Email a copy of 'More on Technorati tags' to a friend

EMAIL THIS ENTRY TO A FRIEND



Separate multiple entries with a comma. Maximum 5 entries.



Separate multiple entries with a comma. Maximum 5 entries.





E-Mail Image Verification

Loading ... Loading ...

7 Responses to “More on Technorati tags”

  1. Alex Barnett Says:

    Great thought on the language potential of Tags. I can see this being really useful on the Tags site (I can’t read Welsh).

    Alex.

  2. pjm Says:

    But, if one were to invent a triangular wheel, it wouldn’t be pointless… it would be quite pointy, in fact. (Just not terribly useful.)

    Heh.

  3. Suw Says:

    Aaah I set ‘em up… ;-)

  4. Gahlord Dewald Says:

    I’m a bit slow to all the RSS stuff, so forgive me if this is just too newbie…

    But there is a meta-tag in the header for you to identify your document’s language. Does Welsh not have a code for this?

    Perhaps a bit of background here: http://community.roxen.com/developers/idocs/rfc/rfc1766.html

    —-
    Ok I couldn’t help but find the solution to your Welsh searching woes… Welsh is an option for the language meta-tag when you code you page. You can see a full list of languages at:

    http://www.oasis-open.org/cover/iso639a.html

    —-

    So including the tag

    in the header section should identify your page as Welsh. Obviously the usefulness of the search will depend on who uses the tag. And, moreover, on Google adding two or three lines of code to it’s advanced search so that it supports the language…

    I guess, getting to my point (apologies for taking awhile to get there): there already is a method of tagging each page for a wide list of languages (and it includes codes for private-use if you don’t want to mess with IANA or ISO or W3C/WTF). Is there perhaps a useful way to take advantage of this?

    Cook up a search engine that is focused on a full implementation of understanding the language codes that are already in use and develop a truly multi-lingual/global-reach version of Google? Especially for obscure languages? Someone out there must be in search of a graduate thesis.

  5. Suw Charman Says:

    For an HTML document, yes, you can use metatags to describe the language of that document using the language code - cy in the case of Welsh.

    With some blog software that uses templates, like Movable Type or Blogger, it would be possible to insert the metatag into the template, and some hosted blogs allow you to add custom metatags to the header. But in both cases that indicates that the whole blog is written in the given language(s), and doesn’t indicate which post is in which language.

    There would be no point inserting the metatag by hand into the HTML of every post you wrote, because it would then be in the body of the page, instead of the header, and not readable by search engines as a metatag.

    One could use divs in the way that Stephanie Booth’s ClimbToTheStars (http://climbtothestars.org/) does, e.g.:

    <div class=”post” lang=”fr”>
    <div class=”post” lang=”en”>
    <div class=”other-excerpt” lang=”fr”>

    But that is not an answer for everyone - these sorts of fixes just don’t have legs for two reasons:

    1. You need to be capable of coding your own fix for your own blog software, or using software that someone else has created a plug-in for. That rules out using hosted blog platforms such as Typepad or Blogware that are not amenable to third-party plug-ins.

    2. Once you have sorted out some way to insert the relevant language metadata, there is no way for people to get any serious use from it. I don’t know of any search engine that has a comprehensive list of languages within which one can search, and specialised blog search tools have yet to address the issues of multilingual (or, in many cases, even non-English) blogging.

    What we need is for the main blog software and hosting providers to use existing metadata standards to add in the ability to metatag individual posts (and excerpts) by language, as well as being able to pick multiple languages for metatagging the blog as a whole. (Mine would need En and Cy, for example, with the occasional post in Pl.)

    We then need the search engines and blog search tools to pick up on this and provide the ability to search in any language.

    Trouble is, I just can’t see that happening any time soon. Why? Well, most blogging software is created by monoglot English developers for a monoglot English audience and therefore the incentives to develop multilingual support are few. They already have other features to develop that are more important to a bigger number of people. I’m not excusing them, but that’s just how I suspect it works.

    This is why tags are powerful - they are here, now, and they are easy to use.

    Of course, if someone wants to cook up a decent search engine that can make full use of the language metatags, then I’m all for it. I could use a search engine like that. But for the moment, our choices remain limited.

  6. Richard Soderberg Says:

    First, the restrictions placed upon me by my TypePad account prevent me from modifying the HEAD section of the HTML of my web pages, and the META tag that indicates the content type of the page MUST be placed inside the HEAD.

    Second, how do I indicate that a BLOCKQUOTE contains Welsh, while the article contains English?

  7. weaverluke Says:

    Of course, the emergent properties of the blogosphere will help even in the absence of well-formed meta-data: people who blog with particular language combinations will get known in the cultural contexts they write for simply by dint of their interactions with other bloggers and readers.