Ada Lovelace Day

About The Authors

Suw Charman-Anderson

Suw Charman-Anderson

Suw Charman-Anderson is a social software consultant and writer who specialises in the use of blogs and wikis behind the firewall. With a background in journalism, publishing and web design, Suw is now one of the UK’s best known bloggers, frequently speaking at conferences and seminars.

Her personal blog is Chocolate and Vodka, and yes, she’s married to Kevin.

Email Suw

Kevin Anderson

Kevin Anderson

Kevin Anderson is a freelance journalist and digital strategist with more than a decade of experience with the BBC and the Guardian. He has been a digital journalist since 1996 with experience in radio, television, print and the web. As a journalist, he uses blogs, social networks, Web 2.0 tools and mobile technology to break news, to engage with audiences and tell the story behind the headlines in multiple media and on multiple platforms.

From 2009-2010, he was the digital research editor at The Guardian where he focused on evaluating and adapting digital innovations to support The Guardian’s world-class journalism. He joined The Guardian in September 2006 as their first blogs editor after 8 years with the BBC working across the web, television and radio. He joined the BBC in 1998 to become their first online journalist outside of the UK, working as the Washington correspondent for

And, yes, he’s married to Suw.

E-mail Kevin.

Member of the Media 2.0 Workgroup
Dark Blogs Case Study

Case Study 01 - A European Pharmaceutical Group

Find out how a large pharma company uses dark blogs (behind the firewall) to gather and disseminate competitive intelligence material.

free page hit counter

hit counter script

All content © Kevin Anderson and/or Suw Charman

Interview series:
at the FASTforward blog. Amongst them: John Hagel, David Weinberger, JP Rangaswami, Don Tapscott, and many more!

Corante Blog

Thursday, May 18th, 2006

Xtech 2006: Ben Lund - Social Bookmarking For Scientists

Posted by Suw Charman-Anderson

Connotea, social bookmarking for scientists.

Why for scientists? Obviously, scientists and clinicians are a core market. doesn’t exclude others, but concentrating on users with a common interest they could increase discover benefits. Hooks into academic publishing technologies.

Connotea is an open tool, is social so connects to other users, and has tags. But what it does is identify articles solely from the bookmark URLs. So it can pull up the citation from the URL - title, author, journal, issue no. page, publication date. This is important for scientists.

Way it does it is by ‘URL scanning’. So user is on a page, e.g. PubMed which is a huge database of abstracts from biomed publications. When the user clicks ‘Add to Connotea’, this opens a window, it recognises that this is a scholarly article, and imports the data.

Uses ‘citation source plug-ins’ - perl modules for each API. It asks each plug-in to see if it recognises the URL and when it does it goes and gets the information which then associates it with the bookmark in the database.

[Now runs through some programming stuff.]

Bookmarks on a lot of these scientific resources are far from clean or permanent and have a lot of session data in. So this needs cleaning off.

So what’s important? Retrieval and discovery. Already has tagging for navigation. Also has search in case there are some articles that haven’t been accurately tagged.

Provides extra link options for bookmarks. Main title links to the article, say in PubMed; but there are links to other sources for this article, e.g. to the original Nature article; plus other databases, and cross-referencing services.

System also produces a long open URL with all the bibliographic information in it.

Now … the hate.

First hate:

- poorly documented and poorly implemented data formats. Variety of different XML schema. Liberal interpretations of standards.

Second hate:

- have to do lots of unnecessary hoop-jumping to get this data. Lots of pinging different urls to get coookies, POSTs, etc.

Third hate:

- have to do everything on a case-by-case basis. have to reverse engineer each publisher’s site . have to write ad hoc rules and custom procedures for each case.

A wish

Nature release a proposal called OTMI, open text mining interface - wants to make Nature’s text open for data mining, but not the articles themselves. So researchers looking for raw XML for doing data mining research, but ever time someone asks they have to make ad hoc arrangements for each case. So OTMI does some pre-processing to make the data more usable.

Publishers could choose to be supported by Connotea and remove the need for them to reverse engineer. Publisher just puts a link through to an ATOM doc with the relevant data in so that the citation can be easily retrieved.

Blogs already do autodiscovery of ATOM feeds, so can test idea using a citation source plug-in for a blog. It works, so can treat any source as a citation, but only whilst the post is still in the RSS feed.

Another wish

Citation microformat. Connotea would work really well with a citation microformat, so is going to look into that.


How to do URL to metadata

- manual entry

- scraping the page

- recognise and extract some ID, Connotea does that, but it doesn’t scale to the whole web.

- follow a metadata link from page, this is the blog plug-in

- parse the page directly, not possible yet.

Useful not just for Nature as publishers of data, but also anyone else who wants to be discoverable and bookmarkable.

Nature blog about this, Nascent.

Email a copy of 'Xtech 2006: Ben Lund - Social Bookmarking For Scientists' to a friend


Separate multiple entries with a comma. Maximum 5 entries.

Separate multiple entries with a comma. Maximum 5 entries.

E-Mail Image Verification

Loading ... Loading ...

One Response to “Xtech 2006: Ben Lund - Social Bookmarking For Scientists”

  1. Kevin Marks Says:

    There is already a lively discussion of a citation microformat going on, and Connotea and indeed others are very welcome to join in and help it to converge.