Thomas, M., Caudle, D. M., & Schmitz, C. M. (2009). To tag or not to tag? Library Hi Tech, 27(3), 411 – 434. doi:10.1108/07378830910988540

The purpose of this article is to provide a quantitative analysis of the extent to which folksonomies replicate the Library of Congress Subject Headings (LCSH) to see if folksonomies would successfully complement cataloger-supplied subject headings in library catalogs (411).

The authors studied a small sample of very popular books.  Their conclusions may not have application to scholarly collections and/or the long tail of less popular books in browsing collections.  On the other hand, their work provides very clear evidence for the advantages tagging can offer to items that receive many tags.

This ability to use tags to bring out different aspects of a resource is a major advantage of tagging over formal systems of classification and taxonomies (controlled vocabularies). According to Shirky (2005), traditional classification systems and taxonomies attempt to systematically organize knowledge by providing a single classification for a resource. Shirky (2005) argues that the free associations made by taggers and the folksonomies that spring from them are the only appropriate way to organize resources on systems as large and chaotic as the web for three reasons: classification fails to allow more than one place for an item; it is impossible to keep a classification system stable over time; and it is also impossible for an expert to truly predict how a user will search for something. (412)

Mathes (2004) contrasts folksonomies, which are flat namespaces lacking explicit relationships between terms, to more formal systems of classification and taxonomies, which have strict vocabulary control and multiple explicit relationships between terms which can be broader, narrower, or related to each other. Golder and Huberman (2006, p. 200), Mathes (2004), Spiteri (2007, p. 14), McGregor and McCulloch (2006, pp. 292-3), and Steele (2009) cite problems with folksonomies that stem from their nature as uncontrolled vocabularies. All uncontrolled vocabularies have the following problems: ambiguity and polysemy; synonymy or synonym control; basic level variation; and variations or lexical anomalies in the form of tags. … According to Macgregor and McCulloch (2006, p. 292-3), controlled vocabularies, on the other hand, take care of all these problems. They also preserve syntactic relationships between terms that are non-hierarchical (Macgregor and McCulloch, 2006, p. 293).

Mathes (2004) and Spiteri (2007, p. 14) also outline the weaknesses of controlled vocabularies which are strengths of folksonomies. Mathes (2004) distinguishes between browsing and finding in information retrieval. Browsing tags is great for serendipitous discoveries of related resources. That is a much different task than searching for every resource in a specific area using a specific term. Folksonomies directly reflect the vocabulary of users instead of using sometimes arcane terms supplied by experts (Mathes, 2004) (414).

There’s more of interest in the excellent literature review but I don’t want to reprint the whole thing here.  It’s not usable for my current paper anyway—I’d need to track down all the secondary sources and I’ve already reached my limit.

The percentage of tags that add value to the catalog can also be determined by adding up the numbers of tags in categories 3 to 6, since these categories represent tags that do not overlap the assigned subject terms (419).

This is looking for the opposite relationship Rolla explores—where tags depart from subject headings versus where the two overlap.

A common criticism of LCSH in library catalogs and controlled vocabularies in general is that assigned terms attempt to put the resource under just the main topic or topics and secondary topics are not covered. It is this either/or classification which is at the heart of Shirky’s (2005) argument in favor of folksonomies and against traditional classification. By retaining controlled vocabularies in catalogs, users still get the precision of a controlled vocabulary, and the addition of a folksonomy provides the ability to place a resource in more than one or two categories and to index the resource in depth. Users get the best of both worlds in a hybrid system (425).

Martha Stewart’s Wedding Cakes, a cookbook, had the fewest number of subject tags (14 tags); followed by I Am a Soldier, Too, a war biography (38 tags); and Heather Has Two Mommies, a children’s book about the child of a lesbian couple (87 tags). These books also had the highest percentages of tag matches with assigned subject terms in category 1. As stated before, these three books also had the highest percentage of subject tags in relation to other types of tags, all of which provides further evidence that users tend to reach consensus quickly on subjects. It is interesting that the users tagging these books used almost the same vocabulary to describe the books as the assigned subjects. LCSH is based on literary warrant, so perhaps it is not so surprising that the most common tags match the LC subject headings. This high degree of overlap between the tags and the subject headings indicates the controlled vocabulary, LCSH, and the rules for applying it provide sufficient access in these cases and it would take many more tags to achieve any additional benefit (429).

Conversely, it would appear that even less frequently tagged books can identify the subject of a book in a manner that agrees with LCSH.

These books and The God Delusion all had a large number of subject tags, numbering in the hundreds in each case. The consistency of these results may indicate that our findings show the benefit of tagging as a supplement to LCSH is more likely to occur when there are a large number of tags in the system (430).

The authors are focused on tags as a supplement to LCSH, not a replacement, and suggest that a large number of tags are required for a given item for those tags to add significant value beyond what LCSH provide.

Other tagging systems being used in library catalogs aggregate their tags from all libraries using that software (431).

Clearly there is a need for a mechanism of sharing across platforms.

As the above results show, our hypothesis was proven correct – social tagging does indeed augment the LCSH providing additional access to resources. Tags do supply additional vocabulary that could be incorporated into LCSH. A hybrid catalog combining both LCSH and a folksonomy would result in richer metadata and be stronger than the sum of its parts, giving users the best of both worlds. … The results also indicate that these benefits are best achieved with large numbers of tags (431).

