Google is great at finding things. But when it comes to putting things away in their right place, they’re worse than your toddler.
That’s the opinion of Geoffrey Nunberg, adjunct full professor at the School of Information at the University of California at Berkeley. His article presents numerous examples of the horrific job Google has done classifying and assigning proper metadata—the descriptive information attached to content that makes it searchable and usable—to books scanned into the Google Book Search project. One example: Google assigned a publication date of 1905 to a book on Peter F. Drucker—four years before the management consultant was born. Another: a search for books about Barack Obama written before 1950 yields 29 results. Barack Obama was born in 1961. These are not isolated “howlers,” Nunberg argues, but symptomatic of broad-based metadata errors.
Why is Google, the giant in intelligent, algorithmic search, such an unmitigated disaster at classifying text-based content? Shouldn’t this be an arena where they excel? The following example illustrates why the answer might be, “No.” When Google scanned a 1890 guidebook called London of To-day from the Harvard University Library catalog, it assigned a publication date of 1774, a 116-year oversight. It picked up the faulty year from a front-matter ad for an apparel manufacturer containing the line “Est. 1774″. Because Google’s classification process is a product of math and machine, not library and scholarly expertise, mistakes are all but assured.
This is a pretty big deal. Incorrect metadata is annoying for the casual web user. But for educators and scholars, improper metadata can render large swaths of data undiscoverable and unusable. And because Google is becoming the world’s de facto library, it is incumbent on producers and consumers of content to hold Google accountable for proper content classification, or take a more active role in tagging their own content.
For our clients—producers and publishers who market their content to the academic community—the lesson is clear: organizing and tagging your content with accurate, relevant metadata is the most vital step you can take to boost online presence, discoverability, and, ultimately, sales. Our metadata creation services involve a human component, not only because we work largely with video content (even more challenging to classify than text), but also because we understand the limits of math and machines.
Borrow and share lesson ideas with other instructors or make a comment.