Our recent work at the Labs has focused on semantic listening: systems that obtain meaning from the streams of data surrounding them. Chronicle and Curriculum are recent examples of tools designed to extract semantic information (from our corpus of news coverage and our group web browsing history, respectively). However, not every data source is suitable for algorithmic analysis–and, in fact, many times it is easier for humans to extract meaning from a stream. Our new projects, Madison and Hive, are explorations of how to best design crowdsourcing projects for gathering data on cultural artifacts, as well as provocations for the design of broader, more modular kinds of crowdsourcing tools.

RCA ad

Madison is a crowdsourcing project designed to engage the public with an under-viewed but rich portion of The New York Times’s archives: the historical ads neighboring the articles. News events and reporting give us one perspective on our past, but the advertisements running alongside these articles provide a different view, giving us a sense of the culture surrounding these events. Alternately fascinating, funny and poignant, they act as commentary on the technology, economics, gender relations and more of that time period. However, the digitization of our archives has primarily focused on news, leaving the ads with no metadata–making them very hard to find and impossible to search for them. Complicating the process further is that these ads often have complex layouts and elaborate typefaces, making them difficult to differentiate algorithmically from photographic content, and much more difficult to scan for text. This combination of fascinating cultural information with little structured data seemed like the perfect opportunity to explore how crowdsourcing could form a source of semantic signals.

There were endless data points we were interested in collecting, but the challenge was how to do so while keeping our audience engaged. A very complex task might garner us a wealth of data in theory–but not if no one, in reality, wanted to do it. Further, we could try to punch up an extremely dry task with external incentives for the most active contributors–but we risked having our system gamed by those who simply wanted the rewards, potentially setting ourselves up for a bank of incorrect data. In order to avoid these problems, we took an approach that centered around reducing friction as much as possible, and that limited gamifying elements in favor of highlighting the interesting parts of the task at hand.

We settled on a set of design principles to reduce friction as much as possible, inspired by previous cultural and science-oriented crowdsourcing projects (such as the Zooniverse projects, the NYPL’s Building Inspector and The Guardian’s MP’s Expenses):

  • Add more tasks rather than making one task very complex. Keeping tasks clear and streamlined meant that the user didn’t have to constantly switch contexts, and could knock out a few assignments in a row without having to think too much about it.
  • Make the tasks self-explanatory. Building off the first concept, a task whose question is obvious is much easier to answer than one that requires looking at and interpreting instructions.
  • Design for a variety of use cases. If the tasks are simple and modular, chances are they can be done on a variety of devices in a variety of situations. Our Find task works nicely on mobile, and can be done in a few seconds while waiting for the bus; our Transcribe task is much more oriented to someone at home on a desktop computer, looking for a way to spend 10 minutes.
  • Permit anonymous contributions. Asking users to sign up first thing would give them an excuse to leave the site. By letting users contribute without logging in, we allow them to try it out and get into it before they commit to creating an account.

We also purposefully chose an approach that downplayed gamification in order to place the  the fun part of a potentially dry process at the forefront: discovering and sharing interesting cultural items and artifacts with your friends. In their paper on their Menus project, the NYPL points out that, for cultural institutions, “the incentives reside in the materials themselves and in the proposition of working in partnership with a public trust.” Rather than trying to tempt a user into participation with external, material rewards, we aimed to design a system whose biggest rewards came from engaging with it: namely, discovering and sharing a piece of culture that probably only a handful of people have seen since its original publication. This has a double benefit: because the rewards of Madison are largely in the delight of finding new things (not earning points, climbing a leaderboard, completing missions or getting badges), there is little incentive to game the system with bad information, in turn bolstering our confidence in opening up the project to anonymous users. Madison also has built-in validation criteria that require agreement from a number of contributors to ensure that the ads are annotated with correct metadata.

These choices formed the basis of Madison, and also shaped the platform underneath it: Hive. Hive is a modular, open-source framework for building crowdsourcing projects like Madison with any set of assets. We will be sharing more information about Hive in an upcoming blog post!