The New York Times, like many other organizations, has an extensive archive of physical documents that don't yet have sufficient digital representation. Our photo archives in particular have beautiful high-resolution imagery from over a century of journalism. While we have digital versions of these images, they are scans of halftone, low-resolution images from the newspaper and lack metadata about themselves.
Lazarus is a system for enriching high-resolution archival photo scans with extensive metadata in order to treat images as first-class digital citizens. Computer vision processes match scans of original photos with their halftone versions from the archive, thereby linking the scan with topic tags from the associated article, as well as additional metadata about publication dates and compositional information. By creating these connections, we can develop a rich, digitized archive of visual journalism that can be used to enhance our existing archive and create new experiences based on our extensive photographic history.