Associated Press Boosts Data Mining Power of Its Archives
MarkLogic to provide data tools across more than 500 million pieces of content.
Big content publishers have been discovering more ways to leverage their products beyond the per-article level. With millions of pieces of content stored away, access to the collective data across that content can add a whole new layer of value for the publisher. Accordingly, the Associated Press has partnered with database technology provider MarkLogic to add a more powerful and efficient content analysis tool to its archives.
"With this new tool, we are able to run complex, Boolean searches across millions of articles in our content archive and get back precise returns in seconds or minutes instead of days or weeks," says AP’s vice president of information management Amy Sweigert in a statement. "The application, which helps create customized datasets for AP customers, will also allow us to enrich our archives to make the content more valuable to the business."
In addition to making its stories available to publishers around the globe, the AP also has a b-to-b group that sells data access to its archive of over 500 million pieces of content. The company, however, needed a solution that offered more targeted search and data packaging capabilities on a much faster platform.
"In the past they had to piece together a relational database with a bunch of metadata that they had to query manually," says David Gorbet, vice president of product strategy at MarkLogic. " They couldn’t do targeted, specific queries in an efficient way."
The ability to package and sell data that’s extracted from archives is one way to squeeze more value from content, but Gorbet adds that publishers are also becoming more interested in analytics and want to understand usage and query patterns against their content. Those analytics go a long way in helping the publisher create new content products—something the AP wants to do with its new system, too—and serve up content in realtime that fits certain customer profiles.