One of the promises always made by consultants and system providers offering integrated audience solutions is “the single view of the customer.”
“You’ll tie together all of the ways you reach your audience—publications, newsletters, web traffic, webinars and events—and be able to see how each unique member of the audience engages with you.”
It’s a promise that puts stars in the eyes of b-to-b media executives. But beyond a few mysterious references to “complex algorithms,” very little gets said about the nuts and bolts involved in creating that single view. Namely, how do you match up all those engagement records from dozens of files and registration systems to make sure you correctly identify each unique individual?
Identifying the single customer used to have a much less glamorous name—de-duping. One of the most basic steps in the controlled circulation audit process was to check for duplication, and catching duplicate subscribers was (and still is) an obsession for managers and fulfillment companies. We even had inside jokes about it: One Halloween, the circulation staff at a big bto-b publisher dressed the same, wore name tags with their director’s name slightly misspelled, and called themselves collectively the "Dupes of Earl."
We would de-dupe lists for new subscriber promotion against our existing database, and de-dupe the responses again as we added them to the file. Then we would run a suspect dupe match and do a clerical check to identify duplicates the computer hadn’t caught. After all that, the auditor would find yet more duplicates—although we hoped few enough to keep within BPA’s (supposedly top secret) auditing tolerance.
What makes de-duping such a challenge? Well, to start with, we have all the normal variations in names and addresses that slip past a computer match. But in the b-to-b audience, it isn’t enough to pin down the same name at the same address. What about individuals who had offices at more than one of their company’s plants? Those who switched jobs mid-year and showed up at two different companies? People who got magazines at their home address as well as the office?
Knowing how much work it takes to identify duplicates in a controlled circulation list of 50,000 or so, I’m always floored when database companies brush by the question of how they will match up unique individuals across several million records drawn from multiple sources. And I’m even more dismayed when some admit that they simply rely on the new “unique identifier”—an email address.
At first it makes perfect sense: While everyone at the same business location shares one mailing address, each has a separate and unique email address. You cannot even create identical email addresses that point to two separate, unrelated inboxes. If you have one email, you have one individual. Right?
Well, half right. Yes, one email address equals one inbox. But a surprising number of businesses still have inboxes shared by multiple individuals. And a much larger number of individuals use multiple addresses. In my last company, I did an analysis of the 1.5 million records in our corporate database, and discovered that nearly 20 percent of our audience members had more than one email address on file with us. And 4 percent of our email addresses were connected to more than one individual at the same company.
If we had assumed each email represented a unique individual, we would have been hugely misled about the true size of our audience.
For the purpose of sending out an email blast, it may be enough to de-dupe by address only. But to accurately identify one individual across many points of engagement, and deliver a single view of each customer, we can’t avoid the tough, old-fashioned work we learned with controlled circulation. Accurate identification takes multiple levels of matching, multiple points of contact information, and usually some clerical clean-up after those “complex algorithms” have done their best. It’s not glamorous, but it’s a challenge we have to face to make our databases accurate and reliable.
Who knows? Maybe someday, at a Halloween party, we’ll see an integrated audience database team dressed up as the Dupes of Earl.