There is a dark secret in the world of identity resolution that most data platforms don’t like to talk about. It’s one of those things that has just enough truth in it to be legitimate, but only technically, and this secret revolves around one simple word: “Deterministic.” Vendors use this word to throw you off — to convince you that the quality of their matching capability is somehow greater than it actually is.
They sometimes go so far as to cast aspersions on probabilistic matches as “fuzzy” or “unreliable.” The truth: there is no such thing as a purely deterministic match.
A person can be identified by any number of characteristics. The problem is that none are perfectly persistent or consistent. Maybe someone wants to use your name to identify you? Sure, it identifies you, but probably others as well. A physical address? Yeah, much more specific but you move between those quite often. An email address? It depends. Personal email addresses don’t lose relevance as fast as business addresses on average, but you probably have more than one personal email address and you go through a new work email address every time you change jobs.
What’s more is that even when someone HAS a deterministic match, they still have to ask the fundamental question: does the email or cookie or phone number actually represent the person I’m looking for? Each of these questions requires extensive consideration about relevancy, accuracy, and persistence; all are qualitative as much as they are quantitative in the real world.
“Identity resolution is really hard for two reasons primarily: 1). Scale is really hard with big data. When you have a relational database model, you can never recalibrate it. It’s too intensive to do in linear time. Your only option is to go in observationally and apply individual changes. 2). Quality suffers at scale because change is both rapid and undefined. You can’t stop new data from coming in, and you can’t know how fast, from where, or at what quality before it arrives”
~Scott Brave, CTO at FullContact
In a relational model, a veritable operating system of business logic and heuristics grows like cancer through the system over time until it seizes up under the weight of its own processes. The only way to truly combat this inevitability is to build the system in a completely different way.
A New Light
Data must be compiled as discrete fields into interdependent nodes and edges in a probabilistic matrix where the strength of those relationships and the quality of those characteristics are determined through observable test results. Then, as new data enters the system, it must be assessed in real-time using a layer of intelligence that gets smarter and more accurate as the matrix increases in size.
FullContact’s patented identity graph system is built on this vision. From the beginning, FullContact recognized the futility of growing a relational scheme; instead, approaching the identity problem through a graph-centric data architecture that never forces records into independent rows. Characteristics become nodes, relationships become edges, and the probabilities of accuracy and relevancy to an individual are assessed through a Bayesian logic layer. That approach guarantees that over time they’ll never have to trade either key performance indicator for speed.
Case in point: Scott Brave, CTO at FullContact states, “In our model, we can completely ‘re-compute’ the entire graph in a matter of hours while maintaining the utmost accuracy.” That’s saying something when the graph is updated over 30 million times per day and consists of over 3.5 billion independent pieces of identifiable information representing literally hundreds of millions of human beings.
The reason this all qualifies as a “dark secret” and not just another academic or highfalutin rant from a big data analyst is that the stakes for businesses are higher than ever. We live in a relationship-centric world with our customers. We are all inundated by thousands of brand messages and communications and advertisements every day, and we’ve all come to expect more from the brands and businesses we associate with as a result. The noise is louder than ever, and breaking through that noise requires knowledge, even insight. Identity resolution is the precursor to all of it.
Whether you are sending a list of hashes to a DMP for activation, social usernames to an Agency for segmentation, emails to a CRM for inclusion into account records, any amount of value those channels generate is predicated first on the depth and accuracy of the identification performed on the way in.
So the next time a data vendor attempts to assuage your fears by saying, “Don’t worry, we match your data deterministically,” you should ask… then how sure are you that it’s right?