Your contacts are aging. No, I’m not referring to people getting older, but maybe I should be. As people age they make changes in their lives. New jobs, new phones, new spouses (and thus new names) and new addresses are simply a fact of life. If you want to build a contact management system, you have to embrace this fact at the deepest level.
I like to say that data is always old. It’s just a question of how old. Let’s say that your contact management system stores contacts in some kind of database. You want to make sure that this database has the most recent version of your contacts, but changes to contact information can come from anywhere. It’s an entirely separate challenge just to digitally capture the changes that are happening to your contacts out in the real world. (This is why we build products like FullContact Card Reader). One way to start is to integrate with the digital address books that people already have. There are tons of these: email systems like Gmail, Yahoo, Outlook.com, social networks like Facebook, Linkedin, Twitter, AngelList, Instagram. The list goes on.
Keep a copy
To keep up to date with each of these, you need to keep a copy of what your system thinks the 3rd party system looks like. For example, let’s look at Gmail. When your users first connect their Gmail account, your system needs to make a copy of their entire Gmail contact list. At first, this is a simple fetch operation, but it’s usually not realistic to fetch every single contact from Gmail every time you want to check for updates. Some people have 20,000+ contacts in Gmail and don’t want to wait around while your contact management system laboriously checks every one for changes.
Thankfully, Gmail’s API provides a method to check for changes that occurred after a given time. This reduces the amount of data that you have transfer by an enormous amount. Only the actual changes are transferred. You’ll have to deal with annoyances like rate limits and expired keys, though.
The first problem with syncing data stores of any kind is managing concurrent modifications. Imagine this timeline:
- I create a contact in FullContact.
- FullContact copies that contact to Gmail.
- My phone gets a copy of that contact from Gmail.
- I change the contact on my phone.
- Before the change from my phone can make it all the way back to FullContact, I change the contact in FullContact.
- FullContact tries to send the change to Gmail but discovers that the Gmail version has changed!
What should FullContact — er… your contact management system — do? First you want to isolate changes to contacts so you can detect these conflicts. You’ve already built a great contact versioning system, so let’s take advantage of it.
You should keep a separate version history of an address book for each source of changes. This isolates the problem of merging contacts from multiple sources, which deserves (and will get) a blog post of its own.
Having separate version histories simplifies things, because it’s now much simpler to determine the content of a given change. You can simply compare the two versions of a contact with a known common ancestor and resolve any conflicting changes using some simple heuristics. This is called 3-way merge, and is a staple of any version control system.
There are many more approaches you could consider, the most exciting of which is based on Operational Transforms, which has a long history of successfully managing concurrent edits to data systems.
When your system asks Gmail, “have there been any changes since XXXX?”, that’s a poll. The longer you wait between polls the less current your user’s contacts will be, so you want to poll as fast as possible, like 300 times per second. Except that’s not realistic. Polling wastes server and network resources whenever the answer to your question is “no changes”. Our friends at Zapier found that 98.5% of all polling seen through their system amounts to wasted traffic.
There’s a better way, and it’s called a “webhook”. A webhook is simply a subscription for updates to a given contact. Hypothetically, if Gmail supported webhooks, you could register a webhook with them, saying, “whenever a contact changes, send it to me at this URL”. Very few third party systems support webhooks yet (Facebook is a notable exception), but they are an incredibly powerful way to keep systems up to date.
Next up: Merging
Keeping a separate copy of each user’s 3rd party address books is great, but it’s a pretty lame user experience. Users don’t want to see 4 separate copies of the same contact, all from different sources. They want to see just one unified contact. In order to pull this off, you need to be an expert at merging contacts. But before you can merge these separate copies, you need to somehow detect that they are the same person. Merging two different people is a great way to piss of your users, so you need a pretty robust duplicate detection system. I’ll dive into much more detail on duplicate detection and merging in an upcoming post.