top of page
Search
  • Writer's pictureLoren Christiansen

How to Fix Duplicate Disease in Salesforce - Part 1: Cleaning the Database

Fix Your Data Headaches Forever

Duplicates are one of the worst diseases a CRM can have. Analytics show incorrect numbers, related records are logged on the wrong parent record, Users don’t get a full picture of a Contact’s history, etc. Poor duplicate management will bring your Org to a grinding halt within weeks.


If this sounds like you, your Org needs help - and fast. Salesforce offers a great suite of tools to prevent duplicates, but you need to clean things up first. Below are a few tips to help you remove existing duplicates.


Identifying Duplicate Records

First things first - get clear about what objects you'll be checking for duplicates, and understand the scope of your duplicate problem. If the number of duplicates is fairly low or your data model is uncommonly complex, a dedicated admin (or two) can take care of manually merging or deleting duplicate records in a few days. If the number is unmanageably large, consider using 3rd party tools that can do the work in bulk based on rules you define.


Write down the objects that you're concerned about, and record the field(s) that can be used to determine uniqueness. For example, the 'Email' field is typically a good place to start for Leads and Contacts. Unless you have a very good reason to store multiple contacts with the same email address, each record should have a unique email address. For other objects like Accounts or custom objects, you may need to use a combination of fields to check for uniqueness. For example, you may use the domain name of the website combined with the Account name to look for duplicates. Deciding on these rules is more of an art than a science and depends wholly on your business logic. A good place to start making these rules is asking yourself, "When I encounter duplicate records when I'm not looking for them, what data initially made me suspicious?"


Once you have your duplicate rules, choose your method for determining scope. You need to figure out how many duplicates you have, and what kind of resolution they'll require. Do you only have groups of 2 duplicates, or are there more in a group? Does each duplicate group require manual review and merge, or can you safely delete one of the records?


There's an important concept to be aware of when managing duplicates. Salesforce makes a distinction between duplicate 'sets' and duplicate 'records.' A duplicate set is a group of 2 or more records that have been flagged as duplicates. Duplicate records are the individual records that are duplicative. To make is more clear, imagine you have 6 contacts - three share an email address, two share another email address, and the last contact has a unique email address. In this case, you have 2 duplicate sets (groups of duplicates) and 5 total duplicate records. This is important for understanding scope if you use Salesforce's built in duplicate reporting.


To get a preliminary idea of duplicate scope, open up SF Inspector's Data Export. Use a grouped SOQL query and a 'having' clause to figure out how many duplicate sets you have. Group by your field(s) that determine uniqueness. For example, if you're using the email field to find duplicate contacts, you can run the following query:

SELECT Email, COUNT(id)
FROM Contact
WHERE Email<>Null
GROUP BY Email
HAVING COUNT(id)>1
ORDER BY COUNT(id) DESC

Tangent alert! Never heard of SF Inspector? It's the best Chrome extension you're not using. Pull reports almost as fast as you can speak, and import data in a heartbeat. Go to our home page and enter your email at the top to have an SF Inspector ebook sent right to you.


The output of this report can give you a lot of good information on next steps. If you get an error about the query timing out or anything about queryMore()....you have your work cut out for you(this typically means you have 2000+ duplicate groups). Otherwise, you'll have a (hopefully short) list of emails that show up more than once in your org. The number of results returned is your number of duplicate 'sets,' and the sum of the count column is the number of total duplicates.


If you already have matching rules and duplicate rules in Salesforce, you can also create reports to get all the detected duplicates in one place. Take a look at this article for instructions on creating the report type.


Cleaning the Duplicates

Now you have some decisions to make! The first of which is whether you'll use free, built-in tools in Salesforce(mostly manual merging), or use a 3rd party tool(automated merging in bulk). The primary driver of this decision is the complexity of duplicate resolution. Think through each of these:


Number of Duplicates

This one should be obvious. The more duplicates you have, the more work you'll have to do if you stick with Salesforce tools. Merge a few records in Salesforce, and then see if you're willing to repeat that process for as many duplicate groups as you have. You'll trade time for money if you have a lot of duplicates


Delete vs. Merge

Does each set of duplicates require merging records, or can you safely delete the duplicates instead? If deletes are a valid option, you should be able to get the work done with Salesforce and Excel. Export your duplicates, determine which ones you should keep, and delete the rest with the Data Import Wizard (or, if you want to be faster, SF Inspector!). If the duplicates require merging, a 3rd party tool may be a better fit. You can do it in Salesforce, but you or your team will have to review every duplicate manually to determine which field values to keep between the duplicate records. Alternatively, you can give end users the ability to merge duplicates and let the deduplication happen naturally as they encounter duplicates. It's slower and not always comprehensive, but there's no additional charge!


Complexity

On average, how long does it take to manually merge a duplicate? What are the business rules that drive the complexity? If they're straightforward if-then statements, chances are they'll fit well into a 3rd party tool that can do automated, rule-based merging. If the logic is more complicated, a human will need to review each set, pushing you towards manual merging in Salesforce.


Backup/Archiving Needs

It's very possible to lose data during deduplication. Will it be a huge problem if data is irreversibly lost? If so, you'll need at least a backup solution. Even better, some 3rd party tools can archive data from duplicate records into a text field during the merge. That way, even if the merge was done incorrectly, you can refer to the text field with the archive and correct the record.


Unfortunately, it's not an obvious decision. Review all of the factors and choice whether to stay with Salesforce or start looking at 3rd party tools. Alternatively, use a blended approach and resolve duplicates in some objects with Salesforce and some with 3rd party tools.



3rd Party Tools

An entire series could be dedicated to the tools that are available for data processing in Salesforce. Some tools are made purely for managing duplicates, and others are much more robust with useful features such as automatic backups, analytics, integrations, etc. Here are a few (in no particular order) that I've had success with:

  • OperationsOS by ZoomInfo(Previously Ringlead): Strong suite of data related tools. OperationsOS has bulk processing, robust rules for duplicate detection (including fuzzy matching on fields like Account Name), and field archiving for merged records.

  • DemandTools by Validity: Similar to above, Validity gives you granular control over complicated duplicate situations. Check their website for the other features that come with DemandTools

  • DBAmp by CData: If you have SQL experience and would be comfortable writing your own deduplication scripts, DBAmp is a reasonably priced option that makes pulling Salesforce data directly into SQL Server a breeze. I've used DBAmp extensively over the last 5 years and found it to be fast and reliable compared to other tools. Once again, only for those with SQL experience!


Conclusion

Cleaning a database full of duplicates isn't an easy task. You'll be successful by carefully determining scope, making a plan, and choosing the right tools. Good luck, and happy Admin-ing!


Did I miss anything? What has made your deduplication projects successful?

3 views0 comments
bottom of page