Salesforce Dataloading — Common pitfalls
If you are a Salesforce Administrator in a large company, you inevitably get to do a lot of data loading. But existing tools like dataloader.io or Data Import Wizard have their quirks, that can result in data corruption if you’re not careful. In this article I will share some experiences on common pitfalls during dataloading, and how to solve them.
Disclaimer: I am the founder of an alternative called SmartUpload, that can help you overcome these pitfalls. In this article, I will point to solutions within SmartUpload that are built to overcome these common data loading issues. And there is also a free version of SmartUpload.
Overview
In this article, I will discuss pitfalls that can occur when using free tools commonly used for data loading into Salesforce:
- Dataloader (sometimes called the Apex Dataloader), command-line utility, can import millions of rows.
- Dataloader.io (free up to 10.000 records per month, with quite steep licensing if you require more records)
- Data Import Wizard (limited to 50.000 rows per import)
These three tools all are free to use (to some extent) and have their own pro’s and con’s, but since they all work in a similar fashion they are all prone to the same issues. So let’s dive in …
Pitfall 1: Excel to CSV conversions
All existing Salesforce dataloading tools work based on CSV files. These are plain text files, where data is stored on lines separated with comma’s. But the CSV only contains values, it doesn’t have metadata about the values.
Why is this important? Let’s illustrate with an example. Consider the following Excel sheet:
Seems pretty straightforward, but when we export this to CSV, the following text file is generated:
This file has three issues when we try to import it into Salesforce:
- Due to my computer’s locale settings, it is generating CSV files with the semicolon (;) instead of the comma (,) as separator. Most tools will not accept this; or they make it very hard for you to define which character to use as a separator. So you might end up needing to manually change the semicolon to a comma everywhere.
- The field ‘birthdate’ was formatted as a real date, which means that in Excel you can even do calculations with it. It might display as ‘10-May-20’, but internally it is saved as a number. That means that within Excel, it will display differently based on your Locale settings, for instance ‘10–05–2020’ or ‘5/10/2020’. But the resulting value in the CSV is just text. And in my case, this is now text that will not be accepted by most tools.
- On a related note, if the date were formatted in a ‘5/10/2020’ format, how does your tool know if this is the 10th of May, or the 5th of October? I’ve seen this go wrong where data was exported in a European format (dd/mm/yyyy) but imported using an American format (mm/dd/yyyy). And that is actually worse than an error message, because data will be imported without fail, but you end up with corrupt data in your Org.
- The ‘description’ was a multi-line entry, which in the CSV file is now adding a new line. Some tools won’t accept this and expect a single line per entry.
So when you are converting data from Excel files into CSV files, you might end up with a file that you manually have to correct before loading, to make sure that Dataloader, dataloader.io or Data Import Wizard won’t fail parsing your CSV.
NB: SmartUpload supports XLS and XLSX files natively, without need for conversion. And it will also validate the Excel file before processing, to make sure all data is correctly formed.
Pitfall 2: Manual Mapping
Let’s dive into the next phase of imports: mapping columns from your CSV file to Salesforce fields. If you have a very simple object model, this usually won’t be too much of a problem. But there are some things to consider:
- Usually the mapping of fields from Excel columns to Salesforce Object fields can be daunting for business users. Especially if you have ambiguous field names, or complex object types.
- This means that often this is a repetitive task for Salesforce Administrators. Which puts the work of importing at the wrong desk: a business user should be managing his/her own data, you shouldn’t have to do it for them.
NB: SmartUpload allows administrators to create mappings once and assign them to users, where business users can only upload data and not modify the mapping.
Pitfall 3: ID lookups and relationships
Within Salesforce, objects can be connected using lookup or master-detail relationships. That’s a great way to keep your data clean, but can be quite a headache if you have a lot of interconnected data to import.
Let’s consider the following example; one of the most complex I have encountered during a consultancy gig:
This it the source Excel that needs to be imported into the ‘SMC__Supplier_Catalogue__c’ object: a supplier catalogue as part of the SCM package of Financial Force. That object has a lot of master-detail relations to other objects, like:
- Catalogue Name: this is the link to the parent Supplier Catalogue record
- Vendor: this is actually a lookup to an Account
- Item Code: this is actually a lookup to an Item Master record
- Currency: this is actually a lookup to the Currency Master object
- Stocking UOM: which points to a unique Stocking Unit Of Measure object
- Incoterm: this is a lookup to an Incoterms object
So what do we do in this case? We create exports for all these 5 objects, put the results into separate sheets and use VLOOKUP formula’s to query the data. For instance, for the vendors, a new tab would look like this:
The VLOOKUP formula for this: =VLOOKUP(C2;Vendors!A:C;3;FALSE)
- We are lookup up the value for whatever is in cell C2 (‘BX481’)
- We lookup in the Vendors sheet, columns A:C. The first column is used for finding the correct entry.
- We want to return the 3rd column (which is C), that contains the ID
- The last parameter is set to FALSE, to force an exact match
Doing this for all 6 lookups will end up in the following quite horrible Excel, I have marked all VLOOKUP formulas in yellow:
Now you can finally export this Excel to CSV and run the import. But I think you can agree that doing this on a daily basis when new prices from suppliers come in … it’s a nightmare.
NB: SmartUpload can perform automated lookups, without need for manual export-vlookup work.
Pitfall 4: Updating records without an ID
Similar to the previous point: if you want to update a record in Salesforce, all existing tools limit you on the field to use as a primary key:
- ID
- Any field with the ‘External ID’ set.
Now sometimes these two don’t cut it. For instance if you have an Excel file with account names: you cannot update Accounts based on their name. So you would have to:
- Either do an export of all accounts, use VLOOKUPs again in Excel to find the matching IDs
- Or: add an extra custom field to the Account object, with the ExternalID flag set.
Neither of these two solutions is very elegant.
NB: SmartUpload can perform updates using any field as unique ID, without any changes to your Salesforce Org.
Pitfall 5: Upserts
The next challenge is how to perform updates and inserts in one go, the so-called ‘upsert’. Building on top of the previous pitfalls, when you have an Excel file where some rows contain new records but others map to existing records, stuff gets really interesting:
- Depending on your object, some fields may be mandatory for inserting new records, but not mandatory when updating specific fields. This means you probably need to export a lot of data first.
- How does Salesforce know which records are existing and which ones are new? This goes back to the previous point: you cannot match on other fields than ‘ExternalID’ and ‘ID’ fields.
Lets re-consider the Excel file we discussed at the 3rd pitfall: the huge Supplier Catalogue import. It’s a very common business scenario that a new pricelist from a vendor contains both updated prices (which should be updated in the existing entries) as well as new products that should generate new records.
But it might be as simple as a list of accounts, where some of them already exist in Salesforce and others do not, and you want to sync it.
NB: SmartUpload can process upserts using a combination of fields to determine whether an entry is new or existing.
Conclusion
Data loading in Salesforce can be a very tricky business, but with these pitfalls in mind you can get a lot done using free tools.
However, if you need more control, remove tedious manual steps, or want to delegate work to business end-users, consider using a more intelligent tool like SmartUpload. You can use it for free on https://free.smartupload.net/.
Want to receive updates? Follow us on our LinkedIn page.