Using Pandas and Python to Import Data into Salesforce

Anna Hazel Crotty
3 min readSep 25, 2015

I’ve done a decent amount of data migration into Salesforce. The normal thing to do seems to be to take whatever .csv file you’ve been given and beat it into submission in excel — fixing data, separating columns, de-duping, etc. Excel wasn’t build for this, and it can be really frustrating. And then sometimes the files are so large they can’t be open in excel, so that’s always fun.

The problem is even worse if Salesforce already has data in it — then you’ve got to make sure that you’re not adding duplicate records. That involves vlookup, which has some frustrating limitations, and I swear has failures that I can’t explain. You can use a tool like Apsona for that, but it always seemed to be a very line-by-line approach to me, which got tedious fast. (I’ll admit to having little experience with Apsona, and I know it is a powerful tool for some things.)

A person or two had suggested R to me, but I knew R was for scientific analysis, which is pretty different from what I do. I was definitely afraid that I could spend a lot of time learning how to use R only to find that it isn’t much better suited to the problem than Excel. Last week a colleague suggested Pandas / Python, and confirmed that he’d used it to replace vlookup. I decided it was time to jump in.

This is the disclaimer part, where I point out that this is day 6 of python and pandas for me. Day 6. So the code I’m putting up is probably riddled with errors and profound misunderstandings of both python and pandas. I’m definitely planning to keep working on it and expanding it to make it better and solve more Salesforce problems, so let me know if you have suggestions.

The problem I attacked first was getting a file of new contact data. I need to look at it and have it separated into 3 .csv files. The first is of contacts who have emails that currently appear nowhere in my Salesforce instance. The second is of contacts who have emails that appear exactly once in my Salesforce instance — the updates rather than creates. Then I want a third file that I’ve named beware. That’s the ones that match more than one record in Salesforce. Those are probably trouble for a variety of reasons — maybe there are duplicates in Salesforce, or maybe that one email address gets used by several people. (Married couples certainly do this, but admins who handle things for multiple people do, also.)

So far, this code is working for me in that way. It is much, much easier than what I used to have to do in excel, and I suspect more accurate.

an update: a recent batch of data that came my way was structured so that every donor had a row. Each time the donor gave a gift, 5 new columns were created. Ugh! In order to import into Salesforce obviously I needed to make one row for each gift. Here’s a bit of pandas that does that. I am sure there’s a more efficient way, but this does the trick.

Related: I’m way past day 6 of pandas now, and can’t recommend it highly enough for someone who has to work with a lot of data but isn’t a data scientist. It is tremendous for figuring out what is in some giant dump of data someone swears is high quality (spoiler: it probably isn’t!) and doing various transformations to make it easier to use.

--

--