Analytics Vidhya
Published in

Analytics Vidhya

Extracting information from Catalog files in Salesforce Commerce Cloud using Python

If you’ve used Salesforce Commerce Cloud (SFCC), you’ll know the pain of managing the catalog through files. Of course, the design of the system is to reduce manual intervention and any information that you might on the catalog, should ideally be available in your Product Information Management system. However, there are times when you feel adventurous enough to play with direct SFCC catalogs (Master & Navigation) both to maybe, push into your reporting tool or just build an understanding of the catalog.

Note: I wrote this a quick way of knowing which products had images assigned to them and which didn’t. There is no direct way of knowing that in SFCC. This takes 3–4 mins, depending upon the size of your Catalog. Also, the script that I’m sharing with you can be used to work with any XML file. Have fun!

The XML File

The XML file format goes something like this:

Catalog Format of SFCC

It also ends with the category definitions and assignments:

We can write complicated code to remove it, but just open the file in Sublime Text (which can breeze through 200 mb files in 4–5 secs). Trim out the file to remove the category assignments at the end of the file and the top entries as shown:

The Code

Very simply, here is the code for extracting product-ids and the associated image names/locations as a CSV.

import xml.etree.ElementTree as ETtree = ET.parse('file.xml')
root = tree.getroot()
#print root.attrib ### establish where the root is...catalog = "" ## needed for printing out the string
for child in root: ## can be replaced with root.iter("node")
## where node is the XML node you want to
## iterate through
catalog += child.attrib["product-id"]
try: ### if some nodes do not have the thing you're looking for
for node in child.iter("image"):
catalog += "," + node.attrib["path"]
catalog += "\n"
except:
catalog += "\n"
print catalog

Save this file as parse.py

Open Terminal and type:

python parse.py | tee file.csv (where file.csv is the output file)

Hopefully, you have Python installed. If not, download here.

The Explanation

Note: I’m by no means an expert Python coder. I only know enough to leverage it for my work. There are 100 different ways to do this better.

import xml.etree.ElementTree as ETtree = ET.parse('file.xml')
root = tree.getroot()

Simple explanation: Get the file and set the root. Since the file contains a single root (“Catalog”), we can work on that one from hereon.

#print root.attrib ### establish where the root is...catalog = ""  ## needed for printing out the string

# gives us comments in Python. You can ask for the root attributes to be printed, which should return an object:
{catalog-id: <name of your catalog>)

Since we’re extracting a CSV file, we’ll create a string variable catalog to capture the values.

for child in root:  ## can be replaced with root.iter("node")
## where node is the XML node you want to
## iterate through
catalog += child.attrib["product-id"]

So we’re iterating along the roots children — which are the product nodes.

Parts of the XML file

We’ve also set the first column of our CSV variable with the child.attrib[“product-id”] — which will return 20126936 in the above shown example.

try: ### if some nodes do not have the thing you're looking for
for node in child.iter("image"):
catalog += "," + node.attrib["path"]
catalog += "\n"
except:
catalog += "\n"

The Try-Except block is introduced to avoid errors of nodes not being present. Remember we’re trying find products without images. A try node will evaluate if there is an error within the block it is evaluating. If error, it’ll execute the Except block.

Within the block, we’re iterating using the iter method which accepts a node. Note that we can go as deep as we want inside the XML tree to iterate — however, it’s best to come as close to a non-expanding branch as possible to get a performance gain (instead of iterating over the entire root node for the image nodes, we’re iterating inside the product node, which usually has 3–7 image nodes.

If you need to go deeper (get the image-group and then iterate further), you can execute nested for loops.

And that’s it. To end it, we’re printing the variable at the end:

print catalog

Note that in the terminal, we’re doing a couple of things with the command tee — we’re pushing the output to stdout (what you see in the terminal) and piping the output to a file you’ve specified.

Key Considerations

Python is the simplest way for you to manage the complexities of SFCC. After doing this for the last the 3 years (where I’ve done node.js/C/bash file generators for converting CSVs to SFCC XMLs for pricing/inventory/catalogs), I’ve come to find that Python provides an unparalleled level of simplicity of managing the complication. However, there are the following considerations:

  1. Python is a strongly typed language (read more on the comparisons).
    For a layman — if you’re prone to miss an indent or a colon, keep your code small. It’s easy to spot a missing indent in a 30 line code, then in 200.
  2. Some of the syntax and concepts are a little crazy, especially coming from other loose languages like JS. But, if you can master them, you’ll be a wizard, Harry.
You after mastering basic Python. Python cannot really be mastered.

3. Your code will run smoothly the first time you run it. That is the only concession you can expect from Python. What happens after that is a function of your luck and your lineage. Python doesn’t forgive or forget.

I believe it’s a must-have for product managers to have basic Python/Scripting language skills. It makes life a lot easier and can make for interesting data discoveries. Besides, this gives you an interesting view of a developer’s workload. If you’ve not spent 2 hours struggling with Async-Await/Callback promise hell for 30 lines of code in Node.js, you’ll never know the struggles of a programmer.

Thank you for reading. If you’d like to read more about Salesforce related articles and shortcuts, leave a comment! I’ll be happy to write more on Salesforce and the related technologies!

--

--

--

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Recommended from Medium

Dynamic secrets on Kubernetes pods using Vault

DNS cache server that uses Azure private DNS on Azure

Adding Ecto to the Supervision Tree

Unit Tests Don’t Find All the Bugs

Chapter 6 Going Live with Subscriptions

Community Spotlight: Jose David Pereira

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Rameez Kakodker

Rameez Kakodker

100+ Articles on Product, Design & Tech | Top Writer in Design | Simplifying complexities at Majid Al Futtaim | rkakodker.com

More from Medium

Google introduces Manufacturing Data Engine & Manufacturing Connect

Data Warehouse Automation: 6 Tips on How to Easily Adopt WhereScape

Moving at Zip speed while building data that stands the test of time

Automated emails and data quality checks for your data