Using Excel to manage scenarios in Decision Optimization for DSX

Victor Terpstra
5 min readApr 5, 2018

--

Excel can be used to manage scenario data consisting of multiple tabular data sets in one file. This allows for an easier creation of scenarios for what-if analysis using the Decision Optimization add-on for DSX.

The Decision Optimization add-on for IBM Data Science Experience (DO for DSX) includes an ability to evaluate a decision optimization model against one or more scenarios. Managing multiple scenarios, doing what-if analysis and compare the results is a crucial capability for developing and deploying decision optimization applications.

Intro

Normally, scenarios are created based on a set of comma-separated data files that have been uploaded into DSX as datasets. You create a new optimization Model, then in Select Data, create a new scenario, open the Data Panel and then one-by-one add the .csv files to the scenario.

Next, we can create multiple what-if scenarios, run each of them with our optimization model and compare the results.

Challenges

Using individual comma-separated data files is a simple and straightforward way to to manage scenarios and can be used to get started quickly when creating a few scenarios. However, it has some drawbacks once you need to manage more complex projects that require a larger set of input tables:

  1. Some real-life optimization projects may involve dozens of input tables. Uploading each separately and adding them to the scenario manually can become a little tedious. But more importantly,
  2. Uploading data for alternative scenarios requires renaming .csv files to keep them separate from the other scenarios. The DSX datasets feature doesn’t have the concept of a sub-directory, so unless you load a data file with a different name, it will overwrite the original.
  3. Adding renamed .csv files from multiple scenarios to the same flat set of datasets results in a larger set of files that becomes more difficult to manage and keep separate.
  4. The import of a .cvs file in the scenario automatically sets the name of the dataset based on the name of file. This name is used as a key in the inputs dictionary with DataFrames in the optimization model. Importing a renamed .csv file will by default set the dataset name to a name the code may not expect. Therefore, you would have to rename the table back to the original name.

Managing scenario data with Excel

Excel spreadsheets offer an easy-to-use way to aggregate multiple data tables in one file and can be used as an alternative way to manage scenario data.

The approach is based, on the one hand, on the Pandas feature of reading and writing Excel files and, on the other hand, on the DO for DSX feature of manipulating scenarios using a Python API:

  1. On your workstation, collect the set of .csv files related to a scenario in a single sub-directory.
  2. Using a local Jupyter notebook on the workstation, run a cell that aggregates the .csv files in the sub-directory into a single Excel file.
  3. Upload the Excel file to the DSX datasets of your project.
  4. Run a cell in a DSX Jupyter notebook that reads the Excel file and creates the scenario via the dd_scenario APIs.

Aggregation code

The function csv_directory_to_excel loops over all .csv files in a given directory, reads them as pandas DataFrames and adds them to an Excel spreadsheet as named sheets.

One challenge is that Excel sheet names can be at most 31 characters. If you have .csv file names that are longer, storing the name as the sheet name is not possible. Simply truncating the file name to 31 characters will loose the original file name and may conflict with another truncated table name. Therefore, this loader has the ability to generate a unique, truncated name for a sheet and to retain the original file name. It adds a ‘table_index’ sheet with a lookup table between the data file name and the sheet name.

First, we set values for the directory and files names:

# Set the parameters:
excel_directory = 'C:/temp'
excel_file_name = 'scenario1'
excel_file = os.path.join(excel_directory, excel_file_name + '.xlsx')
csv_directory = 'C:/temp/input_csv'

Then we create a pandas.ExcelWriter and pass it into the core function csv_directory_to_excel that loops over all .csv files in the csv_directory reads the data as DataFrames and stores the DataFrames in Excel. The other named input arguments are passed into the pandas.read_csv() function, e.g. the skiprows.

# Aggregate the data:
writer = pd.ExcelWriter(excel_file, engine='xlsxwriter')
cvs_directory_to_excel(writer, csv_directory, skipinitialspace=True, skiprows=0)
writer.save()

DSX Scenario loading code

On DSX, we need to run 2 functions. The first loads the DataFrames from the Excel file into a dictionary of DataFrames, keyed by the table name. This dictionary matches the type DO for DSX generates for optimization model notebooks.

For step 1, we first set the link to the Excel file. The expression os.environ['DSX_PROJECT_DIR'] returns the home directory on DSX.

excel_file_name = 'scenario3'
excel_file_path = os.path.join(os.environ['DSX_PROJECT_DIR'], 'datasets', excel_file_name + '.xlsx')

Then we load the DataFrames into an inputs dictionary:

# Open Excel:
xl = pd.ExcelFile(excel_file_path)
# Read data from Excel
inputs = read_dataframes_from_excel(xl)

For step 2, the DO for DSX scenario loading, we specify the framework name (i.e. the name of the DO model) and the name of the source and target scenario:

framework_name = 'PredictiveMaintenance'
source_scenario_name = 'Scenario 1'
target_scenario_name = 'Scenario_1_from_Excel'

A DO for DSX scenario also includes the optimization model (Python) code. By copying an existing scenario, it includes the model. So we assume the user has at least created one first scenario that includes the optimization code. (It doesn’t necessarily need to have input datasets. These will be allocated by function load_scenario_from_data.)

Then we create a new Client, get a handle to the model (framework) and copy a template scenario. The function load_scenario_from_data will store the inputs dictionary in the scenario.

import dd_scenario
client = Client()
fw = client.get_decision_framework(name=framework_name)
scenario = copy_scenario(fw, source_scenario_name, target_scenario_name)
load_scenario_from_data(scenario, inputs)

DO for DSX documentation: http://content-dsxlocal.mybluemix.net/docs/content/DODS/DODS_home.html

dd-scenario package documentation: https://pages.github.ibm.com/IBMDecisionOptimization/dd-scenario-api-doc/index.html

--

--

Victor Terpstra

Senior Data Scientist — Prescriptive Analytics, IBM Data Science Elite Team. The opinions expressed are my own and don’t necessarily represent those of IBM.