“Data is everywhere”, are you sure?

Published in

Decision Optimization Center

6 min readMay 26, 2020

“Data is everywhere” is a common saying nowadays. If only that was true …In this post, I want to stress the point that:

-Early access to relevant data at early stage of an Operations Research project or application is mandatory.

-Therefore, one shall not postpone effort to collect, analyze and visualize the data by any means instead of expecting future availability of clean and realistic dataset.

Although customers agree at the early stage of the discussion that they will, of course, provide relevant data from scratch, this is rarely the case. But this is absolutely critical for the success of OR oriented software.
Let’s quote some reasons why:

Missing data hides potential errors and misunderstandings. The earliest we can have a look at the data, the earliest we validate them. Devil is in the detail, so a tiny change in a column description can reveal a missing or inaccurate specification. Knowing this specification too late may jeopardize the project plan since adopting a mathematical model to change request is much more delicate than other IT re-engineering.
Writing checkers: data in the application UI are in the state of waiting to enter the tunnel of optimization, which is somehow a black box for the customer. If this black box crashes and ends up with exceptions, no solution, or any other error messages, it’s quite frustrating for the end-user and transfers the explanation of data errors to the optimization engine support team. This can create frustrations and wasted time.

Advice #1:
Delegate as many responsibilities to data checking: it should provide a clear and user-friendly explanation of data errors rather than digging into awkward optimization logs or mathematical conflict files!

3. Showing data to the users at an early stage brings trust between project stakeholders. It opens the room to UI design, which may need a revamp of the data workflow. This stands also for the Output data and Key Performance Indicators: understand what information the planners need to review and validate the plan.

4. One data set is not enough. A given type of constraint may only exist in one country for example. We would like to get a variety of all use cases as soon as possible. Allowing the customer to provide data through any toolkit available will help: Excel or flat file importer, mock data tool.

Advice #2:
Split the project into two phases. In the first phase, we merge functional requirements and data collection. The first phase is validated with a basic UI that displays all the data (it will be improved later on), with the ability to collect all relevant data through basic importers. The second phase targets the full delivery but then is much less risky data-wise since the data will be totally validated by the previous stage.

From the OR prospective, brainstorming and early modeling are possible during the first phase but implementation will rather be postponed to the second stage, leading to higher visibility of OR developers’ workload. Instead, today’s projects are sometimes freezing the availability of OR developers on a much larger time frame than it could be. Going back and forth on the functional requirement may lead to delays and increase project risks!

Data analysis requires visualization

Getting accurate data is key, but what if you are unable to clearly see them?! Data models may be complex and expectations for data navigation depends on the profile of the users.

Strategic planners may be interested in a straight dashboard that would highlight key performance indicators
Operations planners may be interested in much more detailed information, that requires customizations, coloring, filters, interactive diagrams, or Gantt charts…
OR practitioner need to debug and query some cross-informations among several tables
Attach online documentation to any visualization, that clearly explains the dimensions of the objects you are looking
What-if analysis may require comparison views between two or more scenarios

Advice #3:
Start each project with a basic data visualisation — including output data as well — that will be enough to validate each table, then iterate at each delivery with a set of new UI components. Enhance the UI with planners feedback.

Discuss the results

It is critical to show to the customers its own data reflected in the application UI.

Looking at the optimization results is as critical and should again involve as many users as possible to foster a rich discussion on the outputs of the ‘black-box’. Note that even for OR people, the output is not always deterministic and may lead to surprising or unexpected results.

The OR practitioner will use these views to detect misbehaviors, bugs, and data errors — to fix at once. Once passed the necessary bug fixes, results may still look awkward to the OR practitioner, and in this case, the investigation may be on his side. Now, when the OR practitioner is convinced that his results are in line with the definition of the problem, his job is far from being completed since additional discussions will only begin to start with the planners!

Such discussion needs to be prepared, reporting tools need to be designed in order to provide an easy way to convince the planners that results make sense. For instance, prove to the planner that there is a bottleneck on the resource that prevents increasing the production.

At one point, OR practitioners need to put their feet in planners’ shoes. Planners are often very experienced and the way they build manual plans should be somehow reflected in search heuristics. This shall be not at the cost of losing global improvements, which is after all what an optimizer hopefully provides and which motivates the customer investment.

The planners shall agree with the results and not complain about obvious quality improvement that the optimizer fails to notice and fix. For instance, if switching two activities in a scheduling Gantt chart improves operational efficiency, then it’s too bad if the optimizer does not evaluate this local change and replace the previous local sub-optimal solution.

This may require however some polishing post-process since missing this move might be a tiny improvement that the solver loses (think of gap threshold that may stop an optimization when being at a certain distance from the optimum).

Advice #4:
Write clear test plans for the end-users at each delivery. That’s the first step towards user acceptance when the application reaches preliminary targets, with repeatable behavior. It does not necessarily satisfy all users expectations but draws the path to success with clear milestones. Each of these deliveries is hopefully involving an on-site visit of OR practitioners and other IT staff (front-end in particular). That’s a quite natural place to discuss the application behavior and get rid of potential misunderstandings. Having a good time around such as lunch or other activities is always nice to get to know each other.

If you liked this article you could also be interested in:
Tips & Tricks in OR Practice
“The Application is done. Now we just need Data”

“Data is everywhere”, are you sure?

Data analysis requires visualization

Discuss the results

Written by David Gravot