Embrace Diversity & the Iterative Process Associated with Open-source

Keertan Menon
DataSeries
Published in
4 min readDec 18, 2018

2 Key Lessons from Patricia Florissi — VP and Global CTO for Sales, for DellEMC

Patricia Florissi at Dell EMC World 2017

Patricia Florissi is Vice President and Global Chief Technology Officer (CTO) for Sales. As Global CTO for Sales, Patricia helps define mid and long-term technology strategy, representing the needs of the broader Dell EMC ecosystem in strategic initiatives. Patricia also acts as the liaison between Dell EMC and its customers and partners, to foster stronger alliances and deliver higher value. Patricia was previously the Vice President and Global CTO for Sales at EMC prior to the merger between Dell and EMC. If you thought her experience could not be any more impressive, think again. She also holds the honorary title of EMC Distinguished Engineer, having been nominated in October 2007. We had the incredible honor of interviewing Patricia recently at World Summit AI, on behalf of our DataSeries network.

She is the creator, author, narrator, and graphical influencer of the educational video series Dell EMC Big Ideas, on emerging technologies and trends. The Big Ideas animated videos accelerate and expand technical thought leadership in EMC using innovative learning methodologies in a fun, easy way without talking about products to both internal and external audiences. There are over 20 videos in the series, with some videos localized in 10 languages, and with a combined total count of over half a million views. Patricia also writes articles on the impact of Big Data on accelerating innovation

The Brazilian native got involved with the AI space almost 10 years ago. At Dell, she started off with big data where she focused on the challenges revolving around extracting large volumes of data. Early on, it got Dell especially involved with Hadoop and Spark (and the likes) and directly resulted in Patricia and the DellEMC’s enthusiasm for deep-learning today.

Patricia’s Takeaways:

# 1 Respect and strive to achieve diversity of data. You can be happy but you should never be satisfied with the level of diversity in data.

We asked Patricia on her views of how important data was to her and the firm, and she responded that it was vital in this day and age. Why? Well, as we move more and more towards machine learning and deep learning where the algorithms learn from the data, it’s no longer about the quantity of data that is used for training but more importantly about the diversity of data that is brought to the table. Yes, clearly one would, and could, discuss the diversity of data in terms of “structured” and “semi-structured” i.e including video, text, etc. This is not what Patricia is referring to, however. When she says “diversity” she is referring to the quality of the content. The content of the data itself has to be diverse. For example, if you’re training a model to do image recognition then you absolutely need to get samples of faces from around the world (i.e cultures, ethnicities, age, etc). And when you have these models that train on data, that’s exactly when data becomes invaluable.

The other questions to ask yourself is: do you really want to achieve diversity? One of the biggest challenges for AI, according to Patricia, will not be the design or sophistication of the models being used, but instead will be its ability to scale and, in turn, allow more sophisticated models, trained on larger datasets, and larger computing capacity, to flourish. That said, you then need to do all of the above in a ‘federated’ way. It’s not simply distributed, and close to the data source. The models, that are trained locally, will be able to collaborate to derive a higher order learning. Furthermore, in the near future, not a single organization will have all of the data that it needs to train a model on a sufficiently diverse set of data. Interestingly, Patricia believes that we can expect the creation of a vibrant digital marketplace where companies on a potentially widespread level will share (not necessarily the data itself, but) the results of the analysis derived from the data. In specific, Patricia foresees the potential avenue for trading around analytics algorithms that can execute on other peoples data.

#2 Embrace the iterative process associated with “Open-source” today in order to better understand and reap the financial benefits of the models being trained.

Open-source traditionally means that IT is shared, but Patricia firmly believes that the IP of the data itself will not be shared. Each company will own the intellectual property of the data that they actually collect. As a result of owning the data, they are (as they should be) naturally entitled to the financial benefits of the data itself. That said, companies will share algorithms coming from Open-source, and might even collaborate in sharing the results of those very algorithms — especially when used against their own data. For example, you want to train a model to conduct speech recognition. Now, keep in mind that you have thousands (and potentially hundreds of thousands) of hours of people across the world talking about a specific subject (i.e technology). In the case of DellEMC, they would train their model using their own data and other companies would do so as well using their own data. Here’s where the process kicks in — and it really is a process! Different companies would then share the parameters of the models that were trained, combine those parameters, achieve a higher order, and then share it once again! This process is one that is highly iterative across companies and will enable all those involved to better understand the financial benefits of the models that have been trained.

--

--

Keertan Menon
DataSeries

Partner @ Sansa Advisors 🌍 Ex @cerberus @openocean @dataseries