An Error in a Python Script May Have Invalidated 150+ Research Projects

And three other Python stories you may have missed from the past month

SeattleDataGuy
Nov 4 · 5 min read
Photo by NeONBRAND on Unsplash

A coding error in a set of Python scripts used for computational analysis may have invalidated 150 published research studies in chemistry.

A recently published research article from the University of Hawaii shows a programming error within the Willoughby-Hoye scripts.

The researchers, attempting to examine results obtained from a cyanobacteria experiment, observed notable variations in the outcomes gotten from using similar Nuclear Magnetic Resonance Spectroscopy (NMR) data.

The error propagated depending on the operating system the scripts were being run on. The scripts were found to give accurate results on Windows 10 and macOS Mavericks but were less accurate by almost a full percent on macOS Mojave and Ubuntu.

The source of these variations comes from the scripts’ usage of Python’s glob module.

The glob module seeks files that correspond to a specific name pattern, and based on the glob results, the scripts generate a list of input files to read.

But then, the output from this module is dependent on the OS used for eordering and returning these files. The order taken for the processing of the file affects the outcomes of calculations made by these scripts.

This small detail may invalidate many previous research papers due to inaccurate outputs.

Phillip Williams and Rui Sun wrote codes to help fix this problem of corrected sorting, which now guarantees consistent outcomes. While the variations did not have any impact on the data results obtained by the University of Hawaii’s team, it may see some substantial impact on other published research projects.

The Willoughby-Hoye scripts were named after its authors, Patrick Willoughby and Thomas Hoye, of the University of Minnesota.

Presently an assistant professor of chemistry at Ripon College, Patrick Willoughby now acknowledges the new findings as well as the new corrections to the scripts. This update was made known in his post on Twitter:

“Great find by Rui and Prof. Williams. When I wrote the scripts six years ago, the OS was able to handle the sorting. Rui and Williams added the necessary sort code and added a function to ensure the calcs were properly aligned. Kudos!” Patrick Willoughby (@pat_willoughby)

Sometimes, trusting external scripts and libraries can lead to unexpected results.

And now, three other updates you may have missed.


Story 1. Salesforce ‘Einstein Analytics’ Changes From Python to Google’s Go Language

source

Salesforce is going for Google’s Go programming language in place of C and Python for Einstein Analytics.

With over $15.7bn spent on analytics firm Tableau in enhancing the Einstein Analytics platform, it is clear Salesforce considers analytics a significant part of its future.

In 2017, before its launch of Einstein Analytics, the company overhauled and rebuilt its back end using Google’s Golang.

Salesforce principal architect, Guillaume Le Stum, noted the dataset creation tools and query engine which led to Einstein Analytics being created in C “for performance” alongside a Python wrapper which offers other functionalities like a REST API server, parsing queries, and more.

In a post on Stack Overflow, Le Stum says:

“In essence, the product was built to have the best of both worlds.

Python is great for quickly writing higher-level applications but doesn’t always deliver the high performance needed at an enterprise level. C creates highly performant executables, but adding features takes a lot more time.”

But before the launch, Le Stum says the platform began showing some performance slowdowns due to adding new features not originally part of its core query engine. Consequently, even with its ability to develop and deploy these new features, Salesforce may be considering its long-term plan.

Le Stum added: “Python does not do multi-threading very well, so the more the wrapper was being asked to do, the worse it performed.”

In contrast, Go suits big applications appropriate for Google’s production systems, which also may account for the shift from a hybrid C-Python to Go platform.

Le Stum further highlighted the benefits of Go, which includes its built-in tooling, easy troubleshooting, quick compilation times, and deployments, along with its approach to making coding easily understandable.

Le Stum notes that in enterprise software, engineers spent more time reading than creating code.

Nevertheless, a proof of concept written in Go enables Salesforce to advance further as the Go version of the platform attained general availability in 2018. Amongst its significant benefits are Go’s cross-platform features, which allow easy code porting.

“If we ever need any of this code in a mobile app, we can cross-compile it to iOS or Android, and it would work,” Le Stum notes.

The only portion of the Einstein Analytics platform not written in Go is its cluster manager, which is in Java.


Story 2. Microsoft Introduces a Free Programming Course on Python

Microsoft has recently unveiled a YouTube video series termed Python for Beginners. The goal of the new series is teaching aspiring programmers the basics of Python.

Susan Ibach and Christopher Harrison would also be anchoring the series. Susan is amongst the business development managers at Microsoft’s AI gaming unit while Christopher is currently a senior program manager at Microsoft.

Microsoft said:

“Even though we won’t cover everything there is to know about Python in the course, we want to make sure we give you the foundation on programming in Python, starting from common everyday code and scenarios.”

The course, divided into 44-parts, would focus more on the 3.X version of Python and is targeted at an audience with some basic knowledge of JavaScript or have participated in some form of visual programming language.

The tutorials available within the course may range from three to ten minutes in video length and would cover features like configuration of Visual Studio Code, handling loops, and perform error handling.

Additionally, Microsoft is also posting a collection of add-on resources alongside the videos. The added resources would include slideshows and coding samples.

Reports from ZDNet highlight that Microsoft may benefit from its bigger pool of accomplished Python developers that could apply Python to building applications in its Azure Machine Learning Studio.


Story 3. Python 3.8 With Walrus Operator Is Now Available, Alongside Positional-Only Parameters Support

What are the new additions to Python 3.8?

  • The new Function parameter (PEP 570).
  • New Walrus operator (PEP 572).
  • Support from Pickle protocol 5 for out-of-band data buffers (PEP 574).
  • Verified open hook and audit hooks (PEP 578).
  • New C API for configuring Python Initialization (PEP587).
  • Support provision for Vectorcall (PEP 590).

Other new additions worth mentioning include:

  • Formatted strings (f-strings) now have a = specifier, new metadata module (importlib.metadata), and parallel filesystem cache.

You can see other improved modules, additions, and removals in the new Python 3.8 in Python docs. Also, for the complete details, kindly check the changelog.

Better Programming

Advice for programmers.

SeattleDataGuy

Written by

#Data #Engineer, Strategy Development Consultant and All Around Data Guy #deeplearning #machinelearning #datascience #tech #management http://bit.ly/2uKsTVw

Better Programming

Advice for programmers.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade