This Week in Data #1
Some folks have asked me to post the first installment of my weekly newsletter.
My thoughts on what’s timely, interesting or quirky in the world of data. If you’d like to receive this in your inbox every week, subscribe here.
The week started off with a pair of insightful New York Times stories (Part 1 and Part 2) on a lumbering technology giant where change is in the wind, AT&T. The C.E.O. there is telling its workers to “retool” themselves. Buried in the first story is a great explanation of cloud computing and parallelization (or “virtualization”) for the layman — check this out: “In some ways, cloud computing is not as radical a technology shift as all the puffy language suggests. Big banks of computers still run software, as they have been doing in many industries since the 1960s. They have more power, because their chips have more transistors that enable them to do more, and they connect to more things thanks to fiber-optic cable and wireless.”
“The big difference,” the story continues, “is something called virtualization, which amounts to software that allows many machines to operate like one piece of computer hardware. This made it possible to run software that in effect interacted with other software instead of hardware. This, in turn, means the possibility of changing functions around rapidly by typing a few lines of code.
…Now what once took a year of analysis and deployment can instead happen in days, even minutes.”
In the News
A consortium of prominent developers of data projects announced the Apache Arrow initiative, which is setting new standards for “columnar in-memory analytics.” These standards offer a single way for different big-data technologies share data with each other in memory. The ability to share data has been a big problem for folks who want to use Python to access more conventional big-data technologies, as Wes Mckinney (creator of Pandas, a very popular part of the open-source data-science tool-set) pointed out in his meetup talk this week.
If you’re technical, here is the release about the Apache Arrow initiative.
If you’re a bit less technical, here’s a good story about it from InfoWorld.
In other news, the LIGO Observatories announced that “For the first time, scientists have observed ripples in the fabric of spacetime called gravitational waves, arriving at the earth from a cataclysmic event in the distant universe. This confirms a major prediction of Albert Einstein’s 1915 general theory of relativity and opens an unprecedented new window onto the cosmos.”
Not only is this fascinating science, but LIGO — which is funded by the National Science Foundation (NSF) and operated by Caltech and MIT- released the source data for their paper, along with a cool tutorial showing how to replicate their analysis. This is a great opportunity to learn something about time-series analytics and see ground-breaking science at the same time.
Hmm…I know what I’ll be doing this weekend…
Story on big data being used to fight money laundering here.
A Forbes story on big data in medicine here.
Check out this news — a hospital paying ransom in Bitcoin to restore access to their e-mail system. Seems like we are in a new era and, as more mission-critical infrastructure gets connected to the internet (think hospitals, power plants, etc.), better security systems will become all the more important.
What’s happening at Ufora
We released the next version of Pyfora this week with features supporting our new open-source customers, including better support for inheritance in Python, and a slew of improvements to the compiler.
Finally, I have a working prototype of Ufora running on GPU. We can now harness the power of 1500 cores (yes, that’s a LOT of cores) for 40 cents an hour. Now all I need to do is teach it how to do something other than logarithms and we’re in business! If you want to use Ufora to do deep learning, stay tuned, I have a bunch more coming.
Thanks for reading!