The Data of 2030 | The Juice

Zumo Labs presents The Juice, a weekly newsletter focused on computer vision problems (and sometimes just regular problems). Get it while it’s fresh.

Maisie Sheidlower
Zumo Labs
5 min readJul 13, 2021

--

Week of July 5–9, 2021

____

There is a natural synergy (yes, we’re using that word) among the many subcategories that make up the AI world. It would be impossible to talk about synthetic data without talking about machine learning, computer vision, software, ethics, privacy (and neural rendering, and GANs, and our Marketing and Sales director Michael’s daughter’s book Neural Networks for Babies) — so that’s why we don’t do that. But synthetic data remains the apple of our eye.

So we were thrilled to discover that Gartner Inc.’s June Report predicts that by 2030, the most used type of data in AI will be synthetic. Modernization can be a tricky thing, especially when it requires industry-wide adjustments. Decision making for humans and AIs only gets more complex, and the unexpected lives up to its name. In fact, Gartner cited the pandemic as an example of how easily data that reflect past conditions become “obsolete.”

We remain confident that synthetic data is the most efficient, secure, adaptable resource in the industry. And the research backs it up. So if your project is unique, complex, or demanding (like ours!), synthetic data could be the fix. Get a demo to prove it.

____

#Environment

The accuracy with which scientists can predict the onset of the Indian Monsoon season hasn’t improved in forty years. Until now. German climate scientists from PIK Potsdam used machine learning to provide a more precise three-month preseasonal forecast, indicating the potential to mitigate global warming’s immense impact on the erratic monsoon season. Based on data from 1948 onward, the predictions compare tropospheric temperatures across the Indian Ocean and subcontinent and are the combined effort of eighteen interdisciplinary institutions in more than ten countries.

Researchers achieve improved prediction of Indian Monsoon onset using machine learning, via Phys.org.

#ComputerVision

Brickit is our new favorite computer-vision-con-Lego time suck. The app, created by nonemployee Lego enthusiasts, has a simple, robust concept: dump your bricks into a single layer on the floor, scan the pile, and decide what to build based on the suggestions. Its attention to detail is impressive — when you select a project, for example, the app will tell you where in your pile the necessary pieces are. The usability is the cherry on top, with swift processing time and a crisp UI. Maybe in a few updates it’ll clean up after us too.

Lego should snap up this rapid-fire brick-finding iOS app, via TechCrunch.

#E-Security

Ever gotten a “we think you’re a robot” warning and worried you can’t pass the Turing test? (“At what point does a hill become a mountain?”). Thanks to Trust Swiftly, an identity verification company, those days may be behind you. The team’s new package, powered by machine learning, features 15 distinct authentication methods, including phone SMS ownership and geolocation. It allows clients to store their data in over 22 regions worldwide (meaning it isn’t unnecessarily collecting customer information). The initiative is the first of its kind, with global, privacy-compliant coverage, quick (codeless) setup, and low commitment pricing packages. And, it feels timely after this year’s whopping 20 percent e-commerce shopping spike.

Trust Swiftly Launches 15 Verification Method Platform with Machine Learning to Increase E-commerce Fraud Prevention, via Associated Press.

#MedicalAI

Researchers have trained an AI to detect prostate cancer before patients have any symptoms and do so better than radiologists. Prostate cancer is the most diagnosed form of the disease and a leading cause of cancer death among men in several countries. Dr. Ruwan Tennakoon, a researcher for the study, says the software aims to “spot…prostate cancer through incidental detection,” as it can be used whenever men have their abdomen or pelvis scanned. Published in Nature’s Scientific Reports, the study found that the AI could detect cancerous growths in mere seconds and improved with each scan, “learning and adapting to read images from different machines to spot even the smallest irregularities.”

New AI tech for early detection of prostate cancer, via RMIT University.

#NewTech

IBM on Wednesday announced CodeFlare, an open-source, serverless framework meant to simplify and speed up the preparation of AI models for hybrid cloud environments. CodeFlare’s Python-based interface makes it “data scientist friendly,” according to the director of hybrid cloud platform at IBM Research. She added that the framework’s motivation “is the emergence of converged workflows, combining AI and machine learning, data analytics and modeling, and the increasing complexity in integrating modalities beyond individual steps.” According to IBM, the tool can take a lot of weight off developers’ shoulders — a user analyzing and optimizing 100,000 pipelines for training machine learning models cut their execution time from four hours to 15 minutes.

IBM’s CodeFlare automates AI model development, via VentureBeat.

#AlgorithmicBias

No one expected YouTube to be exclusively Double Rainbow Guys and butterflies. But they also didn’t expect it to politically radicalize thirteen-year-olds or promote eating disorders to ten-year-olds. The Google-owned company has long faced accusations of aggrandizing inappropriate or dangerous content — including that which violates its Community Standards — but has been able to brush them under the rug with occasional apologies and increasing secrecy about its algorithms. A new study from Mozilla may change that. Based on crowdsourced information from real users, the report identified “disparate and disturbing” content across the platform…71% of which came exclusively from YouTube recommendations.

YouTube’s recommender AI still a horror show, finds major crowdsourced study, via TechCrunch.

____

📄 Paper of the Week

R-Drop: Regularized Dropout for Neural Networks

Is dropout cool again? This work certainly seems to suggest so. Dropout is a regularization technique where you occasionally mute some neurons in a neural network during training. The process usually leads to better performance but has gone out of style for some forgotten reason. This team from Microsoft Research Asia proposes R-Drop, which pushes together the output distributions of sub-models generated by dropout (same model, but with different muted [“dropped out”] neurons). Kudos to the authors for showing that this technique works across various models and tasks, from language transformers to image classification CNNs. They even get state of the art on a translation task, beating out models with much more data and fancier architectures.

____

Think The Juice was worth the squeeze? Sign up here to receive it weekly.

--

--