Python is great for tasks like training machine learning models, performing numerical simulations, and quickly developing proof-of-concept solutions without setting up development tools and installing several dependencies. When performing these tasks, you also want to use your underlying hardware as much as possible for quick results. Parallelizing Python code enables this. However, using the standard CPython implementation means you cannot fully use the underlying hardware because of the global interpreter lock (GIL) that prevents running the bytecode from multiple threads simultaneously.
This article reviews some common options for parallelizing Python code including:
One possible definition of reinforcement learning (RL) is a computational approach to learning how to maximize the total sum of rewards when interacting with an environment. While a definition is useful, this tutorial aims to illustrate what reinforcement learning is through images, code, and video examples and along the way introduce reinforcement learning terms like agents and environments.
In particular, this tutorial explores:
FLAML is a lightweight Python library from Microsoft Research that finds accurate machine learning models in an efficient and economical way using cutting edge algorithms designed to be resource-efficient and easily parallelizable. FLAML can also utilize Ray Tune for distributed hyperparameter tuning to scale up these AutoML methods across a cluster.
This blog highlights:
AutoML is known to be a resource and time consuming operation as it involves trials…
Ray is a fast, simple distributed execution framework that makes it easy to scale your applications and to leverage state of the art machine learning libraries. Using Ray, you can take Python code that runs sequentially and transform it into a distributed application with minimal code changes.
The goal of this tutorial is to explore the following:
As a previous post pointed out, parallel and distributed computing are a staple of modern applications. The problem is that…
We are excited to launch our first annual Ray Community Pulse Survey!
The Ray project began several years ago at UC Berkeley. Over time, the project has seen tremendous growth, and now has over 450 contributors from 100+ companies. We’ve seen thousands of users adopt Ray to scale up their applications. Since the release of Ray 1.0 last year, Ray has grown with recent developments including integrations with libraries like Horovod and XGBoost, as well as feature enhancements to existing libraries like Ray Serve and RLlib.
As our developer community grows and evolves, we want to constantly keep a pulse…
The pandas library provides easy-to-use data structures like pandas DataFrames as well as tools for data analysis. One issue with pandas is that it can be slow with large amounts of data. It wasn’t designed for analyzing 100 GB or 1 TB datasets. Fortunately, there is the Modin library which has benefits like the ability to scale your pandas workflows by changing one line of code and integration with the Python ecosystem and Ray clusters. This tutorial goes over how to get started with Modin and how it can speed up your pandas workflows.
Machine learning today requires distributed computing. Whether you’re training networks, tuning hyperparameters, serving models, or processing data, machine learning is computationally intensive and can be prohibitively slow without access to a cluster. Ray is a popular framework for distributed Python that can be paired with PyTorch to rapidly scale machine learning applications.
This post covers various elements of the Ray ecosystem and how it can be used with PyTorch!
I am happy to help! I am now back making content again (after a year or two break) so hopefully people in the future wont need to spend 3 days looking for solutions (if my content or other good content ranks up, then it will be easier to find). Btw if you use scikit-learn, I made a new post on it: https://medium.com/distributed-computing-with-ray/how-to-speed-up-scikit-learn-model-training-aaf17e2d1e1
Scikit-Learn is an easy to use Python library for machine learning. However, sometimes scikit-learn models can take a long time to train. The question becomes, how do you create the best scikit-learn model in the least amount of time? There are quite a few approaches to solving this problem like:
This post gives an overview of each approach, discusses some limitations, and offers resources to speed up your machine learning workflow!
Decision trees are a popular supervised learning method for a variety of reasons. Benefits of decision trees include that they can be used for both regression and classification, they don’t require feature scaling, and they are relatively easy to interpret as you can visualize decision trees. This is not only a powerful way to understand your model, but also to communicate how your model works. Consequently, it would help to know how to make a visualization based on your model.
This tutorial covers: