Weekly Digest for Data Science and AI: Python and R (Volume 17)

Hello everyone! Happy to have you back, and welcome to Volume 17. This week, we have two great Python packages that have been trending recently: raster-vision and trfl. Also two great packages for the R world: precisely and DataExplorer. To receive this digest directly in your inbox each week, sign up here.

Favio Vázquez

Published in

Ciencia y Datos

7 min readOct 26, 2018

raster-vision — An open source framework for deep learning on satellite and aerial imagery.

This framework blew me away. Like this:

It’s an amazing tool for building computer vision models on satellite, aerial, and other large imagery sets (including oblique drone imagery).

As the creators state:

[Rastervision] … allows for engineers to quickly and repeatably configure experiments that go through core components of a machine learning workflow: analyzing training data, creating training chips, training models, creating predictions, evaluating models, and bundling the model files and configuration for easy deployment.

Raster Vision workflows begin when you have a set of images and training data, optionally with Areas of Interest (AOIs) that describe where the images are labeled. Raster Vision workflows end with a packaged model and configuration that allows you to easily utilize models in various deployment situations. Inside the Raster Vision workflow, there’s the process of running multiple experiments to find the best model or models to deploy.

The process of running experiments includes executing workflows that perform the following commands (depicted in the graphic):

You can do a lot of different things with the package like chip classification, object detection, semantic segmentation and much more.

Installation

pip install rastervision

Usage

Here you can find examples and more information about the package:

Documentation - Raster Vision Documentation (0.8)

Raster Vision workflows begin when you have a set of images and training data, optionally with Areas of Interest (AOIs)…

docs.rastervision.io

Check out the original repo too:

azavea/raster-vision

An open source framework for deep learning on satellite and aerial imagery. - azavea/raster-vision

github.com

trfl — TensorFlow Reinforcement Learning

TRFL (pronounced “truffle”) is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Learning agents.

If you want to know more Reinforcement Learning, Mohammad Ashraf has an amazing series on the topic:

Reinforcement Learning Demystified: A Gentle Introduction

Episode 1, demystifying agent/environment interaction, and the components of a reinforcement learning agent.

towardsdatascience.com

Reinforcement Learning Demystified: Markov Decision Processes (Part 1)

Episode 2, demystifying Markov Processes, Markov Reward Processes, Bellman Equation, and Markov Decision Processes.

towardsdatascience.com

Reinforcement Learning Demystified: Markov Decision Processes (Part 2)

Episode 3, demystifying Bellman Expectation Equation, Bellman Optimality Equation, Optimal Policy, and Optimal Value…

towardsdatascience.com

And the great Siraj Raval has a course in youtube about it (he’s starting it):

Installation

TRFL can be installed from pip directly from github, with the following command: pip install git+git://github.com/deepmind/trfl.git

TRFL will work with both the CPU and GPU version of tensorflow, but to allow for that it does not list Tensorflow as a requirement, so you need to install Tensorflow and Tensorflow-probability separately if you haven’t already done so.

Usage

import tensorflow as tf
import trfl# Q-values for the previous and next timesteps, shape [batch_size, num_actions].
q_tm1 = tf.get_variable(
    "q_tm1", initializer=[[1., 1., 0.], [1., 2., 0.]], dtype=tf.float32)
q_t = tf.get_variable(
    "q_t", initializer=[[0., 1., 0.], [1., 2., 0.]], dtype=tf.float32)# Action indices, discounts and rewards, shape [batch_size].
a_tm1 = tf.constant([0, 1], dtype=tf.int32)
r_t = tf.constant([1, 1], dtype=tf.float32)
pcont_t = tf.constant([0, 1], dtype=tf.float32)  # the discount factor# Q-learning loss, and auxiliary data.
loss, q_learning = trfl.qlearning(q_tm1, a_tm1, r_t, pcont_t, q_t)

loss is the tensor representing the loss. For Q-learning, it is half the squared difference between the predicted Q-values and the TD targets, shape [batch_size]. Extra information is in the q_learning namedtuple, includingq_learning.td_error and q_learning.target.

You can find much more information about the package here:

deepmind/trfl

TensorFlow Reinforcement Learning. Contribute to deepmind/trfl development by creating an account on GitHub.

github.com

Interested in having this digest delivered to your inbox each week? Sign up:

precisely — An R package to estimate sample size based on precision rather than power

If you don’t want this to happen to you read below

precisely is a study planning tool to calculate sample size based on precision rather than power. Power calculations are focused on whether or not an estimate will be statistically significant; calculations of precision are based on the same principles as power calculation but turn the focus to the width of the confidence interval.

precisely has functions for studies using risk differences, risk ratios, rate differences, rate ratios, and odds ratios. The heart of these calculations is the desired precision.

Installation

You can install the development version of precisely with:

# install.packages("devtools")
devtools::install_github("malcolmbarrett/precisely")

Usage

Let’s say we want to calculate the sample size needed to estimate a 90% CI for a risk difference of .1 with an absolute width of .08. Here, the risk among the exposed is .4, the risk among the unexposed is .3, and there are three times as many unexposed participants.

library(tidyr)
library(dplyr)
library(purrr)
library(ggplot2)
library(precisely)n_risk_difference(
  precision = .08,
  exposed = .4,
  unexposed = .3,
  group_ratio = 3,
  ci = .90
)
#> # A tibble: 1 x 8
#>   n_exposed n_unexposed n_total risk_difference precision exposed unexposed
#>       <dbl>       <dbl>   <dbl>           <dbl>     <dbl>   <dbl>     <dbl>
#> 1      524.       1573.   2097.             0.1      0.08     0.4       0.3
#> # ... with 1 more variable: group_ratio <dbl>

We need 525 exposed participants and 1,573 unexposed participants for a total sample size of 2,098.

This package includes a Shiny app to help with calculations, which you can start with launch_precisely_app(). You can also find a live version at malcolmbarrett.shinyapps.io/precisely.

You can find much more information about the package in this vignette:

Introduction to Precisely

precisely has functions for studies using risk differences, risk ratios, rate differences, rate ratios, and odds…

precisely.malco.io

DataExplorer — Automate data exploration and treatment

https://github.com/boxuancui/DataExplorer

Exploratory Data Analysis (EDA) is the initial and an important phase of data analysis/predictive modeling. During this process, analysts/modelers will have a first look of the data, and thus generate relevant hypotheses and decide next steps. However, the EDA process could be a hassle at times. This R package aims to automate most of data handling and visualization, so that users could focus on studying the data and extracting insights.

Installation

The package can be installed directly from CRAN.

install.packages("DataExplorer")

Usage

With the package you can create reports, plots and tables like this:

## Plot basic description for airquality data
plot_intro(airquality)

## View missing value distribution for airquality data
plot_missing(airquality)

## Left: frequency distribution of all discrete variables
plot_bar(diamonds)
## Right: `price` distribution of all discrete variables
plot_bar(diamonds, with = "price")

## View histogram of all continuous variables
plot_histogram(diamonds)

You can find much more like this in the official webpage of the package:

Automate data exploration and treatment

Automated data exploration process for analytical tasks and predictive modeling, so that users could focus on…

boxuancui.github.io

And in this vignette:

Introduction to DataExplorer

This document introduces the package DataExplorer, and shows how it can help you with different tasks throughout your…

boxuancui.github.io

Thanks for reading this. I hope you found something interesting here :) If you have questions just follow me on Twitter:

Favio Vázquez (@FavioVaz) | Twitter

The latest Tweets from Favio Vázquez (@FavioVaz). Data Scientist. Physicist and computational engineer. I have a…

twitter.com

and LinkedIn:

Favio Vázquez - Founder - Ciencia y Datos | LinkedIn

View Favio Vázquez's profile on LinkedIn, the world's largest professional community. Favio has 16 jobs listed on their…

www.linkedin.com

See you there :)

Weekly Digest for Data Science and AI: Python and R (Volume 17)

Table of contents:

Python:

R:

raster-vision — An open source framework for deep learning on satellite and aerial imagery.

Installation

Usage

Documentation - Raster Vision Documentation (0.8)

Raster Vision workflows begin when you have a set of images and training data, optionally with Areas of Interest (AOIs)…

azavea/raster-vision

An open source framework for deep learning on satellite and aerial imagery. - azavea/raster-vision

trfl — TensorFlow Reinforcement Learning

Reinforcement Learning Demystified: A Gentle Introduction

Episode 1, demystifying agent/environment interaction, and the components of a reinforcement learning agent.

Reinforcement Learning Demystified: Markov Decision Processes (Part 1)

Episode 2, demystifying Markov Processes, Markov Reward Processes, Bellman Equation, and Markov Decision Processes.

Reinforcement Learning Demystified: Markov Decision Processes (Part 2)

Episode 3, demystifying Bellman Expectation Equation, Bellman Optimality Equation, Optimal Policy, and Optimal Value…

Installation

Usage

deepmind/trfl

TensorFlow Reinforcement Learning. Contribute to deepmind/trfl development by creating an account on GitHub.

precisely — An R package to estimate sample size based on precision rather than power

Installation

Usage

Introduction to Precisely

precisely has functions for studies using risk differences, risk ratios, rate differences, rate ratios, and odds…

DataExplorer — Automate data exploration and treatment

Installation

Usage

Automate data exploration and treatment

Automated data exploration process for analytical tasks and predictive modeling, so that users could focus on…

Introduction to DataExplorer

This document introduces the package DataExplorer, and shows how it can help you with different tasks throughout your…

Favio Vázquez (@FavioVaz) | Twitter

The latest Tweets from Favio Vázquez (@FavioVaz). Data Scientist. Physicist and computational engineer. I have a…

Favio Vázquez - Founder - Ciencia y Datos | LinkedIn

View Favio Vázquez's profile on LinkedIn, the world's largest professional community. Favio has 16 jobs listed on their…

Written by Favio Vázquez