Open Sourcing the Civis Data Science API Client for R
by Keith Ingersoll
It’s frustrating to reinvent the wheel just to do basic data science. We’ve experienced that here at Civis, which is why we’re always automating these tasks by adding them to the Civis Data Science API. By coding with the Data Science API, you can take advantage of everything Civis Platform can do, including our favorite data science workflows, like CivisML for building models at scale with scikit-learn.
Today, we are pleased to announce we’ve open sourced civis-r, the official R interface to the Civis Data Science API. With civis-r you can use the cloud compute, machine learning, and reporting capabilities of Civis with your favorite R tools. If Python is your language of choice, see our previous post on open sourcing the Python client and this post for examples of how the Civis Data Science API can make your life easier.
So what can you do with civis-r? For starters, you can move your data to and from the cloud with a single line of code. Your data can be stored as rows in a table, CSVs on a remote file system, or even as serialized R objects like nested lists. For example,
library(civis)# First we'll load a data frame of the famous iris dataset
data(iris)# Next we'll push it to a database table
write_civis(iris, "schema.tablename", "database")# Great, now let's read it back
df <- read_civis("schema.tablename", "database")# Hmmm, I'm more partial to setosa myself. Let's write a custom sql query.
query <- sql("SELECT * FROM schema.tablename WHERE Species = 'setosa'")
iris_setosa <- read_civis(query, "database")# Now let's store this data along with a note as a serialized R object
# on a remote file system. We could store any object remotely this way.
data <- list(data = iris_setosa, special_note = "The best iris species")
file_id <- write_civis_file(data)# Finally, let's read back our data from the remote file system.
data2 <- read_civis(file_id)
print(data2[["special_note"]])## [1] "The best iris species"
Civis-r also includes functionality for working with CivisML, Civis’s machine learning service. With the combined power of CivisML and civis-r, you can build models in the cloud where the models can use as much memory as they need and there’s no chance of your laptop crashing.
library(civis)# It really is a great dataset
data(iris)# Gradient boosting or random forest, who will win?
gb_model <- civis_ml_gradient_boosting_classifier(iris, "Species")
rf_model <- civis_ml_random_forest_classifier(iris, "Species")
aucs <- list(gb_model = get_metric(gb_model, "roc_auc_macroavg"),
rf_model = get_metric(rf_model, "roc_auc_macroavg"))
print(aucs)## $gb_model
## [1] 0.9897333
##
## $rf_model
## [1] 0.9954667
Huh, I guess we’ll call that a tie.
I could keep going on about the features in civis-r, including publishing reports or predicting on huge datasets in parallel, but I don’t want to spoil the joy you’ll get from discovering on your own. You can start the journey with the full documentation, creating an API key and, of course, installing:
install.packages(“civis”)
Hope you enjoy civis-r,
Patrick and Keith
library(ggplot2)
library(purrr)ts <- seq(0, 2*pi, by=.1)
df <- map_dfr(ts, function(t) {
x <- 16 * sin(t)^3
y <- 13 * cos(t) -5 * cos(2 * t) - 2 * cos(3 * t) - cos(4 * t)
list(x=x, y=y)
})ggplot(df, aes(x=x, y=y)) +
geom_polygon(fill="#FFC525") +
theme_void()
This post was co-authored by Patrick M.