sparkavro: Manupilate Apache Avro file with sparklyr

Published in

Democratizing Data

1 min readMar 26, 2017

I created a simple sparklyr extension to handle Apache Avro file. It is just a simple wrapper of DataBrick’s spark-avro. It is listed in the official document of sparklyr extensions.

chezou/sparkavro

sparkavro - Load Avro data into Spark with sparklyr

github.com

Installation

Use {devtools} to install sparkavro.

devtools::install_github("chezou/avrospark")

Simple usage

You can read and write Avro file as follows:

library(sparklyr)
library(sparkavro)
sc <- spark_connect(master = "spark://HOST:PORT")
df <- spark_read_avro(sc, "test_table", "/user/foo/test.avro")
spark_write_avro(df, "/tmp/output")

This is the very first version, so there might be bugs especially around options. If you find any bug, please raise on the GitHub issue.

sparkavro: Manupilate Apache Avro file with sparklyr

chezou/sparkavro

sparkavro - Load Avro data into Spark with sparklyr

Installation

Simple usage

Written by Aki Ariga