Avro example 1

Prateek
3 min readJun 21, 2023

Apache Avro is a language-neutral data serialization system, developed by Doug Cutting, the father of Hadoop. This is a brief tutorial that provides an overview of how to set up Avro and how to serialize and deserialize data using Avro.

Data is serialized for two objectives −

  • For persistent storage
  • To transport the data over network

What is Serialization?

Serialization is the process of translating data structures or objects state into binary or textual form to transport the data over network or to store on some persisten storage. Once the data is transported over network or retrieved from the persistent storage, it needs to be deserialized again. Serialization is termed as marshalling and deserialization is termed as unmarshalling.

Serialization in Java

Java provides a mechanism, called object serialization where an object can be represented as a sequence of bytes that includes the object’s data as well as information about the object’s type and the types of data stored in the object.

After a serialized object is written into a file, it can be read from the file and deserialized. That is, the type information and bytes that represent the object and its data can be used to recreate the object in memory.

ObjectInputStream and ObjectOutputStream classes are used to serialize and deserialize an object respectively in Java.

Please refer: https://avro.apache.org/docs/1.10.2/gettingstartedjava.html

sample customer.avsc file

{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{ "name": "first_name", "type": "string", "doc": "First Name of Customer" },
{ "name": "last_name", "type": "string", "doc": "Last Name of Customer" },
{ "name": "age", "type": "int", "doc": "Age at the time of registration" },
{ "name": "height", "type": "float", "doc": "Height at the time of registration in cm" },
{ "name": "weight", "type": "float", "doc": "Weight at the time of registration in kg" },
{ "name": "automated_email", "type": "boolean", "default": true, "doc": "Field indicating if the user is enrolled in marketing emails" }
]
}

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>com.example</groupId>
<artifactId>avro-demo-1</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>

<name>avro-demo-1</name>
<url>http://maven.apache.org</url>

<properties>
<avro.version>1.11.1</avro.version>
</properties>

<dependencies>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>${avro.version}</version>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.7.0</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>


<!--for specific record-->
<plugin>
<groupId>org.apache.avro</groupId>
<artifactId>avro-maven-plugin</artifactId>
<version>1.8.2</version>
<executions>
<execution>
<phase>generate-sources</phase>
<goals>
<goal>schema</goal>
<goal>protocol</goal>
<goal>idl-protocol</goal>
</goals>
<configuration>
<sourceDirectory>${project.basedir}/src/main/resources/avro</sourceDirectory>
<stringType>String</stringType>
<createSetters>false</createSetters>
<enableDecimalLogicalType>true</enableDecimalLogicalType>
<fieldVisibility>private</fieldVisibility>
</configuration>
</execution>
</executions>
</plugin>
<!--force discovery of generated classes-->
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
<version>3.0.0</version>
<executions>
<execution>
<id>add-source</id>
<phase>generate-sources</phase>
<goals>
<goal>add-source</goal>
</goals>
<configuration>
<sources>
<source>target/generated-sources/avro</source>
</sources>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>

SpecificRecordAvsc.java

package com.example.specific;

import com.example.Customer;
import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.specific.SpecificDatumReader;
import org.apache.avro.specific.SpecificDatumWriter;

import java.io.File;
import java.io.IOException;

public class SpecificRecordExamples {

public static void main(String[] args) {
Customer.Builder customerBuilder = Customer.newBuilder();
customerBuilder.setAge(30);
customerBuilder.setFirstName("Harshita");
customerBuilder.setLastName("Dekate");
customerBuilder.setAutomatedEmail(true);
customerBuilder.setHeight(180f);
customerBuilder.setWeight(70f);

Customer customer = customerBuilder.build();
System.out.println(customer.toString());

// write it out to a file
final DatumWriter<Customer> datumWriter = new SpecificDatumWriter<>(Customer.class);

try (DataFileWriter<Customer> dataFileWriter = new DataFileWriter<>(datumWriter)) {
dataFileWriter.create(customer.getSchema(), new File("customer-specific.avro"));
dataFileWriter.append(customer);
System.out.println("successfully wrote customer-specific.avro");
} catch (IOException e){
e.printStackTrace();
}

// read it from a file
final File file = new File("customer-specific.avro");
final DatumReader<Customer> datumReader = new SpecificDatumReader<>(Customer.class);
final DataFileReader<Customer> dataFileReader;
try {
System.out.println("\nReading our specific record");
dataFileReader = new DataFileReader<>(file, datumReader);
while (dataFileReader.hasNext()) {
Customer readCustomer = dataFileReader.next();
System.out.println(readCustomer.toString());
System.out.println("First name: " + readCustomer.getFirstName());
}
} catch (IOException e) {
e.printStackTrace();
}
}
}

Response:

{“first_name”: “Harshita”, “last_name”: “Dekate”, “age”: 30, “height”: 180.0, “weight”: 70.0, “automated_email”: true}
successfully wrote customer-specific.avro

Reading our specific record
{“first_name”: “Harshita”, “last_name”: “Dekate”, “age”: 30, “height”: 180.0, “weight”: 70.0, “automated_email”: true}
First name: Harshita

--

--