Getting Started With Data Analysis in Java: Statistical Features

The Java 8 Stream API already offers easy-to-use filter, sort, and aggregate functions.

Dr. Chris
Javarevisited

--

Photo by Choong Deng Xiang on Unsplash

Data processing and analysis in Java — or increasingly in web environments with Spring Boot (a popular Java framework)— is a common approach.

For instance, you can run an initial statistical analysis to get valuable insight into given data or perform the pre-processing or feature extraction of the data for machine learning (ML) use cases.

In addition, many well-known frameworks in the areas of data science, data processing (e.g., Apache Spark), data analysis, data visualization, NLP (e.g., Stanford CoreNLP), or ML (e.g., MOA, WEKA) are for or at least compatible with Java.

In this article, however, we will look at Java’s built-in statistics capabilities and advanced (but still lightweight) statistics libraries.

Prerequisites

We assume that you have a basic understanding of statistics and the Java programming language.

Test environment

The test environment runs on a MacOS (MacStudio M2 Max, 64GB) with Java 21. The IDE is IntelliJ IDEA with Gradle as the build system.

--

--

Dr. Chris
Javarevisited

Dad. Husband. Son. Multi-business founder & CEO. 7-fig investor. Tech PhD | Sharing valuable experience, lessons learned, and shortcuts. NEW→ upscalefilm.com