Lear how to develop increasingly sophisticated databases from scratch

Image for post
Image for post

I’m excited to share DB From Zero (dbfromzero.com) with you today! This is a new project that aims to explore different aspects and components of databases by developing increasingly sophisticated prototypes. Additionally, benchmarking is performed to quantify the impact of different design parameters and workloads, and thereby improve our intuition about databases. Today I’d like to share with you one of the recent projects.

Log-Structured Merge-Tree for Persistent Reads and Writes

The most recent project explores Log-Structured Merge-Tree for Persistent Reads and Writes. The LSMTree is an interesting data structure that is commonly used in developing high-performance key/value stores. …


Millennials represent an interesting social and technological experiment in new ways of being and it’s questionable whether we’ve gotten the experiment right. What needs to be changed for the future?

Image for post
Image for post
Photo by Benjamin Davies on Unsplash

It seems every day we’re inundated with stories about the woes of millennials. They’re stressed with their jobs and personal lives and living under economically precarious situations with large debt loads. Blame has been attributed to such causes of social media and the modern economic system. We’re constantly asked to question what in our society needs to change to alleviate the problems of millennials.

In my opinion, speaking as millennial, we as a generation were simply the first to experience the unrealistic expectations created by our culture and amplified by social media. We live in a society that glorifies hedonistic…


Image for post
Image for post
Snapshot of polymer chain configuration in simulation

As a small programming project, I’ve implemented the Bond Fluctuation Model (BFM) for polymer physics in 2 dimensions using Java. You can find the code at https://github.com/matthagy/mc_bfm_2d.

Initially, I’ve experimented with how polymer chains diffuse through a small pore within a contained system.

A single trajectory gives the following quantitative results.


Image for post
Image for post

Summary

Novel computational methods are developed to allow for very long time simulations of the two-dimensional Ising model with 10 billion Monte Carlo updates in each simulation. Using these methods, the time-dependent behavior of quenching from random initial states is analyzed to determine the quenching behavior. Simulations are run across a range of parameters, including the lateral size of the grid, l, and the pair interaction strength, J.

In some cases, the simulation trajectory converges to a configuration with a predominantly up spin or a predominantly down spin. There does not appear to be a simple relationship between the parameters of…


I’ve been experimenting with using Scala.js to create educational simulations and would like to share the results and what I’ve learned.

Image for post
Image for post
Simulation snapshots

I’m a software engineer with a background in chemical physics and I’m excited to experiment with developing free online educational resources that teach chemical physics, particularly simulation. To that end, I’ve been exploring Scala.js and assorted JavaScript libraries. Here are some of the preliminary resources I’ve developed.

All the initial rough code is available at github.com/matthagy/chem_prog_exp. I apologize for the current lack of documentation.

Chiefly, I’m using Scala.js, which is a framework for writing Scala code that compiles to JavaScript so that…


IMO, doesn’t matter if we’re all spending more time reading and learning… All I care about is helping Chemists get better at developing and applying software.

Image for post
Image for post
Photo by Gaelle Marcel on Unsplash

I used to wonder why it was so frustrating to use previous generation tools to start solving new problems. E.g., using Perl/HTML to implement a website in the late 90s. All I learned from that was Python 2.3 and it took several years before I finally got back to coding in a lame undergrad research job. Cool that all of us budding Engineers got some real management experience in our minimum wage jobs while we wasted time in high school. Otherwise, we might all be writing Scala code like this.


Sure is an exciting time to be in data science! Particularly so with the magic of machine learning. Why it’s so exciting that it’s attracting millions of highly intelligent and deeply trained future professionals. Hence, I think we should reflect on the future of data science. Here are my predictions…

Image for post
Image for post

It’s 2025 and one of the most coveted careers pinnacles in large corporations is Principal Data Scientist (PDS). It should be clear from the title that this is an elite role. Some innovative companies have even started using the title “Senior Data Manager” to attract the right candidates in 2025.

PDSs are charged with the mission of leveraging their strengths in ML to solve the hardest artificial intelligence problems facing tech companies. For example, social media companies are still searching for the perfect autonomous content moderating technologies. And only PDSs can help them.

PDSs are uniquely suited to solve such…


A functional, object-oriented approach for working with sequences and collections. Also similar to Spark RDDs and Java Streams. Hope you find they simplify your code by providing a plethora of common algorithms for working with sequences and collections.

Image for post
Image for post
An example of using scalps to analyze Reddit posts

I’ve found that working on collections of elements by applying functions through well-defined algorithms (e.g., map, filter, and reduce) to greatly simplify my code and remove many sources of errors. Therefore I was delighted to discover that Scala really pushes this to the next level by introducing a plethora of built-in algorithms on data structures. These concepts share some similarities to Spark RDDs and Java Streams, but I find the Scala approach simpler and more elegant.

As I return to data analysis and machine learning with Python, I’ve found it helpful to port these concepts to Python in a new…


This article continues our journey through Scala. In these exercises, you’ll get to analyze actual Reddit posts using snippets of basic Scala code that you write and run in your browser. You may even discover some surprising behavior of Redditors.

Image for post
Image for post

This is part three of our tour through Scala. If you’re just arriving and would like to start at the beginning, checkout Quickly learning the basics of Scala through Structure and Interpretation of Computer Programs examples.

Today we’re going to do some novel programming exercises based around actual Reddit data. I’ve prepared a random sample of roughly ten thousand posts from the month of October 2018 for us to interactively explore by writing basic Scala in widgets within this article.

Here’s a preview of some of the Scala we’ll be writing to analyze Reddit posts.

Let’s dive right in…


Let’s continue our journey through Scala using SICP exercises to demonstrate the elegance and simplicity of this powerful programming language in processing list data structures.

Image for post
Image for post

This article builds off of part 1, Quickly learning the basics of Scala through Structure and Interpretation of Computer Programs examples. That article also covers reasons for why you may want to learn Scala.

We continue our exploration of Scala using examples that solve exercises from the classic book, “Structure and Interpretation of Computer Programs” (SICP). Small exercises for the reader are also included.

Today we’re going to start working with some data in that we’re going to learn how to create and processes lists of elements. A list is a simple data structure that consists of a sequence of…

Matt Hagy

Software Engineer and fmr. Data Scientist and Manager. Ph.D. in Computational Statistical Chemistry. (matthagy.com)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store