I’m excited to share DB From Zero (dbfromzero.com) with you today! This is a new project that aims to explore different aspects and components of databases by developing increasingly sophisticated prototypes. Additionally, benchmarking is performed to quantify the impact of different design parameters and workloads, and thereby improve our intuition about databases. Today I’d like to share with you one of the recent projects.
The most recent project explores Log-Structured Merge-Tree for Persistent Reads and Writes. The LSMTree is an interesting data structure that is commonly used in developing high-performance key/value stores. …
It seems every day we’re inundated with stories about the woes of millennials. They’re stressed with their jobs and personal lives and living under economically precarious situations with large debt loads. Blame has been attributed to such causes of social media and the modern economic system. We’re constantly asked to question what in our society needs to change to alleviate the problems of millennials.
In my opinion, speaking as millennial, we as a generation were simply the first to experience the unrealistic expectations created by our culture and amplified by social media. We live in a society that glorifies hedonistic…
As a small programming project, I’ve implemented the Bond Fluctuation Model (BFM) for polymer physics in 2 dimensions using Java. You can find the code at https://github.com/matthagy/mc_bfm_2d.
Initially, I’ve experimented with how polymer chains diffuse through a small pore within a contained system.
A single trajectory gives the following quantitative results.
Novel computational methods are developed to allow for very long time simulations of the two-dimensional Ising model with 10 billion Monte Carlo updates in each simulation. Using these methods, the time-dependent behavior of quenching from random initial states is analyzed to determine the quenching behavior. Simulations are run across a range of parameters, including the lateral size of the grid, l, and the pair interaction strength, J.
In some cases, the simulation trajectory converges to a configuration with a predominantly up spin or a predominantly down spin. There does not appear to be a simple relationship between the parameters of…
I’m a software engineer with a background in chemical physics and I’m excited to experiment with developing free online educational resources that teach chemical physics, particularly simulation. To that end, I’ve been exploring Scala.js and assorted JavaScript libraries. Here are some of the preliminary resources I’ve developed.
All the initial rough code is available at github.com/matthagy/chem_prog_exp. I apologize for the current lack of documentation.
Chiefly, I’m using Scala.js, which is a framework for writing Scala code that compiles to JavaScript so that…
I used to wonder why it was so frustrating to use previous generation tools to start solving new problems. E.g., using Perl/HTML to implement a website in the late 90s. All I learned from that was Python 2.3 and it took several years before I finally got back to coding in a lame undergrad research job. Cool that all of us budding Engineers got some real management experience in our minimum wage jobs while we wasted time in high school. Otherwise, we might all be writing Scala code like this.
It’s 2025 and one of the most coveted careers pinnacles in large corporations is Principal Data Scientist (PDS). It should be clear from the title that this is an elite role. Some innovative companies have even started using the title “Senior Data Manager” to attract the right candidates in 2025.
PDSs are charged with the mission of leveraging their strengths in ML to solve the hardest artificial intelligence problems facing tech companies. For example, social media companies are still searching for the perfect autonomous content moderating technologies. And only PDSs can help them.
PDSs are uniquely suited to solve such…
I’ve found that working on collections of elements by applying functions through well-defined algorithms (e.g., map
, filter
, and reduce
) to greatly simplify my code and remove many sources of errors. Therefore I was delighted to discover that Scala really pushes this to the next level by introducing a plethora of built-in algorithms on data structures. These concepts share some similarities to Spark RDDs and Java Streams, but I find the Scala approach simpler and more elegant.
As I return to data analysis and machine learning with Python, I’ve found it helpful to port these concepts to Python in a new…
This is part three of our tour through Scala. If you’re just arriving and would like to start at the beginning, checkout Quickly learning the basics of Scala through Structure and Interpretation of Computer Programs examples.
Today we’re going to do some novel programming exercises based around actual Reddit data. I’ve prepared a random sample of roughly ten thousand posts from the month of October 2018 for us to interactively explore by writing basic Scala in widgets within this article.
Here’s a preview of some of the Scala we’ll be writing to analyze Reddit posts.
Let’s dive right in…
This article builds off of part 1, Quickly learning the basics of Scala through Structure and Interpretation of Computer Programs examples. That article also covers reasons for why you may want to learn Scala.
We continue our exploration of Scala using examples that solve exercises from the classic book, “Structure and Interpretation of Computer Programs” (SICP). Small exercises for the reader are also included.
Today we’re going to start working with some data in that we’re going to learn how to create and processes lists of elements. A list is a simple data structure that consists of a sequence of…
Software Engineer and fmr. Data Scientist and Manager. Ph.D. in Computational Statistical Chemistry. (matthagy.com)