Stories by Gabriel Briones on Medium

Unbeatable Tic Tac Toe?

Gabriel Briones — Fri, 17 May 2024 18:13:47 GMT

In the middle of preparing for technical interviews, I came across this project prompt and decided to see if I could give it a try!

Photo by Solstice Hannan on Unsplash

What Was I Excited to Learn About?

As a someone who has now dabbled in quite a few topcis within the field of data science, one of the things that excited me about this project was the opportunity to explore game development with Python and use the minimax algorithm. One of the coolest concepts of AI is that we can create AI bots that can compete against a human player in increasingly simple ways. This project tested my Python programming skills outside of the context of business analysis but also allowed me to understand the underlying principles of game theory and AI decision-making processes.

Unique Tips and Tricks I Learned

One unique tip I learned through this project is the importance of breaking down a complex problem into smaller, manageable modules. Initially, I tried to build the entire game in one file. I quickly became overwhelmed by this. However, by organizing the code into separate modules for constants, game logic, and game board functions, the development process became much more structured and easier to manage. This modular approach not only made the code more readable and maintainable but also facilitated easier debugging and testing.

Here’s a glimpse into how I structured the project:

├── constants.py — Contains game constants like colors and dimensions
├── drawing.py — Functions for drawing the game board and figures
├── game_logic.py — Game logic including move validation and the minimax algorithm
├── main.py — Main game loop and event handling
└── README.md — Project documentation

This approach reinforced the importance of organizing code in a way that is readable but also in a way that each module has a single responsibility. I’ve used this approach in previous projects and it’s a project structure that I’ll carry forward in all my future projects.

Where Did I Struggle and What Do I Need to Spend More Time Reviewing?

One of the major challenges I faced during this project was implementing the minimax algorithm for the AI. Understanding how the algorithm evaluates possible moves and recursively simulates all potential outcomes to make optimal decisions was initially tough to iterate over. The recursive nature of the algorithm, coupled with the need to handle different game states and outcomes, required reviewing the logic several times and lots and lots of iterations of debugging.

You can take take a look at the code in my Github repo -> here!

The biggest takeaway from this was that I need to spend more time reviewing and implementing code around recursion and algorithm design. I plan to spend more time reviewing and practicing these concepts, as they are relevant in game development but also for many other areas in data science and software engineering.

Conclusion

Building the Tic Tac Toe game with an AI opponent has been an awesome way to sharpen my programming skills while taking a break from interview prep. It combined elements of game development, algorithm design, and software engineering best practices. The project taught me the value of modular coding, provided a practical application of the minimax algorithm, and highlighted areas that I can study further.

For anyone looking to enhance their programming skills, I highly recommend taking on a similar project. It’s a fun and challenging way to apply theoretical knowledge in a practical context, and it certainly helps that it feels pretty damn cool to watch your AI make smart moves.

Building a Scalable Data Engineering Pipeline for YouTube Data Analysis

Gabriel Briones — Sat, 11 May 2024 20:14:58 GMT

Prior to taking my Data Analytics in the Cloud course, which focuses on data engineering in AWS, I found myself wanting to explore more projects related to data engineering. I had been learning a lot of fundamentals in my masters program, like data modelling, data warehousing, but this project was the first that I tried building with AWS.

Let’s explore the key highlights of this end-to-end data engineering project.

Photo by Jaanus Jagomägi on Unsplash

In this blog post, we will discuss the process of building an end-to-end data engineering pipeline tailored for YouTube data analysis.

Project Overview: Working Backwards from Customer Needs

Our journey begins with understanding our customer’s needs. They are looking to launch a data-driven advertising campaign on YouTube. The customer has two fundamental questions:

How can we categorize videos based on their comments and statistics?
What factors contribute to a YouTube video’s popularity and engagement?

Goals and Success Criteria

To meet the customer’s requirements, I outlined the following goals and success criteria:

1. Data Ingestion: Efficiently ingest data from various sources, both one-offs and incrementally.
2. Data Lake Architecture: Design and implement a scalable Data Lake architecture on AWS cloud infrastructure.
3. ETL Design: Develop Extract, Transform, Load (ETL) processes to process data efficiently.
4. Reporting: Build a robust business intelligence tier, including interactive dashboards for insights.

What I Learned

Thanks to Darshil Parmar’s guidance throughout this project, I gained hands-on experience in several key areas:

- Building a Data Lake from scratch on Amazon S3.
- Handling semi-structured and structured data using Lake House architecture.
- Implementing best practices for cost optimization and performance.
- Utilizing AWS services such as Glue and Athena for data processing.

Challenges and Solutions

The architecture addresses challenges such as data scalability and real-time decision-making by adopting a cloud-native approach. Leveraging AWS services allows for handling of large volumes of data efficiently while making sure timely insights for actionable decision-making are available.

Building the Pipeline: Step-by-Step Guide

1. Data Acquisition: Download YouTube dataset from Kaggle and upload it to an Amazon S3 bucket.
2. AWS Glue Catalog: Catalog the dataset using AWS Glue for metadata management and ETL.
3. ETL Processing: Use AWS Glue jobs for ETL processing and transforming JSON data to an Apache Parquet format.
4. Data Cleansing: Implement data cleansing techniques to ensure data quality and consistency.
5. Partitioning and Optimization: Partition the data for efficient querying and optimize data types for performance.
6. Automated Processing: Set up triggers in AWS Lambda to automate data processing whenever new data is uploaded to S3.

Data Architecture

Conclusion

This project equipped me with the skills and knowledge to design and implement a scalable data engineering pipeline tailored for YouTube data analysis. By learning AWS best practices, you can build a data pipeline to help an analyst unlock valuable insights for data-driven decision-making. I’m excited to see how much I can learn and synthesize in my Data Analytics in the Cloud course.

Stay tuned as I continue to work through projects and develop an understanding for data engineering in order to build more architecture on my own!

Software Engineering Principles in Python

Gabriel Briones — Thu, 18 Jan 2024 20:19:45 GMT

As someone with a non-technical background, the idea of embracing software engineering principles feels daunting but relevant and necessary. As my network on LinkedIn continues to expand, I frequently see posts by Software Engineers who are frustrated with Data Scientists that lack the basic understanding of software engineering principles. Thankfully, the important ideas of modularity, documentation, and automated testing were presented in this course. This course focused on how these principles can improve data science workflows and lead to the creation of Python packages to solve problems more efficiently.

Photo by Alex Chumak on Unsplash

What was I excited to learn about today?

I was excited to learn about Software Engineering concepts as a whole and allowed myself to be guided by the mantra of this course: why should data scientists care about software engineering concepts? The course begins by exploring the synergy between Python, data science, and software engineering. By the end of the first chapter, it became clear that these principles streamline data science workflows and can create code that stands the test of time.

The chapter on maintainability underscored the importance of documentation and unit testing for project longevity. Identifying good comments, proper docstrings, and writing documentation for classes demanded a meticulous approach. Mastering these aspects can ensure that Python packages are not only functional but also maintainable.

What’s at least one unique tip or trick that I learned to be a better data professional?

In 2019, I completed Udacity’s Intro to Programming nanodegree. The final project for this course involved leveraging classes in Python development but I had not seen or used these concepts since then.

It was great to update these skills through hands-on coding practice around adding classes to a package and harnessing the power of Object-Oriented Programming. Writing non-public methods, using inheritance, and creating multilevel inheritance provided a deeper understanding of how Python packages can become powerful tools for data scientists. This knowledge certainly enriches my skill set and should allow me to build more sophisticated and maintainable solutions moving forward.

Where did I struggle and what do I need to spend more time reviewing?

Writing a Python package and leveraging Object-Oriented Programming definitely presented it’s challenges. I referred to the chapter slides more frequently during these two sections of the course than virtually any other chapter or course. As a student, most of my time is spent mastering pandas rather adding classes to a package. Practice makes perfect and this is no exception.

My journey towards mastering software engineering principles has just begun. With continued exposure and practice, I look forward to a promising a future where I am able to seamlessly integrate software engineering best practices into data engineering and data science projects.

Constant dripping wears away the stone

Writing Efficient Python Code

Gabriel Briones — Thu, 18 Jan 2024 19:41:39 GMT

As a business professional transitioning to a data-related role, the pursuit of solving business problems through actionable insights is at the core of my work. At first, successfully executing a piece of Python code was more than enough for me. However, as the workload in my MSc in Business Analytics and Big Data program picks up, it’s clear that analysis can be cleaner and clearer when Python code runs efficiently. While I’m still no expert in writing efficient Python code, this course definitely pointed me in the right direction.

Photo by Amanda Jones on Unsplash

What was I excited to learn about today?

I was excited about having a concrete definition of efficient code, particularly in the context of Python programming. Efficient code in Python focuses on code that not only runs quickly but also minimizes resource consumption, and has a small memory footprint. Efficient code also reduces latency and overhead while maintaining code readability. Lastly, in order to be more Pythonic, the code should also adhere to Python’s its intended style which optimizes for both speed and resource efficiency.

My weekly goal is always to write more lines of code than the previous week and this course provided plenty of hands-on coding exercises to meet this criteria. The course builds a strong foundation by exploring Python’s Standard Library, introducing NumPy arrays, and honing skills with built-in tools. The Zen of Python was not new to me but it’s difficult to put this into practice when you’re a true beginner, struggling to master for and while loops. Seeing how the course was structured to incorporate it into my coding practices was well thought out by the DataCamp team.

A great deal of my masters coursework involves pandas so the introduction to efficiently working with pandas DataFrames, including iterating options and applying functions, added yet another practical dimension to the course.

What’s at least one unique tip or trick that I learned to be a better data professional?

Profiling code using line_profiler and memory_profiler packages was a brand new concept. Learning how to gather and compare runtimes between different coding approaches was really cool to put into practice. The practical application of replacing bottlenecks with efficient Python code was also useful. Profiling code not only gives me the ability to identify performance bottlenecks but also equips me to eliminate them, thus making the code more efficient.

Where did I struggle and what do I need to spend more time reviewing?

I should absolutely spend more time reviewing and coding in a more Pythonic way, such as using list comprehensions. It feels more natural to use built-in functions such as enumerate() as a for loop rather than writing the code using list comprehension. I believe I’ll have more opportunities this semester, both in school and as I prepare for technical interviews.

This course provided insights into profiling, advanced efficiency techniques, and pandas optimization. I am better equipped to elevate my coding skills as a data professional. To my Python II professor, if you’re reading this, I’ll be sure to try our homework exercises using efficient Python code so it becomes second nature for extracting meaningful insights from data.

Understanding Cloud Computing

Gabriel Briones — Thu, 18 Jan 2024 19:10:40 GMT

Understanding Cloud Computing is a beginner course to better understand the cloud, its advantages, and the major players like AWS, Microsoft Azure, and Google Cloud. This course did not include hands-on coding exercises but still provided exercises to practice the key concepts. As always, I’m guided by three pivotal questions, so let’s explore the key takeaways of this course on cloud computing.

Photo by C Dustin on Unsplash

What was I excited to learn about today?

I discovered the drawbacks of on-premise servers and the advantages of cloud-based solutions through DataCamp’s real-world use case examples. Scalability, fast setup, and flexible billing emerged as key differentiators between the solutions. It was great to review terms like scalability, latency, and high-availability. As someone who is guided by risk management principles, the idea of businesses removing hardware limitations and the potential for disaster recovery solutions without the headache of physical infrastructure was particularly interesting.

What’s at least one unique tip or trick that I learned to be a better data professional?

The course’s emphasis on the three service models — IaaS, PaaS, and SaaS — and the importance of choosing the right service model to match specific business requirements stood out. As data professionals, it’s important to be able to communicate these ideas with other, potentially non-technical, stakeholders. Understanding the different levels of abstraction and outsourcing IT services appropriately can significantly enhance productivity and resource optimization.

Where did I struggle and what do I need to spend more time reviewing?

As a business professional transitioning into the tech space, I believe I could spend more time reviewing cloud deployment models. Choosing between private, public, or hybrid deployments demands a deep understanding of not just individual business needs but the impact of data protection regulations and various roles within an organization.

As a big data student, cloud computing is an exciting exploration of a technology shaping the future of data management. This course left me in a better position to appreciate the power of the cloud, make informed decisions regarding service models, and address deployment challenges. I’m looking forward bringing this knowledge to the table as I start my AWS Cloud Practitioner certification exam preparation through IE University this weekend.

Constant dripping wears away the stone

Database Design

Gabriel Briones — Thu, 18 Jan 2024 12:17:47 GMT

Today, I’ll be sharing my experience with DataCamp’s Database Design course that is part of the Data Engineering career track. The course delved into the world of processing, storing, and organizing data, mastering database schemas and normalization, working with database views, and finally, understanding database management.

Photo by Campaign Creators on Unsplash

What was I excited to learn about today?

Processing, Storing, and Organizing Data

As I started the course, I was curious to see if there would be significant overlap between this course and my masters SQL II course, where we focused on understanding the fundamental aspects of processing, storing, and organizing data efficiently. This course began by introducing the two approaches to data processing: OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) and when to choose one over the other. The hands-on exercises allowed me to explore different forms of data storage and practice the essentials of data modeling.

OLTP’s primary purpose is to support daily transactions. It’s design is application-oriented. The data is operational and up-to-date as it is snapshot in time. The queries are simple transactions with frequent updates. The end users are company employees. Conversely, OLAP’s primary purpose is to report and analyze data. It’s design is subject-oriented and the data is consolidated and historical. The queries are complex, aggregated queries with limited updates. Typically, OLAP systems are used by only analysts and data scientists at a company.

Database Schemas and Normalization

The second chapter explored database schemas and normalization. The hands-on coding exercises allowed me to practice implementing star and snowflake schemas. Converting databases to various normal forms and extending schemas provided a solid review of effective data modeling.

Database Views

Learning to create and query views, differentiating between materialized and non-materialized views, and managing advanced capabilities in this chapter expanded my skills and understanding of how data can be presented and accessed.

Database Management

The final chapter ended with database management topics. Understanding roles and access control and the critical decision of choosing a Database Management System (DBMS) were chapter highlights. The course concluded with a quick review of SQL vs NoSQL and how to choose the right DBMS for specific business needs.

What’s at least one unique tip or trick that I learned to be better a data professional?

One unique tip that enhanced my skills as a data professional was learning about table partitioning. By learning how to partition tables into smaller pieces, I learned a powerful tool for optimizing query performance and managing large datasets more efficiently. This not only improves data retrieval speed but also contributes to better overall database management.

Where did I struggle and what do I need to spend more time reviewing?

I completed this course at the same time that I was learning these two concepts in my SQL II course. The challenge lies in striking the right balance between normalization and denormalization. Understanding when to apply each technique requires a deeper understanding of the practical applications of the database in question. To address this, I plan to spend more time reviewing case studies and engaging in hands-on projects, like the database modeling projects completed in SQL II, to solidify my understanding of these crucial concepts.

In conclusion, the importance of a well-crafted database design cannot be overstated when aiming for a high-performing database. Similar to building a house with a meticulously planned blueprint, thoughtful consideration of data storage is paramount. The amount of time spent in designing a database should be viewed as a strategic choice, saving time and avoiding frustration in the long run.

Constant dripping wears away the stone

Understanding Data Engineering

Gabriel Briones — Tue, 16 Jan 2024 18:47:14 GMT

DataCamp’s Data Engineering career track began as a playful competition among my classmates. During the first trimester of my MSc in Business Analytics and Big Data program at IE University, we received access to DataCamp. As we began to explore the DataCamp courses to practice our Python and SQL skills, we quickly discovered the Leaderboard.

At the same time, I continued to network with data professionals across LinkedIn and I became curious about the roles and responsibilities of a data engineer. All journeys start with one small step and DataCamp’s course titled Understanding Data Engineering was my first step.

Photo by Victor on Unsplash

What was I excited to learn about?

Embarking on a journey into the field of data engineering can feel overwhelming at times, especially without a computer science background or degree. However, my excitement centered around understanding the core responsibilities of data engineers, their pivotal role in facilitating the flow of data within an organization, and exploring complexity of building complete data pipelines.

Chapter 1: What is data engineering?

In this chapter, the course laid the groundwork by introducing the fundamental concepts of data engineering. I had read posts across my network on LinkedIn about the increasing demand for data engineers and their crucial position in the data science lifecycle. The hands-on exercises, especially the exploration of Spotflix — a fictional music streaming company — provided practical insights into how data engineers collect, clean, and catalog data, also known as ETL. As I progressed through the course, the differences that set data engineers apart from data scientists became clear.

Chapter 2: Storing data

At the time of completion, I had not begun my masters course on Modern Data Architectures and this DataCamp course served as an introduction to the different stages of the data pipeline. Understanding how data engineers manage different data structures, work with SQL for querying and storing data, and implement storage solutions with data lakes and warehouses was particularly helpful. It feels great to make the connection between querying data as a data analyst and how data engineers play a vital role in shaping the foundations that support analyses.

Chapter 3: Moving and processing data

Learning about the techniques data engineers employ to prepare raw data for analysis, the importance of creating pipelines, and the role of automation in streamlining processes left me eager for a hands-on experience. Thankfully, this itch was scratched in my masters course Modern Data Architectures where we touched on data ingestion with Apache NiFi. The integration of parallel and cloud computing into the mix adds a layer of complexity that I am excited to explore further.

What’s at least one unique tip or trick that I learned to be a better data professional?

DataCamp excels at allowing the user to immediately practice theoretical concepts. While this course did not include any hands-on coding exercises, I still found the exercises useful to retain the information.

One section of the course that stood out was the critical aspect of scheduling data. Scheduling must be viewed as a strategic approach that runs tasks in a specific order, resolving dependencies to ensure a seamless flow of data. While manual scheduling is possible, by automating repetitive tasks and orchestrating complex processes, data engineers can streamline their work, ensuring data flows seamlessly from collection to analysis, either at a specific time or based on specific conditions. Moreover, it’s essential to understand the distinction and use cases for batch and streaming.

Where did I struggle and what do I need to spend more time reviewing?

As mentioned before, this course did not include hands-on coding exercises. However, I’m looking forward to mastering the nuances of cloud computing through focused study and practical application this semester. During the second half of the last trimester, we spent a great deal of time understanding different approaches to data modeling in SQL databases. To master this, I could spend some additional time building more data models in MySQL.

In conclusion, DataCamp did an excellent job presenting the basics of data engineering and clearly kept me in engaged to complete the career path. While each course had it’s benefits, I’ll be writing about the ones that I found most engaging, keeping in mind that the bulk of my masters program is dedicated to business analytics and data science.

Constant dripping wears away a stone

Subject: Happy New Year!

Gabriel Briones — Mon, 01 Jan 2024 20:49:28 GMT

I slammed my laptop shut until 2024 and now it’s time to kick off the year!

My first trimester at IE University was a wonderful challenge and the spread of courses helped me identify some weaknesses. A reality of having a non-technical background means that I get to put in some extra hours to keep what technical skills I have sharp. Classes don’t start for a few more days but that doesn’t mean I can’t floss my brain with some Python until then.

Cue the “You’ve Got Mail!” alert from the 90’s

My first beginner Python project of the year is an email sender script, inspired by CodeWithTomi. Rather, than walk you through the steps of the code, which can be found here, I’d like to take discuss the project through the following three points:

What was I excited to learn about today?
What’s at least one unique tip or trick that I learned to be better a data professional?
Where did I struggle and what do I need to spent more time reviewing?

I was excited to apply Python outside of a data analysis context. I’ve spent a fair amount of time, both at IE and Correlation One, leveraging Python in a business context. I’m getting better and better at using packages like pandas and numpy to perform data analysis and provide solutions to business problems. But for this project, I imported the ssl and smtplib libraries, which I had never used before, but found intuitive to use.

In order to send an email, I used the 2FA password provided by Google and stored this in a separate Python file and brought the variable into the main file. I had previously done this when working with my OpenAI key. It’s a great tip to maintain some privacy and security and still get to share code on Github.

While it was intuitive to use, the majority of my time spent on this project involved reading the ssl and smptlib documentation. Reviewing documentation is really important, especially as a beginner, but often times it can feel very verbose. I’d say this is more of a personal point, but it bears repeating that I could benefit from slowing down during these projects. In addition to reading over the documentation, my experience importing files to Python helped me piece together what was needed in order to get the email sender to work successfully.