Databases Explained

Objectives

  • Explain the concept of a database and the types of database models that exist
  • Discuss the practical uses of structuring data into various databases
  • Evaluate the advantages and disadvantages of different types of databases and how it applies to a particular data set example

Defining Databases

Database Models: What, How, and Why

Primary Key vs. Foreign Keys
Primary Key vs. Foreign Keys. As seen in Mukesh Singh, “Primary key Vs Foreign Key in SQL”
An example of three documents in JSON with club_id, role, and student_id field-value pairs
Graph Databases: Living on the Edge | High Alpha
2 nodes (‘data objects’) connected by an edge (‘relationship’). Courtesy of High Alpha.
Circles represent nodes/data objects. Lines connecting circles represent edges/relationships.
Example query done in Neo4J, showing students (blue nodes) who have taken classes (pink nodes) in their major.

Evaluating Databases

Evaluating Databases: Through The Lens of Our Data

Data dictionary for our mock university data set. Note: some of our variables were contained in several data sources.
Data dictionary for our mock university data set. Note: some of our variables were contained in several data sources.
  • Using tabular information to populate the database was a piece of cake
  • Querying is easy to do and the syntax is relatively intuitive
  • When joins are needed, keys allow for easy merges
  • Aggregating data can be efficiently done
  • Rigid data structure and schema so leaving out values would require additional data cleaning
  • Joining tables are necessary to see all information pertaining to ‘primary objects/entities’ (in our case: students)
  • All information regarding one entity can be found within a single document. Example: a student document contained their personal information, as well as the courses they’ve taken.
  • Embedding documents within other documents eliminates the need for traditional joins
  • Nesting multiple tables into one collection can result in redundant or unrelated information in objects and make querying more complex
  • Documents formatted similar to JSON objects can make query outputs more difficult to decipher as compared to a tabular output
  • Easily see commonalities between data objects in our dataset (ex. students that shared a feature/characteristic, sections held in the same building, etc.)
  • Gave a clear, student-centric view: all variables as relating to the student (i.e. what department they’re in, what class they’ve taken/are taking, what clubs they’re in and their role)
  • Storing information inside relationships simplified the data whereby additional information or matching keys became properties
  • Aggregation queries are not as easily achieved and become tabular, so might as well use a more intuitive SQL relational database
  • No support for joins, so harder to represent complex data; hard to deal with composite keys which result in data duplication
  • Visually messy on a large-scale with more than a handful of relationships so must resort to filtering

Wrapping Up

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Apache Spark | RDD Internals

Easy way to install Neo4j on Docker

How to Run a Quick Accessibility Audit

told me his oxygen level had gone down to and they were pushing epi every

Extending the AWS EBS volume

Volume Modify image

Google Certified Android Developer

Most get build little.

Build an infinitely scalable* Slack app in 5 minutes

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Katie Zhang

Katie Zhang

More from Medium

Is software immortal? And can we hide columns in Postgres?

Cinchoo ETL — Split a large JSON file based on deeply nested array property

Big Data In Hadoop