…ile SQL queries running in Redshift work well for us, we need to get data into and out of Redshift. We’ve increasingly turned to Apache Spark for ETL because of its flexibility and ability to scale with our growth. Over time Spark will likely become the tool of choice for our data pipelines.
… our data warehouse, providing the scalable storage and processing system our other tools build on. We continuously import our core data set (e.g. users, posts) from Dynamo into Redshift, and event logs (e.g. post viewed, post scrolled) from S3 to Redshift.
…lations between the entities that represent the Medium network, running a master with two replicas. People, posts, tags, and collections are nodes in the graphs. Edges are created on entity creation and when people perform actions such as follow, recommend, and highlight. We walk the graph to filter and recommend posts.