Data Engineering Digest #13 (Jun 2020)Maycon Viana Bordin·FollowPublished indata.plumbers·18 min read·Jul 28, 2020--ListenSharePhoto by Samuel Wölfl from PexelsNew ToolsDelta Engine Introduction and Overview of How it WorksToday, we announced Delta Engine, which ties together a 100% Apache Spark-compatible vectorized query engine to take…databricks.comquestdb/questdb - Release 5.0Migrated to Java 11 (#272). Fixed #154 SQL: vectorized group by hour(timestamp) (#398) SQL: cancel active read-only…github.comRecent database technology that should be on your radar (part 1)Recent database technology that should be on your radar (part 1) I'm a huge fan of databases, so much so that I've…lucperkins.devSpark Delight — We’re building a better Spark UI“The Spark UI is my favorite monitoring tool” — said no one ever.towardsdatascience.comHyperspace, an indexing subsystem for Apache Spark™, is now open source - Open Source BlogFor Microsoft's internal teams and external customers, we store datasets that span from a few GBs to 100s of PBs in our…cloudblogs.microsoft.comSpark 3.0Introducing Spark 3.0 - Now Available in Databricks Runtime 7.0We're excited to announce that the Apache Spark TM 3.0.0 release is available on Databricks as part of our new…databricks.comSpark 3.0 — New Functions in a NutshellRecently Apache Spark community releases the preview of Spark 3.0 which holds many significant new features that will…medium.comSpark & AI summit and a glimpse of Spark 3.0If there is a framework that super excites me, it’s Apache Spark. If there is a conference that excites me, it’s the…towardsdatascience.comFive highlights on the Spark 3.0 ReleaseSpark 3.0.0 was officially released yesterday (18/Jun/2020), and it is a major change (no pun intended) in the most…itnext.ioAbout Joins in Spark 3.0Tips for efficient joins in Spark SQL.towardsdatascience.comApache Spark 3.0: Remarkable Improvements in Custom AggregationIn the context of the recent official announcement on Spark 3.0, Aggregator would now become the default mechanism to…medium.comSpark 2.x to spark 3.0 — Adaptive Query Execution — Part1Switching Join Strategy, by Radhwane Chebaane and Wassim Almaaouimedium.comData Engineering RoleDream of Becoming a Big Data Engineer? Discover What Sets Us Apart From Software EngineersWe ain’t doing the same thingtowardsdatascience.comPodcasts & PresentationsData Collection And Management To Power Sound Recognition At Audio AnalyticAn interview about how Audio Analytic is building a data set of high quality audio samples from scratch to power their…www.dataengineeringpodcast.comBringing Business Analytics To End Users With GoodDataAn interview about how the GoodData platform lets you bring business analytics to your customers and end users. The…www.dataengineeringpodcast.comData Management Trends From An Investor PerspectiveAn interview with Astasia Myers of Redpoint Ventures on the data management industry trends that she is paying…www.dataengineeringpodcast.comAccelerate Your Machine Learning With The StreamSQL Feature StoreAn interview with the creator of StreamSQL on the complexities of building a feature store and the benefits that they…www.dataengineeringpodcast.comBuilding A Data Lake For The Database Administrator At UpsolverAn interview about Upsolver's mission to build a data lake that empowers the database administrator to step into the…www.dataengineeringpodcast.comReal Data Architectures & PlatformsA Brief History of Liv Up Data PlatformThis is the journey of building a data platform, from 2017 to 2020. During these three years, our business changed a…medium.comHow TripleLift Built an Adtech Data Pipeline Processing Billions of Events Per Day - High…Monday, June 15, 2020 at 9:04AM This is a guest post by Eunice Do, Data Engineer at TripleLift, a technology company…highscalability.comHow Scribd Ditched the Data Center and Accelerated Its Development Velocity - The Databricks BlogGuest blog by R Tyler Croy, Director of Platform Engineering at Scribd People don't tend to get excited about the data…databricks.comHow to build an Analytics & Reporting Solution on the GCPIn our previous story we looked at how to build a modern Cloud Data Platform and which capabilities it should offer…medium.comBuilding University of Indonesia’s Realtime Analytics PipelineHow we designed and implemented University of Indonesia’s big data streaming architecture as part of my bachelor…medium.comOptimized Real-time Analytics using Spark Streaming and Apache DruidOur advertising data engineering team at GumGum uses Spark Streaming and Apache Druid to providing real-time analytics…medium.comHow To Visualize Public Transport Using Kibana, Elasticserach, Logstash (Elastic Stack) and Kafka…Do you think about analyzing and visualizing geo data? Why not try Elasticsearch? The so-called. ELK (Elasticsearch +…medium.comHow to build a real-time analytical platform using Kafka, ksqlDB and ClickHouse ?Recently at StreamThoughts, we have looked at different open-source OLAP database solutions that we could quickly…medium.comFour building blocks for scaling insights — Part 2: The evolution of our insight infrastructureA modular approach helped us scale our insight infrastructure as we grew rapidly and went from start-up to scale-up…medium.comData CultureData MonetizationEvery organization today has access to vast amounts of data. Data about operations, finances, customers, supply chains…medium.comEverything you need to know about data cultureAs data has the potential to fuel innovation and produce more value, many enterprises invest in numerous technologies…medium.comHow to empower data-driven culture on construction (1/4)The data management routine must be guided by the better use of the information companies already have and aim to reach…medium.comDo you really have a data strategy?Many companies claim of having a data strategy. Let’s see what makes this real.towardsdatascience.comData LakeDo you really need a data lake?Let me help you decide.towardsdatascience.comData Lake vs Data WarehouseFor a long time, I didn't understand the concepts of Data Lake and Data Warehouse. I thought it was the same thing - a…luminousmen.comData Engineer, Patterns & Architecture The futureDeep-dive into Microservices Patterns with Stream Processtowardsdatascience.comData mesh (not a service mesh)The speed of business today calls for data architectures evolution - from the warehouse to data lakes to data mesh.towardsdatascience.comBusiness Intelligence meets Data Engineering with Emerging TechnologiesHow to make BI better with new rising technologies and twelve data engineering approaches.towardsdatascience.comDo you really need a data lake?Let me help you decide.towardsdatascience.comModernize Analytics Infrastructure with a Modern Data Unification approachHow a collaborative Modern Data Unification approach can enable the business to scale alongside growing amounts of…towardsdatascience.comData ArchitectureData Engineer, Patterns & Architecture The futureDeep-dive into Microservices Patterns with Stream Processtowardsdatascience.comGuerrilla Data Architecturemedium.comA Modern Data ArchitectureWhen people think of AI driven products they…medium.comData GovernanceFacilitating Data discovery with Apache Atlas and AmundsenThe story of enabling a modern data discovery service within a big data democratization platformmedium.comGovern your data: it’s a tough job, but someone has to do it!Data Governance: why it cares and how Quantyca approaches it with an iterative processmedium.comWhat We Got Wrong About Data GovernanceAnd how we can make it righttowardsdatascience.comHow to find and organize your data from the command-lineIntroducing metaframe: a markdown-based, git-versionable documentation tool and data catalog for data scientists.towardsdatascience.comWhy data catalogs are data governance rock stars?Data catalogs smell like teen spirit!medium.comGovern your data: it’s a tough job, but someone has to do it!Data Governance: why it cares and how Quantyca approaches it with an iterative processmedium.comSpeed up Data Catalog Implementation with Automation and AIIt’s no doubt true that crowdsourcing is a great data catalog capability. After all, it enables teams and departments…medium.comDissecting the need of Data Catalog: Top 5 reasonsData Catalog Solution to Manage Your Data Lake | Boost Data Access & Discovery. End-to-end data lineage. Unified Data…medium.comWhy metadata is crucial in your data management strategyWhen data is created, so is metadata. However, this type of information is not enough to properly manage data in this…medium.comDefinition, Benefits, and Practical Use of Master Data ManagementData not only defined the last decade but will have a critical impact in the years to come. From serving as the…medium.comThe Six Dimensions of Data Quality — and how to deal with themBuilding your models and analysis on solid foundationstowardsdatascience.comData Quality — You’re Measuring It WrongIntroducing a better way: data downtimetowardsdatascience.comData Quality, DataOps, and the Trust Blast RadiusA lack of trust will dramatically impact your efforts to become data driven unless you proactively limit the Blast…towardsdatascience.comHow to ensure productivity and data quality for a phone survey at scalePart two in our series of lessons learned based on a 6,000 person survey in India and a 600 person survey in Kenya.medium.comHow can AI help to make Enterprise Data Quality smarter?Hardly anyone relying on data can say their data is perfect. There is always that difference between the data you have…towardsdatascience.comEarn a Bigger Gig and Help your Business go Real-TimeThe most valuable resource driving your company today is its data. But, usually there are many barriers between your…medium.comDataOps solves the data challenges of businesses.DataOps started from a desire to deal with the data silo’s, and enable non-tech savvy’s to answer their questions with…medium.comThe Importance of Testing Your DataIn the software development process, testing plays an important role. Testing ensures the software’s ultimate quality…medium.comData FormatsA Data Lake new eraData Lake and Data Warehouse in real time and low costmedium.comDelta LakeDelta Lake Year in Review and OverviewTry out Delta Lake 0.7.0 with Spark 3.0 today! It has been a little more than a year since Delta Lake became an…databricks.comWhat is and Why Delta Lake? How Change Data Capture (CDC) gets benefits from Delta LakeIntroductionmedium.comApache ParquetCRUD operation on Parquet files with Azure Data FactoryIf you ever need to do a CRUD operation on Parquet file in ADF then you can review this article for few hintsmedium.comData PipelinesML PipelinesIndustrialization of a ML model using Airflow and Apache BEAMIntroductionmedium.comData ProcessingApache SparkHow to access S3 data from SparkGetting data from an AWS S3 bucket is as easy as configuring your Spark clusterblog.insightdatascience.comStop using Pandas and start using Spark with ScalaWhy Data Scientists and Engineers should think about using Spark with Scala as an alternative to Pandas and how to get…towardsdatascience.comA modern guide to Spark RDDsEveryday opportunities to reach the full potential of PySparkmedium.comUnderstand Spark As If You Had Designed ItAmong the current frameworks available on the data space, just a few have achieved the status that Spark has in terms…towardsdatascience.comShould I repartition?About Data Distribution in Spark SQL.towardsdatascience.comExtract and load of ETL jobs in Apache SparkIf you have been working in apache spark and had a look at spark UI or spark history server, you would know the fact…medium.comExtracting ZIP codes from longitude and latitude in PySparkGiven the pair of (longitude, latitude) how could one find the corresponding US ZIP code?medium.comBe in charge of Query Execution in Spark SQLQuerying data in Spark became a luxury since Spark 2.x because of SQL and declarative DataFrame API. Using just few…towardsdatascience.comDeep dive into Apache Spark Window FunctionsWindow functions operate on groups of data and return values for each record or groupmedium.comPySpark EDA Basics: Practical Parallel Processingdon’t calculate, delegatemedium.comFaster extract and load of ETL jobs in Apache SparkIf you have been working in apache spark and had a look at spark UI or spark history server, you would know the fact…medium.comHow to pre-process large datasets for machine learning using SparkIntroductionmedium.comApache Spark: Window function vs Struct functionThe goal of this article is to compare the performance of two ways of processing data. There are window function and…medium.comApache Spark Optimization TechniquesBefore discuss various optimization techniques have a quick review how does spark runmedium.comDash is an ideal front-end for your Databricks Spark Backend📌 Learn how to deliver AI for Big Data using Dash & Databricks in a live webinar on July 28th at 1pm EDT.medium.comFaster extract and load of ETL jobs in Apache SparkIf you have been working in apache spark and had a look at spark UI or spark history server, you would know the fact…medium.comPrestoPresto On AzureLearn how Presto runs with Azure and how to set up Presto Cluster on Azure Cloud.medium.comAn Alternative Reporting Solution for Microservices with PrestoDuring the past few years, old monolith applications started to evolve into distributed ones and many of these…medium.comApache PigThe charm of Apache PigA big data tool not to misstowardsdatascience.comAkka ActorsHow to Write a Simple Data Processing Application With Akka ActorsWhen we talked about Data Processing or doing Big Data ETL, the first that comes to mind is using Hadoop (or Spark)…levelup.gitconnected.comStream ProcessingThe case for Realtime Stream ProcessingEven since I got interested in providing insights based on data already available in the database, I’ve been looking…medium.comHandling Dead Letters in a Streaming SystemHow we solved the critical problem of invalid records that broke our streaming pipeline.blog.gojekengineering.comOverview of the DataFlow Modelhow the dataflow model deal with the streaming datamedium.comComparison between different streaming enginesmedium.comZero to Streaming Application — BackendNote: This is part of “Zero to Streaming Application”, learn about streaming applications building a POC. Full code…medium.comApache FlinkRun a Stateful Streaming Service with Apache Flink and RocksDBBuild a stateful streaming servicemedium.comReading Avro files using Apache Flinkmedium.comDemo: How to Build Streaming Applications Based on Flink SQLHere shows how to use Flink SQL to integrate Kafka, MySQL, Elasticsearch, and Kibana to quickly build a real-time…medium.comA simple way to build your Real-Time dashboardBuilding a live dashboard could be a headache due to the complex architecture and hard maintenance. Nowadays, the data…medium.comThe Flink Ecosystem: A Quick Start to PyFlinkThis article will introduce PyFlink’s architecture and provide a quick demo in which PyFlink is used to analyze CDN…medium.comRead from specific partitions with Flink Kafka Consumer in a Docker swarm clusterApache Flink offers a powerful integration with Kafka service, with an high level wrapper for the consumer. The main…medium.comFlink Checkpointing and RecoveryApache Flink is a popular real-time data processing framework. It’s gaining more and more popularity thanks to its…medium.comApache BeamRunning an Apache Beam Data Pipeline on Azure DatabricksA brief walk-through on how to execute an Apache Beam Pipeline on Databrickstowardsdatascience.comReading NUMERIC fields with BigQueryIO in Apache BeamTo read a NUMERIC from BigQuery in Beam you need to extract the scale of the number from the schema, and use that to…medium.comApache Spark StreamingOptimized Real-time Analytics using Spark Streaming and Apache DruidOur advertising data engineering team at GumGum uses Spark Streaming and Apache Druid to providing real-time analytics…medium.comComplete Stream Environment (Go + Kafka + Spark + Deltalake)Due to the increasing number of Microservices and IoT scenarios, we often come across the need to have a complete…medium.comStreaming Data from Apache Kafka Topic using Apache Spark 2.4.5 and PythonCreating a CDC data pipeline: Part 2medium.comApache Structured Streaming for end-to-end real-time applicationMany applications today require processing data in real time and make decisions based on real time data such as fraud…medium.comPeering through the ‘Window’ of Structured Spark StreamingTime is critical in streaming applications as compared to batch. The choice to go ahead with batch or streaming always…medium.comChange Data CaptureStreaming Data from Microsoft SQL Server into Apache KafkaCreating a CDC data pipeline: Part 1medium.comChange data capture: Install Debezium on K8sCreate a namespace for the resources we’re going to create:medium.comMessagingApache KafkaUsing Kafka as a Temporary Data Store and Data-loss Prevention Tool in The Data LakeIntroductionmedium.comUsing Kafka for Collecting Web Application Metrics in Your Cloud Data LakeThis article demonstrates how Kafka can be used to collect metrics on data lake storage like Amazon S3 from a web…towardsdatascience.comWhy should anyone use Apache Kafka?This is a story told time and again.medium.comConnecting Kafka to a MinIO S3 bucket using Kafka ConnectHow to connect data being distributed via a web-socket to Kafka and then onto an S3 bucketmedium.comWelcome to Kafkaland!Kafka is blazing fast, you have enormous freedom, and it’s easy to go wrong with it when coming from another message…medium.comApache Kafka Startup Guide: System Design Architectures: Notification System, Web Activity Tracker…What is Kafka?medium.comStreaming Data Pipelines using Kafka connectStreaming data pipelines is the backbone of an effective data pipeline in the modern systems.medium.comKafka-s3 Sink in < 3 minsKafka avro record to s3 bucket in localmedium.comKafka, for your data pipeline? Why not?Create a streaming pipeline using Docker, Kafka, and Kafka Connecttowardsdatascience.comApache Kafka — Understanding how to produce and consume messages?I lack motivation. Every tool I try to explore at least requires three times to concentrate. After a while, I lose my…medium.comWhy should anyone use Apache Kafka?This is a story told time and again.medium.comShould I backup my Kafka cluster? And how?One of our clients recently had an interesting request: can you backup our Kafka cluster?blog.softwaremill.comApache PulsarApache Pulsar Key Shared Mode-Sticky Consistent HashingKafka Pulsar is a pub-sub message streaming service created by Yahoo.medium.comAnnouncing AMQP-on-Pulsar: bring native AMQP protocol support to Apache PulsarWe are excited to announce that StreamNative and ChinaMobile are open-sourcing AoP, which brings the native AMQP…medium.comAnnouncing: The Apache Pulsar 2020 User Survey ReportThis survey report reveals Pulsar’s accelerating rate of global adoption and highlights key features on Pulsar’s…medium.comUnderstanding Pulsar Message TTL, Backlog, and RetentionI’ve noticed some confusion out there around message deletion and retention in Pulsar (for example, in order to keep…medium.comWorkflow ManagementOrchestrating ETL PipelinesIn a previous post we have talked about our Data Platform, the tech choices we have made throughout its implementation…medium.comApache AirflowIntegrating Docker Airflow with Slack to get Daily ReportingUse Airflow to deliver daily weather forecasts to Slacktowardsdatascience.comDeploying Scalable, Production-Ready Airflow in 10 Easy Steps Using KubernetesAirflow, Airbnb’s brainchild, is an open-source data orchestration tool that allows you to programmatically schedule…levelup.gitconnected.comHow Apache Airflow is helping us to evolve our data pipeline at QuintoAndarHow we are scaling our data workflows to support the growing business analytical demands with Apache Airflow.medium.comOmada and Apache AirflowLearn how Omada leverages Apache Airflow for orchestrating various pipelinesmedium.comSetting up Airflow on a local Kubernetes cluster using helmIn this post, I will cover steps to setup production-like Airflow scheduler, worker and a webserver on a local…medium.comUse Airflow to project confidence in your data.A key tenet of Raybeam’s mission whenever we start at a new client is to deliver value quickly. This value often takes…medium.comAirflow — sharing data between tasksIf you look online for airflow tutorials, most of them will give you great introduction what Airflow is. They will talk…towardsdatascience.comMigration to Airflow: One year feedbackAt Maisons Du Monde, we used to create and schedule our data pipelines with Rundeck, which is an automation server like…medium.comAirflow: How To Refresh Stock Data While You Sleep — Part 1In this first tutorial on Apache Airflow, learn how to build a data pipeline to automatically extract, transform and…towardsdatascience.comApache Airflow At Palo Alto NetworksAs a part of the Cortex Data Lake platform team at Palo Alto Networks, we are building a Batch processing pipeline to…medium.comPrefectSeamless move from Local to AWS Kubernetes Cluster with PrefectPre Requisitesmedium.comMap faster! Major mapping improvements in Prefect 0.12.0The second generation of Prefect’s unique approach to dynamic parallel pipelines is here.medium.comWorkflow Automation: Empowering teams through our in-house, self-service frameworkA Workflow consists of steps, configured to respect a predefined order and accomplish a specific business objective…medium.comCloud ProvidersAWSDitch the DatabaseHow to use AWS S3 Select to query smarter and maybe cheapertowardsdatascience.comMoving Faster With AWS by Creating an Event Stream DatabaseAs engineers at Nike, we are constantly being asked to deliver features faster, more reliably and at scale. With these…medium.comAdvanced monitoring of AWS glue jobs by enabling spark UIDocker container to enable spark history server for monitoring glue jobs using sparkUItowardsdatascience.comData Lake Vs Lake FormationWhy do we actually need Data lake?medium.comMy Top 10 Tips for Working with AWS GlueI have spent a significant amount of time over the last few months working with AWS Glue for a customer engagement. For…medium.comRun a Spark/Scala/ Python Jar/Script using AWS Glue Job (Serverless) and Scheduling it using a…Easy Step-by-Step Guide to Create a Glue Job and schedule it using a Glue Triggermedium.comAWS Data Lake: Build Your Business Intelligence System.Introduction:medium.comSome quick notes for AWS Data Analyst tools (Athena, Glue etc)Lessons learned doing real AWS Data Analyst projects at workmedium.comExtract and transform data from AWS Athena’s views and load into AWS S3 as a CSV file using AWS…We have AWS Athena reading some data in S3, so that we can perform SQL querying for analytics purposes. I created a…medium.comGoogle CloudLoading and transforming data into BigQuery using dbtA data engineering tool to build Data Lakes, Data Warehouses, Data Marts, and Business Intelligence semantic layers in…medium.comBigQuery Dataset Metadata QueriesThis is a quick bit to share queries you can use to pull metadata on BigQuery datasets and tables.medium.comDecoupling Dataflow with Cloud Tasks and Cloud FunctionsAre you developing data pipelines on Google Cloud and you sometimes struggle to choose the right product ? Do you feel…medium.comDesigning Data Processing Pipeline on Google Cloud Platform (GCP) — Part IAn architectural overview of processing big data using GCP servicesmedium.comEasy pivot() in BigQuery, finallyIntroducing the easiest way to get a pivot done in BigQuery. Did you know this is one of the most requested features…towardsdatascience.comLoading and transforming data into BigQuery using dbtA data engineering tool to build Data Lakes, Data Warehouses, Data Marts, and Business Intelligence semantic layers in…medium.comLoad files faster into BigQueryBenchmarking CSV, GZIP, AVRO and PARQUET file types for ingestiontowardsdatascience.comProcessing AVRO data using Google Cloud DataProcIn this story, we will see how Google Cloud Platform’s managed service Cloud DataProc can be leveraged to read and…medium.comReading NULLABLE fields with BigQueryIO in Apache BeamHow to read NULLABLE fields from BigQuery with Apache Beam, using GenericRecord values (that is, encoded in Avro).medium.comBigQuery: Creating Nested Data with SQLWorking with SQL on nested data in BigQuery can be very performant. But what if your data comes in flat tables like…towardsdatascience.comExtract Nested Structs without Cross Joining Data in BigQueryA short post sharing an example of less common but highly useful BigQuery Standard SQL syntaxtowardsdatascience.comHow to use BigQuery API with your own dataset?Using Flask and Bigquery APIs to extract data from BigQuery datasets based on user query parameters.towardsdatascience.comGraph data analysis with Cypher and Spark SQL on Cloud DataprocHow to read in BigQuery data and use Spark SQL and the Morpheus library to carry out graph data analysislevelup.gitconnected.comSqoop Data Ingestion on GCPRDBMSes (Relational Data Base Management Systems) have been around for decades, many people use it to store structured…medium.comAzureAzure Data Factory, a powerful Cloud ETL tool.There is no way, you can implement an Analytics project without a powerful ETL tool.medium.comConsuming a SOAP service using Azure Data Factory Copy Data ActivityHow to configure a Azure Data Factory CopyData Activity to consume a SOAP servicemedium.comImprove your Data Lifecycle with Metadata-Driven PipelinesNo digital transformation program is complete without a data-based initiative. With some speculating that artificial…medium.comAzure Data Factory PipelineBrief introduction on Data Pipelines in Azure Data Factorymedium.comDatabasesThe Many Flavours Of SQLWhat the SQL landscape looks like in 2020 and what’s it’s future?towardsdatascience.comRecent database technology that should be on your radar (part 1)Recent database technology that should be on your radar (part 1) I'm a huge fan of databases, so much so that I've…lucperkins.devExploring OLAP on Kubernetes with Apache PinotIt was April 2020 when I first heard about Apache Pinot through a tweet from Kenny Bastani. Online Analytics Processing…medium.comTime-series data: Why (and how) to use a relational database instead of NoSQLContrary to the belief of most developers, we show that relational databases can be made to scale for time-series data.medium.comNoSQLNoSQL Databases: a Survey and Decision Guidance(At the bottom of this page, you find a BibTeX reference to cite this article.)medium.baqend.comThe best SQL vs NoSQL mindset I've ever heardTL;DR - SQL RDBMS is optimizing for storage. NoSQL is optimizing for computing power. Nowadays, computing power is…codarium.substack.comElasticSearch On Steroids With Avro SchemasHow to tackle the interface version explosion in a large enterprise setuptowardsdatascience.comDoes Elasticsearch lie? How does Elasticsearch work?Elasticsearch surprises us with its capabilities and speed of action, but does it return the correct results? In this…medium.comMongoDB queries don’t always return all matching documents!When I query a database, I generally expect that it will return all the results that match my query. Recently, I was…blog.meteor.comChoose SQLLet's get straight to the point; choose an SQL database for your web application. I think I can't make my self clearer…stateofprogress.blogA Real-Time Database Survey: The Architecture of Meteor, RethinkDB, Parse & FirebaseReal-time databases make it easy to implement reactive applications, because they keep your critical information…medium.baqend.comMigrating Cassandra from one Kubernetes cluster to another without data lossOur experience withing changing the K8s operator for Cassandramedium.comLeveraging Shenandoah to cut Cassandra’s tail latencyAt Outbrain, we use Cassandra extensively. Up until recently our Cassandra clusters were configured with G1 and…medium.comAnti-patterns which all Cassandra users must knowNo amount of performance tuning can mitigate a known anti-pattern. When you google ‘antipatterns in Cassandra’ you will…medium.comBackup and restore Cassandra clusterCompatible with Elassandramedium.comLoading CSV Into Hbase Table In Kerberized Hadoop ClusterLooking for some quick step by step method to load bulk of data into Hbase table in a Kerberos enabled Hadoop cluster…medium.comIn-Memory & Data GridWhat an in-memory database is and how it persists data efficientlyHey guys.medium.comRelationalHow Does PostgreSQL Implement Batch Update, Deletion, and Insertion?This article addresses all the frequently asked questions pertaining to batch update, insertion, and deletion in…medium.comHow to Optimize SQL QueriesThis article sorts out some special techniques for optimizing SQL Queriestowardsdatascience.comModern Data WarehousesBusiness Intelligence meets Data Engineering with Emerging TechnologiesHow to make BI better with new rising technologies and twelve data engineering approaches.towardsdatascience.comEvolution of the DWH — What is a Data Lake House?Sometimes I miss the old days when you got a big monolith deployment of tech; a new version full of features and bug…medium.comHow we migrated our data warehouse from Redshift to BigQueryWe recently migrated our data warehouse at Omio from AWS Redshift to Google BigQuery.medium.comData Lake vs Data Warehouse in Modern Data ManagementDistinguish data lake vs data warehouse; modernize your data management and analytics with data platforms.medium.comRedshift vs BigQuery vs Snowflake: A comparison of the most popular data warehouse for data-driven…Digital transformation is the new norm within the modern organisation where they continually challenge the status quo…medium.comThe Death of Data Warehouse?With the price of compute engine is getting cheaper, massive parallel processing advertised everywhere and “big data”…medium.com