Big data and the architecture difference

Sajjad Hussain
Oct 21 · 3 min read
Image for post
Image for post

Today, when big data is receiving more and more attention, the establishment of enterprise-level data platforms has also become a more common demand. To build a data platform system that meets your own needs and provides stable support, the selection of infrastructure is very important. Today we will talk about the selection of big data infrastructure.

In an enterprise data team, the selection of the data platform infrastructure is usually done by a senior development engineer or architect. This requires relevant personnel to select the appropriate technical architecture based on specific scenarios and requirements, comprehensively considering factors such as cost and investment.

Big data infrastructure, the current mainstream choice in the industry, are basically concentrated in the Hadoop ecosystem. On the one hand, it is because of the maturity and stability of Hadoop technology, and on the other hand, there are historical reasons, because many early frameworks of enterprises are based on Hadoop.

Traditional data architecture

Traditional data architecture, after entering the era of big data, the system cannot be used normally due to data volume and performance issues, and needs to be upgraded. The usual action is to retain the ETL, and enter the data storage through the ETL action. The data analysis requirements that this type of data architecture can meet are still dominated by BI scenarios.

Streaming architecture

On the basis of the traditional big data architecture, the streaming architecture data is processed in the form of streams throughout the entire process, and ETL is replaced with data channels at the data access end. The data processed by stream processing is directly pushed to consumers in the form of messages. The storage part is stored in the form of a window in the peripheral system. It is suitable for scenarios that require early warning, monitoring, and timeliness of data.

Lambda architecture

The Lambda architecture is a pivotal architecture in the big data system. The data channel is divided into two branches real-time streaming and offline. Real-time streaming is based on streaming architecture to ensure its real-time performance, while offline is mainly batch processing to ensure final consistency. It is suitable for demand scenarios where both real-time and offline demands exist.

Kappa architecture

The Kappa architecture is optimized on the basis of Lambda, combining the real-time and streaming parts, and replacing the data channel with a message queue. For the Kappa architecture, stream processing is still the mainstay, but the data is stored at the data lake level. When offline analysis or recalculation is required, the data in the data lake can be replayed through the message queue again.

Regarding the construction of the big data platform the selection of big data infrastructure. With the continuous development of big data, the demand for data platform construction by enterprises will become more and more common. Whether it is to transform the original system platform or build a new platform architecture, more professional talents are needed.

Data Prophet

We have history of million years

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store