The Startup
Published in

The Startup

Lego vs SoC, Apple M1 + MT8195, Microservices and Big Data Model

This week (2020–11–10) was really big for System on a Chip: first Apple M1, and then followed by MediaTek MT8195/MT8192. But why on earth these have anything to do with lego, microservices and even data model? It is a topic I have put off for a few years.

Probably, all of us have unanimously nominated lego as the most powerful tool or toy ever invented in history. In any survey or multiple choice quiz, as long as “lego” appears, its super flexibility and limitless creativity can always stand out. Whenever I joined a discussion about “next generation of data processing platform” or “what is the best big data architecture”, lego was often mentioned as the perfect analogy for the ideal design. However, I’d always favor SoC over lego bricks for the critical big data problems.

Before any explanation, let’s joke a bit about the dark side of lego and microservices first. Actually lego and microservices share some key traits: Agility, Flexible Scaling, Easy Deployment, and Reusable Building Block. But over-indexing upon the above traits might also lead to:

  • oversimplified / rigid interface : that can explain the rise of GraphQL over REST
service mesh communication infrastructure (from Microsoft)
  • lack of end-to-end optimization : teams might use the excuses of deployment or modular agility/isolation to draw the clear-yet-small permitter for responsibility, then end-to-end becomes almost impossible.
Breaking a monolithic application into microservices (from AWS website)

While the lego bricks and microservices sound awesome, it’s also interesting to see how Apple and MediaTek are doing the opposite to bring the speed, efficiency, and what customers/partners want to the market: instead of further breaking functions into more micro chips and outsource each module to different vendors, both competitors consolidate more services into their SoC: putting GPU, AI (Neural Engine) and Imagine Signal Processor into a single silicon. This makes the tapeout and testing more complicated, but once the SoC is delivered (to make the phone, tablet, laptop, and server in the future), a lot of more can be quickly built and evolved on top of it. In addition to putting 6 modules into M1, Apple also tightly integrates its macOs software with M1 to achieve the amazing CPU + GPU performance, fan-less MacBook Air, super long batter life, and thiner/lighter body. Yet all such tight integration & coupling might be considered as anti-patterns by microservices and lego fans.

Old-school monolithic architecture is obviously bad, but there is also the drawback in thousands of microservices chaos and associated cost blackhole due to the low efficiency and over provision. The more balanced way is to develop and unit-test software in microservices way, but then to package multiple coherent & related microservices into a single macroservicefor integration-test and final deployment. More importantly, once a set of mciroservices are repackaged together:

  • a good portion of the intercommunication can be switched from HTTP/RPC to IPC/shared-memory

We can still scale the system by dial up/down instances of such macroservice, even without the super fine-grain control (e.g. having one microservice with more/less instances than another microservice). The re-packaging should be totally worth it.

“Data Middle Platform” or “Data Middle Office” is a concept that Alibaba has advocated and practiced since 2017. It was inspired by the amazing data-driven efficiency of Supercell which was visited by Alibaba executives in 2015. Alibaba then rearchitected its data infrastructure and organizations to push the so-called “big data middle platform with small front platform” strategy — consolidating scattered & repetitive microservices & data models is the core action behind the fancy name. Though this movement is not well known to or understood by the internet giants in US, it actually reassembles a lot of focus and practice in the SoC.

In layman’s terms, it is important to spend more engineering + business effort to model and scale one or a very small number of key tables for each business core/line. The ETL, storage and query/serving platform are end-to-end optimized to support such wide/big tables. All the experiment performance, analytics, decision making, and prediction are derived from such wide/big tables. The popular-yet-chaotic data democratization fashion advertised by many Hadoop vendors fell apart recent years, and Databricks, Snowflake, and even Microsoft/Google are bringing back again the data warehouse modeling with better underlying infrastructure support. A wide/big fact table with a dozen dimension tables are developed, tested, evolved, and packaged together like a well-organized SoC to provide both higher quality & efficiency, and lower integration & support cost.

When the world is enjoying the mobility of phones and tablets, SoC and its ongoing consolidation plays the crucial role there. People love to have faster device with longer batter life and better software efficiency. This should not be a hardware-only trait, I believe that software and data architecture should also rethink the balance between microservices and macroservices, and put the efficiency & quality into higher priority from consumer & business angle (instead of producer or engineering angle), because after all, only the successful business can fuel the true innovation of technology especially in big data and AI.

(* Spark Whole-stage Java Code Generation is one of the examples to illustrate how to write the logic in multiple operators/snippets, but the system will combine them into a single execution stage to achieve higher efficiency.)

(* Disclaimer: The views expressed in this article are those of the author and do not reflect any policy or position of the employers of the author.)



Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +760K followers.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Eric Sun

Advocate best practice of big data technologies. Challenge the conventional wisdom. Peel off the flashy promise in architecture and scalability.