TikTok Tech Separation
Technically Separating from the Mother Ship
There’s been a lot of talk these days about the challenges and risks of Tiktok, its business and management practices, and if, when, and perhaps how it might be separated from its roots in a way that builds trust & transparency.
This got me to pondering the technical parts of this challenge, i.e. how one might migrate and mitigate an entire large operational code and infrastructure base, from presumably tainted to provably (and sustainably) trustworthy.
And how to do this across a physical, cultural, and metaphorical ocean. In the current political environment, and in the times of COVID19, no less.
Thus below are some quick thoughts and suggestions on how this might be done, keeping in mind nearly nothing is known about the existing infrastructure, codebase, tooling, or much else about this billion-user system. Nor about how much is managed by ByteDance vs. the Musical.ly teams.
Additionally, this article is about technical issues, not commercial, political, nor national security challenges, each of which is more than sufficient to torpedo any such effort before, during, or after it’s started (or completed).
The fundamental concern is the system can be utilized or compromised in any number of a unacceptable ways, principally via unexpected data sharing and/or leveraging a very widely-installed app to compromise mobile devices that sadly are not as secure as we’d like.
Realistically, these are risks for every single app and data-oriented system we use, but Tiktok raises special concerns via its possible relationship with, or regulatory burden from, a nation state. The feeling is there are potential elevated risks, and therefore reduced trust; enough for the app to be banned in various jurisdictions.
The goal is thus simple: to reduce both the perceived and actual risk, to at or below that of peer social media platforms. And since ownership changes are insufficient on their own, it must be done by mostly technical means, including very strict controls, audits, and continued involvement of well-known experts in key areas.
And while this goal is simple enough, its execution on a large ever-evolving billion-user global platform won’t be so simple, especially while simultaneously building out whole new technical and operations teams in the US, EU, etc. as part of ongoing business, oversight, and policy separations.
This is similar to what Zoom has done to improve its core security, by involving experts, best practices, and transparency, though Zoom had (and has) a much, much simpler challenge.
So our challenge, should anyone accept it, is to audit and migrate the entire app, backend, and infrastructure codebases plus data from one that’s presumptively tainted to one that’s transparently trustworthy.
And to do this without significant user impact and using all of the existing code and applications, along with leveraging the existing cross-ocean technical teams while simultaneously replacing them.
In the midst of hiring and building a whole new American and EU team to build and manage large-scale systems in a running start without much knowledge, inability to travel, and a variety of language / cultural barriers.
All while continuing to be one of the most innovative and meaningful platforms for a significant portion of the world’s population, in the midst of various political transitions and elections, public health challenges, and economic difficulties. A walk in the park, really.
Broadly, the process is simple in concept, and darn complex in execution. Essentially build a team that can audit, migrate, and manage the whole system to a new trusted and very carefully-controlled infrastructure, controls, and processes. Now and on an ongoing basis.
There are three layers this effort has to execute on: Apps, Backend, and Infrastructure. All of which are large and complex in themselves, and supporting a huge user base, with enormous data storage and nearly global coverage.
Fundamentally, each and every part of the system has to be understood, very carefully audited, and migrated to very carefully controlled infrastructure, both for the app/data itself and all the development and security tooling, third party libraries and services, certificates, and more.
Given there nation state actors and perhaps others are anticipating this effort and are already involved in tainting the code, tooling, processes, and/or infrastructure, well-known world-class experts will be required at every stage and for every area. This will be both costly and quite challenging to coordinate, but must be done to reach any level of trust.
And all that has to continue on an on-going basis, as the teams cannot rest on their laurels once the migration is complete. This is especially true given there will be actors big and small actively working to compromise the system, some of them with prior detailed knowledge of its code & inner workings.
The entire process needs to be transparent and team-oriented, with multiple teams handling key parts, as the security risks are not merely technical, but social engineering as well (as seen in the recent Twitter debacle).
There are likely access and change control policy & process components to be borrowed from communities that are used to heightened risk & compromise management, e.g. intelligence, financial, and regulated gambling (the code change controls & QA practices required in Las Vegas might surprise you).
At the same time, there will be accusations of western intelligence and other groups installing backdoors or other compromises, which will take skillful efforts to thwart while simultaneously getting their blessing and perhaps assistance. The goal is to align the system with its global peers, not trade one nation state influencer for another.
No one wants the system beholden to ANY nation state, be it American, European, Eurasian, Asian, etc. It helps no one to swap one master for another.
The people picture of all this is quite unclear, as Tiktok talks about massing ongoing hiring in content, management, and ‘product’ though next-to-nothing is said about engineering teams, especially at the scale and talent-levels necessary to execute this full separation.
There is talk of ‘reducing’ the dependency on ByteDance, i.e. not separating not doing the things in this article, but if true, this completely thwarts the entire effort.
An entire engineering team needs to be built, perhaps from scratch, and borrowing heavily from Silicon Valley companies like Google or AWS with experience in these security & large-scale operational areas.
This presumably includes a CTO, development, operations, tooling, QA, and every other modern element required to run a big Internet property. Plus all the best-practice procedures, DevOps focus, agility, etc.
Plus a team tasked to interface with ByteDance and the existing team to cross the physical and cultural gulfs to coordinate the myriad of technical, procedural, and tooling tasks involved in this effort. Having spent much time in this type of effort, I can assure you it’s likely the largest challenge.
There are innumerable technical challenges in such a separation, and several won’t be known until more is known about how, and how well, the platform is actually engineered, hosted, and managed.
But what follows is a quick list of among the most obvious elements to be managed, again assuming the system is entirely tainted from day one.
User-facing mobile apps are especially dangerous as they allow all sorts of bad behavior, including direct data pilfering (usage, photos, contacts), direct user surveillance (location), and perhaps most seriously, mobile device attacking via zero-day & other vulnerabilities (especially attractive to nation states).
Obviously the user-facing applications installed on mobile devices (and the web) needs to be completely audited by world-class folks looking for back doors, vulnerabilities, data exfiltration, inappropriate syscalls & race conditions, and especially methods that could allow the apps to be future attack vectors against the devices they are installed on. This is a tall order.
This includes a complete rotation of all related certs, signing keys, etc., to avoid any man-in-the-middle attacks and or future ‘fake’ code releases. And likely a whole new world-class KMS management and signing / cert process, as these literally are the keys to the kingdom.
Third party libraries and services are opportune attack targets, and are often under-managed, audited, and controlled in most applications. Thus, all third-party libraries and SDKs need to be audited and signed very early in the process with strict change controls (the ones developers hate).
Finally, development tooling has to be enhanced to help limit vulnerabilities in the codebase (since bad actors may have that code, a nice advantage) via best practice tools / processes, and protect the integrity of the pipeline from developer through production. Google & others have good practices here.
Backend & Internal Code
All the front-end apps interface to what are likely dozens or hundreds of backend services and systems, some user-facing and some not. As Twitter has recently seen, many of these systems and their controls are easily subverted.
Like the mobile apps, 100% of the backend apps, configurations, tools, libraries, and third party services must be understood, inspected, and put under strict controls for security, development pipelines, etc. This includes all certs and other secret elements that must be considered compromised.
Unlike relatively simple front-end apps, modern back-end systems can be astonishingly complex, especially in their security posture, or lack thereof, especially in fast-moving consumer applications. Microservices, Docker, etc. make it even harder, so real work is needed to understand, map, and secure all these continually-evolving moving parts.
It’s likely the system cannot be completely internally protected from rogue components, so strict access controls and data flow management may well be needed to draw boundaries around various elements to protect them from each other.
Further, QA will be challenging to lift and shift, and deal with all the sudden changes introduced due to audits, process and tool changes, bug fixes, etc. so serious effort will be needed on behalf of end users to minimize impact.
It’s easy to forget the infrastructure on which all this runs, but it’s an important part of the security picture, as we assume all the servers, serverless, and access controls are compromised. They all have to be replaced.
Little is known about the infrastructure, which appears to be AWS-based, though with Akamai and perhaps Alicloud components (any Alicloud elements would have to be removed).
It’s unlikely AWS accounts can be changed, nor is migrating the complete system elsewhere realistic, so it’ll have to be done in place. Thus, as with code, this means a full audit of the accounts, especially IAM elements and other configurations related to access, security, and communications controls.
All existing VMs will have to be replaced from fresh images, and all other code elements such as Lambda audited and put under strict control like the rest of the code. All certs, keys, etc. also have to be rotated or replaced, and their management controls strengthened.
Of course, updating the infrastructure also provides the opportunity to move to full infrastructure-as-code if it’s not there already, as this will help ensure its security and controls as part of the larger code control efforts.
All code, build, test, and deployment systems will also have to be scrubbed and/or migrated to clearly-controlled best practice systems. This may be especially challenging since it’s likely much of the code and systems originates from the mother and peer companies; it’s not clear Tiktok has much of its own dev/build infrastructure, thus this may have to be built from scratch.
AI Algorithms and More
There are many other key parts of the Tiktok system to look at, including and especially the AI models and methods used to recommend content. As has been noted elsewhere, this is one of the keys behind the system’s success, but also a society vulnerability as corrupting this can impact society, elections, and other things in the ‘real-world.’
Thus this has to be migrated to a set of experts who can understand and document how it works, with as much transparency as possible, especially given the challenges other social networks have regarding how information is promoted (or not) on their platforms.
While not strictly a technical issue, the technology, code, and data around these critical systems needs as much or more protection as the core application and infrastructure.
IT Infrastructure & Security
It should go without saying that the entire company needs world-class IT security, which has never been a strong point of Internet companies in general and companies like ByteDance in particular.
None of the above efforts will mean anything if hackers can penetrate the company’s laptops, networks, and systems. And while layers of protections and controls help, there are large motivated actors involved so protections need to be very strong from day one. Start at Google or AWS’s level and move up, not down.
Those are a few quick thoughts on how one might approach technically separating Tiktok from its mother ship, based on very little knowledge about how it’s built and managed.
Some folks seem to think changing owners will solve the current political and trust crisis, but without something resembling the above efforts, it’s hard to see how societies, governments, intelligence communities, etc. will trust this app so desired by the populace.
I’m Steve Mushero, an American technology entrepreneur, split between Silicon Valley & Shanghai the last 15 years, where I started both China’s first cloud computing company & its first Internet managed service provider. More about me on LinkedIn.