2010年Amazon致股東信:在技術前沿探索,使技術為顧客服務

這是 見識之旅 的第 58 篇文章

-1-

這封信是2010年Amazon的第14封致股東信。

Amazon規模擴張壯大之後,很多技術上的問題不再有現成的答案可以學習,因此需要自己遇到問題自己學著解決。

這也就意味著Amazon需要在技術領域的前沿探索,需要投入巨大的時間和精力於技術領域。

這樣的投入帶給Amazon什麼?強大的技術帶來絕佳的客戶體驗。

舉個例子,在商品頁面上提供個人化推薦,背後有200到300個技術服務支持。

若沒有背後這些基礎,就沒有「魔法般」的個人化推薦。

英國科幻大師Clarke曾說過:「在任何一項足夠先進的技術和魔法之間,人們看不出有何區別。」

不只是個人化推薦,資料同步也是像魔法一樣的技術。

作為一名用戶,當你遇到自己的Kindle很長一段時間沒有聯網之後,要怎麼處理資料同步的衝突問題?

答案是你不用做任何處置。這個問題交給Amazon的技術團隊,他們會用先進的技術像變魔術一樣幫你完成資料同步。

技術是一切發明的基礎。

以發明文化自豪的Amazon,在技術前沿努力探索,目的還是實現顧客至上的理念,使技術為顧客服務。

好了,導讀結束,以下致股東信正文開始。

-2-

To our shareowners:

致我們的股東:

Random forests, naïve Bayesian estimators, RESTful services, gossip protocols, eventual consistency, data sharding, anti-entropy, Byzantine quorum, erasure coding, vector clocks … walk into certain Amazon meetings, and you may momentarily think you’ve stumbled into a computer science lecture.

隨機森林(Random Forest)、樸素貝葉斯分類器(Naïve Bayesian Estimator)、表現層狀態轉換(RESTful Service)、Gossip Protocol、最終一致性(Eventual Consistency)、Data Sharding、逆熵(Anti-Entropy)、Byzantine Quorum、抹除碼(Erasure Code)、向量時鐘 (Vector Clock),走進某個Amazon會議中,你可能會以為自己走進某個計算機科學講座。

Look inside a current textbook on software architecture, and you’ll find few patterns that we don’t apply at Amazon. We use high-performance transactions systems, complex rendering and object caching, workflow and queuing systems, business intelligence and data analytics, machine learning and pattern recognition, neural networks and probabilistic decision making, and a wide variety of other techniques. And while many of our systems are based on the latest in computer science research, this often hasn’t been sufficient: our architects and engineers have had to advance research in directions that no academic had yet taken. Many of the problems we face have no textbook solutions, and so we — happily — invent new approaches.

攤開一本最新的軟體架構教科書,你會發現我們用上裡頭不少知識。我們使用了高性能的交易系統、Complex Rendering、Object Caching、工作流和排隊系統、商業智能、數據分析、機器學習、模式識別、神經網絡、概率決策以及其他多種技術。儘管我們很多系統都用上了最新的計算機科學研究,這依然還是不夠。我們的架構師和工程師,不得不朝學術界的未知領域拓展技術邊界。我們面臨的很多問題,教科書上都沒有答案。因此,我們樂於發明新方法。

Our technologies are almost exclusively implemented as services: bits of logic that encapsulate the data they operate on and provide hardened interfaces as the only way to access their functionality. This approach reduces side effects and allows services to evolve at their own pace without impacting the other components of the overall system. Service-oriented architecture — or SOA — is the fundamental building abstraction for Amazon technologies. Thanks to a thoughtful and far-sighted team of engineers and architects, this approach was applied at Amazon long before SOA became a buzzword in the industry. Our e-commerce platform is composed of a federation of hundreds of software services that work in concert to deliver functionality ranging from recommendations to order fulfillment to inventory tracking. For example, to construct a product detail page for a customer visiting Amazon.com, our software calls on between 200 and 300 services to present a highly personalized experience for that customer.

我們的技術幾乎都是用服務的形式呈現:邏輯位元封裝了操作數據,並強化存取功能的接口。這樣的做法降低了副作用,同時讓服務以既有的步調迭代,而不影響系統的其他組件。服務導向架構(Service-Oriented Architecture)是Amazon的技術基石。感謝我們極富遠見的工程師與架構師團隊,我們在服務導向架構一詞尚未成為業界流行語時就開始這麼做了。我們的電子商務平台,由數百個聯合工作的軟體服務組成,以提供客戶從推薦、訂單執行到庫存追蹤的功能。舉個例子,為了在產品詳情頁上提供客戶個人化的推薦,我們的軟體需要調用200到300個服務。

State management is the heart of any system that needs to grow to very large size. Many years ago, Amazon’s requirements reached a point where many of our systems could no longer be served by any commercial solution: our key data services store many petabytes of data and handle millions of requests per second. To meet these demanding and unusual requirements, we’ve developed several alternative, purpose-built persistence solutions, including our own key-value store and single table store. To do so, we’ve leaned heavily on the core principles from the distributed systems and database research communities and invented from there. The storage systems we’ve pioneered demonstrate extreme scalability while maintaining tight control over performance, availability, and cost. To achieve their ultra-scale properties these systems take a novel approach to data update management: by relaxing the synchronization requirements of updates that need to be disseminated to large numbers of replicas, these systems are able to survive under the harshest performance and availability conditions. These implementations are based on the concept of eventual consistency. The advances in data management developed by Amazon engineers have been the starting point for the architectures underneath the cloud storage and data management services offered by Amazon Web Services (AWS). For example, our Simple Storage Service, Elastic Block Store, and SimpleDB all derive their basic architecture from unique Amazon technologies.

任何一個打算增長到極大規模的系統,其系統的核心都是狀態管理。很多年前,Amazon就達到一個很大的規模,當時市面上的所有解決方案都無法滿足我們的需求:我們的服務儲存了好幾PB的數據,每秒處理了上百萬個請求。為了滿足這些非同尋常的需求,我們開發出數個長期解決方案,包括我們自己的鍵-值資料庫(Key-Value Store)和單表資料庫(Single Table Store)。為此,我們以分散式系統及數據庫的核心原理為基礎進行發明創造。我們首創的資料庫系統,展示出極高的可擴展性,同時保持了對性能、可用性和成本的良好平衡。為了實現超大規模的效能,這些系統採用了一種新辦法來管理數據更新:降低同時發送大量副本的更新需求,使系統可以撐過高強度的挑戰。這些做法都是為了同一個最終目標-資料同步。Amazon工程師開發的數據管理服務,已經成為AWS雲計算服務的基礎架構。舉個例子,我們的Simple Storage Service、Elastic Block Store和SimpleDB服務,其基礎架構都來自於Amazon的獨家技術。

Other areas of Amazon’s business face similarly complex data processing and decision problems, such as product data ingestion and categorization, demand forecasting, inventory allocation, and fraud detection. Rulebased systems can be used successfully, but they can be hard to maintain and can become brittle over time. In many cases, advanced machine learning techniques provide more accurate classification and can self-heal to adapt to changing conditions. For example, our search engine employs data mining and machine learning algorithms that run in the background to build topic models, and we apply information extraction algorithms to identify attributes and extract entities from unstructured descriptions, allowing customers to narrow their searches and quickly find the desired product. We consider a large number of factors in search relevance to predict the probability of a customer’s interest and optimize the ranking of results. The diversity of products demands that we employ modern regression techniques like trained random forests of decision trees to flexibly incorporate thousands of product attributes at rank time. The end result of all this behind-the-scenes software? Fast, accurate search results that help you find what you want.

Amazon的各個業務領域也遇到類似的數據處理和決策問題,像是產品數據的採集與分類、需求預測、庫存分配和詐欺預防。起初,建立於規則之上的系統可以很好地運行,但隨著時間的推移,系統會越來越脆弱,越來越難以維持。在很多情況下,先進的機器學習可以提供更精準的分類,而且可以自我修復,以適應複雜多變的情況。舉個例子,我們的搜尋引擎使用數據挖掘和機器學習算法來建構模型,運用資訊提取算法識別屬性,從非結構化的描述中提取資訊。如此一來,消費者可以縮小搜尋範圍,快速找到所需產品。我們在相關性中考慮了很多因素,因此我們的搜尋功能可以預測消費者的興趣,並優化搜尋結果的排行。商品的多樣性,使我們必須採用現代回歸技術,例如隨機森林,讓上千種商品屬性可以靈活地排行。所有這些幕後努力換得什麼?快速且精準的搜尋結果,幫助你找到你要的東西。

All the effort we put into technology might not matter that much if we kept technology off to the side in some sort of R&D department, but we don’t take that approach. Technology infuses all of our teams, all of our processes, our decision-making, and our approach to innovation in each of our businesses. It is deeply integrated into everything we do.

如果我們單純將技術只放在研發部門,那我們迄今的努力可能沒有什麼效益。因此,我們並不是這麼做的。技術為我們的團隊、流程、決策和業務創新注入活力,技術與我們所做的一切緊密結合。

One example is Whispersync, our Kindle service designed to ensure that everywhere you go, no matter what devices you have with you, you can access your reading library and all of your highlights, notes, and bookmarks, all in sync across your Kindle devices and mobile apps. The technical challenge is making this a reality for millions of Kindle owners, with hundreds of millions of books, and hundreds of device types, living in over 100 countries around the world — at 24x7 reliability. At the heart of Whispersync is an eventually consistent replicated data store, with application defined conflict resolution that must and can deal with device isolation lasting weeks or longer. As a Kindle customer, of course, we hide all this technology from you. So when you open your Kindle, it’s in sync and on the right page. To paraphrase Arthur C. Clarke, like any sufficiently advanced technology, it’s indistinguishable from magic.

Kindle的Whispersync技術就是一個很好的例子。無論你在哪裡,手裡拿著什麼裝置,你都可以透過Kindle和移動應用,存取你的閱讀紀錄、重點集錦、筆記和書籤。我們遇到的技術挑戰是,提供一天24小時全年無休的服務,讓遍及全球的用戶可以在上百種裝置中隨時取得數以億計的書籍。Whispersync的核心是數據同步技術,它可以解決裝置多周不聯網後,資料同步時會遇到的衝突問題。當然,作為Kindle的用戶,你不用也不需要知道這些繁複的技術細節。當你打開Kindle之後,它會出現在右側進行同步。用英國科幻作家Clarke的話來說就是,先進的科技和魔術沒什麼區別。

Now, if the eyes of some shareowners dutifully reading this letter are by this point glazing over, I will awaken you by pointing out that, in my opinion, these techniques are not idly pursued — they lead directly to free cash flow.

如果你們之中的有些人,讀到這裡時感到茫然,不知道追求技術的意義何在,那就由我來揭示技術的價值。這些技術不是全然盲目的追求,技術和自由現金流是直接相關的。

We live in an era of extraordinary increases in available bandwidth, disk space, and processing power, all of which continue to get cheap fast. We have on our team some of the most sophisticated technologists in the world — helping to solve challenges that are right on the edge of what’s possible today. As I’ve discussed many times before, we have unshakeable conviction that the long-term interests of shareowners are perfectly aligned with the interests of customers.

我們生活在一個頻寬增加、硬碟空間增加和處理能力增加的美好時代,而且他們會持續越來越快、越來越便宜。我們團隊中有世界上最好的技術人員,幫助我們解決現今遇到的挑戰。如同我之前多次討論過的,我們堅信,股東的長期利益與客戶的利益完全一致。

And we like it that way. Invention is in our DNA and technology is the fundamental tool we wield to evolve and improve every aspect of the experience we provide our customers. We still have a lot to learn, and I expect and hope we’ll continue to have so much fun learning it. I take great pride in being part of this team.

我們喜歡如此。發明存在於我們的DNA,技術是我們的發展和改善客戶體驗的基本工具。我們還有很多東西要學,我希望我們持續享受從中學習的樂趣。對於身為團隊的一份子,我感到很驕傲。

As always, I attach a copy of our original 1997 letter. Our approach remains the same, and it’s still Day 1.

如同往常,我把我們在1997年寫的致股東信附在文末。我們的價值觀依然不變,今天依舊是Day 1。

Jeffrey P. Bezos

Founder and Chief Executive Officer

Amazon.com, Inc.

傑夫·貝佐斯

Amazon創始人暨CEO

--

--