Architecture and Design of a Peer-to-Peer Streaming Service

Garrett

--

Abstract — Online Streaming services have proven to be a disruptive technology within the entertainment industry. The displacement between growth of online streaming services and traditional box office theater sales was exacerbated during the COVID-19 pandemic of 2020. Ryan Faughnder from the Los Angeles Times reports that streaming service subscriptions grew by over 26%, reaching over 1.1 billion concurrent users, while box office sales experienced an astonishing 72% decline. It is evident that entertainment delivered by online streaming will continue to grow. This article will discuss a Peer to Peer, or P2P, content delivery system. The article will endeavor to identify the structure of a network with supporting protocols to administer content to end users. The service will allow users to join the network, stake a certain amount of disk space and network bandwidth to serve the network, in return to stream any content that is hosted by another peer on the network.

1 PROBLEM SPACE

The current business model in online streaming seems to be a singular company attempting to create as much vertical integration as possible. In other words, the company aims to own and manage as many aspects of the service as they can. This includes the rights to the entertainment content; the servers that store the content; the servers that deliver the stream; the applications for end users to interact and request content from; and some companies even develop their own streaming hardware that can be embedded in consumer Televisions.

The first mainstream streaming platform was YouTube. Popularity of the service exploded. Though, some were skeptical if the business model of streaming content over the internet was sustainable. In 2006, the average internet connection speed was only around 3–5 Mbps according to researcher Petroc Taylor from Statista (2023). This made streaming even 360p content bandwidth intensive for the average consumer. Coupled with the fact that YouTubes’ monthly infrastructure costs to deliver the concurrent streams ballooned to over 1 million a month. They had completely blown through their 11 million Venture Capital funding from Sequoia Capital. The company was destined to fail until they were acquired by Google for a staggering 1.9 billion dollars. Google looked past the infrastructure costs and saw YouTube as a revolutionary technology. After the acquisition of YouTube, Google’s stock raised 2 billion dollars, the investment had already paid for itself. YouTube soon became the 2nd largest search engine in the world, only behind Google itself (Lamare, 2018).

Had Google not acquired YouTube in 2007, the streaming platform would not be the household name it is today with over 1 billion hours of streams in 2019 (Mohsin, 2022). The problem that almost bankrupted the company was their insurmountable infrastructure cost. Google, being one of the largest tech conglomerates, was able to absorb the cost as well as use their world renowned engineering to develop more cost effective and scalable solutions. Developing a streaming service that leverages a P2P network would be able to deliver the same amount of content at a significantly lower infrastructure cost. P2P networks benefit from the decentralization, peers act as numerous small and distributed servers on the network. A P2P streaming platform would have to provision, maintain, and scale the network manually. Instead, a P2P network scales as the service becomes more popular. Content inherently becomes more distributed. The infrastructure and network becomes more efficient. The data becomes more disaggregated. All of these facets allow P2P systems to be highly available while scaling with ease.

Napster employed this content delivery model in the late 90’s with music. The company filed bankruptcy since there were legitimate claims of rampant piracy within the network. There was massive amounts of content that Napster wasn’t able to legally distribute. The P2P system discussed here will be developed and maintained by a singular cooperation. Let’s refer to this company as the P2P service provider. The P2P service provider will purchase the rights to distribute media, allowing the peers on the network to consume the content legally. The P2P service provider will maintain and develop a dynamic number of distributed “super nodes” to ensure that the only content distributed on the network is legal. In order to fund the acquisition of new media to be distributed on the network, participants on the network will pay a small monetary fee. Since the network infrastructure cost is so efficient, a large portion of the client fees can be directed towards purchasing media. This may allow the service to compete with larger platforms like Netflix, Hulu, and Prime Video. As the service scales, the infrastructure will become more robust, the revenue the network makes will only increase the amount of media available on the service. This business model can offer more content than a collection of larger streaming platforms at a fraction of the cost.

2 ARCHITECTURE

Engineering a P2P service to deliver online entertainment is an unorthodox approach. If executed well, it may prove to be a disruptive business model to existing online streaming services. In order for a P2P streaming service to be practical, it should leverage centralized coordination, this will allow the P2P service provider maintaining the service to have some level of autonomy over the network. This will be important to optimize the network to provide high availability, high quality of service, and a scalable infrastructure. As well as mitigate bad actors by fostering strong security protocols.

2.1 Optimizing the Network

Streaming on demand content is computationally intensive and requires a well architected and robust infrastructure. There are multiple design decisions the P2P streaming service can implement to ensure a high quality of service as well as an optimized network.

2.2.1 Super Node Design

The super nodes will be able to cache content for distribution, coordinate available resources on the network, maintain and store routing tables, as well as manage joining and departing nodes. Researchers Guo Jun and Chen Chen discuss a hybrid P2P architecture where the network consists of super nodes and ordinary nodes. The whole network is divided into n number of smaller regions called clusters. Each cluster contains a super node which is used for “general network topology and information delivery services to ordinary nodes in the cluster” (Jun, Chen, 2012). Since the super nodes inherently have increased processing power, they are leveraged to maintenance operations on the network, which increase the load on the system. This is a risk in the architecture and certain super nodes may be susceptible to an overload which can lead to disruption of service within that cluster and or neighboring clusters.

Figure 1 — P2P network model of Double supernode design

Jun and Chen’s research argues that a Double Supernode approach will be an important countermeasure to ensuring a high quality of service and availability for each cluster. “Two super nodes in each cluster being set, respectively, are a management supernode and of a query super node, those who can achieve separation of query operation and management operation” (Jun, Chen, 2012). Now that the clusters have two distinct super nodes to delegate individualized tasks, the systems can be designed and optimized for more specific use cases. Perhaps the query super node requires additional memory so that it can maintain a larger data structure in memory for searching purposes, whereas the management node might have higher CPU cores to account for the computationally intensive task of communicating with ordinary nodes in the cluster.

2.2.2 Centralized Coordination

The P2P network will utilize central coordination with multiple super nodes managed and maintained by the service provider. A centralized coordination design is what allows the P2P service provider to have some level of autonomy over the network.

Figure 2 — The relationship between PPG and Peers

As shown in Figure 1, a PPG, or Peer Private Group, is outlined by Researchers Xiao-xu Ouyang, et.al. It argues that a completely centralized directory that indexes location of data on the network is susceptible to Paralysis (A self DOS, or overload of system due to too many requests). The research states, “Each Peer has a PPG which obtains information of other peers through a centralized directory server. [This is a component of the super nodes of the network] The other Peers are added to PPG as a PPG member for local Peer retrieval. It effectively reduces the load of the directory server and prevents the paralysis of the directory server which leads to the collapse of the entire network.” The centralized coordination model also offers efficient content discovery and search optimization. Researchers Ouyang, et. al state, “directory retrieval of central server, management services and the standard point to point communications. Therefore, it has the features of efficient retrieval” (Ouyang, et.al., 2011). The PPG model maintains distributed centralized indexes which eliminate the need for complex and less efficient algorithms for finding data on the network. This leads to overall less latency in search queries.

2.2.3 File Storage

High resolution video content requires a substantial amount of storage capacity. Distributing the capacity among peers within the network is an integral decision to combat this problem. The academic article, Optimizing File Availability in Peer-to-Peer Content Distribution, outlines approaches to optimize data storage on the network. Their work demonstrates an, “adaptive algorithm, called Top-K Most Frequently Requested algorithm for optimizing file availability in a P2P community,” (Kangasharju, et. al. 2007), which considers a Distributed Hash Table based content distribution community. They claim that any particular DHT structure can apply.

2.2 Strong Security Protocols

Bad actors are individuals with malicious intent. P2P networks are inherently susceptible to such individuals. In a decentralized network, each participating node can act as a server for another neighbor which means that a network connection between the two must occur. Without proper protocols, a bad actor can DOX individuals or expose personal identifiable information of other peers on the network, like an IP address. Hackers may also be able to spread malware on the network that other users may download since there is an enhanced level of trust between nodes on the network. There are many attack vectors within a P2P network which is why it is important to design the service with strong security. Strong security employs principles like defense in depth and cryptographic algorithms.

2.1.1 Defense in Depth

Defense in depth is not a specific cyber security technology but rather an approach to security. It argues that good security is accomplished with multiple layers of security, each provides a formidable obstacle for a threat actor. Network segmentation is very common. This tactic involves splitting the network into small isolated pieces. Even if one component of the network is compromised, the attacker is limited to that segment, forcing them to perform other exploits, giving the network a higher probability of identifying the attacker with an IDS. An IDS (Intrusion Detection System) is another layer of protection. SNORT is a popular IDS, it logs packets and analyzes requests on the network. SNORT uses static analysis by comparing checksums of packets with a database of known malicious profiles. If a request matches this can raise an alert to network administrators and even add the IP address to a block list, disabling their access.

2.1.2 Cryptographic Algorithms

Encryption is arguably the most important component of creating online applications. Encryption not only prevents denial of service attacks and helps keep servers running, it also ensures the content clients are receiving is legitimate. In order to ensure that the content being delivered to each client on the network is legitimate the P2P service will need to leverage TLS encryption. “The Transport Layer Protocol (TLS) is among the most-used secure channel protocols. [The most recent TLS standard uses] …elliptic curve Diffie–Hellman key exchange to establish an ephemeral shared secret with forward secrecy.” (Schwabe, et.al., 2021). Without TLS encryption it becomes trivially simple for attackers to exploit the network. A common exploit on an unsecure network would be a “Man in the Middle” attack. This attack occurs when an attacker intercepts communication between a server and client. The attacker relays the information between the two, spoofing their identity, impersonating the server when interfacing with the client and vice versa. This allows the attacker to view each packet, stealing any data, or even, injecting malicious data like malware.

2.1.3 Data Validation

As discussed before, the only content distributed to peers on the network will be legal. This will be accomplished by storing a distributed database of digital signatures on the management super nodes. When new content is uploaded or replicated to another node, it first must be verified to already exist within the digital signature database. A signature verification is required, which is a cryptographic algorithm to ensure the legitimacy of the data. When the P2P service provider acquires novel content to add to the network. The service provider must upload the initial copy of the data, as well as generate a digital signature for consumers to check data integrity. Forging digital signatures is nearly impossible since the signatures can leverage a variety of algorithms, like SHA512, which is currently computationally impossible to brute force (Microsoft, 2022).

Figure 3 — Illustrates PPG and Double super node architecture

3 DESIGN REFINEMENTS AND ACKNOWLEDGEMENTS

The Double super node model has evident drawbacks: increased infrastructure cost, as well as increased maintenance. The more nodes that are maintained by the P2P service provider the higher the costs of the network will be. Additionally, the more complex the systems on the network are, the more expensive it will be to hire engineers who can adequately support the network and maintain a high quality of service and availability to each cluster on the network.

The research on Top-K Most Frequently Requested algorithm suggests a possible bottleneck in the design. A small amount of files known as i may be requested so frequently that it will create overload. The research suggests storing copy(s) of each popular item that has a risk of overloading the system, thus distributing and replicating the data even more. This is a form of load balancing which is executed via a “Fragmentation Approach.” In the fragmentation approach, “we break each file into several fragments or chunks which are given unique names and stored individually in the DHT… [The file fragmentation popular files are] stored on different peers and the fragments are much smaller than the original file, the file transfer load on the nodes becomes more balanced” (Kangasharju et. al., 2007). This approach is represented in Figure 4.

Network segmentation and Intrusion detection systems alone cannot create a formidable barrier against threat actors. Security is an endless cat and mouse game between defenders and attackers. It is important to implement strong security while also ensuring it is practical and doesn’t impede performance of the service. It would be more secure to make sure every packet on the network is verified by another peer on the network, however this would cause exponential load as well as drastically increase network latency. In the model discussed here, it will be paramount for packets to use TLS encryption as well as hide the identities of each peer on the network from other ordinary peers. Super nodes may know personal information about ordinary peers in order to effectively increase performance and optimize the network. There are other security components that can be included on the managed super nodes. Logging all packets as well as ssh sessions to the servers. Ensuring unneeded ports are closed on the super nodes. Removing public network connection from the super nodes.

A potential issue for the data validation model discussed above, is that the digital signature database becomes an attack vector for attackers. If an attacker can recover the private keys of the digital signature, they will be able to easily forge any data, like malware, to emulate the legitimate content. Without additional protocols in the network, like performing a checksum to double check the content from another node, the network will be infected.

An inherent drawback to the architecture is the lack of anonymity for the peers on the network. Although personal identifiable information will not be shared among ordinary nodes, the P2P service provider will have a lot of information about the participants. Financial records, IP addresses, geographic locations, and much more are logged to ensure there is a high quality of service on the network. The cost of this service is anonymity. It is not feasible to replicate the anonymity of a service like Tor while also ensuring high availability and high quality of service.

def replicate_popular_files(popular_files, num_replicas, chunk_size):
# for file in popular_files: call fragment_file, and use
# fragments to replicate
# for each replica, store the fragment
def fragment_file(file, chunk_size):
# break the file into chunk_size bytes
# return the list of fragments
def select_replicas(num_replicas):
# Select a subset of nodes as replicas
# Implement your logic to choose the replicas based on
# network conditions, load balancing, etc.
# Return the list of selected replicas
def store_fragments(replica, chunks):
# Store the file fragments on the specified replica node
# Implement the logic to store the fragments in the
# Distributed Hash Table (DHT)

Figure 4 — Describes as pseudocode the fragmentation and replication of popular files on the network

4 ADDITIONAL DESIGN CONSIDERATIONS

In order to establish a service that not only has the potential to scale, but also be able to execute exponential growth, the onboarding and usability of the service must be optimal. Although this service will target the technically advanced individuals who are able to install software on a performance laptop or desktop to join the network. The more difficult the service is to set up, the fewer participants the network will have and the overall quality of service and availability will diminish. The performance indicator that the P2P service is measured by is both quality of service and availability. Nobody wants to futz around with a service that has poor quality, regardless of how impressive the engineering of the system is. The P2P service providers will also need to execute comprehensive and in-depth tutorials that demonstrate the onboarding process of a new node. Additionally, the more control the user has over their node, the better. If they wish to manage a node with high computational abilities, it will only benefit the network more. Likewise, someone with small needs for the service might be simply portforwarding a laptop on the weekends to stream some content. This functionality must also be supported since the network is only as strong as the participants.

4 REFERENCES

Doxing. (n.d.). Wikipedia. Retrieved June 12, 2023, from https://en.wikipedia.org/wiki/Doxing

Faughnder, R. (2021, March 18). Streaming Milestong: Global Subscriptions passed 1 Billion Last Year. Los Angeles Times. https://www.latimes.com/entertainment-arts/business/story/2021-03-18/streaming-milestone-global-subscriptions-passed-1-billion-last-year-mpa-theme-report

He, S., Tang, Q., Wang, G., Wu, C.Q. (2021). Fair Peer-to-Peer Content Delivery via Blockchain. Springer Nature Switzerland. 5(17), 348–369. https://doi.org/10.1007/978-3-030-88418-5_17

IEvangelist, et. al. (August 10, 2022). Cryptographic Signatures. Microsoft. https://learn.microsoft.com/en-us/dotnet/standard/security/cryptographic-signatures

Jun, G., Chen, C. (2012). A Hybrid P2P Model of a Super-Node Functional Division. Advances in Intelligent and Soft Computing, 1(14), 675–682. https://doi.org/10.1007/978-3-642-03718-4_83

Kangasharju, J., Ross, K.W., Turner, D.A. (2007) Optimizing File Availability in Peer-to-Peer Content Distribution. IEEE Communications Society, 26(1), 1973–1981. https://doi.org/10.1109/INFCOM.2007.229

Kuipers, D., Fabro, M. (2006) Control Systems Cyber Security: Defense in Depth Strategies. Idaho National Lab. (INL), 3(15) 1–30. https://doi.org/10.2172/911553

Larmare, A. (2018, July 18). How Streaming Started: YouTube, netflix, and Hulu’s Quick Ascent. B2 The Business of Business. https://www.businessofbusiness.com/articles/a-brief-history-of-video-streaming-by-the-numbers/

Mohsin, M. (2022, May 17). 10 YouTube Stats Every Marketer Should Know in 2022. Oberlo. https://www.oberlo.com/blog/youtube-statistics

Ouyang, X., Ye, J., Wang, C., Deng, F. (2011). An improved centralized directory-based P2P network model, Chinese Control and Decision Conference (CCDC), 11(9), 3522–3527. https://doi.org/10.1109/CCDC.2011.5968728

Taylor, P. (2023, January 18). Average Internet Connection Speed in the United States from 2007 to 2017 (in Mbps), by Quarter. Statista. https://www.statista.com/statistics/616210/average-internet-connection-speed-in-the-us/

Schwabe, P., Stebila, D., Wiggers, T. (2021). More Efficient Post-quantum KEMTLS with Pre-distributed Public Keys. Springer, Cham, 1(1), 31–51. https://doi.org/10.1007/978-3-030-88418-5_1

APPENDIX A

Figure 5 — Depicts the TLS protocol. The handshake between Client and Server

TLS Encryption: Server (and optionally client) authentication is provided by digital signatures. Long-term signature public keys are exchanged in certificates during the handshake. The most commonly used signature algorithm is RSA, although elliptic curve signatures are also supported (Schwabe, et.al., 2021).

APPENDIX B

Figure 6 — Demonstrates the Signing and Verification protocols of Digital Signatures

--

--

No responses yet