Demystifying the Great Firewall of China.

Daniel Chepenko
Jul 21, 2020 · 24 min read

Two years ago I came to study in Hong Kong and from that, I started traveling to China from time to time. Later I decided to stay in Shanghai for a couple of months to study Chinese and learn more about Chinese tech. I was fascinated by the rapid technological advances that play a leading role in economic growth, however, I wanted to eliminate the information asymmetry and find an unbiased view on some problems. The first thing of my research was the mysterious and fully opaque Great Firewall of China (防火长城).
Rather than simply purchasing a commercial VPN service, I decided to figure out how the most sophisticated content filtering system works under the hood. This post is an overview of the technicalities of the censorship machine. You may find some content overlapping within other sources, which I used while working on this post:

Disclaimer # 1: I’m not a native English speaker, so if you find any typos or even grammatical mistakes, I appreciate if you correct me.
Disclaimer #2: I don’t consider myself as a system administrator or a network engineer, although I do have some knowledge on these topics. This post was made out of pure curiosity and I have to learn some things from scratch. Again, If you find any technical inaccuracies, I appreciate your comments.

Historical overview

In 1998 the Chinese Ministry of Public Security (MPS) started working on the project called Golden Shield Project (GSP). The first part of the project lasted eight years and was completed in 2006. The second part began in 2006 and ended in 2008. Based on the phase I project, the phase II project expanded the information application types of the public security business and further public security information. The key points of this project included application system construction, system integration, the expansion of information centers, and information construction in central and western provinces. The project made its first public appearance only in 2000, during a trade show held in Beijing.

Please not to be confused with the Great Firewall of China (GFW). The Golden Shield Project originally aimed to build an intranet for police and had nothing to do with the content censorship. Wikipedia claims that GFW is considered a part of the Golden Shield Project, which is not accurate.

The Great Firewall — surveillance and censorship project that filters the incoming data from abroad restricted by the Ministry of Public Security. The GSP includes a security management information system, a criminal information system, an exit and entry administration information system, a supervisor information system, a traffic management information system, among others.

The Internet in China arrived in 1994. By 1995 the gradual rise of Internet availability began with the first internet-cafes. Soon after the Internet became a common communication platform.

1998, US President Bill Clinton visits an Internet bar in Shanghai on Shansilu Street. Photo: news.163.com

By 2009 the internet penetration rate soared up to 28.8%. Such an unexpected pace necessitated various adjustments to the initial vision of the Golden Shield Project.

The Chinese government has described censorship as the method to prevent and eliminate “risks in the ideological field from the Internet”.

Selected restricted mobile apps. Sorry for making these screenshots in Russian.

Over the last decade, the percentage of Chinese using the internet has more than doubled, rising from 22.6 percent in 2008 to 59.6 percent in 2018.

What makes the Great Firewall of China so effective and controversial is not only its complex technology but also the culture that the system engenders — a culture of self-censorship. Moreover, everyone self-censors in order not to get censored by moderators (e.g. tech companies). The Chinese government mandates that companies be responsible for their public content. In other words, it is the job of these companies to make sure that their online portals do not contain any prohibited topics or obscenities.

Technical implementation

The Internet backbone consists of many networks owned by numerous companies. The backbone providers sell their services to Internet Service Providers (ISP) that enables worldwide connectivity across different levels of scope. The Tier-1 providers exchange traffic directly with each other via very high-speed fiber optic cables and governed by peering agreements. The Tier-2 providers buy Internet transit from Tier-1 providers and enable access to at least some parties of the global Internet.

Fiber optic channel between America and Europe

An important characteristic of the Chinese internet is that online access routes are owned by the government, and private enterprises and individuals can only rent bandwidth from the state. The first four major national networks, namely CSTNET, ChinaNet, CERNET, and CHINAGBN, are the “backbone” of the mainland Chinese internet. Later the 6 major providers in China emerged:

  1. China Academy of Information and Communications Technology;
  2. China Telecommunications Corporation (aka China Telecom);
  3. China Mobile Communications Corporation (aka China Mobile);
  4. China United Network Communications Group Co., Ltd. (aka China Unicom);
  5. China Radio and Television Network Co., Ltd .;
  6. CITIC Networks Co., Ltd

Before the rise of the mobile internet, the largest two are China Telecom and China Unicom. Since China Telecom and China Unicom have acted as the sole Internet service providers in China for some time, smaller companies cannot compete with them in negotiating the interconnection settlement prices that keep the Internet market profitable in China. The conditions that all the backbone networks implement the bill and keep peering are not mature

The existing interconnection settlement method raised a big dispute in China. The access system and the interconnection architecture have led the dominant operators (China Telecom and China Unicom) to have greater market power. Internet service providers without a nationwide network could not compete with their bandwidth provider, the telecom companies, and often run out of business. The dominant operators can set their own prices below or equal to this standard, which provides the conditions for implementing discriminatory pricing in the interconnection. However, from July 1, 2020, the regulator, Ministry of Industry and Information Technology (MIIT), will scrap the longstanding fee-charging model for traffic between the big networks. This legacy practice has favored China Unicom and China Telecom, which are said to impose hefty fees on China Mobile for delivering traffic.

China Mobile is now the biggest one with over 175 million users. They have managed to acquire tens of millions of new users in the past 1–2 years.

China and the rest of the world communicate via a small number of fiber-optic cables that connect the country at one of ten different backbone access points. Before 2015 the first level of exchange points is the “National Level” the only points that are connected with the global Internet:

  1. Beijing,
  2. Shanghai,
  3. Guangzhou

The second “Core Level” of exchange points included five additional nodes,

  1. Shenyang,
  2. Xi’an,
  3. Chengdu,
  4. Wuhan,
  5. Nanjing.

These core nodes bridged the Internet connection between the three main hub cities (Beijing, Shanghai, and Guangzhou) and the third level of “Metropolitan Area Network”.

In 2015 China has added seven new hubs, in the cities of Chengdu, Wuhan, Xi’an, Shenyang, Nanjing, Chongqing, and Zhenzhou. These formerly “Core Level” points have been promoted to “National Level,” easing the burden on the original three and giving a tremendous boost to the country’s Internet capabilities.

https://www.iozoom.com/client/announcements/18/China-Telecom-CN2-Added-to-LA-Network.html

Internet filtering takes place at two levels, firstly, the national ISP, and secondly local internet providers. Local ISP censor content under the supervision of the local telecom authorities. Since Local ISP doesn’t have access to the global network they don’t filter the traffic but transmit it via “National Level” routes.

In the next section, I’ll dive into the particular methods behind the major technical elements that comprise the GFW

Packet dropping scheme

IP blocking is a particularly lightweight method. The Chinese authorities maintain the “blacklist” of IP-addresses of restricted websites foreign DNS servers. However some IPv6 public DNS has not been banned yet, and accessible from IPv6 network. GFWlist is a list of domain names that have been blocked by the GFW. GFWlist is for users who use rule-based censorship circumvention tools, such as SwitchOmega Proxy (not a circumvention tool by definition, but that’s why/how people in China use this and similar extensions). There was a list of IPs. It’s called the GFW hosts file. But no one uses it anymore today.

The IP blocking scheme is quite standard and relies on null routing. All packets that are sent to IP addresses from the “blacklist” are simply dropped off. GFW injects routing information into BGP (Border Gateway Protocol) and hijacks all traffic to blocked websites.

https://queue.acm.org/detail.cfm?id=2405036

Null routing adds only a tiny load to the gateway router of ISPs. However, this type of blocking can be simply bypassed by setting a proxy outside of China or moving the Web site to another IP address. China runs a risk of accidentally learning theses null routes to neighbouring ISPs outside the country and suffered from overclocking since many websites can share the same IP address. GFW hijacks all traffic to blocked Web sites by announcing the routing prefixes (networks) to Chinese ISPs via BGP. If a Chinese ISP — for example, China Telecom — re-announces these prefixes to its neighbour ISPs outside China, and its neighbours accept these prefixes, then the neighbour ISPs could redirect the traffic meant for these blacklisted Web sites to GFW

DNS Injection

GFW can monitor each DNS query from DNS resolvers around the globe and in case of a sensitive query inject and faked DNS reply with an invalid IP address. The server can spoof the associated IP address, any CNAMEs related to the domain, and the existence of the domain itself. Some researchers found that the DNS injection deployment occurs only at the edge, but others indicated that it also occurs within the domestic network.

https://blog.thousandeyes.com/deconstructing-great-firewall-china/

As you know DNS primarily uses the User Datagram Protocol (UDP) on port number 53 to serve requests, so there is no way to make sure the packet successfully reached its destination. DNS was designed in the 1980s when the Internet was much smaller, and security was not a primary consideration in its design. As a result, when a recursive resolver sends a query to an authoritative name server, the resolver has no way to verify the authenticity of the response. The resolver can only check that response appears to come from the same IP address where the resolver sent the original query.

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6814824&tag=1

Apart from DNS spoofing, GFW routers are able to block communication by hijacking DNS queries and injected forged DNS replies. Injections of fake DNS A record responses will successfully block sites even when users use third-party DNS resolvers outside the country since the Great Firewall will still answer queries sent to those resolvers.

Although such techniques can be powerful, the unintended consequences may occur. As GFW doesn’t distinguish incoming and outgoing traffic that may cause the collateral damage, affecting communication beyond the censored networks when outside DNS traffic traverses censored link

  • Some Chinese ISPs are transit autonomous systems that provide connections for other ISPs and relay traffic among them, especially those in East Asia and Europe.
  • Several root servers (F, I, J) are hosted in China. (The list of current DNS root servers is available at http://www.root-servers.org.) The ISPs hosting the mirrors of root servers announce their prefixes to neighbor ISPs, such as those in KR (South Korea) or DE (Germany), so the DNS resolvers in these ISPs will direct their DNS queries to the root servers in China.

For example, if a user in South Korea (KR) wants to access the Web site www.sensitive.de, where sensitive is a blocked domain name by GFW, then the user’s DNS server (recursive resolver) will send out a series of queries to the root server (“.”), TLD server (“.de”), and authoritative name server (“sensitive.de”), with a full domain name (“www.sensitive.de"). If the user’s ISP selects one of the root servers in China or routes the query to a TLD (top-level domain) server or to an authoritative name server through China, then GFW will censor this access.

https://queue.acm.org/detail.cfm?id=2405036

One of the ways to cope with Chinese polluted resolvers is to use DNSSEC validation. Today, almost all major domain name registrars and NS servers support DNSSEC. DNSSEC adds two important features to the DNS protocol:

  • Data origin authentication allows a resolver to cryptographically verify that the data it received actually came from the zone where it believes the data originated.
  • Data integrity protection allows the resolver to know that the data hasn’t been modified in transit since it was originally signed by the zone owner with the zone’s private key.

Content Inspection Schemes and TCP Reset

Compared to UDP, TCP is a connection-oriented protocol which means that the communicated devices should establish a connection before transmitting the data and close the connection after transmitting the data. TCP guarantees the delivery of data to the destination router and provide extensive error checking mechanism. To indicate the packet transfer TCP connection relies on TCP flags. The reset flag gets sent from the receiver to the sender when a packet is sent to a particular host that was not expecting it. I will use some of them in my further narrative, so probably now it’s a good time to revise:

SYN - The synchronization flag is used as a first step in establishing a three-way handshake between two hosts. Only the first packet from both the sender and receiver should have this flag set. The following diagram illustrates a three-way handshake process.

https://www.keycdn.com/support/tcp-flags

ACK - The acknowledgment flag is used to acknowledge the successful receipt of a packet. As we can see from the diagram above, the receiver sends an ACK as well as a SYN in the second step of the three-way handshake process to tell the sender that it received its initial packet.

RST - The reset flag gets sent from the receiver to the sender when a packet is sent to a particular host that was not expecting it. In most packets, this bit is set to 0 and has no effect; however, if this bit is set to 1, it indicates to the receiving computer that the computer should immediately stop using the TCP connection; it should not send any more packets using the connection’s identifying numbers, called ports, and discard any further packets it receives with headers indicating they belong to that connection. A TCP reset basically kills a TCP connection instantly.

Majority content inspection schemes arrange all traffic through a set of proxy services that filter the restricted material. However, deploying such a system on a country-level scale would be extremely expensive. However such scenes previously have been employed in Saudi Arabia, Burma, and specific network providers as Telenor in Norway.

An alternative way relies on the IDS (institution detection system) that inspects all traffic for the sake of restricted content. The IDS equipment inspects suspicious traffic. The most well-known variants are signature-based detection (recognizing bad patterns, such as malware) and anomaly-based detection (detecting deviations from a model of “good” traffic, which often relies on machine learning). When the packet is to be lost it will arrange the TCP reset so as the offending connection to be closed. If IDS detect suspicious content the router injects the forged TCP reset.

If the IDS technology detects undesirable content and determines that a connection from a client to a web server is to be blocked, the router injects forged TCP resets (with the RST flag bit set) into the data streams so that the endpoints abandon the connection.

The researchers from Cambridge set an experiment to access the restricted in China website. The GCF is known to work symmetrically, detecting content to be filtered if it passes both directions. Here thee traceroute made by researchers in Cambridge.

cam(53382) -> china(HTTP) [SYN]china(HTTP) -> cam(53382) [SYN, ACK]cam(53382) -> china(HTTP) [ACK]cam(53382) -> china(HTTP) GET / HTTP/1.0<cr><lf><cr><lf>china(HTTP) -> cam(53382) HTTP/1.1 200 OK (text/html)<cr><of> etc...china(HTTP) -> cam(53382) ... more of the web pagecam(53382) -> china(HTTP) [ACK]... and so on until the page was complete// We then issued a request which included a small fragment of text that we expected to cause the connection to be blocked, and this promptly occurred:cam(54190) -> china(HTTP) [SYN]china(HTTP) -> cam(54190) [SYN, ACK] TTL=39cam(54190) -> china(HTTP) [ACK]cam(54190) -> china(HTTP) GET /?falun HTTP/1.0<cr><lf><cr><lf>china(HTTP) -> cam(54190) [RST] TTL=47, seq=1, ack=1china(HTTP) -> cam(54190) [RST] TTL=47, seq=1461, ack=1china(HTTP) -> cam(54190) [RST] TTL=47, seq=4381, ack=1china(HTTP) -> cam(54190) HTTP/1.1 200 OK (text/html)<cr><of> etc...cam(54190) -> china(HTTP) [RST] TTL=64, seq=25, ack zeroedchina(HTTP) -> cam(54190) ... more of the web pagecam(54190) -> china(HTTP) [RST] TTL=64, seq=25, ack zeroedchina(HTTP) -> cam(54190) [RST] TTL=47, seq=2921, ack=25

As you can see the first reset packets that correspond to the sequence number at the start of GET package have value plus 1461. The fourth spoofed reset packet arrives without a corresponding ACK number, which would suppress the connection in cases where non-standard packet lengths are received on systems that will accept a reset without an ACK number. Seems that the firewall sends three requests to ensure that the reset is accepted by the server even if the sender has already accepted the ACK with the full size.

The experiment was also conducted from the Chinese webserver

cam(54190) -> china(HTTP)[SYN] TTL=42china(HTTP) -> cam(54190) [SYN, ACK]cam(54190) -> china(HTTP) [ACK] TTL=42cam(54190) -> china(HTTP) GET /?falun HTTP/1.0<cr><lf><cr><lf>china(HTTP) -> cam(54190) HTTP/1.1 200 OK (text/html)<cr><of> etc...china(HTTP) -> cam(54190) ... more of the web pagecam(54190) -> china(HTTP) [RST], TTL=61, seq=25, ack=1cam(54190) -> china(HTTP) [RST] TTL=61, seq=1485, ack=1cam(54190) -> china(HTTP) [RST] TTL=61, seq=4405, ack=1cam(54190) -> china(HTTP) [RST] TTL=61, seq=25, ack=1cam(54190) -> china(HTTP) [RST] TTL=61, seq=25, ack=2921cam(54190) -> china(HTTP) [RST] TTL=42, seq=25, ack zeroedcam(54190) -> china(HTTP) [RST] TTL=42, seq=25, ack zeroed

As the “bad” context is detected the firewall also send resets to Chinese machine but resets arrives after GET request.

https://blog.thousandeyes.com/deconstructing-great-firewall-china/

As I’ve shown above China filters only the first HTTP GET request in a TCP stream, likely for the sake of efficiency. It does so by maintaining state. In addition, after filtering an HTTP request, it maintains flow state about the source and destination IP addresses, port number, and protocol of the denied request to deny further communication between the same pair of machines even when such communication would not previously have been blocked. Because the Great Firewall doesn’t stop packets from traveling to their destinations, it’s very possible that one or multiple legitimate responses from the destination web server make their way back to the client before the TCP reset arrives. As a result, blocking takes the form of multiple spoofed TCP reset packets, each slightly different in an attempt to ensure that the client terminates the TCP connection in all possible cases. In the majority of connections, four spoofed packets are returned, each with a different sequence and acknowledgment number.

https://blog.thousandeyes.com/deconstructing-great-firewall-china/

The Tor network

Before going into details about TOR project blocking, I’d like to give a short note about active probing.

The probing works by passively monitoring the network for suspicious traffic, then actively probing the corresponding servers, and blocking any that are determined to run circumvention servers such as Tor

https://ensa.fi/active-probing/imc2015.pdf

A large number of so-called entry guards and bridge relays serve as the entry points to the network. If these entry points are not reachable, a user finds herself unable to connect to the Tor network. While the relays anonymize the network traffic of Tor clients, the authorities’ task is to keep track of all relays and to vote on and publish the network consensus which Tor clients need in order to bootstrap. It is trivial for censors to download the hourly published network consensus and block all IP address/TCP port pairs found in it.

GFW has a long and tough relationship with the TOR project. For example, using DNS hijacking, all traffic to the Tor Project website, was redirected to the pet grooming website from Florida.

GFW uses IP spoofing to scan Tor bridges. The basic functionality of the Chinese Tor blocking infrastructure is the following. In response to the blocking of its relays, the operators of the Tor network began to reserve a portion of new relays as secret, non-public “bridges.” Unlike ordinary relays, bridges are not easily enumerable

https://arxiv.org/pdf/1204.0447v1.pdf

When a Tor user in China establishes a connection to a bridge or relay, IDS recognizes the Tor TLS handshake.

Fingerprinting the Tor TLS Handshake:

● TLS handshake is unencrypted and leaks information

● Tor’s use of TLS has some peculiarities:

● GFW looks (at least) for cipher suites in the TLS client hello

GFW is probing v2/v3 bridges based on the Tor cipher list. Tor is using 15 static ciphers (src/common/ciphers.inc) for the SSL ClientHello of the v2/v3 link handshakes and GFW seems to get agitated by them.

Shortly after a Tor connection is detected, active scanning is initiated. The scanning is done by seemingly random Chinese IP addresses. The scanners connect to the respective bridge and try to establish a Tor connection. If it succeeds, the bridge is blocked. Researches noticed that active scanning is done at multiples of 15 minutes. Tor filtering is probably only done at Chinese border ASes and only with traffic going from inside China to the outside world. Interestingly, although once the bridge is detected and blocked by GFW it remains blocked but not unreachable. The researchers found that every 25 hours, for a short period of time, our Tor clients in China were able to connect to our bridges.

After early efforts to make their use of TLS less conspicuous the developers of Tor settled on a more sustainable strategy: wrapping the entire Tor TLS stream in another layer — a “pluggable transport”— that assumes responsibility for protocol-level obfuscation. The Tor Project is developing a tool called obfsproxy. The tool runs independently of Tor and is obfuscating the network traffic it receives from the Tor process. Due to the obfuscation GFW are not able to identify the TLS chipper. Obfsproxy implements so-called pluggable transport meaning that the precise way of obfuscation is determined by pluggable transport modules. As a result, one could implement a transport module for HTTP whose purpose is to make TOR traffic looks similar to the HTTP.

http://netseminar.stanford.edu/seminars/04_28_16.pdf

Another way - to use TOR with meek. Meek uses a technique called “domain fronting” to send a message to a Tor relay in a way that is hard to block. Domain fronting is the use of different domain names at different communication layers. The meek-client program builds a special HTTPS request and sends it to an intermediate web service with many domains behind it, such as a CDN. Meek uses domain fronting to evade scrutiny by the GFW.

VPN

Although the GFW has no way of interpreting encrypted content between the user and the VPN server, the GFW has enough understanding of popular VPN protocols. OpenVPN won’t even connect because GFW can detect it’s an OpenVPN TLS handshake and blocks it. OpenVPN SSL/UDP doesn’t work as well. The GFW operates heuristic to detect TCP/UDP connections used for VPN add simply drop. The simplest heuristic: 99% of the traffic routes to the single address, it is encrypted and contains the same headers. You can even apply statistical learning techniques to detect such traffic. The difficult part is to label the VPN connection. But I believe that it’s not a huge problem Internet censorship in China doesn’t rely on a GFW but on a huge army of content filtering moderators.

Moreover, Apple has removed VPN clients from its app store in China; affected users must follow a convoluted process to access the U.S. app store.

Shadowsocks

Shadowsocks is one of the most popular circumvention tools in China. Typically, the client software will open a socks5 proxy on the machine it is run, which internet traffic can then be directed towards, similarly to an SSH tunnel. Unlike an SSH tunnel, shadowsocks can also proxy UDP traffic. GFW passively monitors the network for suspicious connections that may be Shadowsocks, then actively probes the corresponding servers to test whether its guess is correct. The active probing system sends a variety of probe types. Some are based on replay of previously recorded, genuine Shadowsocks connections, while others bear no apparent relation to previous connections. A detailed report can be found here.

Anyone familiar with the history of Shadowsocks should know that it is a self-using software developed by clowwindy. In 2015 CCP forced him to remove Shadowsocks code from Github.

V2 ray

Project V is a set of tools to help you build your own private network over the internet. The core of Project V, named V2Ray, is responsible for network protocols and communications. It can work alone, as well as combine with other tools.

V2Ray supports multiple protocols, including Socks, HTTP, Shadowsocks, VMess, etc. Each protocol may have its own transport, such as TCP, mKCP, WebSocket, etc. Also, V2Ray has built-in obfuscation to hide traffic in TLS and can run in parallel with web servers.

The difference is still that Shadowsocks is just a simple proxy tool; it is a protocol of encryption. However, V2Ray is designed as a platform, and any developer can use the modules provided by V2Ray to develop new proxy software. Merely speaking, Shadowsocks is a single proxy protocol, and V2Ray is more complicated than a single protocol proxy

HTTPS and TLS

When you connect to an HTTPS website, the hostname of the website you are connecting to is transmitted over the network in cleartext as part of the TLS handshake. The server’s certificate always contains the hostname, because that’s how the server authenticates itself to the client

China has a fairly odd relationship with SSL/TLS. Many of the websites there do have SSL certificates installed, but the browsers in China don’t require users to actually use them. The Chinese government is able to request and use the root certificate of any Chinese certificate authority.

While TLS is securing the content of the traffic to and from the site, the TCP/IP transport layers are still open to inspection. GFW only knows the meta-data supplied by TCP/IP protocol.

An SSL handshake can add 300ms — 1000ms of time to a page load. This additional time can make or break a site’s usability in an outlying province. So, it makes sense that unstable web connections would prefer to not add SSL.

I made a small script to access ipconfig.com from Seoul and Shanghai via HTTP and HTTPS. You can find detailed data on my Github. Herby, I plotted different distributions for HTTPS and HTTP requests in Shanghai and HTTPS requests in Shanghai and Seoul.

It worth noting that some people rely on HTTPS proxy and it’s not a good idea. HTTPS proxy uses a special TLS in TLS schema which is quite different from a normal HTTPS connection which is not common to the point that most browsers do not support it.

Encrypted DNS

Previously I have mentioned DNSSEC, here I want to describe the DNS security more in-depth.

DNSSEC

◇Protocol stack--------
DNSSEC
--------
UDP
--------
IP
--------

DNSSEC strengthens authentication in DNS using digital signatures based on public key cryptography. With DNSSEC, it’s not DNS queries and responses themselves that are cryptographically signed, but rather DNS data itself is signed by the owner of the data. Every DNS zone has a public/private key pair. The zone owner uses the zone’s private key to sign DNS data in the zone and generate digital signatures over that data. As the name “private key” implies, this key material is kept secret by the zone owner

DNSSEC validation in the six largest Asian economies.

The picture in China is somewhat surprising, with 7% of Chinese users having their DNS queries resolved through Google’s Public DNS service — a slightly higher figure than that of the number of DNSSEC validation.

DNS over TLS and DNS over HTTPS are two standards developed for encrypting plaintext DNS traffic in order to prevent malicious parties, advertisers, ISPs, and others from being able to interpret the data. DoH encrypts DNS queries, which are disguised as regular HTTPS traffic — hence the DNS-over-HTTPS name. These DoH queries are sent to special DoH-capable DNS servers (called DoH resolvers), which resolve the DNS query inside a DoH request, and reply to the user, also in an encrypted manner. Each standard was developed separately and has its own RFC* documentation, but the most important difference between DoT and DoH is what port they use. DoT only uses port 853, while DoH uses port 443, which is the port that all other HTTPS traffic uses as well.

DoT

--------
DoT
--------
TLS
--------
TCP
--------
IP
--------

DoT has officially released the RFC (see RFC 7858 and RFC 8310). From the time point of view, RFC7858 was released in 2016, and RFC8310 was released in 2018; obviously, this protocol appeared relatively late. DoT’s chain of trust relies on TLS, and TLS’s chain of trust relies on the CA certificate system. As of 2020, Cloudflare, Quad9, Google, Quadrant Information Security, CleanBrowsing, LibreOps, DNSlify and Telsy are providing public DNS resolver services via DNS over TLS.

DoH

Protocol stack--------
DoH
--------
HTTP
--------
TLS
--------
TCP
--------
IP
--------

As the name suggests, DNS over HTTPS is a domain name protocol based on the HTTPS tunnel. On February 25, 2020, Firefox started enabling DNS over HTTPS for all US-based users, relying on Cloudflare resolver. But I didn’t find evidence of clients in China support DNS security standards.

IPv6

IPv6 is the next-generation Internet Protocol (IP) address standard intended to supplement and eventually replace IPv4, the protocol many Internet services still use today. The reason behind the development is simple — the original IP addresses scheme, IPv4 is running out of available addresses due to the widespread usage. The key significant difference — IPv6 scheme utilizes a 128-bit IP address, while the IPv4 scheme relies on a 32-bit address.

Given the Chinse significant internet user population, the deployment of a new standard would easily increase global adoption. There are no visible plans for spread IPv6 across the country yet. Still, we enable us to estimate the IPv6 adoption in China using measurement services. For example, Google has a measurement page. By Jan 2020 the 29% of users access Google through IPv6.

https://www.google.com/intl/en/ipv6/statistics.html#tab=per-country-ipv6-adoption

Google reports the current level of IPv6 use per country, and at the end of December, Google reported some 0.3% of users in China used IPv6 to access Google. In June 2019 the level of IPv6 adoption reached it peak with 7.11% with a following significant drop to 1.5%.

https://www.vyncke.org/ipv6status/compare.php?metric=k&countries=cn

Not only Google provides the graph for IPv6 monitoring, so does CDN-provider Akami. Company issues the “State of the Internet” report includes their calculation of the proportion of Chinese users accessing Akami services using IPv6.

The time-series plot differs from Google’s one. We see the drop by 30% with further stable growth to the initial adoption rate.

https://www.vyncke.org/ipv6status/compare.php?metric=k&countries=cn

A similar trend is captured by an Internet address registry APNIC that uses a measurement script embedded in an online advertisement campaign. Then the ad is delivered to the browser an embedded script is activated. It worth noted that the ad campaign measurement has a large number of variable factors, such as ad placement, concurrent campaign, and popularity of the apps embed online ads.

You can notice the traffic surge starting from 2018 on Akami and APNIC measurements. Many regional networks within the China Mobile group have been undertaking IPv6 deployment. A good example is a network that Cloudflare observed in July 2018, AS9808, Guangdong Mobile. Later a similar picture was observed in Beijing (AS56048) and Hunan (AS56047).

Also there some visible movements on IPv6 in China’s largest backbone provider China Telecom. Its network is the largest network in China in terms of customers, with an estimated base of 300 million users. There is some visible movement with IPv6 deployment in this network, again commencing in November 2018

The reason behind Google's fall is questionable. I failed to find any solid news that may cause such behavior. I have two hypotheses in mind.

  1. Also as it is a percentage metric, they can adjust their calculation on the total internet penetration rate in China.
  2. Previously open discussions between netizens took place on Google Plus groups. In April 2019, Google shut down Google Plus. Technical discussions continue on Chinese-language blogs, forums, and groups. For obvious reasons, discussions must be hosted outside China, and posters must register under pseudonyms. So probably that caused the shift from Google services but I hardly believe that it may cause such plummet.

China showcased the CNGI’s IPv6 infrastructure during the 2008 Summer Olympics and later deployment was widespread in related applications — security cameras, data networking, and taxis.

Also, the CERNET set up native IPv6 (CERNET2), and since then many academic institutions in China joined CERNET2 for IPv6 connectivity. CERNET-2 is probably the widest deployment of IPv6 in China. It is managed and operated jointly by 25 universities.

Other

Censorship is notoriously tight during so-called “sensitive” periods. One example would be the two national political meetings (全国两会) during March 2019. June 4, 2019, marks the thirtieth anniversary of the CCP ordering troops to fire on civilians in Tiananmen Square (六四事件).

I would like to thank my friend, Yu Ding for reviewing this text. He is a founder of two great products:

Izumo — the product that helps you set up your private VPN that works in the most restricted Internet access in one click.

GFWaaS — a tool that helps to test your website performance in China and give you a detailed insights report.

Feel free to ask any sort of question in the comments. I want to keep updating this post, so if you read it and find any inaccuracies or you want to add something, please let me know.

Apart from Medium you can reach me through twitter or WeChat (@zkid18)

Mobile Asia

Non-regular research blog about tech in Asia

Mobile Asia

Reverse-engineering Asian market

Daniel Chepenko

Written by

HKUST BDT, Software engineer intern at AQUMON

Mobile Asia

Reverse-engineering Asian market