A Request for Comments (RFC) is a type of publication from the Internet Engineering Task Force (IETF) and the Internet Society (ISOC), the principal technical development and standards-setting bodies for the Internet.
An RFC is authored by engineers and computer scientists in the form of a memorandum describing methods, behaviors, research, or innovations applicable to the working of the Internet and Internet-connected systems. It is submitted either for peer review or simply to convey new concepts, information, or (occasionally) engineering humor. The IETF adopts some of the proposals published as RFCs as Internet Standards. — Wikipedia
For a while I’ve been thinking about the corpus of the RFC documents. It’s a compelling set of text files that document the development of the internet from the late 1960s to today. I liked looking at their ASCII diagrams and tables and noticed they often cited other RFC documents as references or footnotes. I thought about how the documents could be interlinked possibly showing how one RFC led to another. I envisioned a lego block-esq sequence of RFCs stacking on one other but it turns out like most systems the relationships between the documents are much more complex.
I started by downloading the text files and parsing them into a nice dataset. I looked for references in each of the ~8000 text document citing other RFC documents. The number of relationships quickly grew to over 45,000 connections between RFC documents when that target RFC was mentioned at least twice in the source RFC. The average mean for the corpus was 11 references to other documents with 4 links as the median.
The most referenced RFCs were procedural texts, for example the most linked RFC with almost 4,000 references was: “#2119 — Key words for use in RFCs to Indicate Requirement Levels (1997)” which usefully defines the words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” within the context of RFC documents.
The top 10 were:
3966 ‘#2119 — Key words for use in RFCs to Indicate Requirement Levels (1997)’
2129 ‘#5741 — RFC Streams, Headers, and Boilerplates (2009)’
535 ‘#5226 — Guidelines for Writing an IANA Considerations Section in RFCs (2008)’
467 ‘#3261 — SIP: Session Initiation Protocol (2002)’
430 ‘#0791 — Internet Protocol (1981)’
403 ‘#0822 — STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES (1982)’
362 ‘#0793 — Transmission Control Protocol (1981)’
356 ‘#1035 — Domain names — implementation and specification (1987)’
340 ‘#5234 — Augmented BNF for Syntax Specifications: ABNF (2008)’
324 ‘#2434 — Guidelines for Writing an IANA Considerations Section in RFCs (1998)’
323 ‘#1034 — Domain names — concepts and facilities (1987)’
I wanted to visualize these connections, a network seemed obvious but I wanted something more ordered, that maintains the sequence of publication. I decided to try a Arc Diagram, D3.js was not up to the challenge to render 8K nodes and 45K links. There is a tension between wanting to show all data points or reducing complexity. I wanted to show every single RFC so I created an enormous (200K x 10K pixel) static arc diagram in python and cut it into tiles to allow you to explore:
I colored the connections by decade and labeled each RFC node with number, title and year. The result is a bit of a mess, but there are some interesting patterns, as time progress you see more interconnections, there are some large nodes around important RFCs for FTP, STMP, and other fundamental protocols and there are local areas of clustering around specific topics like LDAP, certificates, or internalization for example.
It’s a colorful if not confusing view into the history of the Internet.
Tomorrow I’ll try it as a network.