How I Couldn’t Stop Poking at Mysterious CompuServe Server Hard Disk Images
Preface
This is about digital archeology. I hope people interested in the legacy of early online services will find it useful. And I hope other digital archeologists more knowledgable than me will find it and provide additional information. Maybe someone even feels compelled to pick up where I left off?
Please bear in mind that this is the work of just a couple of long winter evenings. My knowledge of traditional mainframes and minicomputers was close to zero before I started this project. I might have misconceived things or overlooked others.
Whatever you have to say about it, don’t forget to tell me! You can leave a comment at the bottom or reach me via email at compuserve (a-t) cmund (d-o-t) de.
Now let’s start at the beginning.
The Digital Antiquarian
I’m a big fan of Jimmy Maher’s “The Digital Antiquarian”, although I never was a passionate gamer. Yes, I did play many games in my early teens. But only very few really caught my interest. It was all about the magical machines running them and the community developing around it. Games just happened to be what pushed things to the limit.
But then The Digital Antiquarian isn’t your typical retro gaming blog. In many articles the games quickly fade into the background making way for stories about the people and the technology that made them actually possible. And that’s where it gets really interesting.
In November 2017 The Digital Antiquarian ran a series of articles called “A Net Before the Web”. Essentially, they were about how online gaming, real-time chats, file downloads, online banking, message forums and online shopping were mostly pioneered by commercial online services like CompuServe long before the web existed.
That’s a huge part of today’s (online) culture. Yet the historical significance of these services is hardly acknowledged today. In fact, much of CompuServe was literally thrown into the dumpster when the web took over[1,2]. Wouldn’t it be interesting to do research on how people communicated online 30+ years ago? Even more so in the light of today’s debate on “hate speech” and social media’s impact on society? Unfortunately, they’re gone for good. Or are they not?
A mysterious hard disk image archive appears
When this was discussed in the comments section, a link to a folder labeled “Miscellaneous CompuServe Drive Images” on archive.org was brought up by Jimmy himself. He had been unable to extract any data from it and encouraged others to give it a try too. This was just too good to pass up. I downloaded the whole 33 GB archive myself and the journey began.
As we learned from the articles, CompuServe used DEC PDP-10 mainframe computers for their service. Hence the first thing I tried was booting up a PDP-10 emulator with the disk images attached. While trying them one by one, two things quickly became clear: These weren’t ordinary PDP-10 disks and I needed to learn a lot to get anywhere with this. I’m a Unix guy who started out in the late 90s. This wasn’t even close to anything I had worked on before.
But where to start? Looking for hints, I spun up a hex editor and some reverse engineering tools like GNU Binutils and binwalk. Apart from a few system messages on the hdq
disk possibly being part of microcode[13], the raw data wasn't giving away anything meaningful. It was time to do some research.
The SC-40
The PDP-10 family of mainframe computers were available from DEC until 1983, well over 10 years before the number of CompuServe users peaked in the mid-90s[4]. They used disk pack cabinets with a storage capacity of up to 929 MB[3] attached via an interface called “Massbus”. By contrast, the disk images found in the mysterious archive have a size of up to 8.7 GB. And if the filenames are to be believed, they were taken from SCSI drives not existing until 15 years later. How could this be?
When the PDP-10 platform was cancelled, it was still used in production by many companies and institutions around the world. Other manufacturers saw an opportunity and started offering PDP-10 compatible hardware, one of them being Systems Concepts. Their top-of-the-line product was the “SC-40” released as late as 1993. It featured a much faster CPU, more memory, a SCSI interface for attaching modern peripherals like 3.5" hard disks, fit into a 20" rack and consumed much less power. CompuServe loved the Systems Concepts machines. They even licensed the design and built them themselves[9].
Now that I knew what to look for, the pieces of the puzzle started falling into place. In 2009, the original CompuServe service (now re-branded as “CompuServe Classic”) running on these machines finally ceased operation[4]. 5 years later the remains of CompuServe fell into the hands of Verizon with the acquisition of AOL. The SC-40s were now destined for the dumpster. Luckily an engineer called Gerry Moersdorf saved 9 of them. And best of all, while doing so he shot two videos packed with information and made them available on YouTube:
Tracing the archive back to its source
The mysterious disk images must have been taken from one of the SC-40s Gerry saved, I thought. According to the videos, he kept one to himself, sent another to the Living Computers Museum in Seattle and auctioned the rest off on eBay.
While scouring the net for information on the destiny of these machines, especially the eBay ones, the same person talking about their CompuServe SC-40 kept coming up. The trail left behind fit well into the timeline I had so far. My gut told me I was on to something. So I just emailed them to ask for help and if they had anything to do with the disk images. It turned out I was dead on.
In the following email exchange I learned that their particular SC-40 belonged to the last ones used in production. Parts of the operating system have a build date of 2009 (!) and the new owner believes it was used for accounting. This correlates with what can be found on Wikipedia and the Usenet. Yet I still wanted to see it for myself. So we came up with possible ways of how I could access the data myself in an emulator.
Dive into TOPS-10
First I had to take a dive into the PDP-10 architecture and its operating system called TOPS-10. Thankfully, a lot of the original software, technical specifications and manuals are available from the Bitsavers project. And I have to say, they are a joy to read — the manuals in particular; well structured, concise and comprehensible. With their help, I learned how to boot up a PDP-10, explore the system, handle hard disks (“structures”) and perform administrative tasks.
TOPS-10 feels like a mixture of Unix and DOS. And having never dealt with systems of that era before, I was quite surprised by all the advanced features. Some of them were very much ahead of its time.
Manipulating the disk images
The CompuServe system used non-standard PDP-10-based hardware along with a custom version of TOPS-10. As a result, booting it in one of the available emulators is next to impossible. But I hoped the disk images could be attached to a standard TOPS-10 system as secondary disks somehow.
Now the original DEC disks have a sector size of 128 36-bit “words” which translates to 576 bytes. The SCSI disks however, which usually came pre-formatted with now standard 512 byte sectors, were apparently low-level formatted to and used with a sector size of 584 bytes (again if the file names are to be believed). How do these different sector sizes align?
Mounting invalid disks in TOPS-10 will result in the system complaining about missing or broken “HOM” and/or “BAT” blocks. When using system tools to repair/refresh these, changes are being written to the disk at 0x480, 0x18c0 and 0x6b9. The data at the first two locations both have the same signature (8A 1D 00 00 0F FE 18 00 04 00 00 0E
). Is this the HOM block and its backup? And is 0x6b9 where the BAT block lives?
On the SCSI disks they can be found at 0x490, 0x1918 and 0x6c9. Knowing the offsets, I thought it might be worth spending time on chopping and moving the data around. Maybe I was lucky and some of it was just extraneous bits and pieces only used by the SCSI controller on a lower level.
Whatever I tried only provoked different error messages though. Mounting the disks just wouldn’t work. The differences in disk layout between a PDP-10 and a SC-40, presumably caused by the SCSI controller and/or the patches to the operating system, seemed impossible to overcome by trial and error.
Restoring the data using a tape backup
Time for the Plan B: The CompuServe SC-40 owner was kind enough to offer the data in the form of backup tape images as an alternative. To restore it, I needed two things: an emulator supporting arbitrary disk sizes and an accordingly patched version of TOPS-10.
Because it’s well documented and available as a binary package for many Linux distributions, I had used the SIMH multi-system simulator to emulate a PDP-10 thus far. Now I switched over to KLH10, a commercial emulator meant to virtualize PDP-10 systems used in production[5]. It’s now available under a free license[6] and supports setting up disk images with custom geometry[7].
Next I looked into the feasibility of patching TOPS-10. TOPS-10 Monitor Sources can be obtained from the Trailing-Edge.com Software Archives. But the modifications needed for big disks apparently require more than just changing up a few values[7]. Going through the time-consuming process of learning an ancient programming language along with its compiler toolchain all with hardly any documentation at hand felt like too much of a challenge.
The Unix tape tools
At one point during my research on restoring from tape, I thought I had found nice a shortcut. I discovered tape image tools hidden in the KLH10 repository on GitHub written for… *drumroll*… Unix! But on a closer look it wasn’t really of much help. Two of the tools were written for tape images of different operating systems (TOPS-20, ITS), while the third one only creates tape images. Additionally these tools were written for Unixes of the same era (4.2BSD). Trying to compile and use them would likely open a whole new can of worms.
Back to analyzing the raw data
I finally gave up on accessing the CompuServe disks in an emulator or extracting the original files of the disks’s file systems in some other way. It proved to be way too time-consuming for a hobby project like this. Also the skills acquired along the way most likely wouldn’t be of any use in future projects. But before I’d shelve this project, I wanted to work with the raw data one last time.
When I first poked at the CompuServe disk images, strings
from the GNU Binutils was one of the tools I used. It "prints the printable character sequences that are at least 4 characters long (or the number given with the options) and are followed by an unprintable character"[8]. In other words, it will show you human-readable plain text found in binary data. But it didn't provide insight into the data hidden in the disk images.
This time around though, I whipped up a my own tool (drawing on standard tools in the spirit of the Unix philosophy). It basically does the same, but with a twist: It considers the SIXBIT character coding system (not ASCII) of the PDP-10 architecture and groups and sorts the results. When I fed the kdk
disk into it, my terminal was suddenly flooded with output like this:
JACQ%TU4cONTED
JACQ%TU4sAUTHD
JACQ%TU4sERBRD
JACQ%TU4sREEN@
JACQ%TU5CAKNAD
JACQ%TU5cESIN@
JAIM$Ss#3VILLD
JAL-#3#3s-270C
JAME%33ssLANCD
JAME%4eU#LONG@
JAME%4%U#NHAM@
JAME%5tT3KMAN@
JAME%5%U3HING@
JAN-DE$UUNING`
JAN-%dU%sOERD@
JANE%E4T#ASTID
JANE#SsuCYRRED
JANI$4SS34LATD
JANS%DUdSNSON@
JAN-$TttSBEEN@
JAN-$tUdSRINK@
JAN-$$U$sHUIS@
JAN-$$UU#SKEN@
JAT-##CC3-270C
JAVI$U%5SAREZ@
JAVI$U$DSLPOZD
JAY-$%U$CETTE@
JB5-##CcC-270C
JB8-##Cc#-270C
JBAL$DU%3TONE@
JBAT%DU%3HELL@
JBC-##S3S-270C
JBD-##CS#-270C
JBLO$U4U#1576@
JBOR$DU%34534@
JBOT%DU$#USCH@
JBRI$DtU3-USX@
JBUT%DU$cIELD@
JBUT%DU%sORTH@
-JC-5$TDTGELDP
JC-B$TDtSDJIAD
j?DAPA%EDU#SON
JDUE%5DU#HAUS@
JEAN$UEDS989CD
JEAN$UEDS-BEED
JEAN$UEDSBELLD
JEAN$UEDS-BRUD
JEAN$UEDS-EICD
JEAN$UEDSJUNK@
JEAN$UEDS-MCKD
JEAN$UEDS-REND
JEAN$UEE3CHOED
JEFF#33ECEEHAD
JEFF$%$TCESON@
JEFF%tT#c1100@
JENS%UtUcOIGT@
JESS$S#s#RICHD
JESS$T$U#RERA@
JFAI%%4U#VICE@
Lo and behold! See the first four characters of every line? Forenames! Followed by bits of surnames. Further down, huge sections of Peters (5,724), Pauls (4,357) and Roberts (4,283) appeared. Finally I had produced some results!
kdk
is one of the two disks labeled as "bootable", the other being hdq
. While hdq
obviously contains microcode and firmware[13], kdk
might be the operating system boot disk. Is this coming from log files produced by periodically run jobs to gather accounting data? Or is it from a database? I’ve been told TOPS-10 maintains its users in a database called “REACT”. Also CompuServe is said to have been reliant on a database system called "System 1022" originally published by Software House in 1976. They even acquired the company 10 years later and made it the CompuServe Data Technologies division[10,11,12].
Either way, this was a strong indication that this system really was used for accounting and/or tracking system usage of CompuServe’s very last customers. On the one hand I was a bit disappointed that the mysterious disk images most likely don’t contain anything of much historical interest. On the other hand I was happy that all of this work had led to something tangible after all. I thought a bit about refining the tool to produce clearer output. But ultimately this was satisfactory enough to put an end to the project.
References & remarks
- CompuServe PDP-10 SC-40 Boot Up (YouTube)
- Systems Concepts SC40 PDP10: Operation Demonstration (YouTube)
- Although drives with a capacity of 176 MB (RP06) and 504 MB (RP07) seem to have been much more common. In fact, details on the 929 MB drive (RP20) are very scarce and none of the emulators I found emulates it.
- CompuServe (Wikipedia)
- KLH10 Background Commentary
- KLH10 License Terms
- KLH10 Installation & Configuration, line 736
- man page for “strings”
- Systems Concepts (Wikipedia)
- Where did System 1022 go (alt.sys.pdp10)
- Tymnet: Trouble tracking (Wikipedia)
- Nathan Gregory — The Tym Before…
- Among the interesting strings found on the
hdq
disk are "SC-40 Supervisor Program V1.00" and "[PRF: Found file MSP.BIN on %s drive %02X]". MSP is short for Mars Supervisor Program and there is a funny story behind it. DEC was well into developing a new high-end PDP-10 called "Jupiter Project" when the whole product line was cancelled in 1983[14,15]. This prompted Systems Concepts to call the SC-40 "Mars" internally[1]. Because… well, Mars is closer than Jupiter! - PDP-10: Cancellation and influence (Wikipedia)
- Jupiter project (Wikipedia)