An Unhealthy Obsession with Trains

Sam
15 min readApr 14, 2022

--

I hear the train a comin’ / It’s rollin’ ‘round the bend / And I ain’t seen the sunshine / Since I don’t know when…

It was 3:21am, Monday morning. I had just left campus and was heading home after spending nearly 18 hours in the trains lab. I chose to do this. Why? What poor decisions did I make to get here? What childhood trauma shaped those choices?

the train set and lab computers

CS452: Real-time Programming, better known as trains, is unlike any other elective one can take at the University of Waterloo. There is no hand-holding, little help offered, and is ultimately an abstract “you get what you put in” course rather than teaching any concrete concepts. It is a course about sacrifice and a test of willpower. The lectures consist of concepts that can be generally applied to any real-time operating system, as well as some high-level guidance for the assignments. The gritty implementation details are ultimately left to the students to figure out. I learned how much sleep I was willing to forgo in the name of the project and how much time I could spend in the lab in one day. As it turns out, 18 hours was as long as I was willing to stay in a lab for one day, but I know of a few brave souls that pulled a few all-nighters to get the job done.

I’ve had this strange yearning to take this course since I was a young, naive, and ̶h̶a̶p̶p̶y̶ innocent first year. I had heard stories of fabled upper year students that took trains and lived to tell the tale. So of course, I jumped at the first opportunity to take it after the pandemic: Winter 2022. This particular offering started while classes were still online, but it was thankfully granted an exception as it would be quite difficult to do trains… without the trains.

the happiest day of my life. how silly I was.

Bill Cowan has since retired, so the course is currently taught by Martin Karsten. While the lectures did not seem to be mandatory, he made them quite enjoyable and I looked forward to showing up to class. I would recommend showing up just to take whatever guidance is offered.

The majority (~70%) of the course is the pair project: a real-time-capable operating system that you build from the ground up to control the trains on the track. It is important to choose a partner that you enjoy working with, as you’ll be spending a lot of time with them! The remaining 30% of the course is a written exam that tests some concepts you will have (hopefully) picked up just by taking the course, such as the details of context switching and memory management.

The Lab

MC3018 is perhaps better known as the trains lab to its residents. It’s where students go to post memes on the only window in the room either in an attempt to block the view of free students outside, or to cope with the crushing sorrow of the course.

the dead flower is appropriate symbolism

A warning sign is taped to the door warning all those who dare come close.

The students have seemed to really absorb the culture that the course has carefully curated over the decades. My only regret is not taping something of my own to those lab walls.

I found the one standing desk in the lab by the main door to be the most useful thing in the lab. Good luck fighting for it.

The Hardware

There are four key pieces of hardware present in this course:

  • TS7200 single board computer
  • Track
  • Trains
  • Märklin Digital Interface

The desktop computers connect to the TS7200 via serial cable which allows you to enter rudimentary commands at a Redboot prompt — you will only use Redboot to load your kernel. There is a little red button helpfully labelled RESET that resets the box to a baseline state. You will find yourself pressing these buttons often. Some of the buttons are well-used.

The track consists of sensors and switches (perhaps more accurately known as “turnouts”), which can be a little finicky on occasion. The switches can burn out if the solenoids are left running for too long (>500ms), so be very careful to turn them off shortly after turning them on (but not too fast, otherwise the solenoids won’t actually do anything)! Occasionally the track set is repaired by course staff, but don’t rely on this happening regularly.

track A undergoing intensive surgery

At some point in the course each group seems to find a “favourite” train that performs the most consistently. They all have their own quirks and personalities, so be sure to find one that matches yours!

Legends tell of a secret command that causes some trains to honk instead of changing their speed. They speak of a forbidden number between 0x40 and 0x60.

For us, our golden goose turned out to be the Swiss Electric Locomotive BR 486 BLS Cargo (model Roco 73651), coded as train 01 on the track set. Two axles having power connections to the track instead of the usual one added to the reliability of this train over dead spots. Unfortunately as time went on, the train seems to have sustained some damage as we saw its performance begin to degrade over the course of the train control milestones.

The digital interface itself is directly wired into the track set and is connected to the TS7200 box via serial cable. There’s nothing too special about the interface other than it does not seem to be produced anymore. Note that if a train number is selected on the box, attempting to control that same train from your operating system will not work!

Assignment 0

Assignment 0 seems intentionally designed to be… brutal, to say the least. Unlike the rest of the course, this assignment must be completed independently. I was told to expect around 40 hours of work on this portion of the course, and I think it ended up taking me about 50 hours. Word on the street is that A0 is intended to filter out those that enjoy their free time. There is just under one week to complete it.

The only reason it did not take longer than 50 hours is thanks to the TS7200 emulator. This emulator has saved me countless of hours in the lab as we were able to complete a good chunk of the kernel portion remotely. Most importantly, it supports GDB which proved to be the most useful tool in this course. Memory sanitization is also included for numerous times we managed memory incorrectly. Of course, it’s still an emulator, and while it works (really well, mind you) the grading is ultimately done on the real boxes, so don’t forget to test on those!

The professor arranged for key fobs to be distributed around the entire class, as MC is locked after 11pm on weekdays and 5pm on weekends. These fobs grant access to the building at all hours, and you will likely have to use them at some point during the course. I did not need to use mine by simply getting to the lab when the building opened and staying until the clock read AM.

The assignment page for A0 is deceptively simple: All you need to do is make a simple UI that reports some basic information and send a few commands to the track to make the trains run. Easy enough.

my very basic ui

This ignores the fact that:

  • Just setting up the clock alone requires slogging through a lengthy piece of documentation in order to understand the various registers available to you on the board. The professor is helpful enough to point you to the correct chapter at least!
  • The track connection is half-duplex, meaning you can either send or receive information at once. If you try to do both, bytes will just get dropped, almost randomly. This was a great source of pain.
  • The track connection transmission rate is 2400 baud, with each byte being encapsulated in an 11 bit frame (8 data bits, 1 start bit, 2 stop bits). Theoretically this means you get about a 218 byte/second connection to the track, but in practice the processing speed of the digital controller means you get much less. For reference, such a connection speed would be considered cutting-edge in the 1980s when using dial-up modems (adhering to the V.22bis standard, later , although at least the dial-up model was full duplex…
  • The only starter code you get are some print functions that you will need to greatly modify anyways.
  • There is a little bug that is deliberately introduced in what little starter code is provided, and it’s your job to find it. I found this one to be quite mean. When you find it you’ll know why.
  • Nothing is provided to you. No standard libraries. Not even a heap with which to call malloc or new. Nothing at all, except that…
  • Everything is provided to you. About 2741 pages of documentation — everything you could possibly need is in there… somewhere…
  • I just suck at writing C. Memory management is hard.

At the beginning of the term the class was a very full 45, which was enough to fit even at half capacity into the lecture hall. By the end of the week we were down to 14 (including myself), a number that the professor quipped as being “just about right”. Those 13 other brave souls stuck around to the end of the term. I still question whether it would have been smarter to follow the herd and drop the course.

choo choo

Kernel

Looking back, the kernel was perhaps the easier portion of the course, mostly thanks to the emulator and GDB. In previous offerings, such technological advances didn’t exist, and I can imagine this would have been incredibly painful to debug. As a result, I suspect that future offerings may be slightly more difficult to compensate for these advances made in the course.

The kernel is split into 4 parts (K1 to K4) and is allotted about 6 weeks worth of time. You will only have to implement:

  • task scheduling
  • context switching
  • message passing
  • interrupt handling
  • mini device “drivers” for interacting with the terminal and track serial devices
  • a host of small features that will make life more convenient as a developer

Of all of these, I found K4 to be the most difficult, as it is the assignment where you’re actually asked to (properly) implement interacting with the serial devices — especially the track. It also happily says “redo the functionality of Assignment 0”. So that’s cool, I guess.

The context switch itself was also quite challenging and bug-prone— while assembly knowledge is not required to take the course, you will have to write a couple dozen lines of ARM assembly for the context switch. I would recommend writing a generous amount of tests to ensure that this works flawlessly. You do not want to be debugging the context switch during the train control assignments.

The first thing we did in the kernel was write an assert function to ensure we were doing things correctly. This proved incredibly helpful in train control, where it becomes quite difficult to debug in real time while the trains are running. As it turns out, it’s infeasible (and maybe even slightly impossible) to pause a moving train like you can with an executing program, so the next best thing is being able to clearly print where your assumptions went wrong. We ended up with about 300 asserts in our code.

ascii art is basically required

When an assert fails it is recommended to stop all the trains currently running on the track — we lost a lot of time trying to manually chase down and stop the trains after a kernel panic.

Train Control

It was about this time that our professor told us that “most people that make it this far go on to pass the course”, helpfully followed by an emphasis on “most”. I found this amusing. Then I got worried.

Quite honestly, the amount of work that train control ended up being blind-sighted us, and the general consensus of the class was that train control was much harder than the kernel. Again, this was likely thanks to the emulator and debugger support, which were much more difficult to use during this portion. Do not take this portion of the course lightly — if you finish the kernel early it might be worth getting a head start on train control.

The TS7200 emulator is still helpful for train control, but the actual train sets will be of much more use. There is a track set emulator that I cannot vouch for (as I did not use it), but some other classmates have found it to be very useful. I would recommend bringing some kind of cable to the lab to attach your laptop to the monitors if you are planning on spending a lot of time in the lab. The monitors have USB-C connectors, but my Mac charger was a charging-only cable so I ended up working on my tiny XPS 13 for all the time I spent in the lab.

The first thing we ended up doing for this portion of the course was building a nice UI that would reflect the state of the track (which required painstakingly putting together ascii art of the track), making it easier to visualize what was going on. This proved to be incredibly useful in debugging reservations and sensor input as we could easily see the coloured indicators show up on the screen.

I could spend hours watching our ui do a boot test

The true difficulty of train control, in my opinion, is the union of (perfect) software with real hardware. Creating a simulation of the trains easy enough, but integrating where we think the train is versus where the sensors tell us the train actually is results in some complex and often-messy implementations as we attempt to reconcile our inaccurate modeling with reality. The age and nature of the hardware also presents a number of issues, as the last hardware update was well over a decade ago:

  • Some switches don’t work (but at least you know which ones these are)
  • Some sensors don’t work (can also be hardcoded)
  • There are a few dead spots on the track that receive no power (these will change and in my opinion difficult to hardcode)
  • Trains speeds vary (wildly!) from straightaways to curves
  • Train speeds vary (sometimes wildly!) from train to train
  • Train speeds vary (sometimes!) from day to day, as the tracks are used or cleaned

Your system will be expected to handle these issues, as well as recover from any introduced ones during the demos. Of course, it’s not expected that you handle these perfectly — real hardware, of course, is quite finicky — but the more you can demonstrate the better the demos (and your grade) will be. During the first demo, for example, the professor triggered random sensors around the track to see if it would mess up our train positioning. In the second demo, we showed how our system could resolve differences in predicted train positions and true train positions resulting from flipping switches manually. To make hardware possible to reason about, your system is only expected to recover from at most one error case in a row — two failing sensors one after the other or both a failing sensor and switch, for example, can be assumed as impossible scenarios.

The first milestone is routing the train to any destination on the track and demonstrating that you can stop accurately when you get there. The most difficult part of this milestone in my opinion is the fact that the trains don’t stop immediately when the stop command is issued— they “roll out” for a short amount of time (up to about 4 seconds) depending on how fast they were going. Various forms of overhead can introduce some additional sources of error. For example, sensor data overhead of (in the worst case) can introduce ~60–70ms of latency. This means that in time it takes from issuing the stop command to the train actually stopping, the train might have traveled ~2–3cm. For example, according to our calibrations train 1 can travel at around 35cm/s at top speed, meaning an error of about 2cm. Thankfully, our calibration suffers from these same effects and we can “cancel out” some errors.

We had considered slowing the trains down to the slowest possible speeds to eliminate the roll out time, but dead spots on the track made it infeasible to run at a constant slow speed to the destination, and our modelling wasn’t good enough to predict how the train would behave when decelerating. Combined with the fact that trains go slower around curves and faster on straightaways — as well as changing track conditions — can make stopping accurately a challenge! We were only able to consistently get within 15cm of where we wanted; looking back, there were a lot of easy improvements that we could have done to tighten this distance, but we had foolishly underestimated the amount of work this component would take.

the only time our train stopped perfectly

The second milestone introduces a second train to the track, and brings with it a variety of new challenges. Of all of these, collision avoidance was perhaps the most obvious and most complex challenge to solve, as it involved (in our implementation) slowing or stopping trains while still maintaining good location predictions throughout. It is recommended to build a track segment reservation system — when a train fails to secure a track reservation it can stop (easiest) or slow down to a speed where by the time it reaches that segment it will be clear (harder). Like with the rest of the course, you’re pretty free to implement anything you want — any solution is valid so long as it works!

If all this wasn’t enough work, the last week of the course is allocated to the final project. The description for the final project is the shortest in the course: pick something hard, explain why its hard, and describe how you will solve it. The professor declined to give examples of previous projects to ensure that we come up with something original, so I will also avoid sharing the various projects in the class. Suffice to say that they were all really neat, and I’m sure future offerings will also have similarly cool projects. Note that the final project must involve the train set in some non-trivial way, so simply using some part of the board such as the USB, network interface, or graphics driver will not suffice.

Closing Thoughts

Despite all the late nights spent in the lab, we were unable to satisfactorily complete the second milestone and were thus unable to complete our proposed final project. Still, I had a great time trying to solve the ever spiraling array of problems that come up in this course — there is always more work that can be done and never enough time.

CS452 was the perfect union of all the lower year CS courses taken at Waterloo. I do not regret taking this course at all; in fact, I would highly recommend it to anyone that is interested in writing an entire operating system from scratch — I found it to be incredibly rewarding. Take a trip to the window of the trains lab and see for yourself :)

Resources that I found really useful in this course:

choo choo

As an aside, I almost dragged 2 friends with me to visit Switzerland for the sole purpose of taking the famed Glacier Express halfway across the country. The itinerary would have been:

  • Day 1: Land in Zürich, take the IC8 to Visp, then the RE41 to Zermatt (a car-free city!)
  • Day 2: Take the Glacier Express 8 hours to St. Moritz
  • Day 3: Take the IR15 to Chur, then the IC3 back to Zürich

It seemed too much of a hassle to keep transferring trains, so the itinerary was then revised to:

  • Day 1: Land in Zürich, take the IC8 to Brig
  • Day 2: Take the Glacier Express 4 hours to Chur
  • Day 3: IC3 back to Zürich

Then I thought about it some more and we ultimately agreed that visiting such a beautiful country just to ride trains for 3 days was stupid.

Sometimes I write things. Get in touch!

--

--