How Important is High Availability in a School Network?

School networks are becoming more complex as requirements grow. A school server, school-wide WLAN, the use of tablets and laptops in the classroom, a school cloud, uniform logins for all services — the requirements for a network administrator or service provider in the school are many and varied. If everything works, then everything is usually also good. But what if the server, firewall or switch fails? The consequences can be very different. How quickly can normal operation be restored? How important is high availability in a school network?

High Availability

According to Wikipedia, high availability is defined as follows

High availability (HA) is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.

The point is that a system (“the school network”) remains operational even if one or more components fail. This can lead to interruptions. Depending on how long such an interruption lasts, the high availability is divided into different classes. A school network must be operational especially on school days (approx. 180–200 days per year). Even if 99.999% availability is absolutely necessary in very few schools, smooth operation is very important for everyday teaching.

Single Point of Failures

In order to achieve high availability, you have to reduce so-called “single points of failure”. These are components whose failure would cause the entire school network to stand still. What can be such “Single Point of Failures”?

  • Firewall → if it fails, there is no more access to the Internet, depending on the configuration, the internal network also no longer functions.
  • Switches (especially the core switch) → see Firewall, complete failure
  • Server → fails, many applications are no longer accessible, i.e. school logins, web applications, school cloud, …
  • Internet connections → if the only access fails, you are offline.
     …

Quick story:

Last week our firewall bricked (due to the Atom C2000 bug). The network went down, we were offline. A first attempt to virtualize the firewall failed, so we switched to a small minicomputer with 2 network cards. We needed some additional configurations on our main switch to split all WANs and VLANs on two network cards. After a few hours the network was up and running again (we were able to restore the backup of the configuration with few changes).

How can you increase reliability?

There are several ways to increase reliability and better protect the “school network” against failures. In general, it is about having as few (preferably no) “single points of failure” as possible and critical components that are fault-tolerant in the event of a failure. As already mentioned above, the requirements for a highly available school network depend very much on the circumstances and wishes of the school authorities. On the one hand it is a question of money, on the other hand not every network has to be available again within a few minutes.

Here are some ideas on how to increase reliability:

  • qualitative hardware → good hardware costs more, but it often runs more stable
  • Backups, backups → Configurations, data, virtual machines, containers — there is no way around backups (backups must also be tested!)
  • Monitoring → good monitoring can detect errors early in some cases and gives an overview of where there are problems in the network. So you can react faster and are not dependent on the hints of the users in the network (“The Internet doesn’t work anymore”, “The printer is broken”, …).
  • Increase fault tolerance → also called “failover”, i.e. two power supplies in the server, several Internet connections (“Multi-WAN”), two firewalls, RAID, two servers, …
  • Keep spare parts available → Hard disks, spare switch, …
  • UPS → protect hardware in the event of power fluctuations and continue operation even in the event of a power failure (for a limited time)
  • “Personnel redundancy” → better two or more IT administrators or service providers (in case of absence due to illness, vacation, …)
  • preventive maintenance

Bottom Line

A school network is certainly not a highly critical system, but with the increasing digitalization of schools it is becoming increasingly important that the IT infrastructure remains accessible without failures. At some schools, Internet failure is so severe that work can hardly continue (online systems for administration, school clouds, online learning systems, student information systems). In order to increase reliability, you don’t always have to spend a lot of money. Much more important is that you are prepared for a failure (especially at the Single Point of Failures).


Originally published at openschoolsolutions.org.