HEPiX conference

From the 8th to the 12 of October I attended my favorite conference, HEPiX. This conference brings together system administrators, system engineers, and managers from the High Energy Physics and Nuclear Physics laboratories and institutes, and a few others like myself. What marks HEPiX out form other conferences is it is not theoretical but practical, and is about sharing experience. This makes HEPiX low on sales pitches and high on learning and sharing experiences, we also get stories of data center’s being flooded, tornado mitigation solutions, UPS systems catching fire, and other not normally talked about IT problems. Big name research institutes like BNL, CERN, DESY, FNAL, IHEP, IN2P3, INFN, JLAB, Nikhef, RAL, SLAC, TRIUMF and many others present what has an has worked well for them in the past year along with what has not worked so well. HEPiX is always held at a different location and this time it was in Barcelona Spain.

Site reports don’t just give you details of the trouble upgrading hardware and software, but also give clear data center trends.

Suppliers of hardware.

Dell/HP continue to dominate computer purchases.

Dell/Force 10 and Brocade dominate networking, but increasingly white box and Software defined networking is being deployed.

Containers and clouds.

OpenStack is still the dominant deployment framework, and other cloud solutions are being phased out such as Open Nebular, Azure Stack, and CloudStack. Increasingly compute nodes are running on OpenStack to allow easy switching between different scientific workloads, both within HEP or serving multiple user bases.

Containerization is steadily growing to provide service platforms, Kubernetes is almost completely dominant, though many sites are using OpenShift as a way to manage their Kubernetes service.

CERN reporting on OpenShift and Gitlab success

OpenShift is a Redhat product providing a PAAS based upon Kubernetes and docker containers. This has been deployed in CERN and is growing steadily.

GitLab seems to integrate very well with open Shift, automating git url to docker container, including staging and dev environments as well as production. CERN boasts that they are using GitLab for their k8s pipeline/devops work. CERN is known for reinventing the wheel so reporting that a bought in CI/CD workflow is now helping them is interesting and very probably worth investigating.

HEPiX On hardware

HEPiX is my most trusted source of bench marking, and we had some very interesting analysis on the current state of server CPU’s comparing costing for Intel Xeons and AMD’s EPYC.

* Particularly EPYC 7351P@2.0 GHz [Uniprocessor — 48 threads]
 * Some protection against spectre and side channel effects.

For more read here https://indico.cern.ch/event/730908/contributions/3153163/

Storage in HEPiX

HEP use tremendous amounts of storage, though not on the scale of major cloud providers and users, they are the second tier of data storage demands. This makes the cost and long term storage of data very important.

Most of the HEP products such as xrootd/root, EOS, DPM and dcache are of little interest outside the science community, but not all HEP data usage is specific to the HEP community.

HEPiX on Ceph

Ceph is a very scale storage service which builds on top of an object store, provides a file system, and S3 interface, as well as NFS and mange other data access protocols. Ceph replicates data and makes it fault-tolerant, using commodity hardware and requiring no specific hardware support. As a result of its design, the system is both self-healing and self-managing, aiming to minimize administration time and other costs.

Ceph deployments have been growing dramatically in the HEP community. Of all the petabyte scale deployments my highlight was Michigan State University, University of Michigan, Wayne State University, and the Van Andel institute making a multi site ceph cluster. As Ceph only acknowledges data is stored when it is replicated to the required level of redundancy, so all replicas must be distributed, doing multi site ceph clusters is always pushing the boundaries of ceph usage, and in this case network functionality.

Tape and HEP.

Tape is always evolving in capacity and scale of robots.

  • Roughly steady 25% a year improvement in capacity.
  • Bandwidth not increasing significantly and this is causing real issues for tape usage. Not just in HEP but particularly for major cloud providers.
  • LTO-8 (and IBM’s solution) are providing a major improvement in capacity. 12 TB per tape but with only 360MB/s and with this increased capacity seek times are going up. The new capacities have average file to file position time up from ~30seconds to 45s

Disk and HEP.

It seems that 6TB and 12 TB disks are now the choice for worker node deployment. SSD storage is being actively used and is showing significantly better mean time to failure than manufactures specified. Also new form factors such as NVME are increasingly being deployed. This is leading to greater difficulties with replacing failed media, but also higher potential densities of storage, leading to the potential of a petabyte of storage in a single rack.

Networking in HEP

IPv6 is still being migrated to, most services are dual stack IPv6/v4, with the default being IPv6. but the process is still not yet complete, and the community is investigating why monitoring shows some groups traffic is still mostly IPv4 though some groups have migrated at least 70% of traffic to IPv6.

More exciting is SDN/NFV (Software defined networking and Network function virtualization) which is a long term interest of mine. This is such a cross disciplinary subject that brings together Network experts and programmers, who are not used to working in a hybrid way, and I think this in part explains why the topic has not become mainstream. This said HEPiX has decided it is an area of strategic interest consequently it has set up a working group to explorer this area more. The initial scope to be explored:

  • Bandwidth scheduling
    * multi tenancy support
    * Supporting multi cloud systems.
    * Mostly focusing on OpenContrail for current evaluation.

SDN in practice

Flange and open flow was again talked about by the University of Michigan as a way to reconfigure network flows dynamically to maximize bandwidth and reduce congestion.

Elastiflow https://github.com/robcowart/elastiflow seems an interesting way to visualize network flow’s by providing data collection and visualization using the Elastic Stack, Logstash and Kibana.

Batch Queues

HEP has very high utilization of compute clusters, often averaging higher than 90% utilization over the period of a year. While the choice of batch queue system is still many and varied it seems the majority of new deployments are going with SLURM, even though it comes from the Hight performance compute world rather than HEP’s normal high throughput systems.