Homespace, the missing as a service?

Theo McCaie
Met Office Informatics Lab
5 min readDec 9, 2019

Credit where credit’s due: I stole this idea from Mike Kiernan who came up with it during a collaborative hack I was at between Microsoft and the Met Office Informatics Lab.

Amazon has just released SageMaker Studio which looks a great place for my machine learning experiments. Visual Studio Online might be just the thing I need for my serious dev work in Python. I have various Notebooks scattered across our various Pangeo environments and of course I have all sorts of important digital detritus here on my MacBook. I want to use all these tools but I don’t want to scatter my work all over the various clouds. I don’t want to copy and paste SSH keys, or environment variables from one place to another. And I do want to start work on one platform and pick it up seamlessly in another. What I want is homespace as a service.

Prior art

When you type “Homespace as a service” into Google you get results like this:

Results for “homespace as a service” from Google

Whilst undoubtedly these are services I could do with they are not what I had in mind. If you put a little more effort in you’ll start coming across things like DropBox, OneDrive or Google Drive. These services are much closer to what I’m after but have a number of issues that prevent the utopia I envision:

  • Poor ‘mount-ability’ — There are various libraries that will ‘mount’ OneDrive or Google Drive on your local laptop but try to mount them on container running on a managed Kubernetes platform like AKS and I think you are going to run into issues.
  • Not a POSIX filesystem — In other posts I’ve encouraged embracing non-POSIX systems (and I stand by this for data) but so many of the tools I rely on are optimal or only work in a POSIX environment. Symlinks, permissions, directory hierarchies are features I rely on day in and out.
  • Latency — I’ve not tested this but consider the blurb from the OneDrive landing page, “Save your files and photos to OneDrive and get them from any device, anywhere.” The place designed to efficiently store terabytes of photos just doesn’t feel to me like the low latency storage solution I’m looking for.

I do use these tools but more for documents and (non-code) sharing and collaboration. This is not quite what I’ve got in mind when I say homespace as a service.

The closest thing

I think the closest thing I use to “homespace as a service” is my Pangeo homespace. This is an Azure NetApp Files storage system NFS mounted into any and all pods that I wish inside our Kubernetes cluster. It’s a “proper” file system so links, permission and random read-write perform as I expect. It’s low latency, high throughput and supports many-read many-write access. Perfect, no? Well, there are of course issues:

  • There are technical limitations on network access to Azure NetApp Files, some of which is outlined in this document about supported network topologies. I won’t dig into this but it’s sufficient to say that your network needs to be ‘joined’ to the network hosting the NetApp Files volume. If you could do this from, for example, AWS SageMaker that would be great, but I not sure it’s possible and confident it’s not easy.
  • You’re paying for performance you may not use. Azure NetApp Files is always-hot storage and the minimum allocation is 4 terabytes. You can host many users on that but unless you do (and they are active frequently) Azure NetApp Files is probably not cost-efficient for this use case.

What I want from homespace as a service

The thing I want most from homespace as a service is for it to “just work”. I’ve seen several ingenious ideas and workarounds but they all feel like hacks. For me “just working” means available, accessible, affordable.

  • Available — The first step is for cloud providers to embrace the paradigm. Until the cloud tools I want to use have a way of mounting in or interacting with a homespace as a service the rest is academic.
  • Accessible — A secure, globally accessible, read-many write-once (or many), low latency POSIX file system, that will easily mount on to virtual machines, containers, etc. Easy right?
  • Affordable — I think this boils down to re-heatable storage. I don’t work 24–7, I have weekends off, tend not to work at night and once in a blue moon, I take a holiday. I want storage that is cool until I need it. I’d be willing to have a short (30 second?) warm-up time but once hot it needs to be low latency. I don’t think this is as challenging as it sounds as I envision homespaces being small, maybe only 10GB. In a decent data centre you should be able to move 10GB from cool to hot in 30 seconds (I feel).

10GB, really?

Really. This isn’t supposed to be Dropbox. It’s not for data, it’s not for photos and videos. It’s for a few Notebooks, perhaps a few repositories you’re actively developing, your cloud credentials, and perhaps a conda environment or two (conda environments as a service is another idea that needs exploring). How big can your .bashrc file grow?

So what now?

I’m keen to see if this idea has wider appeal and I’m keen to hear what solutions and workarounds the community currently have. There are some experiments I’d like to test out if I get the time such as making my Pangeo homespace accessible to other systems. Finally, I’ll be airing this idea with our contacts at the various cloud providers so who knows maybe homespaces will be the next “as a service”.

Homespace icon with a heart

--

--

Theo McCaie
Met Office Informatics Lab

Head of the Met Office Informatics Lab. Leading a team exploring new ideas in and between Science, Design, and Technology creating beautiful, useful solutions.