A Data Biosphere for Biomedical Research

Introduction

Data Biosphere: Principles

Modular Components

Figure 1. The diagram illustrates a proposed architecture for a Data Biosphere, consisting of (from the bottom up): (i) Data Assets, such as large datasets of genome sequences or images, stored on Clouds that provide low-level services, such as storage, databases, and access control (grey); (ii) Data Access Services, which control access to data services and expose them via standardized APIs to multiple different services created by many groups (blue); (iii) Indexing and Search capabilities to make it easy for researchers to find data and build cohorts (pink); (iv) Workspaces, which are analytical sandboxes where researchers can perform analyses on cuts of data and share them with collaborators (red); (v) Analytical engines, which allow users to deploy workflows and perform exploratory analyses (green); (vi) Repositories for sharing workflows and notebooks (orange); and (vii) Specialized Portals and user interfaces to support ad-hoc use cases and leverage the underlying services (yellow and peach).

Data Environments

Figure 2. The diagram illustrates the role of Data Environments: (i) data assets are stored on one or more clouds; (ii) Data Environments are stood up and operated to enable researchers to access and analyze these data assets; (iii) each Data Environment assembles Components that meet the needs of its community of researchers. A given data asset can be accessed by multiple Environments (as represented in by data asset A in this Figure), and a given Environment can access multiple assets (as represented by Data Environment #3 in this Figure).

Creating a Data Biosphere

Conclusion

--

--

--

UC Santa Cruz Genomics Institute

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Importance of Analytics and IT intervention in Insurance Sectors

SMART 2021 Goal Week 4: Busier Days Ahead

10 Visualizations Every Data Scientist Should Know.

11. Using image data, predict the gender and age range of an individual in Python.

Strategic Clustering for Establishing a Restaurant

How Big Data Increases Profit in Gambling

Spreadsheets to Python: It’s time to make the switch

This one here beats the hell out of surveys!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Benedict Paten

Benedict Paten

UC Santa Cruz Genomics Institute

More from Medium

An Average Man’s Take On The Great Courses — BIG DATA: How Data Analytics Is Transforming The World

What is Big Data and Why is it Important?

Knowledge for Development, A Machine Learning Approach

Data Science-y Roles at Amazon