Introducing BatteryArchive.org — A Public Battery Data Repository
- Few battery data sets are public and even fewer are in a common format, making it difficult to compare data across studies.
- This article describes the features of Battery Archive, the first public repository for visualization, analysis, and comparison of battery data across institutions.
- Battery Archive is built on open-source tools with the goal of making it interoperable with existing software resources in the battery community.
- The software behind Battery Archive is also available as a Github open-source project.
- Click here to provide feedback on the project.
Batteries are central to modern society, powering consumer electronics and electric vehicles, and playing an integral role in energy storage. Significant research has been conducted to understand their performance and degradation under a wide range of conditions. If made public, the data are typically reported in articles as plots, such as capacity fade versus cycle count. While such plots provide a useful summary of the key results, they are not very amenable to further analysis. The data points must be extracted to compare results from different studies and many kinds of modeling and analysis are not possible without the actual charge-discharge curves. The raw data from battery cycling studies are typically not shared: previous articles have reported on just a few well-known data sets, some limited to a single cell. Even when raw data are uploaded to an individual research group’s website or a repository like Zenodo, they are not standardized. Different file types, column structures, and calculated values are used in each dataset. Thus, a lot of post-processing is still required to compare the studies of even those groups that have graciously uploaded their raw data. Motivated by the difficulty we encountered while evaluating cycling results at Sandia National Laboratories (SNL) against existing studies, we decided to create Battery Archive, a repository for easy visualization, analysis, and comparison of battery data across institutions.
At its core, Battery Archive is an open access repository of battery data based on open-source software. The interface is meant to be simple enough for casual users to compare battery performance, while still offering more advanced modeling and analysis capabilities for experts in the field.
Upon entering the site, a user can select the cells of interest (Figure 1). The database allows filtering for specific studies and metadata related to cell features (such as cathode and capacity) and cycling conditions (such as temperature and C-rate). We developed a simple set of metadata for tagging each cell in the database, and we plan to update this as standards in the battery community evolve.
The site automatically generates several basic plots including capacity and energy decay, coulombic and energy efficiency, and full charge and discharge curves at specified intervals (Figure 2). This information is presented as both time series and cycle aggregates. Immediate visualization is a key aspect of the site. Our users have described it as the ‘try before you buy approach.’ In other words, immediately see what the data look like rather than taking the time to download and process a zip file only to see that the data set does not have the features you are interested in. The data in the plots can be downloaded for offline analysis and scripts in Battery Archive’s associated GitHub site are available for larger batch downloads.
The complete details of each study are presented on the Studies page. The entry for each institution links to the corresponding publication and gives the appropriate attribution for using that data.
We began developing the site by cross-posting datasets that were already publicly available, with the permission of the relevant groups. Battery Archive is not intended to replace existing data repositories that issue a permanent digital object identifier (DOI) such as Data Dryad, Zenodo, or institutional repositories. Instead, our goal is to provide an additional repository for researchers to put copies of their battery datasets in a standard format in order to enable an immediate visual comparison between studies. All the data can be downloaded as a CSV time series and cycle file for each cell. While this file type was chosen for the sake of simplicity, in the future, the data can be made available in different formats (e.g. hdf5) based on the user’s needs.
Beyond cross-posting existing data sets, we are also reaching out to groups that have not previously published their data, especially when we see a paper with a significant amount of battery performance and degradation data. Although some groups are understandably hesitant to share data, especially if they are still doing analysis on that data for a future publication, we have been gratified by the positive response from the community and have many new data sets in the pipeline. This site would not exist without the generosity of the groups sharing their cycling data.
When we started Battery Archive, our guiding principle was to maximize the use of existing open-source packages. We decided to architect the site around Redash, an open-source Extract, Transform, and Load (ETL) software system. The Redash project started in 2013, is now on version 8, and has almost 400 contributors. Redash contains data source connections for the most popular relational (PostgreSQL, MySQL) and non-relational (InfluxDB, MongoDB) database systems. It also connects to online tools like Google Sheets and JSON data sources. Once a data-source connection is established, it can be queried using standard SQL from the Redash web interface, as shown in Figure 3. The data returned from the query can then be displayed in tables or graphs. These graphs can then be arranged on public and private dashboards using the Redash web-based dashboard editor, allowing, for example, battery cycle and time series data to be displayed on the same dashboard
We developed data ingestion tools for both completed tests (Arbin, MACCOR, MATLAB, generic Excel/CSV) and ongoing tests (Arbin Access files, and PEC Oracle Database). We then designed SQL queries to extract and present the data in formats used by the battery community. We will continue to add more plots based on the feedback we receive.
Finally, Battery Archive includes functionality for more complex data manipulation in Jupyter Notebook. For example, Figure 4 compares the capacity fade based on reference performance tests (RPTs) of cells from two datasets, one from the Hawai’i Natural Energy Institute (HNEI) and one from SNL. Although the studies used different cycling protocols and RPT frequencies, it was still possible to extract the appropriate cycles in Jupyter. Although this example is relatively simple, it demonstrates the importance of and the potential for advanced manipulation of the datasets in Battery Archive.
Open-Source Project Release
We received requests to install private versions of Battery Archive and released the software framework behind Battery Archive as an open-source project. We anticipate that many groups will want to share some data streams while keeping other data private. We hope that more groups will adopt the architecture and take advantage of the free importers, queries, and export tools that we are making available, and contribute to the project. The first version of our public application programming interface is available on GitHub and will be expanded in the next few months. A description of the open-source software and how it is being used by battery companies is available here.
Since launch, the site has been used by thousands of individuals from academia, industry, and utilities. Examples of users include: developers of non-battery energy storage technologies wanting to understand how their products compare to batteries under different conditions, representatives of utilities installing energy storage systems who are trying to get a better sense of what conditions exacerbate battery degradation, and academics who are trying to validate their battery degradation models with data from more studies. There are even users from the battery industry, including software-oriented companies which need data to test their ideas and products, and companies that already have a lot of battery data, but want to run some quick tests to try out a new idea or access data for different batteries. In short, free and easy access to diverse battery data is critical for a variety of users.
The site is in its early stages and many features are still under development. There are many battery cycling datasets in the pipeline and, as we get a better sense of the range and forms of data, we will embed a standard submission tool in the website and refine the minimal set of metadata required. Another key goal is to integrate with existing resources in the battery community, for example, using battery evaluation and early prediction BEEP to facilitate data import and key feature tagging, and PyBaMM to enable comparisons with models.
As we continue to expand the site, we welcome critical feedback from users. Please fill out a quick survey if you would like to provide some feedback right now or email us at email@example.com if you would like to share data or help with site development.
This work was supported by the Department of Energy, Office of Electricity Energy Storage Program. Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. SAND2021–11112 O
About the Authors
Dr. Valerio De Angelis is a Distinguished Member of Technical Staff at Sandia National Laboratories. His research interests include battery modeling, system integration, and advanced manufacturing. Before joining Sandia, he was the Executive Director of the City University of New York Energy Institute. He expanded the scope of battery research to grid-scale systems and co-founded Urban Electric Power. Before joining the Institute, he was the CEO and founder of Mindflash, a leading provider of online training software acquired by Applied Training Systems.
Dr. Yuliya Preger is a Senior Member of Technical Staff in the Energy Storage Technology and Systems Group at Sandia National Laboratories. Her current work is centered on the safety and reliability of lithium ion and aqueous batteries for grid-level energy storage applications.