Cascade by TinkRNG

TinkRNG Team
17 min readMar 21, 2022

--

TinkRNG Team | tinkRNG.eth

Abstract. Cascade by TinkRNG is a novel, biologically-inspired generative art project portraying patterns in the human proteome. Each piece represents a unique protein, with properties derived from information including the protein’s amino acid sequence, the cellular compartment where the protein localizes to, and the primary biological process that the protein is involved in. In this paper, we describe a framework for representing proteins using generative art, present the distributions of protein traits across the collection (artwork rarity), describe the minting of the artwork as NFTs, and address the carbon footprint associated with the project. Overall, Cascade represents a key step in integrating biological data and patterns with generative art.

Contents

I. Introduction

II. Protein Library

III. Representation of Proteins and Art Traits
A. Biological Process Trait
B. Cellular Compartment Trait
C. Amino Acid Sequence and Protein Size
D. Overview of Generative Art Algorithm

IV. Distribution of Protein Traits
A. Definition of Available Protein List for Collection
B. Distributions of Available Protein List
C. Analysis of Trait Combinations for Available Protein List (“Rarity Matrix”)

V. Mint Mechanics
A. Mint List Roles, Price, and Allocations
B. Protein Assignment and Artwork Construction
C. Artwork Reveal

VI. Carbon Footprint
A. Estimation of Carbon Footprint
B. Addressing Estimated Carbon Footprint
C. Transition of Ethereum to Proof-of-Stake and Future Considerations

VII. Conclusions and Future Outlook

VIII. Citations

IX. Attribution, Terms of Use, and Disclaimer

I. Introduction

TinkRNG is the passion project of three engineers with a love for using math and molecules to understand patterns in life and a desire to tinker and build beautiful art. In our first project, Cascade, we aim to integrate biological data and patterns with generative art. In doing so, we hope to share with others our passion for biology and to inspire excitement around our continued effort to share the beauty of nature.

While some of nature’s patterns can be seen at the macroscopic scale in our everyday life, such as geometric patterns in shells and plant structures, intricate molecular patterns unseen to the naked eye underlie every aspect of life. This collection features some of these molecular patterns, specifically those associated with proteins, which are the biomolecules that carry out the essential functions of life. Each piece of artwork represents a unique protein, with properties derived from information including the protein’s amino acid sequence, the cellular compartment where the protein localizes to, and the primary biological process that the protein is involved in. Taking artistic inspiration from the Fidenza collection by Tyler Hobbs [1], we create a framework for the integration of biological data with generative art in an attempt to create artwork that is as meaningful as it is appealing.

In this paper, we describe our framework to represent proteins using generative art. First, we describe the freely available protein database used to obtain information on proteins from the human proteome (Section II). Then, we present our framework for representing proteins using generative art (Section III). Next, we provide the distribution of protein properties across the list of proteins available in the collection (Section IV). As this generative art project is distributed as NFTs built on the Ethereum blockchain, we next describe the minting mechanics (Section V) and, importantly, assess the potential carbon footprint (Section VI). While we are optimistic about the future energy efficiency of Ethereum given its impending transition to a fully proof-of-stake blockchain, we provide an estimate of the project’s carbon footprint in the current state of Ethereum and present a plan to responsibly address it. Lastly, we conclude with an outlook on biologically-inspired generative art as an emerging space and comment on its potential to impact public engagement in science (Section VII).

II. Protein Library

Artwork in the Cascade collection is generated by an algorithm that makes artistic decisions based on a limited set of freely accessible information about naturally occurring proteins in the Home sapiens proteome. Each piece of artwork in the Cascade collection represents a unique protein from the human proteome. Proteins were selected from the reviewed section of the Homo sapiens proteome* accessed online via the freely accessible UniProtKB [2] by the UniProt Consortium (see Section IX. Attribution, Terms of Use, and Disclaimer for more information). Specifically, protein sequences and two Gene Ontology (GO) annotation terms were obtained through UniProtKB. The Gene Ontology annotation terms correspond to the Gene Ontology project Version 2022–01–13 10.5281/zenodo.5874355 [3,4] by The Gene Ontology Consortium (see Section IX. Attribution, Terms of Use, and Disclaimer for more information). This biological information was used to inform artistic traits in the Cascade collection.

[*] Accessed January 30, 2022, with the following search term: reviewed:yes AND organism:”Homo sapiens (Human) [9606]”

Note: Please refer to Section IX. Attribution, Terms of Use, and Disclaimer and the Terms of Use Agreement for Cascade by TinkRNG and Cascade Genesis by TinkRNG (https://www.tinkrng.io/termsofuse.pdf) for attributions of the aforementioned information used under Creative Commons CC BY 4.0 from The UniProt Consortium and The Gene Ontology Consortium. The UniProt Consortium and The Gene Ontology Consortium do not endorse or sponsor projects by TinkRNG, including Cascade by TinkRNG or Cascade Genesis by TinkRNG. TinkRNG is not granted any official status by The UniProt Consortium and The Gene Ontology Consortium.

III. Representation of Proteins and Art Traits

As previously stated, each piece of artwork represents a unique protein from the human proteome. Properties are derived from the primary biological process that the protein is involved in, the cellular compartment where the protein localizes to, and the protein’s amino acid sequence. The representation of these attributes is defined in this section.

A. Biological Process Trait

Proteins are designed to perform specific functions, like catalyzing a reaction or transporting ions across a membrane. Groups of proteins come together to achieve more complex cellular processes, like movement, growth, division, and programmed death, which are represented in the artwork by the biological process trait. For each biological process, we created a unique vector field template that is inspired by the biological process (Figure 1).

Figure 1: Vector fields represent the primary biological process that the protein is involved in.

In this collection, we have chosen to focus on 10 biological processes that are critical for cellular function and lend themselves to artistic depiction. These are specified by the Biological Process trait in the Cascade collection. This artwork trait is informed by the Gene Ontology (GO) [2–4].

B. Cellular Compartment Trait

In order to execute its function, a protein must move to a specific location in the cell, such as an organelle or the plasma membrane. The cellular compartment trait represents protein localization through the appearance of the bands, which are inspired by the appearance of these cellular compartments as detailed in Figure 2.

Figure 2: Band appearance represents the cellular compartment where the protein is located. Image of cell obtained from [5].

In this collection, we have chosen to focus on 9 cellular locations that can describe the localization of the majority of proteins in human cells. These are specified by the Cellular Compartment trait in the Cascade collection. This artwork trait is also informed by the Gene Ontology (GO) [2–4]. In the case that a protein localizes to multiple locations in the cell, the Cellular Compartment trait was assigned according to the following search order: Cell Adhesion, Cell Projection, Cytoskeleton, Mitochondrion, Golgi Apparatus, Endoplasmic Reticulum, Nucleus, Plasma Membrane, Cytosol. As such, proteins that localize to specialized subcellular structures and organelles are assigned to those cell compartments before they are assigned to more general compartments like the nucleus, plasma membrane, and cytosol.

C. Amino Acid Sequence and Protein Size

A protein consists of a long, linear sequence of amino acids, which folds up to form its three-dimensional structure (Figure 3a). In the artwork, the amino acid sequence is represented by bands. Each band contains a segment of the amino acid sequence and originates from a spatial location on the art that corresponds to its relative position in the sequence (Figure 3b). As the structure and function of a protein depends on the sequence and chemical properties of its amino acids, colors are used to represent the amino acid properties. Specifically, each segment representing an amino acid is colored based on the chemical class of that amino acid (Figure 3c). Therefore, each color palette contains one background color and five amino acid colors.

Figure 3: Amino Acid Sequence and Placement

Lastly, the lengths of protein sequences in the human proteome vary. To include each amino acid sequence in the artwork, the sizes of bands are scaled to ensure a minimal coverage fraction of the total area in the artwork. As such, the size of bands in the artwork is inversely related to the size of the protein. Large proteins have many small bands, and small proteins have few large bands.

D. Overview of Generative Art Algorithm

The generative art algorithm is designed to balance protein representation with intrinsic randomness, such that traits of the protein remain identifiable in the art while still achieving a significant level of uniqueness. There are two major sources of variability in this collection. First, the “trait variability” corresponds to the variability in the amino acid sequence, length, biological process, and cell compartment that exists naturally in the human proteome. In this dimension, uniqueness is driven by the attributes of the protein (See Section IV). Second, the “intrinsic variability” corresponds to random noise introduced into each piece of artwork on top of the rules specified by the protein traits. Here, uniqueness is achieved by pseudo-random number generation. Overall, all aspects in the art are either derived from the protein attributes or randomly generated.

The algorithm used to generate art is described briefly here. First, a vector field for the protein is created using the vector field template corresponding to the biological process trait (Figure 1). The vector field templates specify aspects of the vector field including the background field direction; the location and strength of sources, sinks, and vortices; the amount and type of noise; and the existence of a spatially distinct second vector field. Using these rules, a unique vector field for the protein is generated. Then, properties of the bands are set from the cell compartment (Figure 2). These include the shape, distribution for size (width and length), distribution for the number of segments per band, and other aspects of the band appearance. Next, bands with segments representing short amino acid sequences in the protein sequence are iteratively placed along streamlines in the vector field. (Note: for the cytoskeleton and plasma membrane shapes, each band contains a single segment corresponding to a single amino acid.) To determine the origin of these bands on the streamlines, the position of the first amino acid in the sequence is mapped onto the rectangular art space using a raster approach (Figure 3b). The color of each segment in the band is derived from the amino acid type (Figure 3c) and set according to the specified color palette for the piece. The algorithm uses an iterative scaling approach to ensure that all amino acids in the protein sequence are represented in the artwork. To do this, the sizes of bands are scaled to cover a fixed fraction of the total area in the artwork. The algorithm was implemented in Python [6]. Together, the biological process, cellular compartment, and amino acid sequence drive the appearance of the artwork representing a protein, as shown in the example construction for the protein “Ankyrin repeat and SOCS box protein 14” shown in Figure 4.

Figure 4: Example Protein Construction

IV. Distribution of Protein Traits

Here, we address the distribution of protein lengths and traits across the entire list of proteins available in the collection. These distributions directly portray the rarity of artwork traits (see rarity matrix in Figure 8). Additionally, these distributions portray emergent biological patterns in the collection.

A. Definition of Available Protein List for Collection

From the proteins in the protein database (Section II), proteins less than 50 amino acids in length were excluded. To avoid misrepresentation or ambiguous representation of proteins, only proteins categorized by exactly one of the Biological Process Traits and at least one of the Cell Compartment Traits were used, yielding a list of 3,198 proteins.

B. Distributions of Available Protein List

Here, we present data on the 3,198 proteins available in the collection (Figures 5–7).

Figure 5: Distribution of Protein Length in the Available Protein List. Average length is 647.
Figure 6: Distribution of Biological Process Trait in the Available Protein List.
Figure 7: Distribution of Cellular Compartment in the Available Protein List.

C. Analysis of Trait Combinations for Available Protein List (“Rarity Matrix”)

Next, we assessed the joint distribution for cell compartment trait and biological process trait, which is depicted with a 2D histogram of relative probability for each possible trait combination (Figure 8). The relationships depicted here represent biological patterns embodied by the collection as a whole. Additionally, the probability of these trait combinations relates directly to the rarity of artwork.

Here, we highlight some of these biological trends. Proteins involved in signal transduction are most frequently located at the plasma membrane (Prob = 0.070), cytosol (Prob = 0.048), or nucleus (Prob = 0.057), consistent with the cell signaling paradigm of integrating signals at the cell surface, transducing them with messengers in the cytosol, and affecting gene expression in the nucleus. Similarly, proteins involved in cell adhesion are located at plasma membrane (Prob = 0.057) or specialized adhesion structures (Prob = 0.025), consistent with this process requiring the direct interaction of cells with their surroundings. These are just a few examples of the many trends across the collection, which together highlight interesting biological patterns and support the validity of the protein trait assignment protocol used here.

Figure 8: 2D Histogram for Frequency of Biological Process (x) by Cell Compartment (y).

V. Mint Mechanics

This section describes the processes of minting, generating, assigning, and revealing artwork.

A. Mint List Roles, Price, and Allocations

Cascade by TinkRNG is a 1000-piece collection. The distribution of the 1000 pieces is intended to be 85% access-list mint allocation + 10% public mint allocation + 5% team mint allocation. Of the 50 mints available to the team, 30 pieces will be minted by the 3 core team members, and 5 mints will be allocated to the team wallet for future promotional purposes. The remaining 15 mints will be allocated to moderators and contributors who were fundamentally important to the launching of this project. If any of the remaining 850 access-list spots are not minted, they will be rolled over to the public mint. Both access-list minters and public minters will be able to mint 1 piece for 0.1 ETH (not including gas fees). (Note: the public mint will be 1 mint per transaction, however, not 1 mint per wallet.) Winners of the Cascade Genesis pieces released on Foundation will receive 1 free mint, as well as the opportunity to mint 5 additional pieces. These mints are counted under the 850 available access-list spots for the community.

B. Protein Assignment and Artwork Construction

1000 unique proteins will be randomly selected (without replacement) from the total 3,198 proteins in the Protein List (defined in Section IV-A), excluding proteins used in the Cascade Genesis by TinkRNG collection. Each piece in the Cascade collection is derived from one of these unique proteins, along with a unique seed number used for pseudo-random number generation in the artwork construction process. As detailed in Section III, all features of the art are derived from properties of the protein, including its amino acid sequence, size, and gene ontology, and the random number generator seed.

C. Artwork Reveal

The artwork and metadata of Cascade by TinkRNG will be revealed within 72 hours after the completion of the public mint. At this time, the 1000 unique proteins/artworks will be assigned randomly to the 1000 token IDs.

VI. Carbon Footprint

As a group, we fully recognize the carbon footprint currently associated with proof-of-work blockchains like Ethereum. Currently, the estimated energy consumption of Ethereum per annum is around 9.62TWh (September 2019–2020) [7]. However, we remain optimistic about the future of Ethereum and its impending transition to a fully proof-of-stake blockchain [8], which will massively reduce energy consumption by a reported 99% [9]. At the same time, we remain determined to do our best accounting of this project’s carbon footprint and discuss how we can offset it. Here, we present estimates of the carbon footprint for the project under the current state of Ethereum (Section VI-A) and present a plan to offset this carbon footprint, both at time of mint and through its lifetime (Section VI-B).

A. Estimation of Carbon Footprint

To estimate the carbon footprint of our project, we utilized the historical data of similar existing NFT generative art projects using a tool developed by Offsetra Knowledge Base, a carbon offset company familiar with the impact of blockchain technologies. This company, led by artist and engineer Memo Akten, has created a set of tools to assess the carbon footprint of individual wallets as well as Ethereum contracts: http://cryptoart.wtf. This tool has essentially become an industry standard and is referenced in many articles that discuss NFTs/Ethereum’s environmental impact. Although the analysis of this calculator primarily involves transactions on SuperRare, we feel like the estimations of carbon footprint per unit of Gas is applicable for our purposes.

Offsetra details the methodology for their calculations in [10], but at its most basic level the method for calculating the carbon footprint of a contract involves summing the Ethereum Gas units consumed over all transactions that interact with the contract and multiplying it by an estimated emissions factor. This factor is derived from the energy mix and geographic location of Ethereum mining pools. The key figures used in these estimations are Energy footprint per unit of ETH Gas (0.0005454743 kWh) and Carbon footprint per unit of ETH Gas (0.0003182308 kg CO2), which correspond to averages across the period from December 2020 through January 2021 [11]. Fundamentally, these calculations should be considered as accurate to the “order of magnitude,” which we believe is appropriate for the purposes here. When applicable, we will consider the higher end of the estimates in an effort to overestimate our impact rather than underestimate it.

Because we have not yet launched our project, we assess estimates of four generative art projects (Table 3), for which we have a lot of respect. These projects were chosen because they have similar collection sizes (around 1000 mints) and similar Ethereum contracts. Uniquely, due to a number of factors related to the mechanics of each project, there is a range of total ETH Gas consumption and thus kg of CO2 emission.

For some context, the per capita CO2 emission world-wide is currently 4.79 tons per annum [12], and in the United States of America, it is 15.52 tons per annum. As such, these and similar NFT projects currently have carbon footprints equivalent to that of a US citizen over about 2 to 5 years.

B. Addressing Estimated Carbon Footprint

Oftentimes when discussing the concept of offsetting one’s carbon footprint, we imagine a situation in which you are “canceling” your CO2 impact. Unfortunately, there is no such solution. The reality is that burning fossil fuels for energy is irreversible. As long as we are using proof-of-work blockchains like Ethereum, we are directly contributing to mankind’s depletion of our ozone layer.

That being said, we are still determined to use our project funds to support pro-climate projects aimed at building renewable energy sources, contributing to the reforestation/preservation of native ecosystems, and supporting overall conservationism. Towards this end, we have decided to partner with Offsetra. Offsetra has a solid track record and history of transparency that we appreciate. In 2020, 84% of every dollar they spent went to high-impact carbon reduction projects [13]. Additionally, the projects Offsetra finds and contributes to are certified carbon reduction projects that meet rigorous international standards [13].

Towards overestimating our carbon footprint and expanding our contribution toward these causes, our team has decided to donate 3x the amount of funds required to offset the carbon footprint of our project after mint. Additionally, we will continue to offset the carbon footprint every quarter to keep up with the impact of our project as it grows due to secondary trades, airdrops, additional mints, etc. All of these donations will be made fully public and verified by Offsetra.

C. Transition of Ethereum to Proof-of-Stake and Future Considerations

Above, we have presented estimates of the carbon footprint for the project under the current state of Ethereum (Section VI-A) and presented a plan to offset this carbon footprint, both at time of mint and through its lifetime (Section VI-B). Of equal importance, we are optimistic about the future of Ethereum and its impending transition to a fully proof-of-stake blockchain [8], which will massively reduce energy consumption by a reported 99% [9]. We emphasize that improvements like these are imperative. Overall, we believe that the climate change crisis is a major world issue that demands systemic changes in nearly every aspect of human life.

VII. Conclusions and Future Outlook

In this paper, we introduce Cascade, a novel generative art project portraying patterns in the human proteome, and describe our approach to represent proteins using generative art. Broadly, we expect Cascade will serve as a first-generation framework for integrating biological data and patterns with generative art. In future projects, we plan to further differentiate our artistic style while continuing to facilitate and develop algorithms for biologically-inspired generative art projects. As a unique lens through which to view the building blocks and processes of life, we anticipate that biologically-inspired generative art will become a tool for enhancing public interest and engagement in science. Conversely, we also speculate that in the future artistic approaches like those employed here could serve as tools for the documentation, communication, and presentation of large, complex bioinformatics datasets. Overall, we have developed a framework for integrating biological data with generative art and are excited about the future of this space.

VIII. Citations

[1] Hobbs, Tyler. “Fidenza.” https://tylerxhobbs.com/fidenza

[2] The UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 49:D1 (2021).

[3] Ashburner et al. Gene ontology: tool for the unification of biology. Nat Genet. May 2000;25(1):25–9.

[4] The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res. Jan 2021;49(D1):D325-D334.

[5] Cornell, B. 2016. “Eukaryotic Cells.” http://ib.bioninja.com.au

[6] Python Software Foundation. Python Language Reference, version 3.9. Available at http://www.python.org

[7] Digiconomist. 2022. “Ethereum Energy Consumption Index.” digiconomist.net/ethereum-energy-consumption.

[8] ethereum.org. 2022. “Proof-of-Stake” https://ethereum.org/en/developers/docs/consensus-mechanisms/pos/

[9] Castor, Amy. 2022. “Why Ethereum is switching to proof of stake and how it will work.” MIT Technology Review. https://www.technologyreview.com/2022/03/04/1046636/ethereum-blockchain-proof-of-stake/#:~:text=Sometime%20in%20the%20first%20half,reach%20100%2C000%20transactions%20per%20second

[10] Offsetra Knowledge Base. 2022. “Carbon.FYI Methodology.” https://www.notion.so/Carbon-FYI-Methodology-51e2d8c41d1c4963970a143b8629f5f9

[11] Akten, Memo. Dec 30, 2020. “The Unreasonable Ecological Cost of #CryptoArt (Part 2).” https://memoakten.medium.com/analytics-the-unreasonable-ecological-cost-of-cryptoart-72f9066b90d

[12] “CO2 Emissions per Capita.” 2022. https://www.worldometers.info/co2-emissions/co2-emissions-per-capita/

[13] Offsetra Knowledge Base. 2022. “Offset Projects.”https://www.notion.so/Offset-Projects-9e95cb0be1794a53aa37233678a54a14

IX. Attribution, Terms of Use, and Disclaimer

The Digital Art of the Cascade project is generated by an algorithm that makes artistic decisions based on a limited set of freely accessible information about naturally occurring proteins in the Home sapiens proteome. The information was obtained from The UniProt Consortium and The Gene Ontology Consortium. Specifically, only the protein name, amino acid sequence, and two Gene Ontology (GO) annotation terms were used. No additional information was used, and no software or analysis tools from either source were used. The citations provided in Section VIII reflect the requested citations for these two databases. Attributions for the aforementioned information used under Creative Commons CC BY 4.0 from The UniProt Consortium and The Gene Ontology Consortium are provided in the Terms of Use Agreement for Cascade by TinkRNG and Cascade Genesis by TinkRNG located here: https://www.tinkrng.io/termsofuse.pdf.

Disclaimer: The UniProt Consortium and The Gene Ontology Consortium do not endorse or sponsor projects by TinkRNG, including Cascade by TinkRNG or Cascade Genesis by TinkRNG. TinkRNG is not granted any official status by The UniProt Consortium and The Gene Ontology Consortium. Furthermore, Digital Art from TinkRNG is intended solely to be artwork. TinkRNG does not modify or distribute any scientific data, and Cascade by TinkRNG and Cascade Genesis by TinkRNG are not intended for any scientific or other use. Ownership of Digital Art from TinkRNG does not include any of the biological information associated with artistic decisions used to create the art.

Terms of Use Agreement for Cascade by TinkRNG and Cascade Genesis by TinkRNG: https://www.tinkrng.io/termsofuse.pdf.

--

--

TinkRNG Team

TinkRNG is the passion project of three engineers with a love for using math and molecules to understand patterns in life and a desire to make beautiful art.