Cloud Scale Spreadsheet Computing — A New Paradigm

Bediako George
Advanced Spreadsheet Computing
8 min readSep 6, 2022

Producing Scalable Financial Applications at Speed

Abstract

Business application development is difficult. Even more challenging is building complex financial applications. Successful implementation of the business logic defined in large financial applications requires the participation of Subject Matter Experts (SMEs), the insight of mathematicians, and the grit of experienced software engineers. Engineering a correctly functioning financial application is daunting, even with the right talent available.

SMEs’ reliance on spreadsheet applications to describe the models they need to process financial data suggests that taking advantage of the preponderance of this practice will lead to faster development of financial applications.

Applying Pebble Stream technology to convert spreadsheet models to cloud-ready code automatically represents a paradigm shift in financial application development. It provides for predictable performance and scalability.

Testing spreadsheet models that are automatically converted to code is straightforward. Consequently, software bugs are easier to identify and correct.

The benefits afforded by the spreadsheet computing paradigm result in faster application deployment, focused roles for developers and SMEs, and lower operational costs.

What are Financial Applications?

Financial applications are software applications that apply mathematical, statistical, and logical operations to financial data. Financial data may describe financial instruments such as mortgage-backed securities, bonds, real estate, stock, or derivatives. Computations applied to financial data include, but are not limited to, tax impact analysis, cash flow analysis, scenario analysis, income allocation, risk analysis, arbitrage, and price prediction.

How is Financial Data Modeled?

Traditionally, financial SMEs use spreadsheet models to convey an application’s calculation requirements to developers. Spreadsheets are typically composed of one or more worksheets consisting of one or more cells. Cells contain formulas that perform calculations. The cell’s display shows both the cell’s formula alongside the cell formula’s calculated value. Coupling the cell’s computation formula to the cell’s value provides traceability of the cell’s input to output. This direct traceability of a cell’s formula to its value is a crucial feature of spreadsheet computing.

Spreadsheets make the calculation of a spreadsheet value transparent by visually tying the cell’s result to its computation.

The spreadsheet data representation model uniquely suits the processing of financial data. Financial data processing involves the production of new data by applying calculations to a starting set of input data to achieve a fresh collection of financial data. That fresh collection of financial data is provided to further computations to produce yet another set of data, with this process repeating until the culmination of the final result.

As developers and SMEs both understand spreadsheet models, spreadsheets provide a common ground for discussing financial models, allowing tighter collaboration amongst developers and SMEs.

There are other technical advantages of representing financial data as spreadsheets. Worksheets are tabular in structure. As such, they lend themselves to easy import and export to external data stores. Worksheets can be stored by name in key-value stores or loaded into tables in relational databases. This low impedance in data representation lends itself to superior performance during data persistence. Developers can export worksheet data in portable formats like CSV, JSON, XML, and fixed-length files, enhancing legacy system interoperability.

Spreadsheets can seamlessly represent workflows. Worksheets map to both input and output tables, as well as providing scaffolding for intermediate computations.

Despite these advantages of representing financial models as spreadsheets, the current state of the art in spreadsheet computing imposes unnecessary constraints. In particular, spreadsheet implementations like Microsoft Excel constrain calculation performance and scalability. Although SMEs create spreadsheet models to clearly illustrate how a single financial asset is to be processed, SMEs cannot use Microsoft Excel to analyze large numbers of financial instruments.

Processing Spreadsheet Models at Scale

To overcome this problem, SMEs turn to software engineers to translate spreadsheet models into back-end processes that can analyze large quantities of financial data. SMEs often use Excel to illustrate the expected calculations performed on the financial data to the developer. Developers will perform these calculations by effectively handwriting them in computing languages like Java, C, C#, or even Python. Let’s collectively refer to these computing languages as code for brevity’s sake.

Manual Model Conversion

Manually converting spreadsheet logic to code is fraught with problems. Depending on the complexity of the financial model, it can take significant amounts of time to translate a spreadsheet into a functioning back-end process. Developers must first take time to get the gist of the spreadsheet model, then write code that performs precisely as the spreadsheet model dictates.

What if the logic in the original spreadsheet model could be automatically translated into a scalable back-end process?

As these processes are deployed on computing platforms that do not behave exactly like the spreadsheet models they were derived from, proof of correctness is limited to comparing the spreadsheet model’s final outputs to the code’s final outputs. Only the developer can review the code when deviations from the expected output are detected, as SMEs typically do not understand coding languages outside of spreadsheet models. Consequently, collaboration is restricted to where common understanding exists, in the original spreadsheet model, instead of the code where the logic deviation occurred. This consequence is unavoidable and directly responsible for the impedance of mutual understanding of the model-code deviation by the SMEs and developers.

There are even more drawbacks to manually translating spreadsheet models to code. The principal advantage of the spreadsheet computing model is the explicit tying of a calculation’s input to its output. As we previously mentioned, the spreadsheet computing model inherently contains this feature. When developers manually translate a spreadsheet model to code, the loss of the ability to transparently tie a computation step to its output is the result.

A Better Way to Scale Spreadsheet Models

What if the logic in the original spreadsheet model could be automatically translated into a scalable back-end process? What if that scalable back-end process could produce outputs that align perfectly with every worksheet in the original spreadsheet model?

Automate the Conversion of Spreadsheets to Code!

As it happens, Georgetown Software House has developed technology to automatically convert spreadsheet models to scalable code that runs in both cloud and on-premise environments. This technology, called Pebble Stream, automatically translates a spreadsheet model into code. Pebble Stream’s spreadsheet translation logic is battle tested; Georgetown Software House implemented a financial calculator application using Pebble Stream’s automated code translation software to compute tax impact for complex corporate relationships. Tax analysts produced several spreadsheets dictating the complicated financial calculations to be run by the financial calculator. Each spreadsheet contains multiple worksheets, and each worksheet contains numerous columns of cells. Business analysts used the spreadsheet’s output analysis to create internal reports and prepare detailed IRS forms.

Pebble Stream technology automatically converts spreadsheet logic to libraries that run in Java, JavaScript, C#, C, C++, Scala, Groovy, and Clojure.

Pebble Stream technology automatically converted the computations in the spreadsheets into code, thus negating the need for software developers to understand the business logic implemented by the spreadsheet model. The automated translations of the spreadsheet models occurred in seconds. Consequently, Georgetown Software House completed the first application deployment in a little over a month, with subsequent releases steadily adding additional functionality over six months.

Improved Performance!

Automated translation provides an additional benefit by isolating developer contributions to this application to system infrastructure. Application developers can now focus on the performance of data transport to and from persistent stores and system memory. As a result, the resulting system processed many hedge fund partnerships in a short time.

The application stored its computation results in a cloud-based hierarchical key-value store. Using a key-value store sidestepped problematic object-relational impedance issues, resulting in simpler data persistence. The nature of the stored data was tabular, directly mimicking the table structure of spreadsheet worksheets. Worksheet names mapped directly to table names, facilitating the direct export of results to downstream relational data-stores. Ontological concepts codified worksheet-to-table mappings, ensuring precise and meaningful data transformation.

The project’s SMEs properly constructed spreadsheets to behave as mathematical functions. An essential property of a mathematical function is that it will always produce the same output given a specific input. This functional property supports the application of automated caching strategies to spreadsheets.

Caching spreadsheet outputs can result in further performance improvements. Georgetown Software House executes static analysis of spreadsheet formulas to ensure the spreadsheet’s functional behavior. Automating caching means that it is no longer a developer consideration.

Capturing spreadsheet logic as a library allows developers to simultaneously execute this logic in cloud environments. Pebble Stream’s runtime runs seamlessly within containers and is thread safe, and can be used in parallel processing environments and clustered applications.

Developers deployed these converted spreadsheet processes to cloud-based Kubernetes clusters. The Kubernetes cluster contained dozens of application containers simultaneously executing millions of spreadsheet computations. The performance improvement over the legacy platform was quite noticeable. In one execution, the new system processed a year’s worth of financial computations in hours.

Comprehensive Testing!

Business analysts confirm the correct execution of spreadsheet calculations by requesting the cloud-based spreadsheet computation results as Microsoft Excel spreadsheets. The system automatically compares the output of the cloud-based computation to the Microsoft Excel model.

Comparison testing revealed computational differences at the cell level. Spreadsheet model errors are confirmed when both the cloud-based and Excel computation provides the wrong answer. Business analysts can correct these errors without developer intervention and re-incorporate the updated model in the cloud-based application.

Re-incorporation of spreadsheet models into the cloud based application is accomplished via traditional Dev-Ops techniques. Because converted spreadsheet models are files, they are version controllable using tools like Git, Mercury, or Visual Source Safe. Developers can easily incorporate these converted spreadsheet files into the cloud application’s continuous integration and deployment (CI/CD) pipelines.

Developers can also unit test spreadsheet models. Pebble Stream affords a powerful automated test generator. Pebble Stream’s test generator can produce hundreds of test input scenarios of increasing complexity for a given spreadsheet model. The test generator automatically stress tests a spreadsheet model using these input scenarios.

Regression Analysis

Regression testing a spreadsheet computation is straightforward; recall spreadsheet computations are mathematical functions. This function property makes spreadsheet computations highly amenable to regression testing. A regression is detected whenever the current spreadsheet’s result differs from a past spreadsheet’s result. Georgetown Software House executes static analysis of spreadsheet formulas to ensure the spreadsheet will behave like a function.

Auditability

Since the values computed for each cell in the associated worksheets of the spreadsheet are retained, SMEs can trace the origin of each calculated value to its individual cell inputs. Code implementations do not attain the comprehensive traceability naturally produced in spreadsheets. These intermediary results are easily persisted to back-end data-stores and recalled as proof of computation to effortlessly meet the most stringent levels of enterprise audit requirements.

Conclusion

Spreadsheet computation at scale represents a significant paradigm shift. It combines the clarity of spreadsheet computing with the scalability and performance of cloud-ready code.

Pebble Stream technology enables the automatic conversion of spreadsheet templates to scalable cloud-ready code. Developers focus on ensuring optimal cloud application performance, while SMEs and business analysts focus on guaranteeing the computational correctness of the spreadsheet model.

To learn more about how we see the role of spreadsheets in the enterprise, please contact us at Georgetown Software House, Inc.

--

--

Bediako George
Advanced Spreadsheet Computing

Bediako is the founder of Georgetown Software House. When not developing software, he runs and plays a lot of soccer.