Book 6: Endurance
From the Diaries of John Henry
It is hard to ponder back on the essays of Book 6 without considering their context. The pandemic was gradually being retracted in the mainstream dialogue as vaccine sourced immunity started to contribute, albeit it would be hard to characterize any boundary of success as with this progress viral variants continued to claw back, partly supported by mask wearing becoming less fashionable again. Such is the cyclic nature of crowd dynamics. The US finally had a stable executive office in power, one that was still trying to grapple with how to manage the crisis of transfer of the preceding election — a struggle that endures as this is being written. We entered this period with trends of isolationism on the international front with such calm only breached by a shocking aggression targeting the people of Ukraine. Swans of all shades flew amongst us.
For the first time, the specter of a fully realized artificial general intelligence appeared at least possible if still a distant goal. Large language models were now being trained on increasingly massive scaled neural architectures and even more massive scales of training data, beginning to approach the cumulative output of humanity’s written words, in some cases even across languages and dialects. The biggest bottleneck a global chip shortage from the competition of foundry output towards consumer hardware, data centers, or the siren song of cryptocurrency mining. Other modalities like computer vision began to scale as well, leveraging an internet’s worth of camera uploads. The boundaries in applications between language, vision, audio, and other modalities started to blur. Everything converging towards attention based learning trained in a self-supervised manner. And yet somehow this author continued to plug away at his quaint little python library targeting dataframe preprocessing for supervised learning. A flea practically unnoticeable to the herds of elephants staking their claims.
There was not much in the way of user traction fueling the intensity of focus demonstrated by these essays. The author instead found that rapid progress is its own form of validation. The collection starts with the paper Tabular Engineering with Automunge that went on to be published as a workshop paper at NeurIPS, arguably the most prestigious artificial intelligence research conference. Granted no one noticed it, but that doesn’t detract from getting accepted in the first place. October then became one of the most productive months throughout the entire project, every week rolling out new features and value propositions like privacy, modularity, stochasticity, and transparency — effectively trying to put ourselves in the minds of our users for what they might find of value.
In November we took a tangent into the realm of quantum technologies, with an essay inspired by attendance at the Quantum Technologies in Machine Learning conference. This work was partly the inspiration for a more focussed benchmarking regime targeting the practice of noise injections, which at this time were experiments of an exploratory nature, although they went on to serve as fuel for the formal write-up of December.
The two formal academic write-ups in December were the culmination of a month’s worth of full time focus. Stochastic Perturbations of Tabular Features for Non-Deterministic Inference introduced the practice of lifting determinism from inference in supervised learning. We suspected that such practices could potentially be of material benefit in mission critical applications, including those involving healthcare measures, and although we don’t consider the conjecture fully validated we have yet to see dissuading evidence against the premise. In parallel to this work was a completely different line of inquiry associated with aspects of deep learning theory, in which we tried to demonstrate that the previously unexplained benefit of scaling large language models might originate from a very simple geometric property that had previously been established for hyperspheres, which we refer to as the geometric regularization conjecture. We suspect the implications could potentially be of import in more than just deep learning, extending into domains like foundations of mathematic theory. This sounds somewhat of a grandiose declaration, we use such language intentionally however it is admittedly based on a narrow scope of research into prior work in the field of mathematics.
We continued to build on the practices of non-deterministic inference in January, the published 30 page appendix could practically serve as a fully realized alternate formal documentation to the library with focus on noise injections. We tried to solicit funding on the model of a pitch deck, unfortunately this author was not successful on that front, we suspect partly owing to inexperience and difficulty demonstrating commercial traction. Fortunately, we always had our musical recordings to fall back on if the startup didn’t work out.
February began a trend of drift in themes that progressed in later months. We tried to build on the geometric regularization conjecture by exploring descriptions of high dimensioned geometry in mainstream science journalism accounts. We did roll out an important feature update to benefit troubleshooting, and suspect some other python libraries may have something to learn from our demonstration of recovering message logs even in halt scenarios. We started to ponder on whether we should consider a halt scenario of our own in context of various medical difficulties noted in the essay Entrepreneur Introspection.
We toyed with the idea of driving an electric toy in the amusing Driving the Grid. So far resource constraints have interfered with fruition of these forecasts. The paper Feature Encodings for Gradient Boosting with Automunge was yet another formal academic writeup, this time considering preprocessing conventions available in the library and associated benefit towards model performance. You know, what practitioners actually care about.
It was in April that we started to step back from a breakneck pace of software updates. Without any traction to show for the years of work it was hard to justify maintaining course. The essay Covid Considerations from Santa Fe was basically a book review on a series of talks by researchers from the Santa Fe Institute. Even though it was published in the context of falling outside of the state of emergency mindset from earlier in the pandemic, we expect there still could be lessons drawn from the minds of the institute, which in our experience have a tendency to approach problems from new directions while breaking down boundaries between domains of knowledge.
The focus of this collection really started to transition to lessons from artificial intelligence research conferences in May. ICLR was another top tier conference that we attended virtually, and we summarized our takeaways from an important workshop on automated software development in the essay Deep Coding with Deep Learning and then surveyed several takeaways from our review of papers and posters in the essay Abstract Abstractions. In a similar vein, we tried to aggregate the discussions from the inaugural meeting of the US National Artificial Intelligence Advisory Committee into a cohesive dialogue. This month also included some diversions like a short collection of aphorisms. Perhaps more importantly we drafted what we hope might become an influential essay on Covid mitigation measures that can be taken by considerations surrounding air ventilation for residential or commercial facilities in the essay Air Changes Per Hour.
The Computer Vision and Pattern Recognition conference was our first face to face gathering in several years (dating back to NeurIPS 2019). We tried to devote a commensurate amount of contribution for what, practically speaking, was an expensive outlay for our budget. We started with a survey of vendor expo attendees, then progressed to some workshop “meeting minutes” associated with Denoising Diffusion-Based Generative Modeling (our sincere thanks to presenters for permission to include excerpts from their slides) and an additional workshop around theme of self-driving cars. Within that context we also got a little creative with a new recipe. And then found an opportunity to further build on the geometric regularization conjecture by way of the essay Dimensional Framings of Overparameterization.
We closed our CVPR coverage in July with a series of “unofficial” poster awards based on our surveys at the conference. We may have different tastes and judging criteria than what went into the best paper awards, hoping that this type of collection may serve the purpose of getting some exposure to researchers that otherwise may have not reached as much visibility in the show.
August just felt like a good time to wrap up the year’s collection. The ICML conference served as some inspiration, including a review paper by way of unofficial meeting minutes of the Conformal Predictions workshop as well as some extensions to the geometric regularization conjecture related to data types inspired by one of their invited talks. We released one of our last internal validation resources for the Automunge library as a contribution to any future user base. We were overdue for a collection of terminology definitions from these essays and so assembled a glossary, although fair warning it is un-alphabetized. Finally there could be no better way to close Book 6 than a new set of aphorisms.
We thank all of the virtual research conferences that have tolerated this author and his essays over the last few years. We’re not quite sure how to proceed going forward. At some point some form of commercialization will be needed, right now we are bereft of funding and continue to bootstrap this project with the hope that someday these essays may be published, the book shop may sell a few recommendations, or some other channel may be found. If you would like to show support, a simple measure could be to just share one of the essays with a colleague or friend. Thank you for your attention, sorry if these writings can be a little overwhelming at times. Recommend you consider taking the essays in small doses over prolonged periods. Cheers.
August 2016 — July 2017
August 2017 — August 2018
August 2018 — June 2019
September 2019 — July 2020
August 2020 — August 2021
Book 6: Endurance
September 2021— August 2022
Stochastic Perturbations of Tabular Features for Non-Deterministic Inference