Creating Applied AI Experiments — May 2024

Rob Tyrie

Published in

Grey Swan Guild

7 min readMay 23, 2024

Created By Rob Tyrie (augmented)

Thinking is hard

It’s true. Sometimes reacting is a lot easier, but thinking literally takes energy... Well beyond the 28 watts that we regularly run at.

All this year I have been using and experimenting with artificial intelligence and the experience is prompting me to encourage all the people around me, especially the knowledge working professionals, to do their own experiments.

Sometimes that experiment could be just hacking around and following other people’s instructions. That’s a kind of experiment. But, what I mean is more like an experiment and people should really consider the differences between packing around and POCs and more scientific experiments.

So first of all some definitions.

Experiment: a series of steps organized in such a fashion that have humans and computers using tools to prove or disprove a hypothesis which is something that a human or computer can make up. Experiments are based on measurement and observation in the natural world or in simulations. All simulations have error and all observations have errors — like bias and noise. Those factors should be accounted for in the experiments and the experimental approach. Tests are subcomponents of experiments. Mathematics is a very good tool to be used in tests, simulations and experiments.

Mathematics: A language and philosophy that’s been used to try to explain and codified the natural world and clear angry people fashions. There are many branches of mathematics including geometry, algebra calculus, set theory, probability theory, graph theory, complexity theory and others.

Applied AI: As in the case of, physics, engineering and applied to engineer, artificial intelligence has a range of considerations as it is being developed.. from the highest order of theories that are mathematics and software, down to the engineering of the idea of things like large language models and generative pre-trained transformers and finally applied AI which is the use of existing tools like large language models and generative pre trained transformers as well as data and pipeline integration techniques that are used to build systems that humans use or other computers use.

In this article, we’re going to focus on Applied AI experimentation.

I’ve been trying to explain this to people and I seem to be having a hard time and I want to see if this will work. Often I train people using examples that are fairly general... So the example is important here but the structure of the experiment is more important. I took the time to build a prompt structure with some references to experimental methods and then I shifted the content to experimentation with an idea that I have around creating better systems with artificial intelligence. The technique I’m calling meta-rag. It seems to be evolving as a method as people consider details and how to connect databases and knowledge graphs to large language models. There is obviously a connection there. I’ll leave that for later articles and more research for you.

Here’s an example of an experiment that could be undertaken by any small lab in North America right now at a low cost, and I think straightforward effort with some interesting results. If you’re doing an experiment and applied hey you’re probably working with computer scientists and subject matter experts. It is my expectation that those people should be able to sketch this out in clear . It is also my expectation that those people should be able to develop an experimental design in clear language and one page drawings.

It is my expectation that those people should be able to sketch out an experiment in clear language like this in order for that experiment to be completed and deliver results right back to your organization in a professional fashion. The instructions and design should be so clear that the experiment can also be replicated with other data or the data that you provide openly to other experimenters to be able to repeat your experiment. Experiments that are not repeatable are closer to fiction and are best described as failed experiments that need design improvement.

All right let’s go. I combined science documents, experiment examples and OpenAI ChatGPT, as well as my background and skills to design the experiment. I did this in about an hour.. and I’m still verifying that it is the correct experiment but it is a pretty close thing. This experiment is designed to run in less than five sprints. Each Sprint is 2 weeks long. The greatest expense of this experiment is the experts. The human cost for an experience like this would be probably between ¢50k and $80k a Sprint.

My Request

Please review the approach and please comment if I missed anything or if it can be improved in structure. Please comment on the content separately because there are some interesting things there too, but they’re very specific about AI Foundation models and computer science technology and its applications. On to the experiment:

### Experimental Design for Meta-Rag, Theory of Mind, and Error Reduction in GPTs

#### Objective
To explore the integration of Meta-Rag, theory of mind, and error reduction related to common sense in GPT models.

#### Hypotheses
1. Meta-Rag (a knowledge graph semantically and philosophically tuned) will improve the coherence and ethical alignment of GPT outputs.
2. Incorporating theory of mind concepts will enhance GPT's ability to generate contextually appropriate and empathetic responses.
3. Reweighting inputs to reduce the influence of high-frequency, low-quality data will decrease errors and improve common sense reasoning.

#### Experimental Setup

1. **Data Collection**
- **Curate Meta-Rag Data**: Gather high-quality, philosophically aligned texts (classics, ethical treatises, seminal works in philosophy).
- **Common Sense Data**: Utilize datasets like the Common Sense Knowledge Base (ConceptNet) and curated real-world scenarios.

2. **Model Training**
- **Baseline Model**: Train a standard GPT model with existing diverse datasets (Wikipedia, books, Reddit, Twitter).
- **Meta-Rag Model**: Train a GPT model with the addition of Meta-Rag data and adjusted weights.
- **Theory of Mind Integration**: Train a GPT model incorporating datasets specifically annotated for theory of mind attributes (empathy, perspective-taking).

3. **Evaluation Metrics**
- **Ethical Alignment**: Evaluate responses based on adherence to ethical guidelines and philosophical principles.
- **Contextual Appropriateness**: Measure the relevance and appropriateness of responses in various contexts.
- **Common Sense Reasoning**: Assess the model's ability to generate plausible, logical responses in everyday scenarios.

4. **Testing and Validation**
- **Human Evaluation**: Conduct blind tests where human evaluators rate the quality, ethical alignment, and common sense of the model outputs.
- **Automated Evaluation**: Use standard NLP metrics (BLEU, ROUGE) and specialized common sense reasoning benchmarks.

5. **Error Analysis**
- **Qualitative Analysis**: Review and categorize errors to identify common failure points (e.g., logical inconsistencies, ethical lapses).
- **Quantitative Analysis**: Track error rates and improvements across different models and datasets.

6. **Iterative Refinement**
- **Feedback Loops**: Incorporate human feedback to iteratively refine models and improve alignment with philosophical ideals and common sense.
- **Continuous Learning**: Regularly update models with new data and insights from ongoing experiments.

### Implementation Steps

1. **Develop Meta-Rag**: Construct the initial knowledge graph with philosophical texts and ethical guidelines.
2. **Curate Datasets**: Assemble datasets annotated for theory of mind and common sense reasoning.
3. **Train Models**: Utilize existing frameworks (e.g., Hugging Face Transformers) to train baseline, Meta-Rag, and theory of mind-enhanced models.
4. **Conduct Evaluations**: Deploy both human and automated evaluation methods to assess model performance.
5. **Analyze Results**: Perform detailed error analysis and use findings to refine training methodologies.

### Expected Outcomes
- Enhanced ethical alignment and philosophical grounding in GPT outputs.
- Improved contextual understanding and empathy in responses.
- Reduction in common sense errors and logical inconsistencies.

This experimental design aims to bridge the gap between abstract philosophical ideals and practical AI implementations, creating more robust, ethical, and intelligent language models.

Possible Reference Meta Data Sources

### Sources for the Knowledge Graph Representing Philosophy and Common Sense

#### Philosophy Sources
1. **Classical Texts**:
- Works by Plato (e.g., "The Republic", "Phaedrus")
- Works by Aristotle (e.g., "Nicomachean Ethics", "Metaphysics")
- Online resources like [Perseus Digital Library](http://www.perseus.tufts.edu/hopper/)

2. **Modern Philosophical Texts**:
- Immanuel Kant ("Critique of Pure Reason")
- John Stuart Mill ("Utilitarianism")
- Friedrich Nietzsche ("Thus Spoke Zarathustra")

3. **Ethics and Moral Philosophy**:
- Peter Singer ("Practical Ethics")
- John Rawls ("A Theory of Justice")

4. **Philosophical Encyclopedias**:
- Stanford Encyclopedia of Philosophy
- Internet Encyclopedia of Philosophy

#### Common Sense Sources
1. **ConceptNet**: A large-scale common sense knowledge base that provides structured information about everyday concepts and relationships.
2. **Open Mind Common Sense Project**: A collection of common sense knowledge contributed by the public.
3. **ATOMIC**: An atlas of machine commonsense, containing a large dataset of if-then reasoning for common sense knowledge.
4. **AI2 Commonsense Reasoning Dataset (CSR)**: Datasets focusing on tasks requiring common sense reasoning.

#### Existing Knowledge Graphs and Datasets
1. **ConceptNet**: As mentioned, ConceptNet is one of the most comprehensive knowledge graphs for common sense knowledge, linking concepts through various relationships.
2. **DBpedia**: Structured content from the information in Wikipedia, providing a broad range of knowledge including philosophical concepts.
3. **WordNet**: A lexical database of English, useful for building semantic relationships.

These sources can be integrated into a Meta-Rag knowledge graph, providing a rich base of philosophical principles and common sense reasoning to enhance GPT models.

Rob Tyrie is the CEO of Ironstone Advisory, and the Founder and CTO of the GreySwanGuild.org, a virtual Think Tank. He has a portfolio of companies that he advises and does Consulting for to bring their great products to Market or help to plan and de-risk major programs and other investments. He’s been a computing professional since 1986, and bought his first PC, an Amiga 500 that year. The main reason for the purchase was that it had a multitasking environment and did graphical representations as good as any video game machine at the time. One of the first games on that PC that he fell in love with was SimCity.. and he still plays once in a while.

Creating Applied AI Experiments — May 2024

Thinking is hard

My Request

Written by Rob Tyrie