My Time with the Awesome Version 1 AI Labs Team

Published in

Version 1

7 min readAug 5, 2024

As an Associate Consultant on the Early Careers Pathway with Version 1, I am constantly given incredible opportunities to learn and grow. As they say, the best way to learn is to jump in at the deep end and either sink or swim. I love the ethos of my workplace that, while certification is valuable, there is truly no better way to learn than through hands-on experience. None of my experiences so far have been more impactful than my time with the AI Labs team.

As the innovators within the company, AI Labs are at the forefront of researching and trialling new advancements in AI. During my time with them, I had the chance to work on both the frontend and backend development of a virtual assistant app, a four-week project split into two two-week sprints to develop a proof of value (POV). This journey not only strengthened my front-end development skills but also pushed me out of my comfort zone into the realms of data science and AI. I delved into Python coding, used Azure Services and AI Services, explored open-source AI models, learned cognitive search approaches, worked with text embeddings, deployed models, automated processes, secured sensitive information in a key vault, evaluated models, and contributed to creating a product from start to finish.

Part of this journey included evaluating and comparing models and outputs using cosine similarity scores, testing, and making iterative improvements. The amount I have learnt in four weeks is staggering, and I am incredibly grateful to the team for their support and patience as I navigated a multitude of new concepts.

Here are some of my contributions:

Cognitive Search and Pre-processing

One of the key areas I worked on was improving the pre-processing of data for cognitive search. This involved:

OCR Improvements: Enhancing optical character recognition to filter for English-only content and ignore any empty chunks.
Sentence Splitting: Implementing better sentence splitting techniques, including fixed-size chunks (e.g., 200 words) and variable-sized chunks based on punctuation and line endings.
Content Overlap: Introducing a 10% content overlap (e.g., 256 tokens chunk size with 25 tokens overlap) to ensure context retention across chunks.
Text Split Skill: Developing skills to split text into pages (chunks of multiple sentences) or single sentences.

Initial Testing of Backend Responses in Postman

Before diving into frontend development, I performed initial testing of backend responses:

Q&A Spreadsheet: Creating a question and ground truth answer spreadsheet with Postman queries ready to go and columns for storing results.
Setup: Cloning notebooks, creating compute resources, and attaching clusters in Databricks.
Testing: Running serving endpoints, ensuring the Chroma DB container was running, and using Postman to test the question-and-answer functionality.
Recording Results: Logging the test results in a spreadsheet for further analysis.

Frontend Development

While more familiar with frontend development, I still had a significant learning curve:

Technologies Used: React, TypeScript, and Tailwind CSS.
Learning Curve: Transitioning from JavaScript to TypeScript, learning Tailwind CSS, and understanding interfaces and context in React.
Components Developed: I created two components for the virtual assistant and integrated them with the backend API.

Evaluating Models

Following the initial deployment of the POV, we needed to evaluate the best approach moving forward. This meant assessing models to see whether any performed better than others, whether we could make it run more cost-effectively, and making other performance tweaks.

As a team, we evaluated a standard model, a quantised model, and a hybrid model alongside other new models that are constantly appearing on Hugging Face. I was responsible for evaluating the quantised model. Here’s a step-by-step overview of our evaluation process:

Creating the Evaluation Script: We developed an evaluation.py script to assess our models against a JSON file containing 2000 rows of ground truth answers. Initially, we tested with 20 rows to ensure accuracy, then scaled up to the full dataset. The results were saved to another JSON file.
Cost Reduction Strategy: To minimise costs, we ran the serving endpoints in Databricks as a job, the Chroma container in the Azure Portal, and the app locally in VS Code. Once the evaluation was completed and data saved, we shut down all cloud resources.
Local Data Analysis: The saved JSON file was analysed on local machines using Jupyter Notebooks. This allowed us to extract the quantised responses for Azure and Chroma, which we compared to the ground truth answers.
Embedding Extraction: We extracted embeddings for the answers using our evaluation script.
Cosine Similarity Calculation: We ran a cosine similarity function to compare the ground truth answers with the Azure and Chroma quantised, standard, and hybrid answers, collecting similarity scores.
Comparison of Models: Finally, we averaged the cosine similarity scores to compare the performance of the standard, quantised, and hybrid models.

Using Key Vault for Secure Data Management

A crucial part of the development process involved handling sensitive information securely. I added prompts available in the code to the key vault and used variables to call them instead of hard coding them. I also managed changes to other Key Vault values as and when they needed to be updated or added. This approach:

Enhanced Security: Ensured that sensitive data like API keys and passwords were stored securely and accessed only when needed.
Simplified Code Maintenance: Made it easier to manage and update sensitive information without altering the codebase.
Improved Collaboration: Allowed team members to work with the same set of secure data without exposing it directly in the code.

System Prompt Improvements

My task was to enhance the system prompt, ensuring it was concise and specific to yield better responses. This involved setting up a detailed spreadsheet where I meticulously created various prompts to experiment with. I then hard-coded these prompts for testing purposes. Running the SLM in Databricks, deploying the app in VS Code, and managing the Chroma container in the cloud were all steps in this process. Once everything was up and running, I was able to collate and record the results in the spreadsheet, which provided insights for further improvements.

One of the most effective prompts we tested was:

“You are an intelligent assistant specialising in [your specific domain]. Provide accurate, concise answers based on the context given. If unsure, respond with ‘Sorry, I don’t know.’ Ensure clarity and avoid repetition.”

This prompt consistently delivered clear, well-informed, and reliable answers, making it a valuable tool in enhancing the performance of our virtual assistant.

Final Improvements and Minor Adjustments

As we neared the end of the project, our focus shifted to making smaller tweaks to refine and improve the virtual assistant, as well as addressing any remaining bug fixes. We used the collected data to select the best model and implement all final adjustments and tweaks to enhance its performance. After these improvements, we conducted final tests, repeating the evaluation with the indexing and prompt enhancements.

This comprehensive test included evaluating 2000 rows of ground truth data and manual testing using 13 single-turn questions and 10 multi-turn questions. Following the final changes I tested the deployed app, container and serving endpoint ready for demo to the clients.

The last tasks were to create the documentation, README files, and compile the data, research, trials, and final product into a detailed report and presentation.

Customer Meetings

Throughout the project, I also had the opportunity to observe and learn from customer meetings:

End of Sprint 1: I attended the meeting, taking note of customer feedback and the next steps outlined by the team.
End of Sprint 2: I participated in the final meeting where we presented the Virtual Assistant to the customer. This meeting marked the culmination of our four-week effort to produce a proof of value for them to take away and evaluate.

Personal Reflections and Key Takeaways

Adaptability is Key: Being pushed out of my comfort zone into the world of AI requires a willingness to embrace new challenges and adapt quickly with an open mindset.
Collaboration Enhances Learning: Working with a supportive team accelerated the learning process and made complex concepts more approachable.
Hands-On Experience is Invaluable: The practical experience I gained in developing, evaluating, and improving models provided me with a deeper understanding of AI that cannot be achieved through certification alone.

Check out more from Version 1’s Data x AI Team

Conclusion

This project has been my first significant endeavour as an Associate AI Engineer, where I have made positive contributions and experienced substantial personal and professional growth. It has been a big learning experience, marked by valuable mistakes that provided opportunities for learning and improvement. I now have a solid grounding in seeing a proof-of-value project through from start to finish and everything that happens in between. I have focused on refining my code to make it more efficient, exploring new topics to practise, and absorbing new content to enhance my skills. Collaborating within a team has exposed me to diverse working methods and problem-solving approaches. Additionally, working with different AI models, including Mistral and Llama in their normal, quantised, instruct, and hybrid forms, has expanded my technical knowledge and given me an even greater desire to learn. This project has not only honed my collaborative abilities but also prepared me to tackle future challenges in the field with greater confidence and competence.

My time with the Version 1 AI Labs team has been incredibly enriching. The experience not only expanded my technical skill set but also gave me a deeper appreciation for the vast potential of AI. I am excited to apply these new skills in future projects and to continue my journey in the exciting field of artificial intelligence.

At Version 1, we are at the forefront of creating these solutions, and this is what excites me the most about the future of technology.

Ready to change the way IT works for your business? Want to find out more about how Version 1 can help?

Get in touch, we would love to help you.

About the Author:

Alana Barrett-Frew is an Associate Consultant at Version 1. Follow me as I navigate and blog my journey into and through tech.