Databricks Generative AI Engineer Associate Certification: Study Guide Part 1
Because Databricks is the best platform to do end-to-end Generative AI!
In a world where AI is rapidly transforming industries, mastering generative AI on Databricks might be top on your plate. This guide is your first step towards acing the Databricks Generative AI Engineer Associate Certification, setting you apart in the ever-evolving tech landscape.
Why should you trust my guidance? Great question! You can verify my various Databricks credentials here (link).
Oh yeah, and for Databricks ML Certification guide, refer to this Medium Blog…
Enough small talk, it’s time to dive straight into the certification.
More Detailed Youtube: https://www.youtube.com/watch?v=6ukQKcFOow8
Certification Overview:
Official Definition:
(Focus on Bold and Italic Content): The Databricks Certified Generative AI Engineer Associate certification exam assesses an individual’s ability to design and implement LLM-enabled solutions using Databricks. This includes problem decomposition to break down complex requirements into manageable tasks as well as choosing appropriate models, tools, and approaches from the current generative AI landscape for developing comprehensive solutions. It also assesses Databricks-specific tools such as Vector Search for semantic similarity searches, Model Serving for deploying models and solutions, MLflow for managing solution lifecycle, and Unity Catalog for data governance. Individuals who pass this exam can be expected to build and deploy performant RAG applications and LLM chains that take full advantage of Databricks and its toolset.
These questions are segmented into six Pillars and 45 sub-segments:
- Design Applications — 14%
- Data Preparation — 14%
- Application Development — 30%
- Assembling and Deploying Apps — 22%
- Governance — 8%
- Evaluation and Monitoring — 12%
Source: databricks.com
Mock Exams:
LINK — This is the only one I know of. Thank you, Matt for sharing it with us.
Let’s now go through each of the 45 learning items, and I’ll share my insights on them.
Section 1: Design Applications
1. Design a prompt that elicits a specifically formatted response
Tips for Effective Prompt Engineering
- Model-Specific Prompts
- Iterative Development
- Avoiding Bias and Hallucinations: Include instructions to avoid generating false information. Example: “Do not make things up if you do not know. Say ‘I do not have that information’.”
- Use Delimiters: Use delimiters to distinguish between instruction and context, such as ###, `````, {}, `[]`, or `—`.
- Structured Output: “Return the movie name mentioned in the form of a JSON object. The output should look like {‘Title’: ‘In and Out’}.”
Examples of Advanced Prompting Techniques
- Zero-shot Prompting — Without examples
- Few-shot Prompting — Fancy word for more than 1 example in a prompt.
- Prompt Chaining
Key Elements of a Good Prompt:
- Clear Instruction: Provide a clear directive specifying what the model should do.
- Contextual Information: Include background or additional information that helps the model understand the task.
- Input / Question: Specify the query or data the model needs to process.
- Output Type / Format: Define the desired structure or style of the response.
2. Select model tasks to accomplish a given business requirement
Key Steps in Selecting Model Tasks
- Identify Business Objectives
- Decompose the Problem: Break down the overall objective into smaller, manageable tasks. Example: For improving customer service, tasks might include sentiment analysis, FAQ retrieval, and automated response generation.
- Task Mapping: Match the business objectives to specific AI model tasks; you could use multiple LLM for specilized tasks
- Select Appropriate Models: MPT VS Chatgpt VS Lamma VS BERT or RoBERTa Vs other models; Which all models are open source? Text generation model vs other kinds of model.
- Consider the Interaction Between Tasks: Sentiment analysis might be performed first, followed by retrieval of FAQs, and finally, response generation.
- Utilize Tools and Frameworks: Using LangChain for creating multi-stage reasoning chains that handle complex workflows.
- Evaluate, Optimize and Iterate: Continuously evaluate and optimize the models and tasks to ensure they meet the business requirements based on performance, accuracy, and context applicability. Example: Regularly updating the training data and fine-tuning models to improve performance.
3. Select chain components for a desired model input and output
Frameworks and Libraries for Building Chains
- LangChain
- LLamaIndex
- OpenAI Agents
Components:
- Chain- A chain might involve retrieving relevant documents, summarizing them, and then generating a response.
- Prompt
- Retriever
- Tool or Chatgpt Function Calling
- LLM
Integration and Implementation
1. Framework Integration: Use LangChain to build and manage the chain components. Integrate with databases and external APIs for dynamic data retrieval and interaction.
2. Logging and Monitoring: Use MLflow to log the performance of each component in the chain. Monitor the entire workflow to ensure high performance and accuracy.
3. Optimization: Continuously evaluate the chain’s performance and make necessary adjustments. Update models and retrain as needed to maintain accuracy and relevance.
4. Translate business use case goals into a description of the desired inputs and outputs for the AI pipeline
Implementation Considerations
- Data Quality
- Model Selection: Choose models that are well-suited for each task — accuracy VS speed VS cost VS open source VS regulations VS hosting cost VS performance — relevance to the business goals.
- Integration of all components
- Testing and Optimization
5. Define and order tools that gather knowledge or take actions for multi-stage reasoning
Patterns for Agent Reasoning:
- ReAct (Reason + Act):
- Thought or Reason: Reflect on the problem given and previous actions taken.
- Act: Choose the correct tool and input format to use.
- Observe (Continues to Reason): Evaluate the result of the action and generate the next thought.
2. Tool Use / Function Calling: Agents interact with external tools and APIs to perform specific tasks.
Example Tools:
- Research / Search Tools: Web browsing, search engines, Wikipedia.
- Document Retrieval: Database retriever, vector DB retriever, document loader.
- Image Processing: Image generation, object detection, image classification.
- Coding: Code execution, documentation generator, debugging/testing .
3. Planning: Agents must dynamically adjust their goals and plans based on changing conditions.
Tasks:
- Single Task: A straightforward task with a single goal.
- Sequential Task: Tasks that need to be performed in a specific order.
- Graph Task: Complex tasks that involve multiple interdependent actions.
4. Multi-Agent Collaboration: Multiple agents work collaboratively, each handling different aspects of a complex task.
- Benefits: Allows modularization and specialization.
Section 2: Data Preparation
6. Apply a chunking strategy for a given document structure and model constraints
Applying a chunking strategy involves dividing documents into manageable pieces that fit within the model’s context window and constraints.
Key Considerations
- Context Window
- Chunking Strategy:
- Context-aware Chunking: Divide text by sentences, paragraphs, or sections using special punctuation such as periods or newlines.
- Fixed-size Chunking: Divide text into chunks of a specific number of tokens.
3. Advanced Chunking Strategies like Windowed Summarization where Each chunk includes a summary of the previous chunks to maintain context across the document.
4. Implementation Steps:
- Data Extraction: Extract raw text from documents, ensuring it is clean and ready for processing.
- Chunking Process: Apply the chosen chunking strategy (context-aware, fixed-size, windowed summarization).
- Embedding and Storage: Embed each chunk using a model and store the embeddings in a vector store for efficient retrieval.
5. Challenges and Solutions:
- Maintaining Context: Ensure that each chunk preserves enough context to be meaningful on its own.
- Handling Different Document Types: Use appropriate tools and methods for different formats (e.g., .doc, .pdf, .dat, .html). Learn the basic python packages like doctr or pypdf
6. Use Case: Experiment with different chunk sizes and methods to find the best fit for the specific use case.
7. Filter extraneous content in source documents that degrades quality of a RAG application
This includes:
- Cleaning Data: Ensuring the text is free from irrelevant content such as advertisements, navigation bars, and footers.
- Preprocessing Steps: Applying preprocessing techniques like removing stop words, correcting misspellings, and normalizing text to enhance the quality of the data fed into the model
8. Choose the appropriate Python package to extract document content from provided source data and format
Read about current packages and use:
PyPDF, Hugging Face’s Options, Doctr — where OCR (Optical Character Recognition) is required to extract text; learn about bigger models: OpenAI’s Models, Alphabet’s Gemini 1.5, Meta’s Llamma
9. Define operations and sequence to write given chunked text into Delta Lake tables in Unity Catalog
Defining operations and sequence involves several steps:
- Data Ingestion: Extract text content from documents and load it into a dataframe.
- Chunking and Embedding: Apply chunking strategies and compute embeddings for each chunk.
- Writing to Delta Lake: Store the chunked text and embeddings into Delta Lake tables. This process ensures the data is easily accessible for retrieval operations in Unity Catalog
- Governance and Metadata Management: Ensure the tables are registered in Unity Catalog for proper governance and metadata management.
- Continuous Integration and Data Refresh:
- Automate Updates: Set up workflows to continuously update the Delta tables as new data arrives or existing data is modified.
- Delta Live Tables: Use Delta Live Tables to automate and orchestrate these data workflows.
10. Identify needed source documents that provide the necessary knowledge and quality for a given RAG application
- Relevance: Selecting documents that are highly relevant to the domain and task at hand.
- Quality Assessment: Evaluating the accuracy, reliability, and completeness of the documents.
- Diversity: Ensuring a diverse set of documents to cover various aspects of the knowledge required
11. Identify prompt/response pairs that align with a given model task
Identifying suitable prompt/response pairs involves:
- Task Alignment: Ensuring the pairs are relevant to the specific task the model is designed to perform.
- Contextual Relevance: Selecting pairs that provide sufficient context for the model to generate accurate responses.
- Quality Control: Verifying that the prompt/response pairs are free from errors and biases
- Tagging examples: For sentiment analysis, tag responses as positive, negative, or neutral. And for question-answering, tag responses as factual, opinion-based, or advisory
12. Use tools and metrics to evaluate retrieval performance
- Evaluation Metrics:
- Context Precision
- Context Recall
- Faithfulness
- Answer Relevancy
- Answer Correctness
2. Evaluation Tools and Methods:
- MLflow: Facilitates the evaluation of retrievers and LLMs, supporting batch comparisons and scalable experimentation. MLflow can evaluate unstructured outputs automatically and at low cost.
- LLM-as-a-Judge: An approach where an LLM is used to evaluate the performance of another LLM by scoring responses based on predefined criteria. This method can be integrated with MLflow for automated and scalable evaluations.
- Task-specific Metrics: Metrics like BLEU for translation and ROUGE for summarization are used to evaluate LLM performance on specific tasks.
3. Offline vs. Online Evaluation:
- Offline Evaluation: Conducted before deployment using curated benchmark datasets and task-specific metrics to evaluate LLM performance.
- Online Evaluation: Conducted post-deployment, collecting real-time user behavior data to evaluate how well users respond to the LLM system. This approach includes metrics from A/B testing and user feedback.
4. Custom Metrics: Custom metrics can be defined using MLflow’s capabilities.
Section 3: Application Development
13. Create tools needed to extract data for a given data retrieval need
Key Points:
- Data Extraction and Chunking: The process involves splitting documents into manageable chunks, embedding the chunks with a model, and storing them in a vector store.
- Chunking Strategies: Use case-specific strategies include context-aware chunking by sentence or paragraph and fixed-size chunking by token count. Advanced strategies involve windowed summarization and metadata injection.
- Challenges: Addressing complex documents, maintaining logical sections, and dealing with multi-modal data (text mixed with images).
- Tools and Libraries: Tools like PyPDF for low-level extraction, and advanced models from Hugging Face and OpenAI for high-level contextual extraction.
14. Select Langchain/similar tools for use in a Generative AI application.
Key Points:
- LangChain: A tool designed to manage interactions with language models, facilitating complex applications by linking together various components such as prompt templates, memory, and chains.
- Vector Databases: Essential for storing high-dimensional vectors for efficient retrieval. Databricks supports integration with Mosaic AI Vector Search, which uses embedding models and vector search endpoints.
- Integration with Databricks: The Databricks platform provides tools like Delta Live Tables and Unity Catalog for structured and unstructured data management, ensuring seamless integration with AI applications.
15. Identify how prompt formats can change model outputs and results
Key Points:
- Prompt Engineering: The format and structure of prompts significantly impact the quality and accuracy of model outputs. Properly formatted prompts reduce hallucinations and improve response relevance.
- Context Augmentation: Enhancing prompts with additional context from external sources (retrieved via vector databases) helps models generate more accurate and contextually relevant responses.
- Impact of prompt formatting, including augmenting prompts with supplementary information to see better changes in detail and accuracy.
16. Qualitatively assess responses to identify common issues such as quality and safety
Key Points:
- Evaluation Metrics: Use metrics like context precision, relevancy, recall, and answer correctness to assess the performance of RAG applications.
- Quality and Safety: Identifying and addressing issues such as hallucinations, bias, and incomplete information is crucial. Techniques include constructing better prompts and using augmented context.
- Continuous Learning and Feedback: Implementing a feedback loop in RAG systems allows for iterative improvements based on user interactions and system performance evaluation.
17. Select chunking strategy based on model & retrieval evaluation
Key Points:
- Chunking Strategies: Choose between context-aware chunking (by sentence, paragraph, section) and fixed-size chunking (by tokens). Learn about the use case of each.
- Impact on Retrieval: The chunking strategy affects the quality of retrieved context and model performance. Smaller chunks are useful for precision, whereas larger chunks capture broader themes.
- Iterative Approach: Experiment with different chunk sizes and strategies. Evaluate based on the specific requirements of the application, considering the maximum context window of the LLM.
18. Augment a prompt with additional context from a user’s input based on key fields, terms, and intents
Key Points:
- Prompt Augmentation: Enhance prompts by injecting relevant context retrieved from external sources like vector databases. This improves the relevance and accuracy of the generated responses.
- Retrieval Augmented Generation (RAG): Use RAG to combine LLMs with external knowledge bases to provide more contextually accurate outputs.
- Techniques: Use specific fields, terms, and user intents to tailor the additional context provided to the model, ensuring that the response is aligned with user needs.
19. Create a prompt that adjusts an LLM’s response from a baseline to a desired output
Key Points:
- Prompt Engineering: Design prompts with clear instructions, context, and desired output format to guide the LLM towards generating the desired response. Include examples and use delimiters to structure the prompt effectively.
- Iterative Development: Adjust parameters like temperature to fine-tune the creativity and focus of the responses. Iterative testing and refining of prompts help achieve the desired output quality.
- Zero-shot and Few-shot Prompts: Utilize these techniques to provide the model with examples that help guide its responses, improving accuracy and relevance without extensive training data.
20. Implement LLM guardrails to prevent negative outcomes
Key Points:
- Guardrails: Implement guardrails to prevent harmful or inappropriate responses. These can be simple (e.g., System Prompt — instructing the model not to provide certain information) or complex (e.g., using specialized models like Llama Guard).
- Safety Filter on Foundation Model API (Instead of using System Prompts)
21. Write metaprompts that minimize hallucinations or leaking private data
Key Points:
- Metaprompts Design: Focus on clear and precise instructions to minimize hallucinations. Include context that guides the LLM to provide accurate and relevant responses without fabricating information.
- Guardrails Implementation: Utilize guardrails like Llama Guard to filter out sensitive or inappropriate content, ensuring responses do not leak private data.
- Evaluation Techniques: Regularly evaluate the prompts and the responses to identify and rectify any hallucinations or data leakage issues.
22. Build agent prompt templates exposing available functions
Key Points:
- Agent-Based Prompts: Design prompts for agents that expose and utilize specific functions or tools. These agents can dynamically interact with the environment and other tools to complete tasks.
- Tools and Frameworks: Utilize tools like LangChain, AutoGPT, and OpenAI Function Calling to build robust agent-based systems. These frameworks provide a structure for defining and using functions within prompts.
23. Select the best LLM based on the attributes of the application to be developed
Key Points:
- Model Attributes: Evaluate the attributes of various LLMs, such as their training data, fine-tuning capabilities, and performance metrics, to select the most suitable model for the application.
- Task-Specific Models: Choose models that are specifically tuned or capable of performing the required tasks, whether it be text generation, summarization, or classification.
- Benchmarking: Use benchmarks and performance evaluations to compare different models and select the one that offers the best trade-off between accuracy, efficiency, and cost.
24. Select an embedding model context length based on source documents, expected queries, and optimization strategy
Key Points:
- Context Length Considerations: Select embedding models with appropriate context lengths to handle the expected length of source documents and queries. Ensure the chosen model can effectively capture and represent the necessary context and can take the chunk size.
- Optimization Strategies: Balance between shorter and longer context lengths based on the specific needs of the application. Longer context lengths might capture more information but can be computationally expensive.
- Embedding Model Selection: Use models that provide the best embeddings for both queries and documents, ensuring they operate within the context window limits and provide accurate results.
25. Select a model from a model hub or marketplace for a task based on model metadata/model cards
Key Points:
- Model Metadata: Review model cards and metadata on Databricks Marketplace to understand the capabilities, limitations, and intended use cases of different models. This information is crucial for selecting the right model for the specific use of model — like “Is it open for public use?”.
- Transparency and Accountability: Model cards provide transparency about the training data, performance metrics, and ethical considerations, aiding in making informed decisions about model selection.
26. Select the best model for a given task based on common metrics generated in experiments
Key Points:
- Utilize common performance metrics; and also specific task-related metrics like BLEU or ROUGE for text tasks to evaluate models.
- Experimentation and Benchmarking: Conduct experiments to generate performance data, comparing models based on these metrics to identify the best-performing model for the given task.
- Iterative Improvement: Use the experimental results to iteratively improve model selection and fine-tuning, ensuring the chosen model consistently meets or exceeds performance expectations.
Section 4: Assembling and Deploying Applications
System Lifecycle:
- Code reference for AI_QUERY():
> CREATE FUNCTION correct_grammar(text STRING)
RETURNS STRING
RETURN ai_query(
'databricks-llama-2-70b-chat',
CONCAT('Correct this to standard English:\n', text));
> GRANT EXECUTE ON correct_grammar TO ds;
- DS fixes grammar issues in a batch.
> SELECT
* EXCEPT text,
correct_grammar(text) AS text
FROM articles;
- Real Time — Model Endpoints
from mlflow.deployments import get_deploy_client
deploy_client = get_deploy_client("databricks")
endpoint = deploy_client.create_endpoint(
name=serving_endpoint_name,
config=endpoint_config
)
response = deploy_client.predict(
endpoint=serving_endpoint_name,
inputs={ "inputs": [{"query": question}] }
)
print(response.predictions)
27. Code a chain using a pyfunc model with pre- and post-processing
Key Points:
- Pyfunc Model: MLflow’s pyfunc flavor is a versatile model interface for MLflow Python models. It allows models to be loaded as Python functions for deployment.
- Pre- and Post-Processing: These are critical for preparing input data before it is fed into the model (pre-processing) and for handling the model’s output before it is presented to the end-user or downstream applications (post-processing). Techniques can include data normalization, feature extraction, and output formatting.
- Implementation: Utilize the mlflow.pyfunc to log, save, and load models with necessary pre- and post-processing steps. This ensures the model can handle real-world data inputs and outputs effectively.
28. Control access to resources from model serving endpoints
Key Points:
- Access Control for endpoints
- Databricks Features: Leverage Databricks’ built-in security features such as role-based access control (RBAC), Unity Catalog for data governance, and secure API endpoints to manage access to models and data securely.
- Best Practices: Regularly review and update access permissions, monitor access logs for suspicious activities, and implement least privilege principles to minimize the risk of unauthorized access.
29. Code a simple chain according to requirements
Key Points:
- Define the steps required for your application, configure each step with the necessary parameters and models, and use LangChain to manage the execution flow. Ensure that each step performs the required transformations and passes data correctly to the next step.
- Chains: Chains are sequences of automated steps that process input data and generate output. They can be used to orchestrate complex workflows involving multiple models and data transformations.
- LangChain: A popular framework for building chains in generative AI applications. It allows developers to define and link multiple steps, including data retrieval, transformation, and model inference, in a structured manner.
30. Code a simple chain using langchain
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
# Initialize the OpenAI LLM (Language Model)
llm = OpenAI(openai_api_key="your_openai_api_key")
# Define the prompt template
prompt_template = PromptTemplate(
input_variables=["input_text"],
template="Summarize the following text: {input_text}"
)
# Create the LLMChain
llm_chain = LLMChain(
llm=llm,
prompt=prompt_template
)
# Example input
input_text = "Is it true? That LangChain is a framework for building applications powered by language models. It allows developers to create advanced chains of prompts, use different language models, and deploy them efficiently."
# Run the chain with input
output = llm_chain.run(input_text=input_text)
# Display the output
print("Summary:", output)
Key Points:
- LangChain Framework: LangChain simplifies the creation of chains by providing a structured approach to link different components and steps in a generative AI pipeline.
- Components: The main components of a LangChain chain include prompts, models, retrievers, and tools. Each component can be customized and combined to create complex workflows.
- Example Workflow: A typical LangChain workflow might involve retrieving relevant documents using a retriever, processing the text with a language model, and post-processing the output to generate the final response. This structured approach ensures that each step is well-defined and can be easily modified or extended.
31. Choose the basic elements needed to create a RAG application: model flavor, embedding model, retriever, dependencies, input examples, model signature
Key Points:
- Basic Elements: Key components of a Retrieval-Augmented Generation or RAG application like embedding model, retriever, vector store.
- Model Flavor: Choose the appropriate model flavor based on the task. Common flavors include transformer models for text generation and embedding models for semantic search.
- Dependencies and Model Signature: Ensure all dependencies are listed and included in the deployment environment. The model signature defines the input and output formats, ensuring compatibility with downstream applications.
32. Register the model to Unity Catalog using MLflow
Key Points:
- Catalog.schema.model_name : Register and Load Models
from mlflow import MlflowClient
# Define model name in the Model Registry
model_name = f"{catalog_name}.{schema_name}.summarizer"
# Point to Unity-Catalog registry and log/push artifact
mlflow.set_registry_uri("registry-uc")
mlflow.register_model(
model_uri=model_uri,
name=model_name,
)
# Sequence result — PANDAS
latest_model = mlflow.pyfunc.load_model(
model_uri=f"models:/{model_name}/{current_model_version}"
)
prod_data_sample_pdf = prod_data_df.limit(2).toPandas()
summaries_sample = latest_model.predict(prod_data_sample_pdf["document"])
# Parallelize it for speed using SPARK_UDF
prod_model_udf = mlflow.pyfunc.spark_udf(
spark,
model_uri=f"models:/{model_name}@champion",
env_manager="local",
result_type="string",
)
batch_inference_results_df = prod_data_df.withColumn("generated_summary", prod_model_udf("document"))
If a multi-node cluster, spark will send document partitions to corresponding many nodes.
# Materialize the batch as soon it’s done
batch_inference_results_df.write.mode("append").saveAsTable("TABLE_NAME")
- MLflow Integration: MLflow’s Model Registry, integrated with Unity Catalog, offers a centralized model store for managing the lifecycle of machine learning models. It supports versioning, staging, and deploying models while maintaining full lineage and metadata tracking.
- Model Registration Process: To register a model, log the model using mlflow.log_model() and then register it to Unity Catalog. This process ensures that all versions of the model are tracked and can be managed via the MLflow UI or API.
- Security and Access Control using Unity Catalog
33. Sequence the steps needed to deploy an endpoint for a basic RAG application
Key Points:
- RAG Application Components: A typical RAG application involves setting up a retriever, an embedding model, a vector store, and a generator. Each component plays a crucial role in the end-to-end workflow
- Learn about the overall RAG Deployment Steps on Databricks including Retrieval Component, Embedding Model, Vector Search Index, Foundation Model, Create and Deploy Model Serving Endpoint for real-time querying.
34. Create and query a Vector Search index
Key Points:
- Vector Search Setup: Create a vector search index by syncing it with a Delta table that stores embeddings. This index allows for real-time approximate nearest neighbor searches.
- Use the provided REST API or Python SDK to query the vector search index. Queries can be made using vector representations to find similar documents or data points.
- Mosaic AI Vector Search supports automatic syncing, self-managed embeddings, and CRUD operations. It integrates with Unity Catalog for governance and access control.
35. Identify how to serve an LLM application that leverages Foundation Model APIs
Key Points:
- Foundation Model APIs: Foundation models like OpenAI’s GPT are served via Databricks Model Serving. These APIs provide a standardized way to deploy and query large language models without much effort by user.
- Serving Process: Model Deployment, Query Handling, Integration with MLflow and Resource Management to ensure that appropriate compute resources (CPU/GPU) are allocated for serving the models, and use Databricks’ auto-scaling features to handle variable loads efficiently.
36. Identify resources needed to serve features for a RAG application
Key Points:
- Compute Resources: Like above. Use Databricks’ scalable compute options to allocate necessary CPU/GPU resources based on the application load and performance requirements.
- Storage and Indexing: Utilize Delta tables for storing raw and processed text, embeddings, and vector indexes. Ensure these are properly managed and synced for efficient retrieval.
- Monitoring and Logging: Implement inference logging to track model performance and diagnose issues.