Evaluation, test coverage, and fine-tuning — In Part 1, we discussed open-source models, proprietary models, and model evaluation. This part is going to be more focused on further evaluation, “test coverage”, and fine-tuning. LLM Metrics It’s challenging to determine which metrics to measure for generative models, and there’s a focus on task-specific performance for traditional machine learning compared…