In case you missed it, we just finalized and publicized our evaluation and certification of Clarum (YC W24).
Clarum is a platform that uses AI to enhance and expedite the due diligence process for private equity firms and other principal investors.
It fits squarely in the investment research space that LLMs, with their ability to digest and draw insights from vast quantities of information, seem to be readily applied in.
Many generative AI products are coming to market for specific verticals with promises to appreciably improve productivity, deepen analysis, and handle tasks that humans are either limited in or would be better off delegating.
Few of these products are providing their prospective customers with any credible backing to these promises beyond testimony and self-reporting.
Moreover, many see large language models as risky given their potential to hallucinate. And who wants a product that can’t reliably deliver quality (or even accurate) responses?
For these reasons, Clarum decided to participate in our quality evaluation and certification.
When we evaluate generative AI products, we do two things: we have people use it for tasks that realistically capture how the product would be used and we do that a lot of times to capture the full range of the product’s operation, edge cases and all.
For Clarum, we assembled a pipeline of human evaluators with domain expertise in private equity, banking, and transaction services. We don’t want just anyone evaluating a product, but people who could conceivably become the end user. This prism provides the richest insight into the domain-specific merits of the product.
After we assembled our pipeline, we designed a case study that mirrors work someone would realistically use Clarum for. We designed a bank of questions that map to different categories of investment research and due diligence — accounting fluency, competitive intuition, etc. Corresponding indicators were then devised.
Accuracy, coherence, etc are powerful indicators of how a generative AI product functions, but they tell us nothing about how it specifically excels (or doesn’t) for the use case it’s to be used for. Our indicators were specialized and tailored to private equity, and specifically the due diligence process.
Rigorous human evaluation gave us quite a few conclusions about Clarum. First, it’s an overall state-of-the-art product for investment research. And we don’t use that term lightly. It’s one we arrive at through such rigorous evaluation and numerical qualification. Clarum can reliably provide human substitution-level performance on a wide array of representative due diligence tasks.
At Mutable, we categorize products that meet this specification according to our levels with level 5 being complete human replacement (think AGI) and level 1 being out-of-the-box GPT, rife with hallucinations, inaccuracies, and all. Clarum fell into Level 3/5, which captures the range of products that are professional grade and provide major value-add and time savings.
We also noticed that Clarum spikes in areas like measuring, qualifying, and synthesizing risks and also accounting/financial fluency. Whether it’s trying to anticipate regulatory disruptions and how it impacts the consolidated financial statements or simply using sensitivities to evaluate different cases, Clarum excels in these two areas.
After reviewing the results, we gave Clarum our Mutable certification. Much like SOC 2 Type II is a functional non-negotiable for the security of enterprise software, we think the Mutable certification can and should offer the same credible guarantee for the quality of a suite of applications.
If you’re in the market for generative AI products, be on the lookout for that Mutable “stamp” and report.
If you’re building in the generative AI space, you probably wouldn’t neglect a SOC 2 certification. Unless you want to be at a disadvantage to competitors or lose out on a winnable customers because your quality is dubious, you shouldn’t neglect a Mutable evaluation and certification either.
Book time with us today and we’ll show you how we plan to evaluate and certify your product and what type of measurably positive impact it can have on how many enterprise deals you can win.