Updating Salesforce with Zero-Shot Prompting and (Almost) No Configuration

Andrew Nguonly
10 min readOct 25, 2023

--

This article describes the technical details and implementation used to update Salesforce with zero-shot prompting and (almost) no configuration. It details the following key points:

  • A generalized approach for prompting GPT to update Salesforce
  • How to avoid GPT token limits by dynamically computing max_tokens
  • How to avoid GPT token limits by splitting context and merging results

The admin AI Story

Six months ago, my co-founder Ayan and I set out to build admin AI, an LLM-powered co-pilot for AEs and SDRs. Our goal was to eliminate the repetitive and mundane admin work that sales reps had to do on a daily basis. We were driven by the constant feedback we heard from reps about how annoying it was to update Salesforce after every call they had with a customer. The disdain for updating Salesforce was genuine.

We had an ambitious goal with admin AI. We wanted to build a product that not only updated Salesforce automatically but also performed every single task that a rep had to do to close a deal. Writing emails, scheduling meetings, and updating pitch decks — it would all be done automagically.

In the beginning, we were adamant about not having any configuration in the product. It should simply work for any user, any call, and any CRM. This approach was against the grain compared to existing tools that were heavily integrated with Salesforce. We believed a configuration-less experience would stand out against competitors and drive bottom-up adoption. With this principle in mind, we started to build.

This is how we did it…

Prompting GPT for “Next Steps”

admin AI leveraged OpenAI’s GPT-4 as the core model for determining the “Next Steps” (and other fields) after an initial discovery call. The idea was to build a system that could seamlessly update every field in Salesforce. At a high level, the system ingested a call transcript, retrieved the Opportunity fields from Salesforce, and merged the two contexts to form a prompt to pass to the ChatCompletion API.

Over-optimizing Early 😈 😩

Salesforce provides a plethora of APIs for interfacing with its core platform. There’s an API to describe an Opportunity and there are APIs to retrieve page layouts. Separately, there are APIs specifically for retrieving FlexiPages, which define the layouts for Salesforce’s Lightning UI experience. The sheer number of APIs is overwhelming. Despite help from the developer community and assistance from ChatGPT, figuring out how to use them together was still immensely challenging.

ChatGPT (GPT-3.5) was not strong at producing valid or working SOQL.

The first attempt at integrating with Salesforce required retrieving Opportunity fields that were visible to the user in the Salesforce UI. The motivation was to minimize the number of fields to update and therefore reduce the prompt size. This implementation required querying for 1) the user’s Lightning UI setting, 2) their Profile, 3) the FlexiPage associated with the default RecordType of the Opportunity, and 4) the FlexiPages associated with the Opportunity.

-- 1. SOQL for retrieving a User's Lightning UI setting and Profile ID
SELECT UserPreferencesLightningExperiencePreferred, ProfileId
FROM User WHERE Email = '<user@domain.com>'

-- 2. SOQL for retrieving a User's Profile by ID
SELECT FullName FROM Profile WHERE Id = '<profile_id>'

-- 3. SOQL for retrieving the FlexiPage associated with the default
-- RecordType of the Opportunity (via CustomApplication Metadata)
SELECT Metadata FROM CustomApplication WHERE Label = 'Sales'

-- 4. SOQL for retrieving FlexiPages associated with the Opportunity object
SELECT Id, DeveloperName
FROM FlexiPage WHERE EntityDefinitionId = 'Opportunity'

After parsing results from the preceding SOQL queries, the visible Opportunity fields were retrieved from the FlexiPage associated with the default RecordType of the Opportunity. The following Python code is a simplification of the original code. It excludes logic for various edge cases, error handling, and the workflow for retrieving Opportunity fields from the Classic UI experience.

default_record_type = "record_type"  # parsed from Describe Opportunity API
is_lightning_exp = True # parsed from SOQL 1
profile_id = "profile_id" # parsed from SOQL 1, used in SOQL 2

fields = [] # visible Opportunity fields

if is_lightning_exp:
# Lightning UI Experience
profile_full_name = "profile_full_name" # parsed from SOQL 2

# parsed from SOQL 3
profile_action_overrides = [
{
"actionName": "View",
"content": "Opportunity_Record_Page",
"pageOrSobjectType": "Opportunity",
"profile": "System Administrator",
"recordType": "Opportunity.AdminAI",
"type": "Flexipage"
},
]

# parsed from SOQL 4
flexi_pages = {
"developer_name_1": "id_1",
"developer_name_2": "id_2",
}

# retrieve FlexiPage associated with default RecordType of Opportunity
flexi_page_name = ""
if profile_action_overrides:
for override in profile_action_overrides:
if (
override["recordType"] == f"Opportunity.{default_record_type}"
and override["profile"] == profile_full_name
):
flexi_page_name = override["content"]
break

flexi_page_id = flexi_pages[flexi_page_name]

# get fields from FlexiPage
# aget_flexipage() asynchronously calls Salesforce Tooling API
flexi_page = await aget_flexipage(instance_url, access_token, flexi_page_id)
flexi_page_regions = flexi_page["Metadata"]["flexiPageRegions"]

for flexi_page_region in flexi_page_regions:
for item_instance in flexi_page_region["itemInstances"]:
if item_instance["fieldInstance"]:
field_item = item_instance["fieldInstance"]["fieldItem"]
# field items have "Record." prefixed before the field API name
field_item_parts = field_item.split(".")
fields.append(field_item_parts[1])
else:
# Classic UI Experience
...

In retrospect, the implementation was a severe over-optimization. In practice, it was brittle, inconsistent, and failed to cover all possible edge cases that could exist in a Salesforce instance’s setup. The Opportunity object would often have hundreds of fields from external third-party apps and integrations. After realizing that both situations posed serious risks to the product functioning consistently, we conceded to expose a setting to the user so that they could specify which fields they wanted to update. For most users, this ended up being only a handful of fields (e.g. Next Steps, Metrics, MEDDPICC fields, etc).

Screenshot of admin AI’s Settings page

Question: [What|Who] [is|are] the <field_label>?

For each Opportunity field to update, a question was generated. To achieve successful zero-shot prompting, the entire list of questions was included in a single prompt along with the call transcript, with the expectation that the LLM (GPT-4) would answer each question.

GPT-4 was exceptional at answering questions given a call transcript using the“zero-shot prompting” technique. However, GPT-3.5 was nowhere near as performant for this specific use case.

At first, we weren’t sure if using correct grammar mattered when constructing the questions. We decided to use the nltk Python library to determine the part of speech of the field to construct a grammatically correct question.

field_label = "Next Steps"
pos_tags = nltk.pos_tag([field_label])
part_of_speech = pos_tags[0][1]

question = ""

if part_of_speech == "NNS":
# plural noun
question = f"What are the {field_label}?"
elif part_of_speech == "NN":
# singular noun
question = f"What is the {field_label}?"
elif ...

For common use cases where a field refers to a known sales persona, the format of the question was changed to ask Who instead of What. For example, nltk determines that “Champion” is a noun (NN), but in the context of sales, a champion refers to a key individual on the customer side who is supporting the deal. In this case, “Who is the Champion?” was the desired question, not “What is the Champion?”.

personas = [
"champion",
"buying committee",
"decision maker",
"economic buyer",
]

if field_label.lower() in titled_personas:
quesiton = f"Who is the {field_label}?"

For picklist and multi-select picklist fields, the pre-defined options in Salesforce were appended to the question.

field_label = "Cloud Provider"
field_type = "picklist"
options = ["AWS", "GCP", "Azure"]

# construct picklist context
picklist_context = ""

if field_type == "picklist":
picklist_prompt = f"Pick 1 option from the following list: {options}"
elif field_type == "multipicklist":
picklist_prompt = f"Pick options from the following list: {options}"

# construct question for picklist/multipicklist field
if part_of_speech == "NN":
# picklist_context is appended to question
question = f"What is the {field_label}? {picklist_context}"

More context was added to the final prompt to increase the probability that GPT would adhere to the validation rules defined in Salesforce. For example, dateTime fields must be JSON strings in the format “YYYY-MM-DD”. Including this and many other contexts was necessary to ensure that the values produced could be used to update an Opportunity successfully. If a value failed a validation rule, the entire update operation would fail. Substantial post-processing logic was added later to fix forward remaining errors.

Dynamic Pydantic 👅 🥨

Finally, to guarantee that the final response was returned in the desired JSON format, function calling was implemented. Pydantic classes were generated on the fly based on the fields of an Opportunity and the corresponding JSON schema configuration was supplied to the ChatCompletion API.

import pydantic
from pydantic import BaseModel


def create_crm_obj_model(model_name: str, attrs: list[tuple[str, type]]) -> BaseModel:
"""Return a Pydantic model based on the provided attributes."""
attributes = {}

for name, attr_type in attrs:
attributes[name] = (attr_type, pydantic.Field())

return pydantic.create_model(model_name, **attributes)


def get_python_type(sfdc_field_type: str) -> Any:
"""Get Python type for Salesforce field type."""
sfdc_field_type = sfdc_field_type.lower()

if sfdc_field_type == "boolean":
return bool
elif sfdc_field_type == "int":
return int
elif sfdc_field_type == "long":
return int
elif sfdc_field_type == "string":
return str
elif sfdc_field_type == "textarea":
return str
elif ...


def generate_pydantic_opp_model(fields: list[dict[str, Any]]) -> BaseModel:
"""Return Pydantic model for Salesforce Opportunity."""
return create_crm_obj_model(
model_name="OpportunityDataResponse",
attrs=[(field["name"], get_python_type(field["type"])) for field in fields],
)


fields = [...]
OppDataResponse = generate_pydantic_opp_model(fields)
prompt_functions = [
{
"name": "get_opp_data",
"description": "get_opp_data",
"parameters": OppDataResponse.schema(),
}
]

A Scaled Approach

The generalized implementation produced acceptable results for almost all scenarios. Although the questions in the prompt were fairly generic, the prompt itself worked remarkably well. Of course, there were exceptions, but for the majority of use cases, the Opportunity fields were updated with the desired information from the call. It’s important to reemphasize that the implementation would scale to different Opportunity setups without code changes or user configuration. In later iterations, most of the same code would be used to integrate with HubSpot.

Bypassing Token Limits

Early on, we realized that the length of call transcripts (not including the prompt) would typically exceed GPT-4’s token limit (8192 tokens). To get a rough idea of how many tokens were required to process a call, we estimated the number of tokens given the following assumptions:

  1. An average sales call is 1 hour or less.
  2. In English, people speak about 140 words per minute or 8400 words per hour.
  3. One token roughly translates to 3/4 of a word (tokenizer).

A transcript of a 1-hour call translates to about 11,200 tokens, well above GPT-4’s token limit. Given that the frequency of calls longer than 1 hour is low, we felt confident that simply splitting the transcript into two parts was sufficient for most use cases. Naturally, the number of tokens varies by many factors such as talking speed and the number of people speaking, but we decided to move forward with the brute force method, at least to start (no vector database or embeddings involved). Each part of the transcript would be processed in parallel and the results from GPT would be merged afterward.

Counting tokens, computing max_tokens 🧮

Even after splitting the transcript, the token limit was still sometimes breached. To mitigate the error, the total number of tokens in the final prompt and prompt functions was counted ahead of time. Then, the value of max_tokens was computed based on the token count before calling the ChatCompletion API.

import tiktoken


def count_tokens_msgs(messages: list[dict[str, str]]) -> int:
"""Return number of tokens in a list of messages."""
encoding = tiktoken.encoding_for_model("gpt-4")
all_content = ""

for message in messages:
all_content += message["content"]

tokens = encoding.encode(all_content)
return len(tokens)


def count_tokens_functions(functions: list[dict[str, Any]]) -> int:
"""Return the number of tokens in a list of functions.

Temporary solution found here:
https://community.openai.com/t/how-to-calculate-the-tokens-when-using-function-call/266573
"""
...


# count tokens for prompt messages
prompt_msgs = [...]
tkn_cnt_for_msgs = count_tokens_msgs(prompt_msgs)

# count tokens for prompt functions
prompt_functions = [...]
tkn_cnt_for_functions = count_tokens_functions(prompt_functions)

# compute max_tokens
GPT_4_MAX_TOKENS = 8192
max_tokens = GPT_4_MAX_TOKENS - tkn_cnt_for_msgs - tkn_cnt_for_functions

The workaround was super effective. To ensure that max_tokens never became negative, a minimum value was also enforced. The implementation isn’t shown in the above example. Moreover, it did drive us to consider whether the technique was more broadly useful.

As a last resort, the implementation fell back to using OpenAI’s GPT-3.5-turbo-16K model (16,385 tokens). This wasn’t ideal as the performance was significantly worse than GPT-4. Regardless, we decided it was preferable to have some response instead of a failed attempt.

“The Merge” ⛰️

To reiterate, each part of the transcript would be included in a prompt with the list of questions generated from the Opportunity fields and fed to GPT. API requests to OpenAI were executed in parallel (watch out for rate limiting) and the results were merged afterward with a subsequent prompt to the LLM. The merge process was unexpectedly helpful for condensing content and deduplicating information that appeared across both responses.

The process was slow but the results were highly accurate. Looking forward, a RAG implementation would surely outperform this approach in terms of latency, but the effort required to produce equivalent or better results is still unknown.

The total processing time for updating Salesforce rarely exceeded 90 seconds. A RAG implementation may reduce the total latency to the order of seconds.

End of the Road

Using admin AI was effortless. In one click, it drafted follow-up emails that were a few edits away from sending and produced detailed call notes that would supplement the content in the CRM. One user praised our emails as “90% ready to go”. Even though we’ve sunsetted admin AI, we achieved our goal: update Salesforce with zero-shot prompting and (almost) no configuration.

Screenshot of admin AI Chrome Extension

Thank you to my co-founder Ayan for leading us through the design and build of the app. If you’re interested in hearing more about how we built admin AI (want to see the actual prompts?) or if you’re interested in keeping up with what we’ll do next, please don’t hesitate to reach out!

--

--