A/B Testing: A Comprehensive Guide for Scoring Models

Sinch Blog
Published in
5 min readApr 24, 2024


Hi! I’m Arpit Rawat, and I’m a Data Scientist at Sinch.

A/B testing is a crucial technique for evaluating the effectiveness of different models or strategies. In this blog post, we will walk through an A/B test comparing a rule-based lead scoring model (the control group) with a predictive lead scoring model (the treatment group). Our goal is to determine which model performs better in identifying high-quality leads. Let us dive in!

An AI-generated image featuring two nearly identical parrots with subtle visual differences. A/B testing can efficiently identify and analyze these distinctions, demonstrating how small variations can be detected and compared.

What is A/B Testing?

A/B testing involves comparing two versions (A and B) to measure their impact on a specific metric. In our case, we will compare lead scoring models based on conversion rates and lead conversion time.

Why Test Scoring Models?

  • Data-Driven Decision-Making: A/B testing helps us make informed choices by directly comparing model performance.
  • Optimization: We want to improve lead identification and conversion rates.
  • Sales Team Involvement: The sales team will play a crucial role in performing the experiment.

A/B Test Framework for Scoring Models

Key Steps:

  1. Hypothesis: Formulate a clear hypothesis. For example:
  • Null Hypothesis (H0): There is no difference in lead conversion rates between the rule-based and predictive models.
  • Alternative Hypothesis (Ha): There is a difference in lead conversion rates between the rule-based and predictive models.

2. Random Assignment: Randomly assignment leads to either the control (rule-based) or treatment (predictive) group. Stratified sampling can be used on relevant attributes to ensure similar distributions across groups.

3. Implement Variants:

  • Control Group (A): Customers chosen randomly from the current rule-based model.
  • Treatment Group (B): Random customers with high lead scores from the predictive model.

4. Collect Data: Monitor conversion rates and time spent over the test period.

5. Analyze Results: Use statistical methods to compare the two groups.

Once we have defined our Hypothesis criteria, we can move forward with getting the size of the samples for Hypothesis testing.

Sample Size Calculation

We will calculate the required sample size for our A/B test. This ensures statistical significance.

baseline_conversion_rate: This is the conversion rate of the control group, often based on historical data or an educated estimate.

desired_effect_size: This is the smallest difference in conversion rates between the control and treatment groups that you want to be able to detect. It is typically expressed as a percentage of the baseline conversion rate.

alpha: This represents the probability of rejecting the null hypothesis when it is actually true.

power: This is the probability of correctly rejecting the null hypothesis when it is false. 0.8 indicates an 80% chance of detecting a true effect if it exists.

Create Control and Treatment groups and calculate conversion rates for each group:

We will randomly be assigning Control and Treatment group with leads from Rule based and Predictive model using sample size (number_of_leads_per_group) we have calculated in previous step. After this we will combine this list, shuffle it, and give it to our Sales team. We will keep the metadata related to which leads belongs to Rule based or Predictive model so we can perform significance test later.

Next, when we get the results from the Sales team regarding lead conversion, we will calculate conversion rates for Rule and Predictive lead groups for comparison. You can have your leads records in 2 different datasets for rule based and predictive models respectively and then randomly select the leads from those for Control and Treatment group.

Significance Test to evaluate conversion rate for both groups:

We will perform a test on the conversion results of Control (Rule based) and Treatment (Predictive based) groups to understand the significance of conversion rate.

To compare the conversion rates between two groups (in this case, rule_based vs ml_based), where each group has a binary outcome (converted or not converted), a more appropriate test would be the two-sample proportion z-test or t-test for comparing proportions. These tests are suitable for comparing proportions or percentages between two groups.

If we accept Alternative Hypothesis from the result, it means that there is difference in conversion rates of rule based and predictive models, we can further compare the conversion rates calculated in previous step to make sure if predictive model was more effective than rule based, we can further investigate experiment results and take final decision.

Full Sample Code:

Evaluating Metrics

Metrics to Consider:

  • Conversion Rate: Proportion of leads converting to desired actions (e.g., sign-ups to enterprise plans).
  • Time Spent: How long leads engage with our platform. We add this to our hypothesis too, it is better to use multiple metrics to evaluate an experiment to minimize the bias.

Conclusion and Next Steps

After 45 days, we will evaluate the A/B test results and share them with the Sales Manager. 45 days were chosen based on the current lead conversion cycle. Remember to communicate findings effectively and consider any additional insights from the sales team. In Part 2 of this series, I will go through the results in detail. Happy testing!

Interested to learn more about Sinch and perhaps become a part of our team? Check out our Careers page!


  1. Evan MIller. “Evan’s Awesome A/B Tools”. Personal Website. Available at https://www.evanmiller.org/ab-testing/sample-size.html. Last accessed on March 14th, 2024.
  2. Lauren Thomas. “Control Groups and Treatment Groups | Uses & Examples”. Scribbr. Available at https://www.scribbr.com/methodology/control-group/. Last accessed on March 10th, 2024.
  3. Angie Teaches. “2 Proportion Hypothesis Test | Two Sample Proportion Z-Test with TI-84”. Youtube. Available at 2 Proportion Hypothesis Test | Two Sample Proportion Z-Test with TI-84. Last accessed on April 23rd, 2024.
  4. Adam Hayes. “How Stratified Random Sampling Works, With Examples”. Investopedia. Available at https://www.investopedia.com/terms/stratified_random_sampling.asp. Last accessed on March 10th, 2024.
  5. Arpit Rawat. “Sample Codes for Medium blog post — A/B Testing: A Comprehensive Guide for Scoring Models”. Github. Available at https://gist.github.com/arpitmailgun. Last accessed on April 23rd, 2024.



Sinch Blog

Follow us to stay connected to our minds and stories about technology and culture written by Sinchers! medium.com/wearesinch