Dealing With Logged-Out Users In A/B Testing

4 min readSep 29, 2017

Intro

Recently, I have been working on designing a/b testing strategy and architecture in my current company. It was really challenging but also interesting and inspiring. Dealing with logged-out users is the most challenging problem.

For e-commerce website, customers most likely see products and add products to their shopping cart in logged-out state. If you ever consider to track events when user is in logged-out state and analyze user behaviour based on these events, you might be desperate to google a plausible solution. That was me at least.

Who’s the logged-out user?

Customer may see some products and add few to their shopping cart. Right now, they may still not login. After a while, customers decided to checkout their shopping cart and they need to login now. Then, they could checkout shopping cart.

The above scenario is common in our life, and there are two type of events, that is add_to_card(ATC) and Checkout, have been triggered by our customer. Before login, all ATC events don’t have user information, and so we don’t know who is the actual user; after login, ATC and Checkout events can be assigned with logged-in user information so that we know who is the actual user.

For those events missing user information, it’s possible to find out who triggered it. ATC events happened before login can be reassigned user information once login, and this is based on the assumption that it’s the same user across login and logout state. In other words, if your user never login, there is no way to figure out who he/she is.

Currently, Google Analytics can provide such feature by its user-id feature and custom dimension. The user-id feature provides session unification, which will assign user-id to previous events in the same session; The custom dimension provides session stitching, which will assign user-id to previous events in the previous session.

Compare to session stitching, it’s much safe to use session unification since it’s unlikely there are different users across login and logout state in one session. But if the device is a phone or personal laptop, then session stitching might be a good choice since it could be the same person all the time.

The main idea is that based on some assumptions, we could link user-id and client-id together so that it’s possible to find out the user-id of those events containing client-id but user-id.

Which variant did our customer see？

With session unification and session stitching, we may know both client-id(like cookie) and user-id for an tracking event in the later analysis phase. In an a/b testing experiment, one has to assign a variant to user based on user-id before seeing the website. But the user-id is unknown before login, and we have to assign variant to client-id.

If one user has been assigned to variant A, she may see variant B when she come to website in logout state from another device. Once she login, we could switch to variant A since we know her user-id has been assigned to variant A before.

Now, the question is in which variant does the event happen. There are a couple of ways to deal with this situation:

Naive Strategy: before login events should be connected to variant A, while after login events should be connect to variant B.
Passive Strategy: if the variant assigned to client-id and user-id are different, we discard the whole session in the later analysis(Google Analytics use this strategy).

With Naive Strategy, one user session would be counted twice, and it may give us an incorrect experiment result; using passive strategy, we would lose many useful user data, though it could give us a more precise result later. Both strategies are not suitable for us, and we were wondering if there is a better solution?

The casual order events

A customer add a product into shopping cart when she is exposed to variant A, and she might be exposed to variant B after login. If she decide to checkout that product, can we say her checkout event is because of variant A?

This case bring us the concept: casual order events. It means checkout events are dependent on ATC events. For these casual order events, we could connect a series of events to one variant based on the first event’s variant. This strategy won’t lose important information while provides a correct result. It is based on the the casual relationship among events. An classical example would be ATC events cause Checkout events, while Checkout events cause Return Order events.

Conclusion

You may notice that the proposed solutions are all based on some assumptions, and it’s really like a small tricks to manipulate tracking data. It’s true, and we should be always careful to conclude any result from those tracking data.

Thanks to my talented colleague Jason who is working with me and always inspiring me!