Social Network Analysis Web App

A project with Orca Active

Published in

SMUBIA

6 min readAug 5, 2020

With the rising popularity and efficacy of influencer marketing in businesses, affiliate and referral sales make up a large portion of business revenue. Usually, data analysts/ scientists will process the data and identify important nodes and edges within a business customers network. However, what if there was a quick and simple way for anyone to identify these important customers within sales transactions?

Introduction

Orca Active, a female athleisure brand (due for launch end 2020) worked together with SMU BIA, to come up with a project to help them identify customers of major influence via Social Network Analytics (SNA). The objective is to come up with something simple where everyone in the team can make sense of the data even if they do not have any data analytics/ science knowledge. Users can identify customers with major influence, allowing Orca Active to engage them in potential referral sales and partnership opportunities, as part of their marketing strategy.

Methodology

SNA are based off network and graph models and there are numerous well-known methods such as the Lovain Method (optimizing of clusters) and PageRank Algorithm (Node ranking based on Number, Strength and Source of links). Among them, the model that we found most fitting for Orca Active, is the example of a “Kite Network” by David Krackhardt.

Although the Louvain Method for optimizing clusters can also be used in conjunction with the Kite Network, in this project, the focus will only be on the Kite Network. In a network model, the entities (people, organizations) are represented by nodes in the network. The links or edges represent the relationships between various nodes. In this example, Diane has the most number of connections and is a very important node in this network. On the other hand, Heather is also important because Ike and Jane will not be connected to the network if not for Heather. Therefore, Heather is known as a ‘bridge’ and Diane is known as a ‘leader’. These will be identified via metrics in the following section.

Metrics

Analyzing the overall influence of a customer can be split into Quantitative and Qualitative portions. Quantitative assessment involves numerical representations via metrics such as Degree of Centrality, Eigenvector Centrality and Betweenness Centrality. Qualitative assessment includes things such as personality, style, genuine communication with others without the intention to oversell, i.e. trustworthiness and likeability.

In this project, the focus is on the Quantitative assessment, using 3 main metrics, which are Degree of Centrality (number of direct connections to a node), Eigenvector Centrality (importance of a node and not just the number of connections) and Betweenness Centrality (to identify nodes that are strong connectors/ bridges).

Understanding the data

Orca Active uses smile.io to track referrals and their orders can be exported in CSV (Comma-separated values) format. The following snapshot shows the respective columns and mock data:

Mock data to illustrate data for CSV export from smile.io

The name of the referrer will fill up the first and last name column, and the customer that uses the referrer’s code will take up the friend’s first and last name column. Order numbers are unique, and the dates are in the following format as shown. The state represents the state of that order and can be either ‘pending’, ‘fulfilled’ or ‘cancelled’. Total amount spent represents the total amount spent on that transaction.

Mockup using Network X model

After a bit of research, I found that NetworkX is a great python package that allows us to calculate centrality metrics easily. Before working on the application, I created a mockup to test the functionality and usability of the package.

Creating the Web Application

The idea is to create a web application where the user can directly upload the CSV export from smile.io and display the network graph. To render graphs on client-side, I used AnyChart as it is an easy-to-use, and well-documented framework. For server-side, I used Django and NetworkX for the handling of the webpages and to calculate centrality metrics. The following shows the first draft that I came up with, displaying the graph and metrics from mock data.

Although I was quite happy with it, it still lacked user functionality as it could only display metrics. It would be great if there were some filters to enhance the usability of the application. Hence, I figured that by adding some date filters, Top N (top 1 to 20 results) for centrality, and button toggles between centrality metrics, usability can be improved. I also wanted to make the graph a little more interactive where the users can filter data by selecting important nodes and edges, and eventually export them into a CSV where they can play around with it even more. The following shows the second iteration of the application without dates and Top N Filters.

By clicking on the ‘Start’ button, the selection of the graph that filters the data will be recorded within the table below and can be exported out. This way, the user can select nodes and edges that they deem important, and it introduces flexibility in the way they want their data to be exported. Without clicking on the ‘Start’ button, the graph selection will only show data that belongs to the selected node or edge, and the cumulative selection or filters will not be recorded.

Even though at this stage I am quite happy with the progress, the graph still does not render the nodes and edges according to Top N for nodes and basket size (total order amount) for edges. I also added a CSV export for centrality data.

Final Product

Hosted on Heroku, the final product of the application can be seen here. You can also clone my project on GitHub and run it on localhost.

Conclusion

Even though this project mainly focuses on the Quantitative assessment of a customer’s influence, continuous monitoring of a customer or influencer’s Qualitative aspects is also important with referral sales and marketing. On top of that, for huge datasets that goes beyond the size of a CSV file, alternative solutions would have to be built around it specifically. As Orca Active is an up and coming startup, the CSV upload solution works well for now and will have to be adjusted with the growth of the business in the future.

Challenges Faced and Learning Pointers

The whole project was a challenge to me as it is my first time developing a web application. Along the way, there are a lot of unexpected bugs and problems that surfaced due to the initial lack of knowledge and understanding of how various components work.

A major obstacle for me was to deploy it on Heroku as I am very unfamiliar with deployment. After much googling, experimenting with solutions from StackOverflow and Heroku documentation for Django deployment, the web application was successfully (although not perfectly) deployed.

I’ve learnt that even though something might seem undoable at first glance (which was for me initially with the filter functions), with enough time, effort and perseverance, ultimately the end product will be rewarding and satisfying.

Acknowledgements

Special thanks to Tammi Chng, Wa Thone from BIA, LiXuan Chansan and Xinyi Tan from Orca Active for overseeing this project, and allowing me to be a part of it.

Thank you for reading.