<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Abdullah Reza on Medium]]></title>
        <description><![CDATA[Stories by Abdullah Reza on Medium]]></description>
        <link>https://medium.com/@abdullahreza?source=rss-ba2f532e5566------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*fFzR-M4c5WkolrVe3KqCGA.png</url>
            <title>Stories by Abdullah Reza on Medium</title>
            <link>https://medium.com/@abdullahreza?source=rss-ba2f532e5566------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Fri, 29 May 2026 17:55:05 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@abdullahreza/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Find your Core Customers and Determine Customer Segments]]></title>
            <link>https://medium.com/@abdullahreza/find-your-core-customers-and-determine-customer-segments-e5a49180d95c?source=rss-ba2f532e5566------2</link>
            <guid isPermaLink="false">https://medium.com/p/e5a49180d95c</guid>
            <category><![CDATA[marketing-technology]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[customer-segmentation]]></category>
            <category><![CDATA[supervised-learning]]></category>
            <category><![CDATA[unsupervised-learning]]></category>
            <dc:creator><![CDATA[Abdullah Reza]]></dc:creator>
            <pubDate>Sun, 23 Aug 2020 18:06:52 GMT</pubDate>
            <atom:updated>2020-08-23T18:06:52.809Z</atom:updated>
            <content:encoded><![CDATA[<h4>Data-Driven Approach for Customer Segmentation</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/809/1*BbjLEagmAEuViTyyvj2-LQ.png" /></figure><p>Customer Segmentation can be defined as the process of dividing customers into different groups based on the needs, interests, habits, and preferences of your customers. In business-to-consumer (B2C) marketing, customers are often grouped based on demographics such as age, gender, marital status, income level and locations etc.</p><p>In this post, we will segment the customers based on the data provided by Arvato Financial Solutions, a subsidiary of Bertelsmann. The data provided by Arvato consists of demographics information of the general population as well as the demographics of current Arvato customers.</p><h4>Problem Statement</h4><ol><li>Given the demographics of the current customers determine the segments of the general population who are most likely to be converted into customers</li><li>Identify the groups who are most likely to respond to the marketing campaign and turn into customers</li></ol><h4>Objective</h4><p>Since the datasets are pretty large, we need to use different techniques for our analysis.</p><ol><li>To identify the demographics of the core customer base from the general population, unsupervised machine learning algorithm will be used.</li><li>To identify the target audiences for marketing campaign supervised machine learning algorithm will be used.</li></ol><p>There are four datasets provided by Arvato Financial Services.</p><ol><li>Demographics of the general population for unsupervised learning</li><li>Demographics of the customers for unsupervised learning</li><li>The training dataset for supervised learning</li><li>The test dataset for supervised learning</li></ol><p>The analysis can be divided into two parts: i) unsupervised learning ii) supervised learning. However, the dataset for unsupervised learning consists of more than 350 features. So in the beginning, we will explore the dataset of demographics of the general population to get familiar with each feature and develop a framework to clean the dataset which will be utilized for other datasets.</p><h4>Exploration and Data Wrangling</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*rucUjWRJz1xwm_wN" /><figcaption>Photo by <a href="https://unsplash.com/@markusspiske?utm_source=medium&amp;utm_medium=referral">Markus Spiske</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>Two main datasets were provided by Arvato Financial Solutions as csv files:</p><ul><li>Udacity_AZDIAS_052018: Demographic data of the general population of Germany has 891,211 rows, 366 features</li><li>Udacity_CUSTOMERS_052018: Demographic data for customers of a mail-order company has 191,652 rows and 369 features</li></ul><p>Each row represents unique individuals and <strong>CUSTOMERS</strong> dataset has three extra features: CUSTOMER_GROUP, ONLINE_PURCHASE, and PRODUCT_GROUP. These features are redundant and can be omitted for further analysis.</p><p>In addition, two more files where the description of each feature and their mapped value were provided.</p><ul><li>DIAS Attributes — Values 2017: Features and mapped values associated with each feature</li><li>DIAS Information Levels — Attributes 2017: Description of each feature and their type</li></ul><p>After referring to the descriptive files especially <strong>DIAS Attributes — Values 2017 </strong>and comparing it with AZDIAS, it was apparent that not all features were described in the attributes file. In fact, 94 features were unique to AZDIAS with no descriptions available.</p><p>Following steps were taken to clean the dataset:</p><ol><li>Dropped features that were not described in the <strong>Attributes</strong></li><li>Replace unknown values with NaN</li><li>Remove features where NaN count is more than 20%</li><li>Remove rows where NaN count is more than 20%</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/934/1*z5DH2xZbeexLuSxHowmRLw.png" /><figcaption>Distribution of Missing Value Count on Each Column</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/956/1*JcDkkINdnabkMCyOerz9LQ.png" /><figcaption>Distribution of Missing Value Count across Each Row</figcaption></figure><p>The above approaches were applied to <strong>CUSTOMERS</strong> dataset and there were 188,439 rows and only 37 columns. For further analysis, features, common between <strong>CUSTOMERS</strong> and <strong>AZDIAS</strong> dataset were kept and unique features to <strong>AZDIAS</strong> were dropped. Eventually, there were 737,288 rows and 37 columns in <strong>AZDIAS </strong>and 188,439 rows and 37 columns in <strong>CUSTOMERS</strong> dataset.</p><h4>Feature Encoding and Engineering</h4><p>Four more features (LP_LEBENSPHASE_GROB, LP_STATUS_GROB, LP_FAMILIE_GROB and GEBURTSJAHR) were dropped from both datasets and the remaining datasets contain two types of features: numeric and ordinal. These features could be left without encoding.</p><p>However, there were still missing data (NaN). Therefore, these datasets need to be imputed and the imputation strategy was <strong>median. </strong>Median was chosen over <strong>mean</strong> since most data were ordinal in nature. Next the features were standardized.</p><p>Once the features were standardized, the datasets were ready for unsupervised learning. Since the datasets are highly dimensional in nature (37 features), Principal Component Analysis (PCA) was applied to reduce dimensionality.</p><h4>Unsupervised Learning</h4><p>By applying PCA, it was determined that 15 features explain more than 90% of the variance. The following figure shows the scree plot for the PCA with all components.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/915/1*YUR9QnZaoRuL5-qsu-TEfA.png" /><figcaption>Scree Plot for PCA Analysis</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/920/1*OLHsDUDFBvd92w_CB7UQ2w.png" /><figcaption>Explained Variance by Components</figcaption></figure><p>Equipped with the knowledge, 15 features were selected as input for KMeans Clustering. To determine the optimum number of clusters elbow plot was generated.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/909/1*NKrRfgaBai-FGfGpNeg3ig.png" /><figcaption>Elbow Plot</figcaption></figure><p>In the above image, it was hard to detect clear elbow. For the KMeans clustering, the number of clusters was set to 6. KMeans clustering with 6 clusters was applied to the general population dataset as well as customers dataset.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/939/1*Hh585pXeFUQ0Bk2PVIbchA.png" /><figcaption>Distribution of Clusters</figcaption></figure><p>Cluster 0, 2 and 3 from the general population are well represented in customers while 1, 4 and 5 are underrepresented. Looking at the features for positive correlation, it was determined that 1, 4 and 5 represent a population who are cultural minded, socially active and aware of the product.</p><h4>Supervised Learning</h4><p>For supervised learning, two more datasets were provided.</p><ol><li>MAILOUT_TRAIN: demographic data for individuals who were targets of a marketing campaign; 42 982 persons, 367 features</li><li>MAILOUT_TEST: demographic data for individuals who were targets of a marketing campaign; 42 833 persons, 366 features</li></ol><p>Both datasets are similar except MAILOUT_TRAIN included a <strong>RESPONSE</strong> column which is highly unbalanced; only about 1.2% responded.</p><p>Since the dataset has the same features as <strong>AZDIAS, </strong>previous data wrangling techniques were implemented to the <strong>MAILOUT_TRAIN</strong> and <strong>MAILOUT_TEST </strong>datasets. One exception <strong>RESPONSE</strong> was extracted from <strong>MAILOUT_TRAIN</strong> for training the model. In addition, no rows were dropped since it would create unbalanced data.</p><p>Five classification models were applied: Logistic Regression, Bagging Classifier, Random Forest Classifier, Ada Boost Classifier and Gradient Boosting Classifier. Out of five, Logistic Regression yields the best result; a score of <strong>0.55</strong>.</p><h4>Result and Conclusion</h4><p>The goal of this project was to apply unsupervised learning techniques to identify segments of the population that form the core customer base and determine population segments of potential customers. The CUSTOMERS data has lots of missing values. Therefore, after cleaning the dataset the number of features reduced significantly from 369 to 37. Furthermore, redundant features were dropped to reduce the number of features to 33.</p><p>Training the dataset was particularly difficult on the provided workspace as well as the local computer. Therefore, feature reduction facilitated the execution time. Regardless, to improve the performance of the models following actions are needed to be taken:</p><ol><li>Drop fewer columns: explore each feature and determine whether the feature should be dropped.</li><li>Impute features with a different strategy based on feature type i.e. numerical, categorical and ordinal.</li><li>Apply Multi Factor Analysis instead of PCA</li><li>Try different classification models with hyperparameter tuning.</li></ol><p>Thank you for reading the article and feel free to leave a comment below or connect with me on <a href="http://www.linkedin.com/in/airreza">LinkedIn</a> 🙂</p><p><strong>Acknowledgement: </strong>I would like to thank <a href="https://medium.com/@tobias.gorgs">Tobias Gorgs</a> for his <a href="https://medium.com/@tobias.gorgs/how-to-use-machine-learning-for-customer-acquisition-bcd52f42042d">article</a>. The article was helpful to do the analysis.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e5a49180d95c" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How can you Leverage Data in OOH Marketing & Advertising]]></title>
            <link>https://medium.com/@abdullahreza/how-can-you-leverage-data-in-ooh-marketing-advertising-9220f350ce61?source=rss-ba2f532e5566------2</link>
            <guid isPermaLink="false">https://medium.com/p/9220f350ce61</guid>
            <category><![CDATA[marketing]]></category>
            <category><![CDATA[location-intelligence]]></category>
            <category><![CDATA[out-of-home-advertising]]></category>
            <category><![CDATA[ooh-advertising]]></category>
            <category><![CDATA[advertising]]></category>
            <dc:creator><![CDATA[Abdullah Reza]]></dc:creator>
            <pubDate>Sun, 26 Jul 2020 09:58:23 GMT</pubDate>
            <atom:updated>2020-07-27T13:38:10.168Z</atom:updated>
            <content:encoded><![CDATA[<h4>Location intelligence in OOH</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*C3gdu2ySsCPMSoC_" /><figcaption>Photo by <a href="https://unsplash.com/@johnelfes?utm_source=medium&amp;utm_medium=referral">john elfes</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>If you are in the field of marketing, be it a brand, an agency, or a media owner, chances are you are not an enthusiast of Out-of-Home (OOH) marketing and advertising. Measuring the Key Performance Indicators (KPIs) is painstakingly hard to justify the investment.</p><p>Yet time and again OOH proves to reduce the cost of advertising significantly. So the question is how do you maximize your reach without spending a fortune and ultimately how do you justify it?</p><p>In traditional OOH, the number of audiences is measured by traffic counts. While traffic count was the de facto standard for decades, it did not tell how many people saw the billboard ad. So, the OOH media introduced DEC (Daily Effective Circulation) which is essentially traffic count excluding the traffic from the opposite direction, to measure the KPIs of the OOH marketing.</p><p>However, since the underlying measurement is still based on traffic count, DEC inherits all the baggage that comes with traffic count such as determining the accuracy of the data.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*dkZ7l3nCymGnVIR0" /><figcaption>Photo by <a href="https://unsplash.com/@enginakyurt?utm_source=medium&amp;utm_medium=referral">engin akyurt</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><h4>How can you Increase the Accuracy of Traffic Count</h4><p>Instead of traditional traffic count, you can venture into location intelligence and reap benefits from it. Even when your customers are not looking at the phone or not using an app they still leave digital footprints that can be utilized for OOH marketing and advertising.</p><p>Out of many digital trails i.e. data left by the users, the most relevant data for OOH is Geodata. Geodata can simply be point coordinates (latitude &amp; longitude) or it could be associated with time and other useful information. You can think of it as a snapshot of consumers at different locations and times. When you stitch billions of these snapshots i.e. Geodata points, you can essentially create a movie about consumer behavior, their movement pattern, etc.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Vzg6jX8sNT6zqMVIFHVADw.jpeg" /><figcaption>Photo by <a href="https://geomarketing.com/geomarketing-101-what-is-geo-targeting">Geomarketing</a></figcaption></figure><p>By tapping into the Geodata you can create a Geopath and identify the locations where you should display your ad and how many people have the potential to see your ad. This approach is more transparent and empirical than traffic count or DEC.</p><h4>What can you learn from Geodata</h4><p>A few of the insights that you can gain from Geodata are:</p><ul><li>Number of traffic, number of pedestrian and vehicle occupancy</li><li>Traffic speed and speed of walking in relationship to the display</li><li>Attribution i.e. user touchpoints to design a map of consumer behavior</li><li>Draw a heatmap of consumers over a region</li></ul><p>Let’s take a look at the economic capital of Malaysia, Kuala Lumpur, and see how Geopath can help you to make better OOH marketing decisions. Let’s answer the following three questions:</p><ul><li>Where should a brand owner or media buyer display their content to maximize the number of audiences?</li><li>What is the correlation between unseemingly independent factors such as time of the day, point of interest, and the number of roads that affect OOH?</li><li>Where should media owners build their next billboard?</li></ul><h4>Determine the Ideal Billboards</h4><p>Kuala Lumpur has thousands of billboards. Using coordinates from mobile devices we can determine the number of audiences in the vicinity of a billboard.</p><p>In the image below you can see the locations of the billboards (points) and the number of audiences (color gradient). Billboards with yellow color on the spectrum have the highest number of audiences while billboards with blue have the lowest number of audiences.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/1*TcLmzNTxn5ATVilcKxM04w.png" /></figure><p>From the image, it is clear that only a handful of billboards can boast a significant number of audiences. Less than 1% of all the billboards in Kuala Lumpur capture the attention of more than 70,000 audiences.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/399/1*f3P_5lDiiczRx4eZozEPWQ.png" /><figcaption>Number of Billboards Distribution based on Audience Number</figcaption></figure><p>If you dive deeper and zoom in you can determine how a billboard is performing compared to its neighbor billboards. The size of the billboard i.e. point/circle determines the tier it belongs to based on the location and the color corresponds to the number of audiences.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/1*PtHYwkJomka5aD9k8KE22A.png" /></figure><p>Generally, lower-tier billboards (smaller circles) have a lower number of audiences. However, the same tier billboards also have a wide range of audience numbers.</p><p>Equipped with this knowledge, a brand or a marketer can target the high performing billboards to maximize their ROI.</p><h4>Determine the Correlation of Various Factors</h4><ul><li>What is the correlation between the number of hours and viewership?</li><li>Do POIs in the vicinity of the billboards improve viewership?</li><li>How about the number of roads: do they have significant effects on viewership?</li></ul><p>Common sense dictates that hour is completely unrelatable to the number of POIs and the number of roads. In contrast, locations with a higher density of POIs tend to have a higher number of roads. Both claims are supported by the following plot.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/424/1*9ORQxeHgVPQCLigRWbxDaQ.png" /></figure><p>Now to the more important question: how does the number of audiences relate to the hour, number of roads, and number of POIs? Correlation between POIs and the number of audiences are significantly higher compared to the hour (time of the day) and the total number of roads.</p><p>Does this mean marketers should choose a billboard with a higher number of POIs around the billboard? Yes, they should.</p><p>How about the number of roads: is it a deciding factor to pick the ideal billboards? Not necessarily. For example, billboards on the major highway have a higher possibility to reach more audiences than billboards on the intersection of a residential area.</p><p>The time of the day is an interesting factor in the mix. While it has the lowest correlation of the bunch, it should not be discarded. Instead, it should be utilized for a more granular analysis of each billboard.</p><h4>Where should media owners build their next billboard?</h4><p>From geo data, we can estimate the number of audiences within an area. Furthermore, we can find a relation among various factors that affects viewership. If we incorporate this information with costs (OPEX, CAPEX, and overhead cost) we can determine the ROI and see which locations could turn out to be profitable.</p><h4>Conclusion</h4><p>In this article, I shared briefly how you can leverage location intelligence for your next OOH campaign or how you can maximize your profit if you are a media owner. We tried to answer some basic questions:</p><ul><li>Firstly, how geodata can help you to determine a more accurate number of audiences?</li><li>Secondly, what is the relationship among various factors that affect the OOH campaign?</li><li>Lastly, how can you strategize and deploy your media assets with geodata?</li></ul><blockquote>How will you use the Geodata for your OOH Marketing Needs?</blockquote><p>Thank you for reading the article and feel free to leave a comment below or connect with me on <a href="http://www.linkedin.com/in/airreza">LinkedIn</a> 🙂</p><p><strong>Note: </strong>The data is based on the work done by <a href="https://www.movingwalls.com/moving-walls-plan-buy-and-measure-outcome-based-outdoor-advertising">Moving Walls</a>. A big shout out to them.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=9220f350ce61" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>