<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Felipe Mezzarana on Medium]]></title>
        <description><![CDATA[Stories by Felipe Mezzarana on Medium]]></description>
        <link>https://medium.com/@felipe.mezzarana?source=rss-9827834ae97e------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/0*GrbiUdSdnUA8OfQN</url>
            <title>Stories by Felipe Mezzarana on Medium</title>
            <link>https://medium.com/@felipe.mezzarana?source=rss-9827834ae97e------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Mon, 01 Jun 2026 05:58:32 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@felipe.mezzarana/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Python + Excel - Generating Highly Customized Reports]]></title>
            <link>https://medium.com/@felipe.mezzarana/python-excel-generating-highly-customized-reports-edd3b2b8ae67?source=rss-9827834ae97e------2</link>
            <guid isPermaLink="false">https://medium.com/p/edd3b2b8ae67</guid>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[data-engineering]]></category>
            <category><![CDATA[excel]]></category>
            <category><![CDATA[programming]]></category>
            <dc:creator><![CDATA[Felipe Mezzarana]]></dc:creator>
            <pubDate>Sat, 08 Apr 2023 20:43:27 GMT</pubDate>
            <atom:updated>2023-04-08T20:45:27.234Z</atom:updated>
            <content:encoded><![CDATA[<h4>Learn how to use Python to create Excel reports with proper format, formulas and images!</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*AYUzAY21iAtTsRATZdsvsw.jpeg" /></figure><p>As a data engineer I’ve had many opportunities to drive value by creating an Excel file at the end of an ETL pipeline.</p><p>Data won’t always be used by data analysts and scientists only, often other areas need a tool that allows manipulations and changes on data, but they probably won’t know SQL to query directly from data warehouses, and that’s where Excel reports comes in.</p><p>First of all, it is important to understand if it is really necessary to use Python and increase the complexity of your pipeline. Excel allows connection to several external data sources, which is enough for many situations.</p><p>You should consider using Python to generate reports in cases where:</p><ul><li><strong>Version control is important</strong> - User recurrently needs access to historical data</li><li><strong>Formulas matters - </strong>User needs to actively change the data</li><li><strong>You have security concerns - </strong>Files need to be sent to external clients/users without access to data sources</li><li><strong>Multiple queries/data sources - </strong>Updates take a long time for the user and it is difficult to control possible changes</li></ul><h3>1 º Dataset</h3><p>Before starting, let’s quickly define and introduce the data set to be used for testing purposes. I’m going to use two datasets about Twitch information.</p><p>The first contains data about top games on Twitch 2016–2021. It is publicly available on <a href="https://www.kaggle.com/datasets/rankirsh/evolution-of-top-games-on-twitch">kaggle</a></p><pre>import pandas as pd<br><br>twitch_games_df = pd.read_csv(&#39;Twitch_game_data.csv&#39;)<br>twitch_games_df.head(5)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*e4JGXD8Zg7mlpYM2b-Hilw.png" /></figure><p>The second contains data on the top 1000 streamers from the past year, also available on <a href="https://www.kaggle.com/datasets/aayushmishra1512/twitchdata?resource=download">kaggle</a></p><pre>twitch_streamers_df = pd.read_csv(&#39;twitchdata-update.csv&#39;)<br>twitch_streamers_df.head(5)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*CmH2XYz3uSkutMz2VrMymw.png" /></figure><h3>2 º Generating and Formatting</h3><p>As you probably already know, generating an Excel file from a DataFrame can be as simple as running <em>DataFrame.to_excel(‘file_name.xlsx’). </em>However<em>, </em>there is a better way that allows for a high degree of customization.</p><blockquote>Please note, although you do not need to import extra libraries other than pandas, you will need to install xlsxwriter with “pip install xlsxwriter”</blockquote><p>Let’s start creating an Excel with multiple tabs.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/778ece60ccecaa56f2caf67cd20dc930/href">https://medium.com/media/778ece60ccecaa56f2caf67cd20dc930/href</a></iframe><p>We created the “base” file now it’s time for formatting! Please note , to avoid repetitions (we just want to learn!) let’s format only the first tab, “games_sheet”.</p><p>Also note that the following code is a continuation of the previous one.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/c57146b1b5e4aa91c476b4bdf5087a99/href">https://medium.com/media/c57146b1b5e4aa91c476b4bdf5087a99/href</a></iframe><p>Let’s go a little further and format the header. To improve the header we will format the cells and rewrite the column names, removing the snake_case style:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/78a161117d5ebe1c18a70d106e71ba02/href">https://medium.com/media/78a161117d5ebe1c18a70d106e71ba02/href</a></iframe><p>let’s take a look at the Excel generated after running the three blocks of code above:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*fyMTnRRwUqVi7d50jJHAaQ.png" /></figure><p>Way better than just running <em>df.to_excel(),</em> right?</p><blockquote>Tip: You can try to improve the formatting even more by adding a special formatting for text columns, with left alignment and larger width</blockquote><h3>3 º Adding Images/Charts</h3><p>Adding an image to a worksheet is a very straightforward process:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/55fbbcfb29a4bda68c407ded5ea43dd6/href">https://medium.com/media/55fbbcfb29a4bda68c407ded5ea43dd6/href</a></iframe><p>However, I would like to go one step further to show how we can use this feature to create amazing reports.</p><p>The trick here is that you can create any visualization with matplotlib, seaborn, or your preferred library, save the image provisionally, insert it into an excel and then delete it.</p><p>Lets start by creating a function that will save a bar plot about the average watch time of top 10 Twitch streamers.</p><blockquote>Ps: the goal here it’s not teaching how to create good visualizations, but if you’re interested in the subject, take a look at one of my other article: <a href="https://medium.com/@felipe.mezzarana/get-your-bar-chart-to-the-next-level-with-seaborn-b0eddfb38e1b">Get your bar chart to the next level with Python</a></blockquote><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/fcf4959aba26ad2e4adfb6392c935b4c/href">https://medium.com/media/fcf4959aba26ad2e4adfb6392c935b4c/href</a></iframe><p>This function will save the plot below as img.png in the current directory</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/954/1*QGxUVu4Jc3pP6Lz_sHGShg.png" /></figure><p>Now we can add it to our Excel report:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/a7c6bd3002e7cffa603844a6d32609bc/href">https://medium.com/media/a7c6bd3002e7cffa603844a6d32609bc/href</a></iframe><p>Final result:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ezOeKf7Z0OznZhO449IYEg.png" /></figure><p>Cool right? With this trick I’m sure you will be able to create impressive reports.</p><h3>4 º Adding Formulas</h3><p>There are two ways to insert formulas in excel. The first and easiest is using array formulas. Personally I’m not a fan of this option because it creates a formula for the entire column, and the user is not used to this format.</p><p>Even so, because it is simpler, I believe it is worth demonstrating. To do so, let’s create a simple feature in the twitch_streamers_df:</p><ul><li><em>watch_time_rate = Watch time(Minutes)/Stream time(minutes)</em></li></ul><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/6c2c0d05aa0bf14e803429fe613cde06/href">https://medium.com/media/6c2c0d05aa0bf14e803429fe613cde06/href</a></iframe><p>It will work, but note that the formula will be written as an array, and cannot be partially changed:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/197/1*MKz_wq2IRNJl0yHum3PqJQ.png" /></figure><p>The second option, which I particularly prefer, is to write the formula individually in each cell. To do so, it will be necessary to loop through each value:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/fe0cf39e99425f648c717ecb0929d356/href">https://medium.com/media/fe0cf39e99425f648c717ecb0929d356/href</a></iframe><p>The result:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/999/1*oIZIz3VO0WXo3nU9tL-vMQ.png" /></figure><p>Exactly as the user expects! Which certainly facilitates the use of your report, adding value to your work.</p><h3>Extra! Name your files according to the current date</h3><p>This is a quick and simple tip, but extremely useful. When generating files recurrently, it’s a good idea to add the generation date to the file name:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/c53797404c0daf8c80c123fb7c1849c8/href">https://medium.com/media/c53797404c0daf8c80c123fb7c1849c8/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*EWtREjVfpS9nu8a-21Cpgg.png" /></figure><p>It is a simple and useful solution to not generate duplicate files</p><h3>Final Thoughts</h3><p>Data engineers often avoid working with Excel for a number of very good reasons. However, it doesn’t matter how good your data pipelines or your data model are if they aren’t being used.</p><p><strong>Generating good reports in Excel is an excellent way to universalize data, create value, and give visibility to your work.</strong></p><p>I hope this guide can help you create professional reports. If you have any questions or feedback, please let me know in the comments.</p><p>Thanks for reading!!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=edd3b2b8ae67" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Overcoming Web Scraping Challenges]]></title>
            <link>https://medium.com/@felipe.mezzarana/overcoming-web-scraping-challenges-f8b59ae3c551?source=rss-9827834ae97e------2</link>
            <guid isPermaLink="false">https://medium.com/p/f8b59ae3c551</guid>
            <category><![CDATA[web-scraping]]></category>
            <category><![CDATA[scraping]]></category>
            <category><![CDATA[python]]></category>
            <dc:creator><![CDATA[Felipe Mezzarana]]></dc:creator>
            <pubDate>Sun, 08 Jan 2023 13:57:36 GMT</pubDate>
            <atom:updated>2023-01-08T13:57:36.102Z</atom:updated>
            <content:encoded><![CDATA[<h4>Learn how to create an organic header, find out the best way to rotate your IP, and how to reduce the crawler runtime.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*4bd09xQ-YESEEb28" /><figcaption>Photo by <a href="https://unsplash.com/ja/@mcgroom?utm_source=medium&amp;utm_medium=referral">Chase McBride</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>In this article I am going to talk about some of the biggest challenges that you may face while building a web crawler, and <strong>explore</strong> <strong>how we can overcome them, providing solutions that works at scale.</strong></p><p>I would like to address two challenges: <strong>how to avoid being blocked and how to decrease the crawler runtime. </strong>The solutions will be discussed across three topics:</p><ul><li><strong><em>How to Create an Organic Header</em></strong></li><li><strong><em>The Best Way to Rotate Your IP</em></strong></li><li><strong><em>Run Functions Concurrently</em></strong></li></ul><h3>1º How to Create an Organic Header</h3><p>Usually, the most commom tip people come across when looking for “<em>how to avoid being blocked”</em> is to send a header with a real user-agent, rigth? It’s a good advice, however, to create a more organic header, i.e a header that mimic the behavior of organic users, a steap futher is needed.</p><p>The first thing to do is to <strong>send a random user-agent</strong> instead of always the same one. We can easily do this with the help of the library <a href="https://github.com/fake-useragent/fake-useragent">fake-useragent</a>. This awesome lib allows us to generate random and valid user-agents. Take a look:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/9d3c1c75ab6501c1728bdf47f89fb1f1/href">https://medium.com/media/9d3c1c75ab6501c1728bdf47f89fb1f1/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*tDAOy-IgI9zSwfAtGD1dQQ.png" /></figure><p>Sending random user-agents will often be enough, but there is still room for improvement.</p><p>First of all, it’s important to understand which header a user (like you) are actually sending when accessing a site. To do so, go to the site you want to scrape, open the developer tools panel (inspect button) and enter the network tab.</p><p>In the network tab you will see all the requests made to build the webpage (you may need to refresh it). Go to the first one -<em>it should be the request for the address you entered, blue icon</em>- there you can find the headers you just sent.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/692/1*EtCwzto7dCwY06f-Jys75g.jpeg" /></figure><p>What you want to do now is to<strong> add these keys and values to your header</strong>. There are no rules here, and sometimes less is more. Each site checks different points, so I encourage you to try different combinations and see what works.</p><p>A good start is to add a platform and version that matches the user-agent (you can extract this information with some regex). Adding a referer can also be a good idea.</p><p>Let’s create a function that implement those concepts:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/319f2afefe23e8fabe92b1d35fd2991a/href">https://medium.com/media/319f2afefe23e8fabe92b1d35fd2991a/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*86m3LTK5QL_P8W1JW1JViQ.png" /></figure><p>That should get you off to a good start! Just avoid copying cookies and authorization keys. In case you really need to send cookies, it’s better to use a <a href="https://requests.readthedocs.io/en/latest/user/advanced/">session</a> or try to retrieve a “fresh cookie” with another request.</p><h3>2º The Best Way to Rotate Your IP</h3><p>A good header often isn’t enough. Many sites detect web scrapers by examining their IP. Therefore, we need a workaround. The most common solution is to rotate proxys. A proxy server assigns a new IP address from the proxy pool for every connection, so problem solved, rigth?</p><p>Well… not really. Free proxies are problematic for a few reasons, but most importantly they are simply not safe. There’s risk of malware, cookie theft, some proxy servers are even set up to be a front for data mining and identity theft. <strong>You should never use free proxies!</strong></p><p>Paid proxies, on the other hand, are reliable and normally can solve the problem, although prices can escalate quickly, making medium and large projects completely unfeasible.</p><p>There is a slightly unknown solution capable of solving all these problems. Basically, we can use the AWS APIGateway service together with the library <a href="https://github.com/Ge0rg3/requests-ip-rotator">requests-ip-rotator</a> to send requests from different IPs each time.</p><p>You will have to <a href="https://aws.amazon.com/free">create an AWS account</a> with a valid credit card, but you don’t need to spend any money since <strong>the first million requests will be free, and after that you will only be charged ~$3 per million request.</strong></p><p>You will also need to retrieve credentials to use AWS services, to do so, just follow the steps <a href="https://docs.aws.amazon.com/powershell/latest/userguide/pstools-appendix-sign-up.html">here</a>. I highly recommend that you don’t use these credentials directly in your code, instead, you can export them as environment variables:</p><pre># On windows CMD<br>setx AWS_ACCESS_KEY_ID your_access<br>setx AWS_SECRET_ACCESS_KEY your_key<br><br># On Linux - current session<br>export AWS_ACCESS_KEY_ID your_access<br>export AWS_SECRET_ACCESS_KEY your_key<br><br># On Linux - permanently<br>nano ~/.bashrc<br># Will open a script file that&#39;s executed when a user logs in<br># Add at the end of the file <br>export AWS_ACCESS_KEY_ID your_access<br>export AWS_SECRET_ACCESS_KEY your_key<br># Save and reset the terminal</pre><p>Now that you have the AWS credentials defined, the process to rotate the sent IP is simple:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/a8d196c645d3c061caad1b7208438361/href">https://medium.com/media/a8d196c645d3c061caad1b7208438361/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*cZjKyq9cyS_Uj-9ru-IXtw.jpeg" /></figure><p>Please, note that this method won’t always work. From the docs: <em>“these requests can be easily identified and blocked, since they are sent with unique AWS headers (i.e. “X-Amzn-Trace-Id”).” N</em>evertheless, in my experience this method has a good success rate, and it is such an efficient and simple solution that it is worth trying.</p><h4>Important:</h4><p>From de docs: <em>“Please remember that if gateways are not shutdown via the </em><em>shutdown() method you may be charged in future.” W</em>hen we use the with statement (as in the example) it is not needed to call the shutdown method.</p><p>However if you have any connection issue during the requests, the gateway will remain open an you need to close it manually, either by de AWS console or by opening again a gateway with the same name and closing it</p><h3>3º Run Functions Concurrently</h3><p>As soon as your projects start to scale, and requests go from tens to hundreds and thousands, it’s inevitable to realize that web scraping is a high time consuming job. In larger projects, a high runtime can even makes it unfeasible.</p><p>Luckily, there are a few solutions out there, usually involving some kind of concurrency. I don’t want to go into the technical details of each option because this might deviate too much from our goal, instead <strong>I will expose in a practical way the async solution, </strong>which usually gives good results in this kind of task.</p><p>Basically, we will use the packages <a href="https://docs.python.org/3/library/asyncio.html">asyncio</a> and <a href="https://docs.aiohttp.org/en/stable/">aiohttp</a> to run requests asynchronously, for comparason purposes, lets first create a function that executes requests synchronously.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/1139ab99ea545e3591c1900010751bce/href">https://medium.com/media/1139ab99ea545e3591c1900010751bce/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/988/1*-2ORtUIpziPiUpCEsHlvbA.jpeg" /></figure><p><strong>It took approx. 30s</strong>. Let’s see how to do the same thing asynchronously:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/5444f5d58bce7f00a79193bed61144b7/href">https://medium.com/media/5444f5d58bce7f00a79193bed61144b7/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/947/1*jTDkAA-1QryI7Gyq32NE-A.jpeg" /></figure><p><strong>It took approx. 8s, 3.55 times faster! </strong>Writing functions that run with concurrency may be tricky at first, but it’s definitely worth the effort.</p><p>Just keep in mind that intensive scraping can slow the website, so try to use this tatic only when you need to scrape different websites at the same time, or if you really have no option, scrape at times when the site receives less access.</p><h3>Final Thoughts</h3><p>Web scraping is an amazing tool to collect data available on the web, and came in hand in many situations. The topics discussed here should give you good start if you intend to develop scalable web scraping crawlers.</p><p>I hope this guide may help you in building solid solutions. If you have any questions or feedbacks, please let me know in the comments.</p><p>Thanks for reading!!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=f8b59ae3c551" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[5 Cool Python Tricks to Make Your Life Easier]]></title>
            <link>https://medium.com/@felipe.mezzarana/5-cool-python-tricks-to-make-your-life-easier-8086972e7435?source=rss-9827834ae97e------2</link>
            <guid isPermaLink="false">https://medium.com/p/8086972e7435</guid>
            <category><![CDATA[python]]></category>
            <category><![CDATA[programming]]></category>
            <category><![CDATA[coding]]></category>
            <category><![CDATA[pandas]]></category>
            <dc:creator><![CDATA[Felipe Mezzarana]]></dc:creator>
            <pubDate>Sun, 13 Nov 2022 17:34:20 GMT</pubDate>
            <atom:updated>2022-12-03T23:16:00.812Z</atom:updated>
            <content:encoded><![CDATA[<h4>Improve your productivity and code quality with these quick tips.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/490/1*h_qv-uEOnzMCazKo5nHHsA.jpeg" /><figcaption>Photo by <a href="https://unsplash.com/@juliusdrost?utm_source=medium&amp;utm_medium=referral">Julius Drost</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>Who doesn’t like a quick trick to get something done faster, more efficiently or simply more elegantly?</p><p>I made this list to share five random tricks that I learned /developed working with data and using Python every day. let’s get to it!</p><h4>1 º Quickly renaming all columns</h4><p>In many situations, changing column names may be a bad choice. However, when your goal is simply to analyze data or generate visualizations, using a DataFrame with columns names containing whitespace, accents or symbols can be quite annoying.</p><p>It’s easier to select columns with standardized names, not to mention when someone left that unwanted whitespace at the end of an Excel column name. <strong>So this is how we can quickly transform all columns names to snake_case pattern:</strong></p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/8bf19f36246c22fa41fc0849f8a7ca18/href">https://medium.com/media/8bf19f36246c22fa41fc0849f8a7ca18/href</a></iframe><h4>2º Creating your own modules</h4><p>In the first trick we created a nice function, now we just need to copy and paste it into all the scripts we intend to use it… right?</p><p><strong>Please, don’t! </strong>how about just importing the function exactly like you usually do with any other library? It’s easier than it looks and it can be very useful, both to make your life easier and to share your work with others.</p><p>You just need to create a python file (.py) containing your function, and copy that file to the path where Python checks for modules and packages. To find out what path Python checks, just run in your favorite IDE:</p><pre>import sys<br>print(sys.path)</pre><p>Now that you copied the file to one of the printed paths (let’s say you named it<em> ‘my_first_module.py</em>’) you will be able to import it like any libary:</p><pre>import my_first_module<br><br># Calling the function<br>my_first_module.rename_columns(any_df)</pre><h4>3º <em>Generating fake data</em></h4><p>Whether for testing or learning purposes, we often need to create fake variables. For example, you might need fake data to write unit tests or to play with a new library.</p><p>Of course, we can always import or create these variables from scratch, but there is a much easier and more versatile solution! I’m talking about the Faker library, with it we can generate random data of different types and characteristics, take a look:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/2fa316de9f7bc3b584153b9c2aaa37e4/href">https://medium.com/media/2fa316de9f7bc3b584153b9c2aaa37e4/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/970/1*F5SBv-dWj4HnurHx8MZvKA.png" /></figure><p>I showed only a few examples, there is still much more to explore in the <a href="https://faker.readthedocs.io/en/master/index.html?highlight=install#basic-usage">documentation</a>, like selecting a specific language and other data options.</p><h4>4º Using f-strings like a pro</h4><p>If you study/work with Python you probably already use f-strings. If you still don’t know it, no problem, it’s a simple concept. Basically, using the letter “f” before the string allow variables to be inserted into the text, inside {}. Something like that:</p><pre>age = 29<br>print(f&#39;My age is {age}&#39;)</pre><p>f-strings are awesome! But we can make them even more elegant and professional with a series of formatting options, allowing, for example, to improve user interfaces or generate better custom logs. Take a look:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/2737d69a016b8cd91fa18903d8d882e5/href">https://medium.com/media/2737d69a016b8cd91fa18903d8d882e5/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/911/1*jQTJvgLHa3FD4Y952MrjHw.png" /></figure><h4>5º Loading bar</h4><p>Have you ever needed to run a code that goes through a long loop and was left wondering if there was still a lot of time to finish or if everything was going as expected?</p><p>Your problems are over! How about developing an elegant and reusable solution? here is how I did it:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/0e35b75e6406c63e250d845733da51f5/href">https://medium.com/media/0e35b75e6406c63e250d845733da51f5/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*otHD1mTopEllu3gAkkXDDg.png" /></figure><h4>Extra!! A function to analyze any DataFrame</h4><p>There are some basic analysis that needs to be done on any imported/generated DataFrame. How about creating a function to perform these repetitive steps and just importing it <em>(as shown in the second trick)</em></p><p>I’ve already developed this function for my personal use, and I thought it would be cool to make it available here for anyone who wants to use, modify, or have as a basis to create their own!</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/71b4b5c6355bdee182d4e6be04d0e6cb/href">https://medium.com/media/71b4b5c6355bdee182d4e6be04d0e6cb/href</a></iframe><p>To exemplify, I will apply this function in a dataframe that contains information about customer characteristics and payment history of a credit card</p><pre>credit_card_history_df = pd.read_csv(&#39;credit_card_history.csv&#39;)<br>analyse_df(credit_card_history_df)</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*v1PjfjA1vsHTUf7N3YkhWw.png" /></figure><p>That’s it! Thanks for reading, I hope you enjoyed and learned something new. If you know any other tricks or have any suggestions, please let me know in the comments!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=8086972e7435" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How to Deploy a Machine Learning Model to the Cloud]]></title>
            <link>https://medium.com/@felipe.mezzarana/how-to-deploy-a-machine-learning-model-to-the-cloud-1c9ca637897c?source=rss-9827834ae97e------2</link>
            <guid isPermaLink="false">https://medium.com/p/1c9ca637897c</guid>
            <category><![CDATA[docker]]></category>
            <category><![CDATA[cloud]]></category>
            <category><![CDATA[aws]]></category>
            <category><![CDATA[data-engineering]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Felipe Mezzarana]]></dc:creator>
            <pubDate>Thu, 20 Oct 2022 23:55:07 GMT</pubDate>
            <atom:updated>2024-06-10T15:47:24.488Z</atom:updated>
            <content:encoded><![CDATA[<p>Deploying a ML model with FastAPI, Docker and AWS EC2 to make it available to end-users\production environment.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*IKXmk6m66px1eRMH" /><figcaption>Photo by <a href="https://unsplash.com/@ilumire?utm_source=medium&amp;utm_medium=referral">Jelleke Vanooteghem</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>After a lot of study and hours of coding you developed a ML model! That’s great, but the work doesn’t end here. It’s still necessary to ask how this model will be used, and this is where the deployment phase comes in.</p><p>When you don’t need immediate results, like in a periodic job, the best deployment solution will probably be <strong>batch predictions</strong>. This can be done by simply calling the predict function or using scheduling tools like Airflow.</p><p>However, in many cases we need an <strong>on-demand </strong>service<strong> </strong>with <strong>near real-time predictions</strong>. As an example, we can mention recommendation systems, fraud detection, search tools, medical diagnostic, etc. In this article, we will cover how to create a web service for prediction, solving this kind of problem.</p><p>To do so, we will go through three steps:</p><ul><li>Create a REST API with FastAPI web framework.</li><li>Build a Docker image to run the server through a container.</li><li>Hosting the Docker container in an AWS EC2 instance.</li></ul><h3>The Model</h3><p>The purpose of this article is not to teach you how to build an ML model. Therefore, we will use an already developed XGBoost model capable of predicting the shipping cost of a product.</p><p>You may find it interesting to take a look at how this model was build, since during its development I approach a series of important subjects such as: data processing, feature engineering, metric definition, hyperparameter tuning, model selection, model evaluation, and much more. You can find all the steps (as well as the source code used in this article) in the repository below.</p><p><a href="https://github.com/FelipeMezzarana/shipping_price_estimate">GitHub - FelipeMezzarana/shipping_price_estimate: ML Model to estimate the shipping price of an order, based on one e-commerce dataset + Deploy with FastAPI, Docker and AWS</a></p><p>However, if you are only interested in deploying, what you really need to know is that after building your model, you will need to dump it into a file, This can be easily done with the pickle package:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/2e1ef5a29fd068e60f1c1edcbf8a6350/href">https://medium.com/media/2e1ef5a29fd068e60f1c1edcbf8a6350/href</a></iframe><h3>1º Step: Create a REST API</h3><p>Now that we have a model, the first step to deploy it in cloud is to create the REST API. At this point, many choose to use Flask framework, mainly because it is an older tool and people are just used to it.</p><p>Yet, although FastAPI is a younger framework it has a number of advantages over Flask, such as a higher performance, <strong>native concurrency support, inbuilt data validation and an automatic documentation system</strong>. For these reasons, I prefer FastAPI over Flask.</p><p>To start, create a .py file, mine will be named “server.py”. Then, we can create the FastAPI object and load the previously saved model. Be sure to have everything saved in the same path.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/46502c501109db2acdc4c0a9b68c15ca/href">https://medium.com/media/46502c501109db2acdc4c0a9b68c15ca/href</a></iframe><blockquote>Tip: In this exemple the model will be loaded after each request, you may want to define a function to load the model as a global variable only once. It’s a trade off, you gain speed but need to allocate more memory.</blockquote><p>As said before, FastAPI supports inbuilt data validation, so we need to define what kind of data will serve as input for our predict, and what data will be outputted. To do so, we will define a class using BaseModel with all the input variables and another for the output. It’s easier than it sounds:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/81cc1635c2692bc7817872ffc4effc39/href">https://medium.com/media/81cc1635c2692bc7817872ffc4effc39/href</a></iframe><p>Note that “price”, “product_weight_g”, “product_height_cm”, “delivery_distance_km” and “product_volume_cm3” are all the input needed to predict the shipping price.</p><p>Now we just need to define our <strong>API endpoints!</strong> In our case we will only have two endpoints, one in the main page to check if the server is working, and another to make the predictions. This is the way to do so:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/96a109c5f4e743a2b6e4552144863442/href">https://medium.com/media/96a109c5f4e743a2b6e4552144863442/href</a></iframe><p>There are some important things going on here.</p><ul><li>When defining the endpoint you need to say whether it will use a “get” or “post” method. For prediction we will need user input, so it is important to use a “post” method.</li><li>We need to define the input and output data type as shown. Default is string, that’s why we don’t need to declare the output in home_page().</li><li>We defined the functions using “async def” which makes our functions able to run in parallel.</li></ul><p>The final server.py file should look like this:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/f67a79c9ac6f7bf5ad421c5c0486f374/href">https://medium.com/media/f67a79c9ac6f7bf5ad421c5c0486f374/href</a></iframe><p>You can test the server locally using uvicorn. After installing the package, just run in the comand line:</p><p>uvicorn server:app --host 0.0.0.0 --port 80</p><p>You should see something like this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*TPxLbz1vsaXySTGss7SJrQ.png" /></figure><p>Now we can make requests directly to ‘<a href="http://localhost:80&#39;">http://localhost:80&#39;</a>, so it will be possible to perform tests in a jupyter notebook as follows:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/f23186000fed9f3007c2e20b19c44eec/href">https://medium.com/media/f23186000fed9f3007c2e20b19c44eec/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/1013/1*z13HtqVX7NyFvTHlNAhWyg.png" /></figure><h3>2º Step: Build a Docker image</h3><p>Docker came to solve the oldest problem in programmers life!</p><p><em>“But it works on my machine…”</em></p><p>With Docker, it is possible to create a “place” (container) with everything needed for your application to work, exactly the way you developed it, so it will be guaranteed that <strong>the application will run on any machine!</strong></p><p>It also comes in handy to deploy a ML model in cloud, we can put everything our model needs to work inside a docker image, test it locally, and if everything is ok, upload the image to the cloud!</p><p>To do this, the first step is to create a Python virtual environment (it’s not exactly a mandatory step, but it makes our life a lot easier). In this guide, I won’t cover how to create a virtual environment, there are<a href="https://medium.com/@loginradius/using-virtual-environment-in-python-f41709cefef8"> hundreds of tutorials on the internet </a>that you can use.</p><p>Having your virtual environment ready, <strong>be sure to install in it only the necessary libraries for your application to work. </strong>Now, we will generate a file containing all dependencies of this virtaul env. This file will be very useful when creating our Docker image.</p><p>On the command line, activate the virtual env you just created. You can check the env path with the first command and activate it with the second:</p><pre>conda info --env<br>conda activate  your_env_path</pre><p>Now, to generate the file just enter:</p><pre>cd path_for_your_file<br>pip list --format=freeze &gt; requirements.txt</pre><p>This command will create a requirements.txt file in the path you defined. The file should look like this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/560/1*vi3qyx6228dhWJbCITgAMw.png" /><figcaption>requirements.txt example</figcaption></figure><p>We are ready to create our Dockerfile! <strong>A Dockerfile is simply a text file that contains the build instructions.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*Qn9_9SyTDhF-2NXM.png" /></figure><p>A Dockerfile has no extension. If your using docker on windows, use <a href="https://notepad-plus-plus.org/downloads/v8.4.6/">notepad ++</a> to write the instructions, while saving select “All type” and save the file name as “Dockerfile”. With Linux, just use “vim Dockerfile”.</p><p>Our Dockerfile must have these commands:</p><pre><em># FROM defines the &quot;starting point&quot; of your image</em><br>FROM python:3.9.13 </pre><pre><em># We need to copy all the files that will be used in our container <br># Note, /deploy/ is a created folder, it could have any name</em><br>COPY ./requirements.txt /deploy/<br>COPY ./server.py /deploy/<br>COPY ./shipping_estimate_model.pkl /deploy/ </pre><pre><em># Define where instructions perform their tasks</em><br>WORKDIR /deploy/ </pre><pre><em># Remember the file created earlier? <br># Here we install all the libs listed in it</em><br>RUN pip install -r requirements.txt</pre><pre><em># execute the command only when we create the container</em><br>CMD [&quot;uvicorn&quot;,&quot;server:app&quot;,&quot;--host&quot;, &quot;0.0.0.0&quot;, &quot;--port&quot;, &quot;80&quot;]</pre><p>Now we have everything we need to build our docker image! Be sure to have the files “shipping_estimate_model.pkl”, “requirements.txt”, “server.py”, and “Dockerfile” in the same directory. Then, to finally create our docker image (mine will be named “app-shipping”), enter on the command line:</p><pre>cd files_path<br>docker image build -t app-shipping .</pre><blockquote>Tip: The last argument “.” , indicates the path of the dockerfile. We use the dot to indicate that we are already in the correct directory (accessed with the cd command)</blockquote><p>This will execute all commands defined in the Dockerfile, creating the Docker image named “app-shipping” ! Now we just need to run the image to create the container and start the local server. Enter on the command line:</p><pre>docker run -p 80:80 app-shipping</pre><p>You should see this output:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/965/1*Tme-srwpxDOTaJ8IwsWQPA.png" /></figure><p>It looks and works just like the local server created in the first step of this guide. To confirm, you can again make requests to ‘<a href="http://localhost:80&#39;">http://localhost:80&#39;</a> from a jupyter notebook or from a webpage(only get methods)</p><p>The only (and very important) difference, is that now <strong>the application is running completely isolated from the rest of your machine, </strong>which will be very useful in the third and last step of our guide.</p><h3>3º Step: Hosting the Docker container in an AWS EC2 instance</h3><p>We finally have everything ready to make our model available to the rest of the world!</p><p>First, you will need to create an AWS account. The process is very straightforward, but you will be required to enter a valid credit card. Don’t worry though, in the first year you will have access to the AWS free tier, which grants free access to a number of AWS services, including everything needed to complete this guide.</p><p>Now that you have an AWS account, <strong>we will create a virtual machine that run on the AWS Cloud. </strong>In the search bar, type “ec2” and enter in “Dashboard”. On the new page, click in Launch instance</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/322/1*2Qal4M3BVirywgqCN_l3DA.jpeg" /></figure><p>You will need to select the virtual machine system and specifications. We will use an <strong>Amazon Linux </strong>system, and you may choose any virtual machine with the “Free tier eligible” seal.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/567/1*KY7hZVho327J5qkR11n2Cw.jpeg" /></figure><p>To access the virtual machine, <strong>you will need a key</strong>. This key is a file, and whenever you want to access the VM, you will need to pass the key file path as an argument. You will have the option to go ahead without using a key, but don’t choose this option! After all, without a key anyone can access your virtual machine, it doesn’t seem like a good idea, isn’t it?</p><p>So, if you dont have a key, you will need to generate one. There is no secret here, just click in <em>“ Create new key pair”</em> choose a name for the key, click on <em>“Create key pair”</em> and a file <em>“chosen_name.pem” </em>will be downloaded. Keep it in a safe place.</p><p>The last necessary configuration is in the “<em>Network settings”</em> tab. It is very important to check all the options, as this will allow our API to connect to the internet.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/757/1*xx01rLny1nkqXGi1Cm1HPQ.jpeg" /></figure><blockquote>Tip: With this configuration, any IP will be be allowed to make requests to your API. In a production environment it may be interesting to restrict this access, which can be done in this phase.</blockquote><p>We have everything settled! Click on start an instance, and after the message indicating that the instance has been created, go to the “<em>instance tab”. Y</em>ou should be able to see that your instance is running. Take the opportunity to copy the virtual machine IP.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/627/1*igDfJwmetq_nmaiF0iIMOA.jpeg" /></figure><p>Now it’s time to connect to the virtual machine we just created with <a href="https://www.ssh.com/academy/ssh/protocol">SSH protocol</a>, enter on the command line:</p><pre>ssh -i pem_file_path ec2-user@virtual_machine_ip</pre><p>If everything went as expected, you are now connected to your Amazon Linux virtual machine!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/583/1*Zmvot5wo8CT2B-SNkmaBZA.png" /></figure><p>The VM is empty, so we will start installing Docker, starting it, giving user permission to use docker, and finally exiting the virtual machine (next steps will be performed from your machine).</p><pre># Installing Docker on VM<br>sudo amazon-linux-extras install docker</pre><pre># Starting Docker<br>sudo service docker start</pre><pre># Giving permission to the default user<br>sudo usermod -a -G docker ec2-user</pre><pre># Returning to local machine<br>exit</pre><p>Now we are going to use <a href="https://www.ibm.com/docs/en/flashsystem-v7000u/1.6.2?topic=system-using-scp">SCP protocol</a> to copy the required files to the VM.</p><pre># Copying 4 files to /home/ec2-user (linux default directory)<br>scp -i pem_file_path ^<br>path\dockerfile ^<br>path\requirements.txt ^<br>path\server.py ^<br>path\shipping_estimate_model.pkl ^<br>ec2user@ip_maquinavirtual:/home/ec2-user</pre><p>All copied! Time to reconnect to the virtual machine and finally <strong>build our Docker image</strong> and <strong>run the container, </strong>enter<strong>:</strong></p><pre># Connect to VM<br>ssh -i pem_file_path ec2-user@virtual_machine_ip</pre><pre># Build Docker image<br>docker image build -t app-shipping .</pre><pre># Run container<br>docker run -p 80:80 app-shipping </pre><h4><strong>Congratulations, you managed to create an web service!!</strong></h4><p>Now your API is available for the whole world to make requests. To get the API address, go back to the instance panel on the AWS website and search for <strong>Public IPv4 DNS.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/908/1*lyFYpuWsSM4UGnvUIpRTFQ.png" /></figure><p>That’s it, now you can make requests to this address from anywhere! for testing purposes, let’s use a jupyter notebook</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/2bbea7835d2b401e117d3faccf1437a4/href">https://medium.com/media/2bbea7835d2b401e117d3faccf1437a4/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/985/1*cBQvJkGy0fMQefrOBI0GmA.png" /></figure><p>You can also test directly on a web page (“get” methods only). For example, you might want to take a look at the amazing documentation automatically generated by FastAPI</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zzMKWpTMrLqVkBjDc5_y_w.png" /></figure><h4>Important!</h4><p>Now that you’ve tested your API and ensured that everything is running as it should, don’t forget to terminate the instance,<strong> otherwise there may be charges.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*e8u8XkMnRC5KRXrQUaI2Bg.jpeg" /></figure><h3>Where to go from here?</h3><p>It is important to say that this guide is a basic example. Deployment is a complex subject and there is still much more to it than what I have presented here.</p><p>If you want to keep learning, here are some topics that you might want to take a look:</p><ul><li>How to control metrics over time;</li><li>How to keep the model up to date (concept drift, model retraining);</li><li>How to store results over time;</li><li>How to increase the security of your deployment.</li></ul><p>Thank you so much for getting this far! I hope I’ve made myself clear, but fell free to contact me with any questions or feedbacks!!</p><p><a href="https://medium.com/mlearning-ai/mlearning-ai-submission-suggestions-b51e2b130bfb">Mlearning.ai Submission Suggestions</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=1c9ca637897c" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Get your bar chart to the next level with Python]]></title>
            <link>https://medium.com/@felipe.mezzarana/get-your-bar-chart-to-the-next-level-with-seaborn-b0eddfb38e1b?source=rss-9827834ae97e------2</link>
            <guid isPermaLink="false">https://medium.com/p/b0eddfb38e1b</guid>
            <category><![CDATA[data-visualization]]></category>
            <category><![CDATA[bar-chart]]></category>
            <category><![CDATA[storytelling]]></category>
            <category><![CDATA[seaborn]]></category>
            <category><![CDATA[python]]></category>
            <dc:creator><![CDATA[Felipe Mezzarana]]></dc:creator>
            <pubDate>Mon, 03 Oct 2022 23:17:43 GMT</pubDate>
            <atom:updated>2022-11-20T21:09:27.827Z</atom:updated>
            <content:encoded><![CDATA[<h4>How a few lines of code and some good practice standards can help you create beautiful and informative bar charts.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*L-XzcbuyxElMzJeP" /><figcaption>Photo by <a href="https://unsplash.com/@isaacmsmith?utm_source=medium&amp;utm_medium=referral">Isaac Smith</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>We all know that creating good visualizations is vital for anyone working with data. <strong>This article aims to teach some good practices in data viz and show how to accomplish them using Python</strong> (<em>Matplotlib &amp; Seaborn</em>).</p><p>Without further ado, let’s get started!</p><h3>Dataset</h3><p>For this article I will use a dataset containing several information about Pokemons! I choose this dataset because it contains different types of features: continuous (Pokemons specs like attack, defense, etc) , categorical (types, name and gen.) and boolean (legendary) . Thus, we will have several visualization options to explore.</p><p>You can download this dataset directly from the <a href="https://github.com/FelipeMezzarana/data_viz_article">repository with the source code</a> used in this article. Lets take a quick look at our data:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/2cf59ff6acba9760e308f66ea21f3113/href">https://medium.com/media/2cf59ff6acba9760e308f66ea21f3113/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/888/1*U1BkkSkBmRYy_IBHYaYeCg.png" /></figure><h3>Defining which question will be answered</h3><p>To generate good visualizations, the first step is to define the direction of our analyses, that is, which questions we want to answer with the data we have at hand.</p><p>We can think of dozens of questions that this data can answer, however, our goal is to generate a good bar plot, so we will select a simple question involving categorical values, like the Pokemon type:</p><ul><li>What types of Pokemon have the highest attack?</li></ul><h3>The Bar Plot</h3><p>Let’s start by preparing our data and creating the first “basic” bar plot. Using group by we can extract the information about the average attack by pokemon type and with Seaborn we can quickly plot the data.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/bde1f724f7b0410a8d3f5cde8f51d7ec/href">https://medium.com/media/bde1f724f7b0410a8d3f5cde8f51d7ec/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/410/1*UvQqGK1iPJ-rTEhKtUgG_A.png" /></figure><p>Simply looking at this chart is it possible to answer the question about which type of Pokemon has the highest attack? Well… maybe, but the information is confusing, and there are several elements that make it very difficult to interpret. To be honest, this chart is a mess!</p><p>Let’s try to improve it. First we need to organize our data. We can start by ordering it. Whenever there is no clear order of categorical data, we should present the data organized in ascending or descending order. We can also limit the number of categories shown. Filtering the top 10, for example, we can clean and make the chart more pleasant to interpret.</p><p>Let’s also take the opportunity to improve some basic elements of the chart. First of all, we need to select a single color! <strong>When generating a visualization, colors must be used very consciously</strong>, they can totally modify the interpretation of a chart, drawing attention to a specific point. In this case, the use of different colors distracts the attention and does not help at all. with just a few more lines of code we can also modify the image size, add a title and change the font size.</p><blockquote>Tip: Colors can be chosen with Hex code, you can easily customize your colors with the help of a site like <a href="https://www.learnui.design/tools/data-color-picker.html#palette">this</a></blockquote><p>Let’s see how to write the code to generate our next view</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/781d45bdbb8387934bd8e82bd12a4a20/href">https://medium.com/media/781d45bdbb8387934bd8e82bd12a4a20/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/769/1*e4eQnDv9UhM_Jk8SDO2NSA.png" /></figure><p>Much better, right? With the new organization, it has become easy to identify the types of pokemons with the highest attack. Also, modifying the dimensions makes the information clearer, and a good title makes people more likely to be interested in your view.</p><p>However, we still have some problems and improvement points. First of all, <strong>we have some elements that isn’t adding informative value</strong>. Quoting Knaflic:</p><blockquote>“Every single element you add to that page or screen takes up cognitive load on the part of your audience — in other words, takes them brain power to process. Therefore, we want to take a discerning look at the visual elements that we allow into our communications.” (Storytelling with data, p.71)</blockquote><p>In our chart we can identify at least three useless elements: The borders, the x axis title and y axis title. Note that the title already explains what each axis means, there is no need to repeat the information.</p><p>We still have a very interesting point in relation to the alignment of information. There is a theory that claims that people will usually scan images from left to right, and, to a lesser extent, top to bottom, its called Z-pattern. This is interesting because knowing which information will be read first can be used to our advantage, we can first show to our audience how to read the graph, before they get to the data itself.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/389/1*RqczsRuIklbT4pE23xn4rA.png" /><figcaption>Z-Pattern layout</figcaption></figure><p>Thinking about this layout, we can make two more changes to our graph, move the title to the left to make it be read first, and move the x axis to the top, for the same reason.</p><p>Let’s see how our code will look like!</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/25aa76f1278b57d0848e93168a9366b7/href">https://medium.com/media/25aa76f1278b57d0848e93168a9366b7/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/755/1*dcwma5fqY1K_J6CHJJsnSA.png" /></figure><p>Did you notice the difference? Now we have a much cleaner and pleasant graph. The information is well organized and easy to interpret, Ccrtainly people will be more willing to read this graph!</p><p>This concludes my guide to quickly improve a bar chart! Thanks for reading, I hope you enjoyed😄</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=b0eddfb38e1b" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>