<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by João Renato on Medium]]></title>
        <description><![CDATA[Stories by João Renato on Medium]]></description>
        <link>https://medium.com/@falcaojoaorenato?source=rss-9e5ec55c6166------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*soihqZ7L1fEA3hAzmicoTA.png</url>
            <title>Stories by João Renato on Medium</title>
            <link>https://medium.com/@falcaojoaorenato?source=rss-9e5ec55c6166------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sat, 23 May 2026 12:24:44 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@falcaojoaorenato/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Did AI write this article?]]></title>
            <link>https://falcaojoaorenato.medium.com/did-ai-write-this-article-b8721a222dd7?source=rss-9e5ec55c6166------2</link>
            <guid isPermaLink="false">https://medium.com/p/b8721a222dd7</guid>
            <category><![CDATA[chatgpt]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[humanity]]></category>
            <category><![CDATA[generative-ai-tools]]></category>
            <category><![CDATA[turing-test]]></category>
            <dc:creator><![CDATA[João Renato]]></dc:creator>
            <pubDate>Sun, 14 Apr 2024 23:17:06 GMT</pubDate>
            <atom:updated>2024-04-14T23:17:06.833Z</atom:updated>
            <content:encoded><![CDATA[<h4>Could an artificial intelligence write this article here on medium? Or was it an actual human being?</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*idz4ZleXCcxrBdTxR-A5Ow.png" /><figcaption><a href="https://designer.microsoft.com/">Microsoft Designer — Stunning designs in a flash</a></figcaption></figure><p>Well, it’s been a while since the last time I published here (over a year to be more accurate, a few months before ChatGPT being launched).</p><p>As you might know, English is not my first language. So for my last post I had to research a lot of words and idiomatic expressions just to make sure that what I was writing made sense (I don’t trust Google Translate that much to just input a text in Portuguese and use the output in English).</p><p>I basically spent twice the time to write in English compared to what it would have taken me if it was in Portuguese.</p><p>In other words, it’s a lot of time invested to write a simple post, so there is no surprise that I haven’t been so productive on this platform (only five posts, two of them in English).</p><p>During this time, I’ve always wanted to come back writing. I’ve had some ideas, even created a few drafts. But they never became anything real. The time to be invested in such task aligned with my tendency to procrastinate blocked me for almost two years to get something done.</p><h3>But then… in 2023 everything changed.</h3><p>We all heard about this free tool invented by <a href="https://openai.com/">OpenAI</a> that could answer questions like a real person. And even more, it could give us ideas, insights, review texts and…even write stories!</p><p>ChatGPT reached one million users in only five days. It was the fastest growing platform until Meta launched Threads, with one million users in only one hour.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*ykX08N2EZ8ynGOCS.jpeg" /><figcaption>Source: <a href="https://www.statista.com/chart/29174/time-to-one-million-users/?utm_source=Statista+Newsletters&amp;utm_campaign=ed74b43c72-All_InfographTicker_daily_COM_PM_KW2_2023_We_COPY_&amp;utm_medium=email&amp;utm_term=0_662f7ed75e-ed74b43c72-315164141">Statista</a></figcaption></figure><p>By the mid of 2023, practically anyone I knew was using it. Whether it was for writing fancy emails, reviewing texts, coming up with ideas for a marketing campaign, or even debugging programming code (me included).</p><p>While it appeared to be a highly useful tool and a potentially game-changer in the market, it also generated some controversies.</p><p><a href="https://www.cbc.ca/news/canada/hamilton/chatgpt-school-cheating-1.6734580">Students could use it to cheat in essays</a>, or online tests, influencers could use it to create digital content without any effort or creativity. And, why not, <a href="https://askyourpdf.com/blog/can-you-legally-use-chatgpt-to-write-a-book">writers could write articles or even books</a> in a couple hours without too much thinking.</p><p>People started wondering the possibilities. <a href="https://www.wired.com/story/chatgpt-jailbreak-generative-ai-hacking/">There were even cases of hackers using ChatGPT to break cybersecurity</a>.</p><p>With this new technology, it came new rules. Teachers started forbidding it, <a href="https://www.techtarget.com/searchenterpriseai/tip/How-and-why-businesses-should-develop-a-ChatGPT-policy#:~:text=Best%20practices%20for%20creating%20a%20ChatGPT%20policy&amp;text=Clarify%20the%20different%20words%20and,using%20AI%20without%20human%20oversight.">companies created policies for the use of this tool</a>.</p><p>Now, one year later, every time I see some post with fancy words on LinkedIn I wonder who actually wrote that.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*hKoJiS_-gkWzDXY_" /><figcaption>Photo by <a href="https://unsplash.com/@possessedphotography?utm_source=medium&amp;utm_medium=referral">Possessed Photography</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><h3>Turing test</h3><p>In 1950, the mathematician <a href="https://royalsocietypublishing.org/doi/epdf/10.1098/rsbm.1955.0019">Alan Mathison Turing</a> wrote an article proposing to consider the question: <a href="https://redirect.cs.umbc.edu/courses/471/papers/turing.pdf">Can machines think?</a></p><p>The solution to this problem described in Turing’s article is now known as “Turing’s test”, although his own denomination was “Imitation Game” (yes, that’s where the title of the <a href="https://www.imdb.com/title/tt2084970/">movie</a> came from).</p><p>I’m not gonna annoy you with the technical details (in reality because I didn’t understand) but I can say that the essence of the game is to discover if a “person” you are talking to is human or actually a computer. Considering the possibility that this machine can imitate a human being, to win you need to ask specific questions that depending on the answers will guide you towards the truth.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fgiphy.com%2Fembed%2F3o7TKWLjRnH6yH9avK%2Ftwitter%2Fiframe&amp;display_name=Giphy&amp;url=https%3A%2F%2Fmedia.giphy.com%2Fmedia%2F3o7TKWLjRnH6yH9avK%2Fgiphy.gif%3Fcid%3D790b7611nydqgvs6pw34ajuopfxazjfhx3egfu7909kuv2p5%26ep%3Dv1_gifs_search%26rid%3Dgiphy.gif%26ct%3Dg&amp;image=https%3A%2F%2Fmedia2.giphy.com%2Fmedia%2Fv1.Y2lkPTc5MGI3NjExeWR1em13Yng2aWRyaXlnYWlvOHVoNDZvY2FuYmJlaWRxMzFxOXNrNSZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw%2F3o7TKWLjRnH6yH9avK%2Fgiphy.gif&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=giphy" width="435" height="218" frameborder="0" scrolling="no"><a href="https://medium.com/media/e041efd2cbc30461707e8da3e6e18dca/href">https://medium.com/media/e041efd2cbc30461707e8da3e6e18dca/href</a></iframe><p>Turing in his article describes how digital computers operate and then theorizes on possible questions we could ask and what would be the possible outcomes from these questions. Basically, you win the game if you can tell that it is indeed a machine, and the machine wins if it tricks you to think that it is human.</p><p>Turing than theorizes the concept of “learning machines” (is this expression familiar to you?). He finishes his article with the following:</p><blockquote>We may hope that machines will eventually compete with men in all purely intellectual fields. But which are the best ones to start with? Even this is a difficult decision. Many people think that a very abstract activity, like the playing of chess, would be best. It can also be maintained that it is best to provide the machine with the best sense organs that money can buy, and then teach it to understand and speak English. This process could follow the normal teaching of a child. Things would be pointed out and named, etc. Again I do not know what the right answer is, but I think both approaches should be tried. We can only see a short distance ahead, but we can see plenty there that needs to be done.</blockquote><p>The Turing Test is often discussed in the context of artificial intelligence and the quest to create machines that can demonstrate human-like intelligence. While it has been a benchmark for AI development, many argue that passing the Turing Test doesn’t necessarily mean a machine truly understands or possesses consciousness. It primarily assesses the ability to mimic human conversation.</p><p>Since his article was released, Turing’s idea of a machines tricking humans was always linked to sci-fi movies. It was really hard (at least for me) to think of a scenario where we could not be sure if we were talking to another human being or to a machine.</p><p>That was before ChatGPT…</p><h3>A game changing..</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*JL3sICo0QXJ_vkHE" /><figcaption>Photo by <a href="https://unsplash.com/@jupp?utm_source=medium&amp;utm_medium=referral">Jonathan Kemper</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p>The rise of ChatGPT was a gaming changing even for the giant <a href="https://www.forbes.com/sites/davidphelan/2023/01/23/how-chatgpt-suddenly-became-googles-code-red-prompting-return-of-page-and-brin/?sh=7b3d25935977">Google</a>, because people became more interested in finding information by asking ChatGPT than using the search engine.</p><p>But…why?</p><h3>How do Generative Language Models work</h3><p>If you ask ChatGPT what is ChatGPT, it answers:</p><blockquote>ChatGPT is a conversational AI developed by OpenAI, based on the GPT (Generative Pre-trained Transformer) architecture. It’s designed to understand and generate human-like text based on the input it receives. Essentially, it’s a language model that can engage in dialogue, answer questions, provide information, and assist with various tasks based on the context provided. It’s trained on a vast amount of text data from the internet, allowing it to generate responses that are contextually relevant and coherent.</blockquote><p>In other words, ChatGPT is a specific implementation of a generative language model. From an input, it generates responses, complete text, and engage in conversation by predicting the next word or token in a sequence given the preceding context, through a process called Input Processing.</p><p>There are other tools like ChatGPT, but none of them gained so much popularity (so far).</p><p>There are a lot of possible explanations on why ChatGPT suddenly became more popular to find information than Google. But I personally like to think that the idea of having a “human” conversation at anytime about anything you want to know is the key answer.</p><p>Perhaps, if we didn’t know we were talking to an artificial intelligence, we could really think that this was an actual human being.</p><h3>To be polite with ChatGPT</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/680/0*CDbIZcdb6eZp__Xc" /><figcaption>from <a href="https://www.facebook.com/groups/1468526826800061/user/100057272413300/?__cft__[0]=AZUBDHsp2a2JLTvsax3Zv6iMMpCNce64WrGpQmWl12aaXKMaDWYvt6F1m78eAwdzh2nT_z6bP0m3A7QJFYSqgkJbN868COUceHZuek0Ob5bx5cZhH9Gy6xs6yuO0RcCV2m9LABdD0lLvQgRd8KCaeUZnfBRkO1oXOLSMLV9t8MBXsDziZEN0bxsZfYloz-E7zVk&amp;__tn__=%2Cd%2CP-R"><strong>Sangwang Rai</strong></a><strong>’s post in facebook</strong></figcaption></figure><p>Once I saw a post in linkedin from a developper saying that everytime he asked something to ChatGPT he used the words “please”, “thank you”, “goodbye”. He argued that, since it was a model that was constantly training with new data (in this case, our inputs), this would help the AI to consider the polite words statistically significant in a conversation, which would be eventually amended to its own vocabulary.</p><p>Well…I’m not gonna say it doesn’t make sense, but…</p><p>Like any robot, it is programmed, so it can be programmed to recognize bad words, impoliteness, etc. The same way it can be programmed to be always polite. So, in my point of view, if you ask ChatGPT</p><blockquote>Who is the president of United States?</blockquote><p>or</p><blockquote>Hello! I hope you are fine. Could you please tell me who is the president of United States? Thank you in advance!</blockquote><p>It’s the same!</p><p>But I understand the point. There were cases of <a href="https://www.washingtonpost.com/technology/2022/07/16/racist-robots-ai/">robots becoming racists</a> because of trolls interacting with them with racists expressions. But I still think that it all depends on how the model is pre-programmed.</p><h3>AI content detectors</h3><p>Now, in a scenario where even programmers are treating ChatGPT as a real human, how can we know if a text we read in a blog, in an instagram post, or even in a book was written by a real human?</p><p>To help with this, OpenAI itself created a tool to help us classify human written text to AI written. That’s because AI doesn’t create, but actually recreates texts mixing the texts from the training data. It’s basically copy pasting, but in a very efficient way.</p><p>However, when you go to the website to use this tool:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*S3YmNF8Q4jhG_dlrEtN-9A.png" /><figcaption><a href="https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text">https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text</a></figcaption></figure><p>and when you click on <strong>Try the classifier</strong> you are redirected to a <em>page not found</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-nHm1t4O2mdoAKwjb485Kg.png" /><figcaption><a href="https://platform.openai.com/ai-text-classifier">https://platform.openai.com/ai-text-classifier</a></figcaption></figure><p>Thankfully, there are other AI classifier tools available like <a href="https://quillbot.com/ai-content-detector?utm_medium=cpc&amp;utm_source=google&amp;utm_campaign=FA+-+HY+|+PERF+-+Search+|+Product+-+AI+Detector+|+PREM+|+CPA&amp;utm_term=ai%20text%20classifier&amp;utm_content=688287458551&amp;campaign_type=search-20957139800&amp;click_id=CjwKCAjw_e2wBhAEEiwAyFFFo5I2JsjbWusWDkXpFKO2ilomDmZ0zNbSgjMKnMAmtsUqK-7_-2OxcBoCibEQAvD_BwE&amp;campaign_id=20957139800&amp;adgroup_id=160707607809&amp;ad_id=688287458551&amp;keyword=ai%20text%20classifier&amp;placement=&amp;target=&amp;network=g&amp;gad_source=1&amp;gclid=CjwKCAjw_e2wBhAEEiwAyFFFo5I2JsjbWusWDkXpFKO2ilomDmZ0zNbSgjMKnMAmtsUqK-7_-2OxcBoCibEQAvD_BwE">QuillBot</a> and <a href="https://freeaitextclassifier.com/">AI text classifier</a>. If you know any other classifiers, please share them in the comments.</p><h3>So… is this article an AI text generated?</h3><p>Now, with all that said, do you think this was a human being or an AI-text generated?</p><p>Have you ever been tricked by a robot? Do you think that ChatGPT passes the Turing test?</p><p>Let me know your thoughts!</p><p>Thank you for reading!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=b8721a222dd7" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Data Science skills: Why learn SQL?]]></title>
            <link>https://blog.devgenius.io/data-science-skills-why-learn-sql-38492acf4f68?source=rss-9e5ec55c6166------2</link>
            <guid isPermaLink="false">https://medium.com/p/38492acf4f68</guid>
            <category><![CDATA[sql]]></category>
            <category><![CDATA[skills]]></category>
            <category><![CDATA[r]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[analytics]]></category>
            <dc:creator><![CDATA[João Renato]]></dc:creator>
            <pubDate>Sun, 21 Aug 2022 02:54:28 GMT</pubDate>
            <atom:updated>2022-11-14T02:54:38.044Z</atom:updated>
            <content:encoded><![CDATA[<p>If you are a data scientist or if you want to become one, this article will help you understand the importance of learning SQL for this field.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/614/1*srHmdQToJVhC2asqDGdntg.png" /><figcaption>Designed by author</figcaption></figure><ol><li><a href="#1356"><strong>Context</strong></a></li><li><a href="#ed9e"><strong>SELECT * FROM</strong></a></li><li><a href="#c906"><strong>ORDER BY</strong></a></li><li><a href="#bb18"><strong>GROUP BY</strong></a></li><li><a href="#89fe"><strong>JOINS</strong></a></li><li><a href="#df37"><strong>Conditions if else (CASE WHEN)</strong></a></li><li><a href="#c803"><strong>Application in Data Science</strong></a></li><li><a href="#4612"><strong>Conclusion</strong></a></li></ol><p>This article was inspired by <a href="https://medium.com/analytics-vidhya/5-reasons-every-aspiring-data-scientist-must-learn-sql-2bab007a8d76">5 Reasons Every Aspiring Data Scientist Must Learn SQL</a>, written by <a href="https://medium.com/u/5e9386358f1a">Francis Onyango</a>. Here, I will show you, in a simple, didactic and objective way, the advantages of this programming language and its applications in the data science field.</p><p><a href="https://medium.com/analytics-vidhya/5-reasons-every-aspiring-data-scientist-must-learn-sql-2bab007a8d76">5 Reasons Every Aspiring Data Scientist Must Learn SQL</a></p><h3>1. Context</h3><p><a href="https://pt.wikipedia.org/wiki/SQL">Structered Query Language</a>, or just SQL, is a query-oriented language for relational data basis developed by IBM in the 1970&#39;s. It was created based on <a href="https://en.wikipedia.org/wiki/Relational_algebra#:~:text=In%20database%20theory%2C%20relational%20algebra,Codd.">Relational Algebra</a> — descending derivation from <a href="https://en.wikipedia.org/wiki/First-order_logic">first order logic</a> and <a href="https://falcaojoaorenato.medium.com/math-concepts-for-sql-programming-232cb1da0d16">set theory</a>.</p><p>As the years went by, SQL became so popular that now it’s not limited to IBM’s domain. A lot of popular data analytics software, as <a href="https://en.wikipedia.org/wiki/SAS_(software)">SAS</a>, <a href="https://en.wikipedia.org/wiki/R_(programming_language)">R</a> and others, use it for data table crossing and pivoting.</p><h3>2. SELECT * FROM</h3><p>Its popularity can be explained from the fact that it’s an easy and intuitive language.</p><p>Suppose we have a data table as the following:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/633/1*ACEuC1opFG28uuHSaWv62A.png" /><figcaption>Source: Author</figcaption></figure><p>If we want to select only the column (variable) <strong>Name </strong>from the table <strong>Table</strong>, we use the following command:</p><p>select Name from Table</p><p>If we want to select the occupations (variable <strong>Occupation</strong>) in which the people are between 30 and 40 years old, we use the following command:</p><p>select &#39;Occupation&#39; from Table where Age between 30 and 40</p><p>Observe that, to specify the records (table rows) to be filtered, I use the conditional command <strong>where</strong>.</p><p>If we want to select all variables from the table, with no restriction — in other words, the whole table — we have two command options:</p><pre>Option 1:<br>select ID, Name, &#39;Occupation&#39;, Age from Table</pre><pre>Option 2:<br>select * from Table</pre><p>The two command lines above create the same query. The main difference is that for option 1, the data analyst has the freedom to query the table in any variable order. The asterisk (* ) in option 2 is understood, in SQL, as <strong>the selection of all columns from the table</strong>.</p><blockquote><strong>Obs</strong>.: There&#39;s also a matter of performance among the two types of query. However, this is not the subject in this article. You can check some performance tips in SQL <a href="https://www.sisense.com/blog/8-ways-fine-tune-sql-queries-production-databases/">here</a>.</blockquote><p>The <strong>SELECT</strong> command is part of a set of commands called <a href="https://www.ibm.com/docs/en/i/7.2?topic=programming-data-manipulation-language"><em>Data Manipulation Language</em></a> (DML). Among them, there are:</p><blockquote>DELETE</blockquote><blockquote>UPDATE</blockquote><blockquote>INSERT</blockquote><p>As you probably imagine by now, each command above results in <strong>deleting</strong>, <strong>updating </strong>and <strong>inserting</strong> data in a data table.</p><h3>3. ORDER BY</h3><p>Making queries in data bases is not limited to only select the desired columns/rows. Often we need to visualize the data ordered by some specific feature.</p><p>Coming back to the prior example. If we wish to visualize the data in ascending order by age, we’ll use the following code line:</p><p>select * from Table order by Age</p><p>Or if we want to visualize the data by age in a descending order, we have:</p><p>select * from Table order by desc Age</p><blockquote><strong><em>Obs</em></strong><em>.: Since SQL is used by many different analytics tools, there might be little changes in certain commands, in a sense that, for instance, </em><strong><em>desc </em></strong><em>might be after the variable or between parenthesis. That said, it’s important that the user (programmer, data analyst, data base administrator, etc.) is aware to the peculiarities of the SQL language in the chosen environment tool.</em></blockquote><p>Notice how every functionality in SQL is basically to select (select ) some or all (* ) columns (from ) from the table, which reinforces the idea of how easy this language is.</p><h3>4. GROUP BY</h3><p>Now, let’s say the table we want to wok on is the following:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/642/1*KjG1atDLcArqcKIMV4SGwg.png" /><figcaption>Source: Author</figcaption></figure><p>If we want to select the philosophers age’s average, we have the following code options:</p><pre>Option 1:<br>select <br>       sum(Age)/count(*) <br>from Data<br>where &#39;Occupation&#39; = &#39;Philosopher&#39;<br>group by &#39;Occupation&#39;</pre><pre>Option 2:<br>select <br>       avg(Age)<br>from Data<br>where &#39;Occupation&#39; = &#39;Philosopher&#39;<br>group by &#39;Occupation&#39;</pre><p>Both commands will result in the same query. Notice that this time we had to use, besides the <strong>group by</strong>, the conditional <strong>where</strong>, because I had limited my query to only the records in which the occupation was philosopher<strong> </strong>where &#39;Occupation&#39; = &#39;Philosopher&#39; .</p><p>Both functions <strong>count() </strong>and <strong>sum() </strong>are also very important for algebraic manipulations. The first counts the number of rows for a specific column count(column) or the whole table count(*) . While the function <strong>sum()</strong> sums the values from the chosen column, in this case the column Age sum(Age) . This way, we obtain the age’s average using the formula sum(Age)/count(*) .</p><p>The second code line has the function <strong>avg()</strong>, from average. A more straightforward way to calculate the average.</p><blockquote><strong>Obs</strong>.: The function to calculate the average in SQL might change from one software to another, which can be <em>avg()</em> or <em>mean()</em>. Once more, the programmer must pay attention to this detail.</blockquote><h3>5. JOINS</h3><p>Suppose we have two tables, following the same style as the table presented on the last topic. Let’s call these tables A and B.</p><p>Imagine there are records in common among these tables. In other words, there’s information inside table A that is also inside table B. This information is known as <strong>intersection</strong>, which has been already explained in my last article.</p><p><a href="https://falcaojoaorenato.medium.com/math-concepts-for-sql-programming-232cb1da0d16">Set theory — from pure math to SQL</a></p><p>However, the advantage of “Joins” goes beyond the intersection of elements among tables. This feature is useful to join, discard or even compare elements within data tables.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ipZTZ0wbG_YG5ITrC3xZRw.png" /><figcaption>Source: <a href="https://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins">https://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins</a></figcaption></figure><p>You can learn more about joins from the following article.</p><p><a href="https://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins">Visual Representation of SQL Joins</a></p><h3>6. Conditions If Else (CASE WHEN)</h3><p>Suppose we want to create a new variable that classifies the age from the table Data. If I wish to create the categories “young”, “adult” and “senior”, accordingly to the age, I use the <strong>when </strong>clause, as shown below:</p><pre>Option 1</pre><pre>select <br>  case<br>    when Age&lt;30 then &quot;young&quot;<br>    when Age between 30 and 40 then &quot;adult&quot;<br>    when Age &gt;= 40 then &quot;senior&quot;<br>  end as Age_range<br>from Data<br></pre><pre>Option 2</pre><pre>select <br>  case<br>    when Age&lt;30 then &quot;young&quot;<br>    when Age between 30 and 40 then &quot;adult&quot;<br>    else &quot;senior&quot;<br>  end as Age_range<br>from Data</pre><p>Notice that the CASE command has a block structure.</p><p>case when then end</p><p>In other words, the CASE command requires an END to close the algorithm. case something happens, then a new column is created.</p><p>Also, you can use the ELSE clause using the same logic as you would use for if else.</p><h3>7. Application in Data Science</h3><p>The advantage (or necessity) in using SQL for data science can be resumed in how easy it is to use it and in the capacity of better comprehension of the data sets from the analyst’s perspective.</p><p>After all, any statistician/data scientist knows that every analysis, no matter how complex it is, begins with the extraction and exploration of the data, which covers cleaning, structuring or even data base table crossing.</p><p>For instance, in software <strong>R</strong>, there is a package called “sqldf”, with which is possible to code in SQL for <a href="https://www.tutorialspoint.com/r/r_data_frames.htm">Data Frames</a> manipulation, crossing and/or restructuring.</p><p><a href="https://rdocumentation.org/packages/sqldf/versions/0.4-11">sqldf package - RDocumentation</a></p><p>For Python, we have the SQLite library to connect to a data base. You can read more about it in the following.</p><p><a href="https://docs.python.org/3/library/sqlite3.html">sqlite3 - DB-API 2.0 interface for SQLite databases</a></p><h4>Subqueries</h4><p>A very useful resource in SQL is <strong>SUBQUERY</strong>, which consists of creating a query to extract only a part of an existing query. This can be used inside a <strong>where</strong> command or a <strong>case when</strong>.</p><pre>Example 1:<br>select <br>       avg(Age) as avg_age<br>from Data as a<br>where (select Age from Data where Id = a.Id)&gt;30<br>group by &#39;Occupation&#39;</pre><pre>Example 2:<br>select case<br>       when Age &lt; <br>(select avg(Age) from Data where &#39;Occupation&#39; = a.&#39;Occupation&#39;) then &#39;New&#39;<br>       else &#39;In average&#39;<br>       end as compare_ages<br>from Data as a</pre><p>You can see that in both examples the same table was used to compare its data, in a way that there is a query inside the comparing algorithm.</p><p>In Example 3, we compare the individual’s age to the average of all ages.</p><pre>Example 3:<br>select <br>   case <br>      when Age &lt; (select avg(Age) from Data) then &#39;Smaller Age&#39;<br>      else &#39;Age equal or bigger&#39;<br>   end as compare_age_average<br>from Data</pre><h4>Auto Join</h4><p>Because of its relational characteristic, there&#39;s a limitation when it comes to exploring / comparing data within the same column.</p><pre>Data2 = <br>select * from Data order by Id</pre><pre>select <br>       case<br>         when  a.Age &lt; b.Age then &#39;younger&#39;<br>         else &#39;older or same age&#39;<br>       end compare_Age<br>from Data2 a left join Data2 b<br>on a.Id &lt; b.Id<br>group by &#39;Occupation&#39;</pre><p>In the above example, we compare the individual’s ages. Notice that a <strong>left join </strong>is required from the table <strong>Data2</strong> with itself.</p><blockquote><strong>Obs</strong>.: Here, first we created the table Data2 ordered by column <strong>Id</strong>. This was necessary in order to apply the <strong>auto join</strong> on the variable.</blockquote><p>Another example would be comparing the philosopher’s age with the other individuals from the table.</p><pre>select <br>    case<br>      when a.Age = b.Age then &#39;Same Age&#39;<br>      when a.Age &lt; b.Age then &#39;Smaller Age&#39;<br>      when a.Age &gt; b.Age  then &#39;Bigger Age&#39;<br>    end as compare_age_philosopher<br>from Data a left join Data b on a.Id &lt;&gt; b.Id<br>where b.&#39;Occupation&#39; = &#39;Philosopher&#39;</pre><h4>A platform to learn/train SQL</h4><p>As you already know, the best way to understand our knowledge gaps is to test ourselves. That said, I’d like to recommend a platform that I honestly find it interesting to test my SQL skills:</p><p><a href="https://sqlzoo.net/wiki/SQL_Tutorial">SQLZoo</a></p><h4>More tips to Data Scientists</h4><p>I strongly recommend you to read the following article with more tips and valuable information about the application possibilities for this language in data science.</p><p><a href="https://towardsdatascience.com/extra-4-sql-tricks-every-data-scientist-should-know-d3ed7cd7bc6c">Extra 4 SQL Tricks Every Data Scientist Should Know</a></p><h3>CONCLUSION</h3><p>Here you saw how SQL is easy to learn and to use. You also saw how it is useful when it comes to dealing with data exploration and data table crossing.</p><p>It’s important to highlight that the code examples that we explored until here are extremely simple and for didactic purposes. An <strong>auto join</strong> or <strong>subquery</strong> can become very complex with dozens of lines of code, depending on the application (they can even be nested, with a subquery inside another subquery).</p><p>That said, we can conclude that knowing SQL, for a Data Scientist, is as important as any professional knowing English in the XXI century. Nowadays, there are several platforms like <a href="https://www.coursera.org/">Coursera</a>, <a href="https://www.udemy.com/">Udemy</a>, and others with accessible courses, besides, of course, the good and old <a href="https://stackoverflow.com/">stackoverflow</a>.</p><blockquote>Thank you for reading!</blockquote><h3>Stay connected</h3><ul><li>Connect on <a href="https://www.linkedin.com/in/falcaojoaorenato/">LinkedIn</a>.</li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=38492acf4f68" width="1" height="1" alt=""><hr><p><a href="https://blog.devgenius.io/data-science-skills-why-learn-sql-38492acf4f68">Data Science skills: Why learn SQL?</a> was originally published in <a href="https://blog.devgenius.io">Dev Genius</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Set theory — from pure math to SQL]]></title>
            <link>https://falcaojoaorenato.medium.com/math-concepts-for-sql-programming-232cb1da0d16?source=rss-9e5ec55c6166------2</link>
            <guid isPermaLink="false">https://medium.com/p/232cb1da0d16</guid>
            <category><![CDATA[sql]]></category>
            <category><![CDATA[programming]]></category>
            <category><![CDATA[probability]]></category>
            <category><![CDATA[math]]></category>
            <category><![CDATA[python]]></category>
            <dc:creator><![CDATA[João Renato]]></dc:creator>
            <pubDate>Thu, 07 Jul 2022 21:33:56 GMT</pubDate>
            <atom:updated>2022-11-14T02:55:51.785Z</atom:updated>
            <content:encoded><![CDATA[<h4>From math to SQL</h4><h3>Set theory — from pure math to SQL</h3><p>This article intends to explain the fundamental theory applied in SQL, probability and statistics.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/428/1*fGN5TbUikAghlUniPHHlpA.png" /><figcaption>Designed by author</figcaption></figure><p>The Set Theory is one way to explain how different elements are distributed within groups, either sharing more than one group or none. That being said, the theory quantifies the possibilities of clustering those elements. There are practically an infinity number of applications. You might use it for social bubbles, profile statistical surveys, books or products classifications or even propositional logic. Besides, of course, the fact that it is the fundamental for the <em>Strutucted Query Language</em> (<a href="https://pt.wikipedia.org/wiki/SQL">SQL</a>) and for the “sets” concept in <a href="https://www.coursera.org/learn/python-for-applied-data-science-ai/lecture/7GcLY/sets">Python</a>.</p><ol><li><a href="#a0f6">Venn Diagram</a></li><li><a href="#a70f">Elements belonging to sets</a></li><li><a href="#fdbe">Intersection</a></li><li><a href="#82d0">Union</a></li><li><a href="#4ebf">When one set contains another (set operations)</a></li><li><a href="#98c7">Review</a></li><li><a href="#7e8b">References</a></li></ol><h3>1. Venn diagram</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/676/1*nrUMo3HKsF2hxyettvdKTA.png" /><figcaption>Venn Diagram design by author</figcaption></figure><p>The so called <a href="https://www.cuemath.com/algebra/venn-diagram/">Venn Diagram</a> mathematically illustrates the association between two or more sets that may or may not have elements in common.</p><h3>2. Elements belonging to sets</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/676/1*vX9ACL-T-kxV4x6PyyximQ.png" /><figcaption>Designed by author</figcaption></figure><p>On the above illustration, the elements a1 and a2 <strong>belong </strong>to set A, while b1 and b2 <strong>belong </strong>to set B. The math notation for this association example can be defined as bellow:</p><blockquote>a1, a2 ∈ A</blockquote><blockquote>b1, b2 ∈ B</blockquote><h3>3. Intersection</h3><p>Notice that both a2 and b2 belong to sets A and B. However, these two elements are in a common area. As for a1 and b1, each one belongs to a specific set. We might say that the area where a2 and b2 are found is a third set. This common area is called <strong>intersection</strong>, that being said:</p><blockquote>a2, b2 ∈ A <strong>∩</strong> B</blockquote><p>meaning that a2 and b2 belong to the intersection between A and B.</p><h3>4. Union</h3><p>When we talk about the union between two sets we are basically talking about those elements that belong to either one of the two sets.</p><figure><img alt="" src="https://cdn-images-1.medium.com/proxy/1*vX9ACL-T-kxV4x6PyyximQ.png" /><figcaption>Designed by author</figcaption></figure><p>The union between sets A and B is the set with elements {a1, b1, a2, b2}. In other words, as well as the idea of the intersection between sets results in a third set, the union of sets also results in a new set. The math notation for this is:</p><blockquote>{a1, a2, b1, b2} ∈ (A ∪ B)</blockquote><p>Then, we can define the following sets C and D:</p><blockquote>C = (A ∩ B)</blockquote><blockquote>D = (A ∪ B)</blockquote><p>The set C consists on the <strong>intersection</strong> of A and B. As for set D, it consists on the elements that belong to the <strong>union</strong> of A and B. Formally, we say that C is the A intersection B and D is equal A union B.</p><h3>5. When one set contains another (set operations)</h3><p>Let’s take a look on the following image:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/634/0*2XYktQEOelj0nrW3.png" /><figcaption>Designed by author</figcaption></figure><p>On the above diagram we have the sets A and B, with their respective elements a1 and b1.</p><p>We say that set B is a <strong>subset </strong>of A (or A contains B)</p><blockquote>B ⊂ A</blockquote><p>All elements belonging to B also belong to A, but not every element in A belong to B.</p><p>From this example, we can extract the following properties:</p><blockquote>A ∩ B = B</blockquote><blockquote>A ∪ B = B</blockquote><blockquote>b1 ∈ A, B</blockquote><p>Using the concept of set operations, getting back to the Venn Diagram from item 4, we can conclude that:</p><blockquote>C ⊂ A, B</blockquote><blockquote>A, B ⊂ D</blockquote><p>In other words, the set C is contained, simultaneously, in A and B, while the set D contains both A and B. As a consequence, the elements from C belong to sets A and B, although the elements belonging to A or B belong to D.</p><h3>6. Review</h3><p>The set theory is an excellent start for those who want to understand a little further about probability and SQL programming.</p><p>In this article, we saw that the intersection between two or more sets is linked to the idea of its elements belonging. We also saw that the union between two sets consists in a set that contains all their elements.</p><p>Although we used two sets in this article, the same ideas can be extended to 3, 4, …, n different sets.</p><p>There’s a caveat here:</p><blockquote>The union between two or more sets is not the same as the sum of its elements, although intuitively it makes sense. If we add the elements of A and B, as a result we’ll have the duplicity of the elements that belong to their intersection.</blockquote><p>We need to remember this concept, specially if we’re talking about probability.</p><p><strong>References:</strong></p><p><a href="https://plato.stanford.edu/entries/set-theory/#:~:text=Set%20theory%20is%20the%20mathematical,whose%20members%20are%20also%20sets">https://plato.stanford.edu/entries/set-theory/#:~:text=Set%20theory%20is%20the%20mathematical,whose%20members%20are%20also%20sets</a>.</p><p><a href="https://www.cuemath.com/algebra/venn-diagram/">Venn Diagram - Examples, Definition, Formula, Symbols, Types</a></p><h3>Stay connected</h3><ul><li>Connect on <a href="https://www.linkedin.com/in/falcaojoaorenato/">LinkedIn</a>.</li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=232cb1da0d16" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Por que aprender SQL]]></title>
            <link>https://falcaojoaorenato.medium.com/por-que-aprender-sql-cc152823a039?source=rss-9e5ec55c6166------2</link>
            <guid isPermaLink="false">https://medium.com/p/cc152823a039</guid>
            <category><![CDATA[database]]></category>
            <category><![CDATA[sql]]></category>
            <category><![CDATA[ciencia-de-dados]]></category>
            <category><![CDATA[dicas]]></category>
            <category><![CDATA[data-science]]></category>
            <dc:creator><![CDATA[João Renato]]></dc:creator>
            <pubDate>Tue, 16 Jun 2020 18:15:24 GMT</pubDate>
            <atom:updated>2022-08-21T02:28:23.184Z</atom:updated>
            <content:encoded><![CDATA[<h3>Ciência de Dados: Por que aprender SQL</h3><h4>Se você é cientista de dados ou aspirante a cientista de dados, entenda aqui a importância de aprender SQL.</h4><ol><li><a href="#e8b5">Contextualização</a></li><li><a href="#97b9">SELECT * FROM</a></li><li><a href="#5eab">Order by</a></li><li><a href="#90bc">Group by</a></li><li><a href="#f4b4">Joins</a></li><li><a href="#01b8">Aplicação em Ciência de Dados</a></li><li><a href="#9df0">Conclusão</a></li></ol><p>Como muito bem exposto no artigo <a href="https://medium.com/analytics-vidhya/5-reasons-every-aspiring-data-scientist-must-learn-sql-2bab007a8d76"><em>5 Reasons Every Aspiring Data Scientist Must Learn SQL</em></a>, aqui busca-se mostrar, de forma simples, didática e concisa, as vantagens da linguagem e quais suas aplicações no campo da Ciência de Dados.</p><h3>Contextualização</h3><p><a href="https://pt.wikipedia.org/wiki/SQL"><em>Structered Query Language</em></a>, mais conhecida como SQL, é uma linguagem voltada para consultas em <a href="https://pt.wikipedia.org/wiki/Banco_de_dados_relacional">bancos de dados relacionais </a>desenvolvida pela <a href="https://www.ibm.com/br-pt">IBM</a> na década de 1970.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/327/1*ZQ5O3uaUEQh1J76jl1Ov_g.png" /><figcaption><a href="http://blog.dbaacademy.com.br/t-sql-nao-suportadas-no-sql-azure/">http://blog.dbaacademy.com.br/t-sql-nao-suportadas-no-sql-azure/</a></figcaption></figure><p>Inspirada na <a href="https://pt.wikipedia.org/wiki/%C3%81lgebra_relacional">Álgebra Relacional</a>, com o passar dos anos, SQL se disseminou e se popularizou de tal forma que não se limita mais ao domínio da IBM. Muitos softwares populares de análise de dados, como <a href="https://www.sas.com/pt_br/home.html">SAS</a>, <a href="https://www.r-project.org/">R</a>, dentre outros, se utilizam dela para manuseio de tabelas e <a href="https://bookdown.org/wevsena/curso_r_tce/curso_r_tce.html#o-que-e-um-data-frame"><em>data frames</em></a>.</p><h3>SELECT * FROM</h3><p>Sua popularidade pode ser explicada pelo fato de que é uma linguagem fácil e intuitiva.</p><p>Suponha que temos uma tabela de dados relacionais conforme a seguir:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/271/1*SphluRMuftINVWEzGf2ckg.png" /><figcaption>Fonte: Autor</figcaption></figure><p>Se quisermos selecionar apenas a coluna (variável) <strong>Nome </strong>da tabela <strong>Dados</strong>, utilizamos o seguinte comando:</p><p>select Nome from Dados</p><p>Já se quisermos selecionar as ocupações (variável <strong>Ocupação</strong>) cujos indivíduos têm entre 30 e 40 anos, tem-se o seguinte comando:</p><p>select &#39;Ocupação&#39; from Dados where Idade between 30 and 40</p><p>Observe que, para limitar os registros (linhas da tabela) a serem consultados, eu utilizo o comando condicionante <strong>where.</strong></p><p>Se quisermos selecionar todas as variáveis da tabela sem nenhuma restrição, ou seja, a tabela inteira, temos duas opções de comando:</p><pre>Opção 1:<br>select ID, Nome, &#39;Ocupação&#39;, Idade from Dados</pre><pre>Opção 2:<br>select * from Dados</pre><p>As duas linhas de código apresentadas acima resultam na mesma consulta. A diferença básica é que, na primeira, o usuário tem a liberdade para consultar sua tabela em qualquer ordem de suas variáveis. Em suma, o asterisco (* )é entendido, na linguagem SQL, como a <strong>seleção de todas as colunas da tabela</strong>.</p><blockquote><strong>Obs</strong>: Existe também uma questão de performance envolvida entre os dois tipos de consulta. Todavia, para fins didáticos, tal assunto não será abordado neste artigo.</blockquote><p>O comando <strong>SELECT </strong>faz parte do conjunto de comandos denominado <a href="https://www.ibm.com/docs/en/i/7.2?topic=programming-data-manipulation-language"><em>Data Manipulation Language</em></a> (DML). Entre os comandos do tipo DML estão:</p><blockquote>DELETE</blockquote><blockquote>UPDATE</blockquote><blockquote>INSERT</blockquote><p>Como você já deve imaginar, cada comando desse resulta, respectivamente, em <strong>deletar</strong>, <strong>atualizar </strong>e <strong>inserir </strong>dados em uma tabela.</p><h3>Order by</h3><p>Fazer consultas em bancos de dados, obviamente, não se limita a apenas selecionar as colunas/linhas desejadas. Muitas vezes há necessidade de visualização dos dados ordenados por alguma categoria ou variável.</p><p>Voltemos ao exemplo anterior. Se desejamos visualizar os dados ordenados, de forma crescente, pela idade, temos a seguinte linha de código:</p><p>select * from Dados order by Idade</p><p>Já se desejamos visualizar esses dados ordenados pela idade de forma decrescente, temos:</p><p>select * from Dados order by desc Idade</p><blockquote><strong>Obs</strong>: Dado o fato de que a linguagem SQL é utilizada por muitos softwares diferentes, podem haver pequenas divergências em certos comandos, de forma que o <strong>desc </strong>pode estar após a variável ou entre parênteses, por exemplo. É preciso então que o usuário se atente às peculiaridades da linguagem SQL no ambiente em que ele está programando.</blockquote><p>Repare como toda a funcionalidade da linguagem se resume a selecionar (select ) algumas ou todas as colunas (* ) da tabela de interesse (from ), o que reforça a ideia da facilidade no uso de SQL.</p><h3>Group by</h3><p>Suponha que a tabela na qual queremos trabalhar seja:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/267/1*NPwAltEgTcXu06BOYA6c9Q.png" /><figcaption>Fonte: Autor</figcaption></figure><p>Se desejamos selecionar a média das idades dos indivíduos que são filósofos, temos as seguintes opções de código:</p><pre>Opção 1:<br>select <br>       sum(Idade)/count(*) <br>from Dados <br>where &#39;Ocupação&#39; = &#39;Filósofo&#39;<br>group by &#39;Ocupação&#39;</pre><pre>Opção 2:<br>select <br>       avg(Idade)<br>from Dados <br>where &#39;Ocupação&#39; = &#39;Filósofo&#39;<br>group by &#39;Ocupação&#39;</pre><p>Ambos os códigos trarão a mesma consulta. Note que dessa vez tive que utilizar, além do recurso <strong>group by</strong>, a condicionante <strong>where</strong>, pois limitei minha consulta a apenas registros em que o indivíduo possuía ocupação de filósofo where &#39;Ocupação&#39; = &#39;Filósofo&#39;</p><p>As duas funções <strong>count()</strong> e <strong>sum() </strong>também são de suma importância para manipulações algébricas. A primeira conta o número de linhas de determinada coluna count(coluna) ou da tabela como um todo count(*) . Já a função <strong>sum()</strong> soma os valores da coluna de interesse, no caso a coluna Idade sum(Idade). Dessa forma, obtém-se a média das idades pela fórmula sum(Idade)/count(*).</p><p>A segunda linha de código traz a função <strong>avg()</strong>, de <em>average</em>. Uma forma mais direta de se calcular a média da coluna de interesse.</p><blockquote><strong>Obs:</strong> A função que calcula a média em SQL pode variar de software para software podendo ser avg() ou mean(). Mais uma vez, o programador deverá se atentar a esse detalhe.</blockquote><h3>Joins</h3><p>Suponha que temos duas tabelas, nos moldes da tabela apresentada no tópico anterior. Chamemos essas tabelas de A e B.</p><p>Imagine então que há registros em comum entre essas tabelas, ou seja, há informações contidas na tabela A que também estão na tabela B. Tais informações são conhecidas por Interseção, conforme já explicado no artigo sobre Teoria dos Conjuntos.</p><p><a href="https://falcaojoaorenato.medium.com/teoria-dos-conjuntos-642dabb5128a">Teoria dos Conjuntos</a></p><p>Mas a utilidade dos “Joins” vai muito além de interseção entre elementos de duas tabelas. Esse recurso serve para unir, descartar ou até mesmo comparar elementos entre tabelas.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/599/1*VRbHGVgVc42KVSw6A3SiQA.png" /><figcaption>Fonte: <a href="https://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins">https://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins</a></figcaption></figure><p>Uma explicação mais completa sobre junções de tabelas relacionais se encontra neste excelente artigo de <a href="https://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins">C.L. Moffatt</a>.</p><h4>Condições If Else (CASE WHEN)</h4><p>Suponha que eu deseje criar uma nova variável que classifica a faixa etária dos indivíduos contidos na tabela Dados. Se eu desejo criar as categorias “jovem”, “adulto” e “maduro”, de acordo com a idade, eu utilizo a condicionante <strong>when</strong>, conforme o exemplo abaixo.</p><pre>select <br>  case<br>    when Idade&lt;30 then &quot;jovem&quot;<br>    when Idade between 30 and 40 then &quot;adulto&quot;<br>    when Idade &gt;= 40 then &quot;maduro&quot;<br>  end as faixa_etaria<br>from Dados</pre><p>Observe que a função CASE possui uma estrutura em bloco.</p><p>case when then end</p><p>Dessa forma, a estrutura em bloco do <strong>case when </strong>gera o algoritmo condicionante para que eu possa criar a variável de interesse.</p><h3>Aplicação em Ciência de Dados</h3><p>A vantagem (ou necessidade) de se usar SQL em ciência de dados pode ser resumida, como já mostrado, em sua facilidade de uso e na capacidade de melhor compreensão dos conjuntos de dados (<em>data sets</em>) que o analista tem em mãos.</p><p>Afinal, qualquer Estatístico/Cientista de Dados sabe que toda análise, por mais complexa que seja, começa pela extração e exploração dos dados a serem analisados, o que envolve limpeza, estruturação ou até cruzamentos de tabelas dos Bancos de Dados.</p><p>No já mencionado <em>software</em> <strong>R</strong>, existe um pacote chamado “<a href="https://www.rdocumentation.org/packages/sqldf/versions/0.4-11">sqldf</a>”, com o qual é possível programar em SQL para manipulação, cruzamento e/ou reestruturação de <em>Data Frames</em>.</p><iframe src="https://drive.google.com/viewerng/viewer?url=https%3A//cran.r-project.org/web/packages/sqldf/sqldf.pdf&amp;embedded=true" width="600" height="780" frameborder="0" scrolling="no"><a href="https://medium.com/media/becafa0717a316402f77a827ee3e8b1b/href">https://medium.com/media/becafa0717a316402f77a827ee3e8b1b/href</a></iframe><h4>Subconsultas (<em>subquery</em>)</h4><p>Um recurso muito útil dessa linguagem é o de subconsultas. Como o próprio nome sugere, ela consiste em se fazer uma consulta dentro de uma consulta, seja em um <strong>where</strong> ou <strong>case when</strong>.</p><pre>Exemplo 1:<br>select <br>       avg(Idade) as media_idade<br>from Dados a<br>where (select Idade from Dados where Id = a.Id)&gt;30<br>group by &#39;Ocupação&#39;</pre><pre>Exemplo 2:<br>select case<br>       when Idade &lt; <br>(select avg(Idade) from Dados where &#39;Ocupação&#39; = a.&#39;Ocupação&#39;) then &#39;Novo na categoria&#39;<br>       else &#39;Dentro da média na categoria&#39;<br>       end as compara_idade<br>from Dados a</pre><p>Note que em ambos os exemplos foi utilizada a mesma tabela para comparar seus dados, de forma que há uma consulta dentro do algoritmo de comparação.</p><p>No Exemplo 3, compara-se a Idade do indivíduo com a média de todas as idades.</p><pre>Exemplo 3:<br>select <br>   case <br>      when Idade &lt; (select avg(Idade) from Dados) then &#39;Idade Menor&#39;<br>      else &#39;Idade igual ou maior&#39;<br>   end as compara_idade_media<br>from Dados</pre><h4>Auto Join</h4><p>Dada a natureza relacional da linguagem, há uma limitação no sentido de se poder explorar/comparar dados dentro de uma mesma coluna.</p><pre>Dados2 = <br>select * from Dados order by Id</pre><pre>select <br>       case<br>         when  a.Idade &lt; b.Idade then &#39;mais novo&#39;<br>         else &#39;mais velho ou mesma idade&#39;<br>       end compara_idade<br>from Dados2 a left join Dados2 b<br>on a.Id &lt; b.Id<br>group by &#39;Ocupação&#39;</pre><p>No exemplo acima, busca-se comparar as idades entre os indivíduos. Note que, para isso, é feito um <strong>left join </strong>da tabela Dados com ela mesma.</p><blockquote><strong>Obs</strong>: Neste caso, inicialmente, foi criada uma tabela <strong>Dados2</strong> ordenando os dados pela coluna <strong>Id</strong>. Isso foi feito para que o <strong>auto join</strong> pudesse ser feito pela variável.</blockquote><p>Um outro exemplo seria o de comparar a idade dos filósofos com os demais membros da tabela.</p><pre>select <br>    case<br>      when a.Idade = b.Idade then &#39;Mesma Idade&#39;<br>      when a.Idade &lt; b.Idade then &#39;Idade Menor&#39;<br>      when a.Idade &gt; b.Idade  then &#39;Idade Maior&#39;<br>    end as compara_idade_filosofos<br>from Dados a left join Dados b on a.Id &lt;&gt; b.Id<br>where b.&#39;Ocupação&#39; = &#39;Filósofo&#39;</pre><h4>Uma plataforma para aprender/treinar SQL</h4><p>Como já é de conhecimento de todos, a melhor forma de conhecer nossas lacunas de conhecimento é nos testando. Com isso, gostaria de recomendar uma plataforma que achei, particularmente, interessante para testar meus conhecimentos em SQL:</p><p><a href="https://sqlzoo.net/">SQLZOO</a></p><p>Essa plataforma é citada no artigo <a href="https://towardsdatascience.com/sqlzoo-the-best-way-to-practice-sql-66b7ccb1f17a"><em>SQLZoo: The Best Way to Practice SQ</em>L</a>, que mostra, além da citada acima, várias outras bem interessantes para testar suas habilidades nessa linguagem.</p><h4>Mais dicas para Cientistas de Dados</h4><p>Aconselho fortemente a leitura do artigo abaixo, além do mencionado no início deste post, com mais dicas e informações sobre as possibilidades de aplicação dessa linguagem nessa área de atuação.</p><p><a href="https://towardsdatascience.com/extra-4-sql-tricks-every-data-scientist-should-know-d3ed7cd7bc6c">Extra 4 SQL Tricks Every Data Scientist Should Know</a></p><h3>Conclusão</h3><p>Vimos como a linguagem SQL é fácil de aprender e de se utilizar. Vimos também como ela é útil quando se trata de lidar com exploração de dados relacionais e cruzamento de tabelas.</p><p>É importante destacar que os exemplos de código explorados até aqui são extremamente simples e com fins puramente didáticos. Um <strong>auto join</strong> ou <strong>subconsulta </strong>podem se tornar extremamente complexos com dezenas de linhas de código, dependendo da aplicação (podendo estar até mesmo aninhados, ou seja, uma subconsulta dentro de um auto join e vice-versa).</p><p>Dito isso, pode-se concluir que saber SQL, para um Cientista de Dados, é tão importante quanto saber inglês para qualquer profissional do século XXI. Hoje em dia, há várias plataformas como <a href="https://www.coursera.org/">Coursera</a>, <a href="https://www.udemy.com/">Udemy</a>, dentre outras com cursos acessíveis, além é claro da possibilidade de se aprender na prática com o bom e velho <a href="https://stackoverflow.com/">stackoverflow</a>.</p><blockquote>Este post foi esclarecedor? Ficou mais alguma dúvida? Deixe sua contribuição nos comentários para o caso de eu ter esquecido de abordar algum conceito e possa, então, complementar na Parte 2.</blockquote><blockquote>Obrigado!</blockquote><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=cc152823a039" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Teoria dos Conjuntos]]></title>
            <link>https://falcaojoaorenato.medium.com/teoria-dos-conjuntos-642dabb5128a?source=rss-9e5ec55c6166------2</link>
            <guid isPermaLink="false">https://medium.com/p/642dabb5128a</guid>
            <category><![CDATA[matemática]]></category>
            <category><![CDATA[sql]]></category>
            <category><![CDATA[conjunto]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[ciencia-de-dados]]></category>
            <dc:creator><![CDATA[João Renato]]></dc:creator>
            <pubDate>Thu, 11 Jun 2020 17:37:07 GMT</pubDate>
            <atom:updated>2020-09-23T17:50:22.646Z</atom:updated>
            <content:encoded><![CDATA[<h3>Teoria dos Conjuntos — uma base para a linguagem SQL</h3><p>Este artigo tem por finalidade explicar de forma didática e concisa a teoria dos conjuntos e, dessa forma, embasar artigos futuros sobre linguagem SQL, Probabilidade e Estatística.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/283/1*sLqlUW-n7to7n1D1J-G3lw.png" /><figcaption><a href="https://autociencia.blogspot.com/2016/07/teoria-dos-conjuntos-exercicios.html">Fonte</a></figcaption></figure><p>A Teoria dos Conjuntos é uma forma de explicar como diferentes elementos se distribuem dentro de grupos, sejam ocupando grupos em comum ou não ocupando nenhum grupo. Dessa forma, ela tenta quantificar as possibilidades de agrupamento desses elementos. Sua aplicabilidade é praticamente infinita. Pode-se usar a teoria de conjuntos em bolhas sociais, levantamentos estatísticos de perfil, em catalogação de livros ou produtos em uma loja ou até mesmo <a href="https://pt.wikipedia.org/wiki/L%C3%B3gica_proposicional#:~:text=Em%20l%C3%B3gica%20e%20matem%C3%A1tica%2C%20uma,certas%20f%C3%B3rmulas%20sejam%20estabelecidas%20como">lógica proposicional</a> e por aí vai. Além, é claro, de ser a base para a já mencionada <em>Strutucted Query Language</em> (<a href="https://pt.wikipedia.org/wiki/SQL">SQL</a>) e para o conceito de “sets” em <a href="https://www.coursera.org/learn/python-for-applied-data-science-ai/lecture/7GcLY/sets">Python</a>.</p><ol><li><a href="#8e5a">Diagrama de Venn</a></li><li><a href="#cb36">Elementos pertencentes a conjuntos</a></li><li><a href="#b511">Interseção</a></li><li><a href="#e48d">União</a></li><li><a href="#8323">Quando um conjunto contém o outro (operações entre conjuntos)</a></li><li><a href="#01d9">Recapitulando</a></li></ol><h3>1. Diagrama de Venn</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/914/1*-5sJkkVyt2SXQRfIF6K38g.png" /></figure><p>O chamado <a href="https://pt.wikipedia.org/wiki/Diagrama_de_Venn">Diagrama de Venn</a> ilustra matematicamente a relação entre dois ao mais conjuntos que possuem ou não elementos em comum.</p><h3>2. Elementos pertencentes a conjuntos</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/905/1*QAok_BoVlzGB-BhKc9Li0Q.png" /></figure><p>Diz-se que os elementos a1 e a2 <strong>pertencem</strong> ao conjunto A, enquanto que b1 e b2 <strong>pertencem</strong> ao conjunto B. A notação matemática para essa relação pode ser descrita da seguinte forma:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/125/1*tkm_uWC0D4h6T0PESs3mAg.png" /></figure><h3>3. Interseção</h3><p>Note que a2 e b2 pertencem a ambos os conjuntos A e B, porém esses dois elementos encontram-se em uma uma área que é comum aos dois conjuntos. Já a1 e b1 pertencem, cada um, a um conjunto específico. Fazendo um breve exercício mental, poderíamos dizer que a área em que a2 e b2 se encontram é um terceiro conjunto. A essa área em comum denominamos <strong>interseção</strong>, de forma que:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/176/1*klhhL46pZG-oiZgXj2nCWQ.png" /></figure><p>ou seja, a2 e b2 pertencem à interseção entre A e B.</p><h3>4. União</h3><p>O conceito de união entre conjuntos vem da ideia de pertencimento de todos os seus elementos. Observe a seguinte imagem:</p><figure><img alt="" src="https://cdn-images-1.medium.com/proxy/1*QAok_BoVlzGB-BhKc9Li0Q.png" /></figure><p>A união entre os conjuntos A e B resulta no conjunto dos elementos {a1,a2,b1,b2}. Ou seja, assim como a ideia de interseção entre conjuntos resulta em um terceiro conjunto, a união entre conjuntos também resulta em um novo conjunto. A notação matemática para a união é:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/286/1*41YmvNTseA6sRwNWlnFDTA.png" /></figure><p>Podemos então definir os seguintes conjuntos C e D:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/155/1*UI1deyTglayqILXkTG1CvA.png" /></figure><p>Note que o conjunto C é formado pelos elementos da <strong>interseção</strong> entre A e B. Já o conjunto D é formado pelos elementos da <strong>união</strong> entre A e B. Formalmente, dizemos que C é igual A interseção B e D é igual a A união B.</p><h3>5. Quando um conjunto contém o outro (operações entre conjuntos)</h3><p>Repare na seguinte figura:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/634/1*xp8cLcUiFLCNOPGIaEpxug.png" /></figure><p>No diagrama acima tem-se os Conjuntos A e B, com seus respectivos elementos a1e b1.</p><p>Diz-se que o conjunto B <strong>está contido</strong> em A (ou A contém B). Também pode-se dizer que B é um subconjunto de A.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/85/1*xW5YryfIzam9H73yYH6HFA.png" /></figure><p>Note que todos os elementos que pertencem a B pertencem a A, mas nem todo elemento pertencente ao conjunto A pertence ao conjunto B.</p><p>Podemos extrair desse exemplo as seguintes propriedades:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/131/1*NFxSk366_Rrn9Nsl660YgQ.png" /></figure><p>A partir do conceito de operações entre conjuntos, voltando ao Diagrama do item 4, podemos concluir que:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/119/1*FSmcmOInCVI-pGWH7q6wlg.png" /></figure><p>Ou seja, o conjunto C está contido, ao mesmo tempo, em A e em B, enquanto que o conjunto D contém tanto A como B. Por consequência, os elementos de C pertencem aos conjuntos A e B, em contrapartida tanto os elementos de A como os elementos de B pertencem ao conjunto D.</p><h3>6. Recapitulando</h3><p>A teoria dos conjuntos é um excelente ponto de partida pra quem quer entender um pouco mais sobre probabilidade e programação em SQL.</p><p>Vimos que a interseção entre dois ou mais conjuntos está ligada à noção de pertencimento mútuo de seus elementos. Vimos também que a união entre dois conjuntos é formada pelo conjunto que abrange todos os seus elementos.</p><p>Neste post, foram utilizados — pra fins didáticos — apenas dois conjuntos. Porém, como é possível notar no diagrama que abre este artigo, os conceitos aqui abordados podem ser estendidos para 3, 4, …, n conjuntos diferentes.</p><p>Cabe aqui também uma ressalva:</p><blockquote>A união entre dois ou mais conjuntos não é o mesmo que a soma de seus elementos, apesar de intuitivamente isso fazer sentido. Basta notar que se somarmos os elementos de A e B teremos como resultado a duplicidade dos elementos pertencentes à sua interseção.</blockquote><p>É preciso ter em mente esse conceito principalmente quando falarmos sobre teoria das probabilidades.</p><blockquote>Este post foi esclarecedor? Ficou mais alguma dúvida? Deixe sua contribuição nos comentários para caso eu tenha esquecido de abordar algum conceito e possa, então, complementar na Parte 2.</blockquote><blockquote>Obrigado!</blockquote><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=642dabb5128a" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Aprendendo Julia para Ciência de Dados]]></title>
            <link>https://falcaojoaorenato.medium.com/aprendendo-julia-para-ci%C3%AAncia-de-dados-bddc57a7a21d?source=rss-9e5ec55c6166------2</link>
            <guid isPermaLink="false">https://medium.com/p/bddc57a7a21d</guid>
            <category><![CDATA[ciencia-de-dados]]></category>
            <category><![CDATA[julia]]></category>
            <dc:creator><![CDATA[João Renato]]></dc:creator>
            <pubDate>Sun, 05 Apr 2020 19:10:10 GMT</pubDate>
            <atom:updated>2020-04-09T15:44:25.461Z</atom:updated>
            <content:encoded><![CDATA[<h3>Aprendendo Julia para Ciência de Dados — Parte 1</h3><p>Em 2012, um colega de graduação comentou sobre uma nova linguagem de programação que poderia vir a substituir a linguagem <a href="https://www.r-project.org/">R</a>.</p><p>De imediato, o(a) leitor(a) já familiarizado(a) com programação para análise de dados pode pensar que meu estimado colega teria mencionado a linguagem <a href="https://www.python.org/">Python</a>. Porém, não foi esse o caso. Ele estava falando da recém criada <a href="https://julialang.org/">Julia</a>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/811/1*b5PEy3C8AzsV3Cn0zxBJZg.png" /></figure><p>De toda sorte, esse foi um comentário que eu, de início, achei interessante, porém não me motivei em me aprofundar sobre o assunto. Eu era muito novo, não era formado em Estatística ainda e não tinha tanta habilidade em programação.</p><p>O tempo foi passando, por necessidade do destino tive que aprender <a href="https://www.sas.com/pt_br/home.html">SAS</a>, <a href="https://www.w3schools.com/sql/default.asp">SQL</a>, depois R e por último me interessei pela linguagem Python. Esta última como sendo aquela na qual eu queria mergulhar em termos de conhecimento e aplicações.</p><p>Durante um bom tempo, vi a linguagem Python como sendo a que viria a substituir o R. Seu código era amigável, intuitivo, além de que todos os artigos que os comparavam colocavam Python como mais rápido e eficiente. O que me levou a estar determinado a utilizar apenas Python como recurso computacional.</p><h3>Chega então 2020</h3><p>Já faz algum tempo que acompanho os artigos do <strong>Medium</strong>. Principalmente nas áreas de programação e ciência de dados — meus maiores interesses.</p><blockquote>Eis que um belo dia me deparo com um artigo intitulado “<a href="https://towardsdatascience.com/why-python-is-not-the-programming-language-of-the-future-30ddc5339b66">Why Python is not the programming language of the future</a>”.</blockquote><p>Nele, o autor explica as desvantagens do Python perante outras linguagens um pouco mais recentes e desmistifica o Hype em torno dela.</p><p>Mas um trecho em particular do artigo me chamou a atenção. Um trecho em que o autor lista possíveis linguagens que viriam a substituir Python no futuro:</p><ul><li>Rust</li><li>Go</li><li>Julia</li></ul><p>Quando li o nome “Julia” logo me veio na lembrança meu colega de graduação falando dessa nova linguagem.</p><p>Baixei o <a href="https://julialang.org/downloads/">programa</a> em meu computador, comecei a pesquisar sobre e logo percebi que havia muito pouco material em português a respeito — aqui no Medium, um material bem interessante é do <a href="https://medium.com/u/8eb7bc83c97a">Pizza de Dados</a>, como o artigo “<a href="https://medium.com/pizzadedados/precisamos-falar-sobre-a-linguagem-julia-e22eb235969e">Precisamos falar sobre a linguagem Julia</a>”. Ao investigar mais a fundo, notei que há uma grande <a href="https://www.youtube.com/user/JuliaLanguage">comunidade internacional</a> contribuindo para os avanços da Julia, que há muita aplicabilidade em computação científica e muitos recursos visuais.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*KIieuTxptgFcTClwNRbo-w.png" /><figcaption><a href="http://julialang.org">julialang.org</a></figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-uOvp5lIvIAEi-kAzCJHYQ.png" /></figure><p>O interesse e a curiosidade foram crescendo de tal forma que hoje estou determinado a explorar esse nicho, aparentemente pouco explorado aqui no Brasil, de Ciência de Dados utilizando Julia.</p><p>Meu objetivo com este artigo — o qual espero ser o primeiro de muitos — é então iniciar meus estudos na aplicabilidade de Julia como ferramenta em Ciência de Dados de forma compartilhada. Com isso, fixo melhor meus conhecimentos e ganho possíveis <em>feedbacks </em>e colaboradores para que possamos trocar ideias e até mesmo criar novas.</p><p>Conto com vocês.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=bddc57a7a21d" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>