Stories by Lorna Maria A on Medium

Easing your way into new topics with a learning plan.

Lorna Maria A — Sun, 30 Jun 2019 13:40:08 GMT

One of my goals for the second half of the year is to help as many people to get a grip on their learning goals for 2019. Throughout my YouTube series Learning with Lorna, I have shared the first step to the learning process, making a learning plan.

In the video, I am highlighting what the learning plan looks like, why you need one and the major components of the learning plan.

The article below is the blog version of the video so if you’re interested in the video to go ahead and watch it here and if prefer reading, use this article to guide you.

Key points about a learning plan

A learning plan is a document that lays out learning prospects over time. When one takes interest in a topic/skill and intends to start learning it, it is important to draft a plan that fits within their schedule to elaborate on how they will pursue this new-found interest. A learning plan is not a timetable.

The plan takes in your learning goals and helps you map out the steps, actions and resources needed to meet them and the evidence that you need to measure success.

Section 1

Name: Fill in your name or the name of the person you’re preparing this for.

Date: Fill in the intended learning period e.g 19th June 2019 to 19th July 2019

Section 2

This section is a personal introspective centred on helping you discover what you want to meet from this learning process.

Vision: What do you envision yourself achieving by learning this topic/skill?

Expected outcomes: What impact do you want to make by learning this? (centre this on yourself)

Section 3 Topics

General topic: Fill in the major topics and break them down into manageable sub-topics, if possible break it down further to fill the sub-specific topics. This will help you have a precise definition of what you’re trying to learn.

Learning Aids

In order to maximise your learning goals, you need to take advantage of the resources available to help you learn better. These include tutorials, articles, podcasts, books etc. It is essential for you to map these out ahead of your learning journey and go ahead to insert references to specific resources if you have already found them.

This saves a lot of time when you start your learning process and helps you focus more on learning than trying to find resources.

Time frame

Fill in the days and hours you’re dedicating to each sub topic: being specific to the hours will help you know how many hours you cumulatively dedicated to the learning which is a metric of accountability.

Place

A great learning environment contributes to the learning process. Depending on what you’re learning and what type of learner you’re, you have to choose your environment carefully and prepare it in advance to avoid distractions and save time.

Find out what type of learner you’re here.

Expected Outcomes

For a successful learning process, your goals should be measurable. In the outcomes, set simple actions you can fulfil after each topic learned. These can include taking a quiz, presenting to fellow learners among others. These will cumulatively be a way to measure the success of your learning process.

Notes

A space reserved for any comments about the process you have mapped out. You can use it for budgeting your learning process, evaluating how much you need to successfully set aside to buy resources, fund your movements etc. and compare with any available alternatives. The goal is to keep your learning process affordable.

Use this link to download a learning plan template of the format described in this article. Feel free to give me feedback on your own use case of the template.

Follow me on Twitter and Instagram and subscribe to my YouTube to follow the Learning with Lorna series. I am happy to hear more people learning something new and challenging themselves further.

Happy learning! 😻

A data science newbie’s guide through SQL.

Lorna Maria A — Thu, 30 May 2019 13:59:52 GMT

Chapter Four — Select and Alter

In chapter 4 of the SQL series, we shall look at the two SQL statements that will commonly be used when writing SQL from a data science perspective.

Importing sample data into the database

In order to show how these statements work we need to have sample data in our database. Using this link, download and import this csv file into your database:

Start your server > open pgAdmin> go to your database > create a table: console_games > add the columns with their column names in the csv and the corresponding data types > right-click on the table> import/export>toggle to import, add csv, add delimiter “,”>click ok.

Your imported data should look like this.

Dataset is from https://www.superdatascience.com/

Please note that the csv is imported without the column names.

Read more about data types in PostgreSQL 👉🏾 here

SELECT statement

This is a statement used to “return” or “select” records from one or more database tables. From a data science perspective, select will be the most commonly used SQL statement because it answers questions about data sets. It returns a set of records defined by the parameters parsed/passed in it. In order to retrieve data, one must know the ins and outs of the select statement.

Let us break it down below:

Main clauses

SELECT: the opening clause specifies the columns selected.

FROM: this specifies the table(s) to be selected from.

Optional Clauses

These clauses apply different constraints on the rows. There are many optional clauses but the popular ones include:

WHERE: this specifies the rows selected / conditions parsed
ORDER BY: this specifies the order to show the returned rows
GROUP BY: this groups rows that share a property satisfied by a function
HAVING: this selects rows that have a specified parsed condition

SELECT syntax

SELECT column1, column2, …
 FROM  table_name
 WHERE condition
 HAVING condition
 GROUP BY column_name
 ORDER BY column_name

SELECT examples

From our imported dataset, let us answer some questions about the dataset.

From the pgAdmin main menu select Tools > Query Tool and open a new Query Editor window to run these queries.

Q1: Return all columns where the Name is “Tetris”

Q2: Order the table by the highest North America sales (NA_sales)

Q4: Show a list of top 20 publishers by their total sales(show the sales).

Q5: Show a list of games and genres published by Microsoft Games Studio in 2010

JOINS

A join in SQL is a statement that creates a set of records by joining columns from one or more tables with a common value and this is exported as a table.

Understanding more about joins 👉🏾here

ALTER statement

The Alter statement makes changes in an already existing record in a database. From a data science perspective, the alter statement is useful during data cleaning as it helps add, update or delete records. The alter keyword is ALTER.

We shall try using ALTER in the next chapter as we clean a data set.

Conclusion

Congrats, today you have done much more practical work and that takes much energy.👏🏾

Thank you so much for catching up with chapter 4 of the SQL series, the basics ins and outs are starting to get together and for the next chapter, we shall start to work on more industry-specific examples doing data cleaning and analysis with SQL.

Feel free to share with me feedback by leaving a clap, comment or tweeting me @kalmpublication or @lornamariak

Catch up with previous chapters here: 1, 2, 3

Happy Learning!😻

A data science newbie’s guide through SQL.

Lorna Maria A — Tue, 21 May 2019 21:04:48 GMT

Chapter three— Creating tables in SQL

It is finally time to start writing statements. In chapter 3 of the SQL series, we shall look deeper into the rules around writing SQL and we shall write our first statements.

Syntax and semantics of SQL

Like any other programming language, SQL has its rules and styles of writing that we shall need to know these as we start writing statements.

Here are resources that will guide you:

SQL syntax and semantics guide 👉🏾Here

SQL data types 👉🏾Here

SQL Keywords 👉🏾 Here

Keep in mind, practice makes perfect. So let's dive in!

Create a table in your database:

This is the generic code structure to create a table.

 CREATE TABLE table_name (
 column1 datatype[(size)][column level constraints],
 column2 datatype,
 column3 datatype,

 CONSTRAINT table_name_constraint CONSTRAINT TYPE (column_name)
);

From last week's exercise here, let us create the table in the task. I envisioned my table to look like this:

There are two ways to create this table;

using the terminal (psql)it can be created with this code;

CREATE TABLE name_table
(
   name "char",
   age integer,
   fav_number
)

using pgAdmin: Ensure that your server is connected.

pgAdmin>Servers>Databases > database name> Schemas> public> Tables> right click and select create, fill in the column names, constraints and finish.

Constraints: When a table is created we are able to parse rules that will apply to either a column or the table these are called constraints. For example, the primary key commonly known as pk, table constraint that sets a column as the primary key or the not null column constraint that ensures that this column is never left blank.

There are about 7 commonly used constraints in SQL and you can read more about them here.

From the above example, let's make column name the primary key and ensure none of the columns is left blank.

CREATE TABLE name_table
(
   name "char" NOT NULL,
   age integer NOT NULL,
   fav_number integer NOT NULL,
   CONSTRAINT name_table_pkey PRIMARY KEY (name)
)

Creating a table using another table

As a data scientist, it is very important to know table inheritance because it comes in hand when you need to use another table's columns to carry out a test. This means you can use parts of an existing table without working/altering that table.

CREATE TABLE new_table_name AS
 SELECT column1, column2,
 FROM existing_table_name
 WHERE (condition);

We shall look more of this when we start to SELECT.

Inserting values into an SQL table

If not already inserted there is an SQl statement to insert values in a table.

INSERT INTO name_table(name, age, fav_number)
VALUES (value1, value2, value3);

-- Alternatively if inserting in all columns

INSERT INTO name_table
VALUES (value1, value2, value3);

Alternatively using the GUI(pgAdmin) can go as far as allowing one to import a csv file although you have to take caution, the data should be clean especially at the data type level to ease the import.

This is a very crucial observation as a data scientist because you need to understand what data types will make your work easier in SQL so that your data cleaning process can match that.

Conclusion

Congrats, you have your database ready to query!👏🏾 Go ahead, try to insert about 10 values. (Remember, you don’t have to get it right, you’re just practising.)

Thank you so much for catching up with chapter 3 of the SQL series, next week shall start to write queries.

Feel free to share with me feedback by leaving a clap, comment or tweeting me @kalmpublication or @lornamariak

Catch up with previous chapters here: 1, 2,

Happy Learning!😻

A data science newbie’s guide through SQL.

Lorna Maria A — Tue, 14 May 2019 16:06:00 GMT

Chapter two — Installing a database management software

Last week in SQL series chapter 1, I introduced SQL and in this week’s chapter 2 we shall go through installing a database management software that will enable us to practice SQL queries in the coming weeks.

In this article, I shall provide a list of useful links to detailed installation guides of all software we shall need.

Database Management software (DBMS)

To be able to use SQL we shall need to create databases for practice, similarly, in production, these are databases where data is already stored. In order for us to create, view and manage databases, we shall use database management software. A DBMS is a program for creating and managing databases.

The choice of the DBMS defers from company to company depending on preference, resources available and infrastructure.

Check out this list of the 30 most popular DBMS of April 2019.

PostgreSQL and pgAdmin

In our learning series, we are going to use PostgreSQL as out DBMS and pgAdmin as an interface for our PostgreSQL server.

Feel free to install any other server interface you might be interested in discovering.

Installation

Choose the installation guide that applies to the operating system you’re using to follow the SQL series.

PostgreSQL / pgAdmin for Mac users: Getting Started with PostgreSQL on Mac OSX by Patrick Sears

PostgreSQL / pgAdmin for windows users: How to Install PostgreSQL & pgAdmin 4 on Windows 10 by ProgrammingKnowledge

About PostgreSQL: https://www.postgresql.org/

About pgAdmin: https://www.pgadmin.org/

Practice Exercise

Congrats on following through this far!👏🏾 You are now ready to create your first database. Go ahead, try to create a table with the following columns: Name, Age, favourite number. (Remember, you don’t have to get it right, you’re just practising.)

Thank you so much for catching up with chapter 2 of the SQL series, next week shall start writing SQL statements.

Feel free to share with me feedback by leaving a clap, comment or tweeting me @kalmpublication or @lornamariak

Happy Learning!😻

A data science newbie's guide through SQL.

Lorna Maria A — Tue, 07 May 2019 19:46:28 GMT

Chapter One — Introduction to SQL

SQL — Structured Query Language.

Choose your pronunciation: Sequel or Ess-que-el

What is SQL?

SQL is a programming language designed to manage data stored in relational databases. Relational databases are a type of database that holds records in tables with a series of keys linking each table to another. The data is structured and therefore SQL handles altering, retrieving and sometimes manipulation of structured data.

This, however, is not where the structured in SQL is derived. Structured is derived from the syntax/format of clauses in the SQL.

Where is SQL used?

Today SQL is widely used in many web frameworks and database applications. It is a highly sort after querying language because of its ease of use and logistical optimisation in the face of large databases.

Its ability to query many data points and return results in a short time is impressive.

How is SQL used in data science

As a data scientist, the process begins with obtaining data to be analysed, this data is stored in a database or a sheet. Most modern system architectures include structured databases and require the use of SQL to comb out the data you would like to analyze.

SQL is used to retrieve data from a database as specified(through queries) and can be used to store data too(creation of tables).

SQL is used to manipulate data with inbuilt functions that can do simple overall manipulation in the querying process.

SQL enables you to run tests by allowing you to create and destroy test tables.

As a data scientist having the SQL knowledge gives you an upper hand into understanding how to store and retrieve data in relational databases.

Most companies have adopted RDMS and knowledge of SQL to be able to retrieve data of your interest from a company database before analysing it.

What do data scientists think about SQL?

I asked two data scientists that use SQL what they think and here is what they had to say:

I love SQL because even if data is updated, I can re-run the same script without no worries, says Shel Kariuki

Oreoluwa Ogundipe says Knowing the data you need to analyse is very key but furthermore being able to query it in the best/fastest way possible that meets your needs gives you a greater edge. Unlike tools where you have to create computed dimensions which do not represent how your data is stored, with SQL, you can query your data straight away and then filter your responses as you consider fit

Thank you so much for catching up with chapter 1 of the SQL series, next week I will share about preparing yourself and your computer as a data science newbie to start writing SQL.

Feel free to share with me feedback by leaving a clap, comment or tweeting me @kalmpublication or @lornamariak

Happy Learning!😍

Meet-up lessons: From starting a meet-up and envisioning a community.

Lorna Maria A — Tue, 07 May 2019 15:19:55 GMT

Starting a meet-up and envisioning a community.

An organizer’s perspective on running a meet-up group with a vision to scale into a community.

Photo credit: Christina Morillo

Background

For close to 5 years now I have been part of many tech communities like Women techmakers, Google developer groups and recently R user groups. While at the university I learnt a lot through community involvement more than any other extracurricular club. I loved the sense of togetherness and I equally enjoyed organising events and facilitating workshops on technologies I was conversant with.

Fast forward to 2018, It had been almost a year after school and I had started working in a data science-related field, I decided to revive my community days by hosting a monthly ladies’ meet-up to introduce different data science tools. My employer then supported the idea and this even made it more exciting. I was able to host the meet-up consistently for about 7 months.

Running the data ladies community

When starting a community we usually have this long vision of seeing it evolve into a huge impactful part of the domain it belongs to; in this case, it was the data science community, and I hope to eventually grow it into a self-sustaining project.

Community building involves technical and non-technical elements.

When I started running data ladies I focused majorly on developing the technical skills of women interested in data science but I soon realised that it grew into something consistent and I needed to scale into a full-blown data science community. Did I? my year review can answer this!

Here are some of the lessons I learnt.

Core skills and codependency

The focus on core/beginner skills in the data science field gave me a great starting advantage of nurturing participants from day one and later it became easy to introduce other much harder skills like machine learning. Every community has participants with different experience levels, focus on making these groups codependent.

It is great to have the novice group learn from the intermediate and advanced groups, the intermediate group is great at being learning facilitator for the novice group while working along the advanced and expert groups , the expert group is great at guiding and facilitating workshops for beginners and working alongside advanced group are great at coming up with learning material for the novice and intermediate groups. The expert group can always take up interns/volunteers from any group.

This will be an automated mentorship model in your community.

Knowing your community members

When you run consistent meet-ups you will start to see a number of members that keep showing up, build relations with these people, talk to them and start to involve them in the community, sometimes the creativity you need to hire to do something will be just among your people so pay attention.

This will also make leadership an easy transition in the community. Most communities I have been part of never had democratically elected leaders, most leadership was through volunteering and giving back to the community. Knowing my community members I was always ready to find who can work best in which place.

All learning must be fun and safe

No matter what topic or tool you decide to tackle, make the learning environment as fun as possible keeping in mind that community guidelines must be clear to everyone and followed religiously regardless of who they apply to. While communities are fun and free, they may also turn hostile if not governed by a certain level of rules.

Funding: Partnership, sponsorships and scholarships.

Most communities start while relying on external funding. It is very important to reach out to as many people or companies as possible and fine tune every partnership to also benefit the targeted sponsors. don’t make your proposals sound self-centred, be strategic.

Funding doesn't only come in terms of money, some people will give you space, a projector, snacks, photography, influencing, promo codes etc, take advantage of these to make your events great.

Always give your sponsors and supporting people feedback and a thank you note.

Consistency and Visibility

It is important to keep up with a pattern of events and if possible make a calendar and share with the community in advance. Be sure to also share your work especially on the meet-up social media or blog/website and urge your participants to share their growth too.

Growth will highly depend on feedback and follow-up and you can use a number of creative tools to collect feedback other that forms. Check out Kahoot and mentimeter. This will show you areas to improve, topics to rerun among other metrics.

Looking back at all these lessons I learned, I wished I had more time to spend with my data ladies group and even work hard to grow it further. However, I have noticed that two groups in the same niche have sprung up; R Ladies Kampala and Women in Machine learning and data science Kampala chapter.

I hope this article enlightens one thing or two for anyone looking at starting a learning group/community online or offline.

Feel free to drop any feedback to Lorna Maria A

A 2018 Review: Learning and Life.

Lorna Maria A — Sun, 23 Dec 2018 11:20:08 GMT

A 360 of 2018 moments

“ I don’t see a utopia anywhere for me.” ~Bozoma Saint John

I cannot believe that 2018 has come to an end! Just like many years of my adult life, this one started off with lots of projections and goals. Today I want to take off time and share three aspects of my life that I felt can help someone reading this.

Career and Learning

I started off this year at the entry level of my career. Although I had past working experiences around the building of developer communities, I had picked up a keen interest in data science and the year began with me deciding to have a full-time data science career. I used as many learning methods as possible to help me grasp skills that would put me on the job market real quick. Thanks to the R stats community, I realised that I had to learn and write. This made me more creative, gave me the zeal to learn more concepts and the feedback was so encouraging. This is by far an all-around method that not only helped me learn quickly but also helped me showcase my skills and that’s where my entry to the industry began.

What is it like to be a data scientist in the industry?

I often get this question a lot especially when people find out what I do versus my job title. I will mention that being a data scientist in the industry is sometimes not a mentioned task, sometimes no one at your workplace even calls you a data [insert all those cool titles] but it is your responsibility to show your team that data exists and can be incorporated in decision making. My past 3 months at my new workplace have taught me a million lessons about being a data scientist in the field that none of the blogs on the internet ever did.

So if you ask what it is like to be in the industry, it is tough, more than python and R and all those fancy words in the tech and data industry, it is research, asking the right questions and formulating the right comparisons. Soon you realise that models are not the hardest thing but how to formulate the right models.

Close and switch.

Often times friends ask me, why don’t you do tech communities anymore, you should do one more event or which one is the next event/ community you’re organising?

Well after close to four years of being a community organiser, I rediscovered myself. I must appreciate that it is through community organising that I discovered my true strengths and passions. I worked with promoting the use of different technologies, inclusion and diversity in tech and when 2018 came, I decided that I would go out and practice a part of me that I had discovered about two years ago.

It is okay to quit that original thing that everyone thinks you should stick to and go after that thing that you think will help you make a difference in the world. I love to watch communities grow, mentor people into the whole culture but also not forgetting to work on my true self. I think the wave is beautiful to ride but don’t forget yourself in that ride.

Life and relocating.

If you had told me at the beginning of this year that I would not spend the most important holidays with my family I would have said, well unless I am on Mars.

I made a bold decision to move into my own apartment because of numerous reasons but autonomy was top of the list. If you have grown up in a community where personal space is not really a thing you always wonder what it feels like. Well, self-reliance is as challenging as possible but every challenge is a lesson for years to come.

During this year, I had a chance to relocate(bucket list ticked). I enjoyed fitting my entire life in complimentary checked in luggage and starting all over but one big lesson that I learnt from this is it is never too late to start learning again.

Cultural shock, weather changes, language barrier are all things that can take a toll on you while relocating. Be aware and protect your sanity in that phase because things will go wrong and you will have to keep calm.

But amidst all this I have really grown into a responsible human being, I am more aware of my surrounding, I plan and budget and I can account for everything I do. I manage my time and all this comes from constant self-training.

Take time off and alone time.

I used to think holiday/vacation was just a fancy thing but guess what, I found out that it is instead a healthy thing to do. For your health, sanity and well being if not for your muscles, take time off and switch off the work. I took two holidays this year, both were a week long and they gave me a great rest, peace of mind and the best two weeks of 2018.

The activities of this year were taking a toll on me and everything was happening so fast, I was stressed, angry and feisty. One little thing would go wrong and I would lose my mind. The air around me was tense and negative but taking time off and crying, ranting and reading just helped transform this energy into something positive. Life is not all roses remember.

Not only did I enjoy the beautiful sights, but I also rediscovered myself and set my priorities, I learnt how to read a map(not google maps), talked to strangers, took beautiful pictures and spent time with beautiful people.

Hello 2019

2019 is my year to finally confront my bigger fears. I look forward to taking a very bold career step that will probably leave a mark on me for the next ten years. I also plan to start worrying less about things I cannot control and not to mention taking the matters of my diet and exercise routine into my own hands. I keep promising myself a better body but this time I am more than willing to give myself a better body.

Cheers, share with me your learning and life goals.

Happy new 2019 to all of us!

Get Started With Examples of Reactivity in Shiny apps.

Lorna Maria A — Tue, 06 Mar 2018 20:42:12 GMT

Photo Credit : Pixabay

Introduction

One of the things that makes shiny apps interactive is reactivity. In the simplest of terms reactivity/reactive programming is the ability of a program to compute outputs from a given set of user inputs. The ability of a shiny app to handle reactivity makes a two-way communication between the user and the existing information.

Reactivity is applied in cases such as performing calculations, data manipulation, the collection of user information among other scenarios.

As a beginner setting out to build shiny apps, having the basic knowledge to handle reactivity will help you go a long way to exploring different use cases of shiny apps.

Let’s get started

The idea of reactivity will not occur to one until the error message below.

error message

This error occurs when a reactive component is placed inside a non reactive function. The app will not load and will parse this error. Let’s us look at what a reactive function is and what it does.

Reactive Components of a shiny app

There are three major reactive components of a shiny app:

Reactive Inputs

A reactive input is defined as an input that a user provides through the browser interface. For example when a user fills a form,selects an item or clicks a button. These actions will trigger values to be set form the reactive inputs.

Text input and Add button are reactive inputs

Reactive Outputs

A reactive output is defined as program provided output in the browser interface. For example a graph, a map, a plot or a table of values.

Table of values as a reactive output

Reactive Expressions

A reactive expression is defined as one that transforms the reactive inputs to reactive outputs.These perform computations before sending reactive outputs.These will also mask slow operations like reading data from a server, making network calls among other scenarios.We shall see one in our example.

Example

Let’s start with a simple example of adding up two integers and returning their sum in a shiny app.

ui

titlePanel("Sum of two integers"),
  
  #number input form
  sidebarLayout(
    sidebarPanel(
      textInput("one", "First Integer"),
      textInput("two", "Second Integer"),
      actionButton("add", "Add")
    ),
    
    # Show result
    mainPanel(
      
   textOutput("sum")
      
    )

server

server <- function(input,output,session) {

#observe the add click and perform a reactive expression
  observeEvent( input$add,{
    x <- as.numeric(input$one)
    y <- as.numeric(input$two)
    #reactive expression
    n <- x+y
    output$sum <- renderPrint(n)
  }
    
  )

Result

Example

Demo

Now let’s build something a bit complex while handling reactivity.

ui

fields <- c("name","age","height","weight")
ui <- fluidPage(
   
   # Application title
   titlePanel("Health card"),
   
   # Sidebar with reactive inputs
   sidebarLayout(
      sidebarPanel(
         textInput("name","Your Name"),
        selectInput("age","Age bracket",c("18-25","25-45","above 45")),
         textInput("weight","Please enter your weight in kg"),
         textInput("height","Please enter your height in cm"),
         actionButton("save","Add")
        
      ),
      
      # a table of reactive outputs
      mainPanel(
         mainPanel(
            
            DT::dataTableOutput("responses", width = 500), tags$hr()
            
         )
      )
   )
)

server

# Define server logic 
   server <- function(input, output,session) {

#create a data frame called responses
      saveData <- function(data) {
         data <- as.data.frame(t(data))
         if (exists("responses")) {
            responses <<- rbind(responses, data)
         } else {
            responses <<- data
         }
      }
      
      loadData <- function() {
         if (exists("responses")) {
            responses
         }
      }
      
      
      # Whenever a field is filled, aggregate all form data
      #formData is a reactive function

      formData <- reactive({
         data <- sapply(fields, function(x) input[[x]])
         data
      })
      
      # When the Save button is clicked, save the form data
      observeEvent(input$save, {
         saveData(formData())
      })
      
      # Show the previous responses
      # (update with current response when save is clicked)
      output$responses <- DT::renderDataTable({
         input$save
         loadData()
      })     
   }

Result

When project is run.

Project Demo: https://rstudio.cloud/project/22236

There you go! Now that you can handle the basics, please go ahead and try it out.Feel free to share and ask me questions or give feedback on twitter @lornamariak

Get Started With Examples of Reactivity in Shiny apps. was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

Setting Up Twitter for Text mining in R.

Lorna Maria A — Mon, 05 Feb 2018 14:00:56 GMT

Photo Credit : Pixabay

Introduction

Over the years, social media has become a hot spot for data mining. Every day there are always topics trending, campaigns running and groups of people discussing different global, continental or national issues.This is a major target to harness data.

In this article, we focus on Twitter as a centre of opinions and sentiments across the globe.We set out to mine text from the millions of tweets that go out every day to be able to understand what is going on all even beyond our own timelines.

Twitter API Set Up

To use the twitter API we need to have a twitter account.
Sign Up via https://twitter.com ,go to https://apps.twitter.com/ to access twitter developer options.

Click create New App

Create a new application

Application Name: Give your app a unique name. if it is taken you will be notified to change.
Application site: This can be a link to your Github repository where the application code will be if your app has no site yet. (like mine)
Call back URL: This is a link to which a success or failure message is relayed from one program to another.It tells the program on where to go next in both cases.You can direct your application to any port available, mine is port 1410.

App Credentials

These are very important to help one log into the application.
There are four major credentials used in this set up.
Consumer key : This key identifies the client to the application.
Consumer Secret: This is the clients password that is used with server authentication.
Access Token:This is the consumer identification that is used to define their privileges.
Access Secret:This is sent with the access token as a password.
They are obtained this way.

App credentials

Generating Tokens

App Tokens

Note : These credentials are meant to be kept private,that is why i shaded through mine.
Voila! We have an API set up

R Studio Set Up

R uses the twitteR library, an R based Twitter client that handles communication with the Twitter API. Let us take a moment and thank Jeff Gentry for putting this library together.

Now go ahead and install the library using the code below.

#from CRAN
install.packages(“twitteR”)

#alternatively from the Github
library(devtools)
install_github(“geoffjentry/twitteR”)

The difference between the two methods above is that the first method downloads the package from the CRAN site and will take the argument of a package name while the second will install packages from the GitHub repository and will take an argument of the repository name. Read more about packages and installation here.

AUTHENTICATION

Twitter uses Open Authentication (OAuth) to grant access to the information. Open Authentication is a token based authentication method.Let’s refer to our four credentials.

Step 1

#load library
library(twitteR)

#load credentials
consumer_key <- “****************”
consumer_secret<- “*******************”
access_token <- “*******************”
access_secret <- “************************”

Step 2

We use the setup_twitter_oauth function to set up our authentication.The setup_twitter_oauth() function takes in the four twitter credentials that we generated from the API set up above.


#set up to authenticate
setup_twitter_oauth(consumer_key ,consumer_secret,access_token ,access_secret)

Go ahead and authorise direct authentication by pressing Y on our keyboard.

Querying Twitter

To query is to simply ask a question. To be able to access the data we need, we have to send meaningful queries to twitter. With Twitter, we have access to tons of information from trends to campaigns to accounts. We have a range of things to query.
Let us run query on this hashtag #rstats

#fetch tweets associated with that hashtag , 12 tweets-n in 
#(en)glish-lang since the indicated date yy/mm/dd

tweets <- twitteR::searchTwitter(“#rstats”,n =12,lang =”en”,since = ‘2018–01–01’)

#strip retweets
strip_retweets(tweets)

This code returns the tweets as a list.The strip_retweets() function eliminates any retweets in the returned tweets.

For further analyse these tweets we shall consider converting the returned tweets to a data frame and store them locally .

#convert to data frame using the twListtoDF function
df <- twListToDF(tweets)\#extract the data frame save it locally
saveRDS(df, file=”tweets.rds”)
df1 <- readRDS(“mytweets.rds”)

Cleaning the Tweets

From the query, we have managed to store the results into a data frame on our computers.Now let us examine this data to find out whoever has the highest Retweets and give them a shoutout straight from our script.
We shall use the dplyr library to traverse through this data frame.


library(dplyr)

#clean up any duplicate tweets from the data frame using #dplyr::distinct

dplyr::distinct(df1)

Let us make use of dplyr verbs to select the tweet,screenname,id and retweet count for a tweet with the most retweets and store the result in a data frame called winner.

winner <-df1 %>% select(text,retweetCount,screenName,id )%>% filter(retweetCount == max(retweetCount))
View(winner)

Check out more on dplyr via this cheat sheet.

Sending a Direct Message

To notify our retweet competition winner ,we shall send them a direct message from this script by picking their handle from the winner data frame.
The dmsend() function takes in the message and username which is saved as the screenName.

us <- userFactory$new(screenName= winner$screenName)
dmSend(“Thank you for participating in #rstats,Your tweet had the highest retweets” , us$screenName)

Conclusion

We have just created our first mined dataset from twitter and used it to find out our retweet competition winner. Well, that is just a small example of what we can do, however, there is a column that contains text from the mined tweets which can be analysed using techniques from the previous articles from on this blog.

There are more possibilities with this mined data like sentiment analysis and word frequency visualisations.
I suggest you check out a live example of twitter text mining via this GitHub repository.
For any further questions and comments about this article ask me @lornamariak via twitter.
Happy Tweet Mining!

Setting Up Twitter for Text mining in R. was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

Exploring Sentiment Analysis

Lorna Maria A — Fri, 19 Jan 2018 13:51:17 GMT

Photo credit : Pixabay

Understanding Text mining - Part 2

Introduction

This article is part 2 of Understanding Text Mining.If you just landed here, Part 1 is available here.

One of the applications of text mining is sentiment analysis.In order for us to go ahead and carry out a sentiment analysis of our mined text,we are required to clean and prepare our data set as we saw in Part 1.

Understanding Sentiment Analysis

Sentiment Analysis:The study of extracted information to identify reactions, attitudes, context and emotions.As one of the applications of text mining, sentiment analysis exposes the attitudes in the mined text.

It is based on word polarities, it takes into account positive or negative words and neutral words are dismissed.

Table showing word polarity examples

Sentiment analysis is done based on lexicons. A lexicon in simpler terms is a vocabulary , say the English lexicon.In this context, a lexicon is a selection of words with the two polarities that can be used as a metric in sentiment analysis.

There are many different types of lexicons that can be used depending on the context of the data you are working with.There is also a possibility of creating a custom lexicon depending on how much customisation we would like to make with your data.

In this article,we shall make use of the syuzhet package.While there are a number of packages for sentiment analysis on CRAN,the syuzhet package is great to learn with because it is a combination of the most common lexicons like nrc, bing and afinn.

We also make use of ggplot2 to further visualise our results from the sentiment analysis.

How does Sentiment analysis work?

In simple terms,sentiment analysis is performed as an intersection of a term-document (built from the mined text ) and a lexicon of choice.

The first step is to have a term-document and a lexicon of your choice.

Then form an intersection between the two sets.

Hands-on with Sentiment analysis

Example one : This is a simple example where we extract emotions from a sentence.We load the sentence,split each word using the strsplit() function to form a character vector and use the get_nrc_sentiment() function from the syuzhet library.This function takes in new_sentence and compares it with the nrc emotion lexicon to return the scores as shown below.

library(syuzhet)

sentence <- "i love cats such a bundle of joy."
new_sentence <- as.character(strsplit(sentence," "))

get_nrc_sentiment(new_sentence)

#This is the output

anger anticipation disgust fear joy sadness surprise trust negative
   0          0       0    0   2       0        0     0        0
positive
     2

Example two: This second example makes use of a TED talks data set that was downloaded from Kaggle under the name transcript.csv.It underwent cleaning using the tm package following the steps in part 1 of this article and was carried forward for sentiment analysis in this part 2.

#load the libraries
library(syuzhet)
library(tm)
library(ggplot2)

#mydataCopy is a term document,generated from cleaning #transcripts.csv

mydataCopy <- mydata

#carryout sentiment mining using the get_nrc_sentiment()function #log the findings under a variable result

result <- get_nrc_sentiment(as.character(mydataCopy))

#change result from a list to a data frame and transpose it

result1<-data.frame(t(result))

#rowSums computes column sums across rows for each level of a #grouping variable.

new_result <- data.frame(rowSums(result1))

#name rows and columns of the dataframe

names(new_result)[1] <- "count"
new_result <- cbind("sentiment" = rownames(new_result), new_result)
rownames(new_result) <- NULL

#plot the first 8 rows,the distinct emotions
qplot(sentiment, data=new_result[1:8,], weight=count, geom="bar",fill=sentiment)+ggtitle("TedTalk Sentiments")


#plot the last 2 rows ,positive and negative
qplot(sentiment, data=new_result[9:10,], weight=count, geom="bar",fill=sentiment)+ggtitle("TedTalk Sentiments")

Plot 1: Shows distinct emotions

Plot 2: Shows the combination of emotions under two polarities.

The code from this example can be accessed from this repository.

Conclusion

We have applied our sentiment analysis tricks on mined text to come up with an evident description of the emotions attached to text data.

This could be a whole project that can help you gain insights on how and when to talk to your audience, what they feel about a certain topic /product/service and what better way you can interact with them.

Now, go ahead choose an article/dataset /campaign that you want to try sentiment analysis on and follow the steps.
Happy Coding , I am always here to help <- @lornamariak

This story is published in The Startup, Medium’s largest entrepreneurship publication followed by 288,884+ people.

Subscribe to receive our top stories here.

Exploring Sentiment Analysis was originally published in The Startup on Medium, where people are continuing the conversation by highlighting and responding to this story.