For my final project in my Artificial Intelligence class for my Data Science Masters, I chose to compare two models; one using Markov principles and the other a Deep learning model created by OpenAI for Natural Language Generation purposes.

Image for post
Image for post
Photo by James Pond on Unsplash

Natural Language Processing (NLP) has had some exciting advancements these last five years, which is credited to technology advancements that have sped up computation which has led to the rise of Deep Learning models in NLP. One of the areas that have advanced in NLP is Natural Language Generation (NLG) which is currently utilized in tasks :

  • that involve paraphrasing or summarizations

Early methods of NLG were rule-based involving rules that dictated the outline of the messages and statistical methods to improve paraphrasing, syntax, and semantic similarity. One of the first major advances were in the form of neural networks, and some shallow neural networks that created new ways to represent textual information. It transformed large text into a lower-dimensional space without sacrificing information loss in latent representations available Methods like word2vec, Glove, and Seq2Seq that generate word embeddings in the form of vectors which one can compute distance-based calculations such as Euclidean distance or cosine similarity. Lastly, deep learning models that deploy transformer architecture to help facilitate transfer learning which holds attention mechanisms in their layers have been referred to as the state of the art methods in NLP and NLG [1]. The main difference between the RNNs such as LSTM Networks is the data is passed sequentially versus the Transformer Models utilizes parallel computing and transfers the input sequence in parallel and utilizes encoders and decoders that are then transformed into probabilities with a softmax activation. …


How to mitigate any surprises when landing your data science job…

Image for post
Image for post
Pixabay License

A quick search on Medium with the keywords “Data Science Interview” resulted in hundreds of Medium articles to help guide the reader from what concepts are covered to even specific company interviews ranging from Tesla, Walmart, Twitter, Apple, AWS, etc.

What is fundamentally missing are more articles on how you as the interviewee can determine the company’s data science environment, expectations, and whether or not you can grow in this company to avoid a NIGHTMARE job.

Not everyone is going to land a data science job at a major company like Amazon and you might have to work at some companies where they are just starting out in data science but you should have a good idea of what you can expect during your time as an employee there and avoid any nasty surprises. …


Image for post
Image for post

I am brown. I am white.

I am tired.

These are the thoughts that race in my head after Cinco De Mayo at 4 in the morning.

Fire in my belly, heaviness in my heart, and no that is not from the nachos. We don’t eat nachos, we eat tamales, enchiladas, sopes, pozole, tacos de canasta…

Every year I have to educate someone on the history of Cinco De Mayo. To be honest, I knew it but then I forgot it from the ages of 15–25 when I was too busy living my Americanized life. But then 2016 happened and everything changed. I realized I am very Mexican; for God’s sake I was born in Mexico, I am the first generation. My skin is white, it burns under the sun and I have green eyes. To which uneducated Americans astoundingly respond, “You don’t look Mexican”; forgetting how colonized that country was and how large the country is. …


Plastic permeates every point of our lives, and the average westerner uses about 185 pounds of plastic a year. Even if you recycle, articles report only 10–20% of it is recycled. This past year, Geochemical Perspectives’s study reported that microplastics, have made its way even into the deepest part of our oceans; the Mariana Trench.

Just for context, I drew this visual on how deep that point is, and that is the last place I would think for plastic to end up in.

Image for post
Image for post
Microplastics found in Mariana Trench

With that in mind, it is up to us; the consumers to dictate the demand of plastics that multi billion dollar companies churn out for us. …


Image for post
Image for post
Travis Oliphant speaking at PyData DC 2018

This past weekend, I was lucky to attend and speak (for my first time at a conference), at PyData DC 2018. I have been an avid user of some open source languages such as Python and R; but, mainly Python. According to Stack Overflow Python was the most visited tag in 2017 and it was projected to rank higher than Javascript in 2018 and it seems like nearing the end of 2018, it has done just that.


Image for post
Image for post

¡Hola! Me llamo Mónica, y soy una voluntaria de la aplicación Resistbot, que es un chatbot que te ayuda contactar tus funcionarios electos en Estados Unidos. Desde el año pasado, 4.3 millones de personas han usado nuestro servicio sobre mensajes por SMS, Facebook, Telegram y Twitter.

Esta aplicación fue creada por un grupo de expertos que pensaban que debería existir una manera más eficiente de contactar a los funcionarios. Aunque la necesidad de integrar una tecnología de este tipo era obvia, los creadores se sorprendieron gratamente cuando en tan solo 180 días colectaron su primer millón de usuarios. …


Image for post
Image for post
Top 20 phrases from Resistbot’s ⭐️⭐️⭐️⭐️⭐️ reviews

My name is Monica, and I am a volunteer for Resistbot. We’ve been on the ride of our lives with many ups and downs over the past year, picking up 4.3 million users and handling over 10 million letters so far. Here’s more on what our users have written:

Our Facebook page now has over 37,000 likes with over 1,100 reviews. Of those reviews around 400 are 5-stars. I teamed up with a fellow volunteer, Natasha, and we scraped those reviews, tokenized them into bigrams using NLTK, and visualized them into a word cloud. You can find the NLTK code here. …


Image for post
Image for post

A few things that confused me when I was writing my own first few functions were:

User Defined Functions (UDF) versus built in functions

The first function you come across in Python is the print function and its famously known to print(“Hello World) when you learn to code. Note: That is Python 3 syntax, for Python 2.7 you drop the parentheses and type print “Hello World”. Other functions, I quickly came across were also other built in functions like range(), sort(), str(), and min(). These were already defined in the Python library framework and all I had to use it was call them and pass arguments through them. A user defined function is first started by the keyword def. It makes sense that it is a user defined function because, I could name the function whatever I wanted after writing def followed by a name, parentheses, and then a colon. See the image above where I define a function called clean_room by def clean_room(room):. …


Image for post
Image for post
imposter syndrome

This year I found out I had imposter syndrome. I learned its a lifelong battle if you make it that way. It is summarized as having feelings of fraudulence, instead of being happy with ones own accomplishments.

So lets talk about the last time I had felt imposter syndrome (which was pretty recent):

I have been using Python more in the last 6 months and have even used it a few times at work. The last time I used it was a few days ago where my supervisor asked if I could do some natural language processing on some data we pulled on customer’s conversations to see what are the most things mentioned. …


Image for post
Image for post

I have used google products for years. I remember watching those early commercials introducing google docs, sheets, slides, and then google drive. I quickly discerned the powerful collaboration, seamless UX design, and ability to access my documents via google drive across my devices was going to be a huge game changer from small businesses to freelancers. Now even large companies are using these products. According to Business Insider, 4M organizations are paying for the G Suite product.

At my current company Upside, we have databases that store an immense amount of information about our customers, customer interactions, hotel/flight inventory, etc. However not everything can be stored or needs to be stored in a database. When that is the case, most people use Google sheets to store one-off data, that needs to be accessible quickly/easy and that may need to be shared with external users. …

About

Monica P.

Data Scientist graduating with a MS in Data Science May 2021. Former Women Who Code Director. www.monicapuerto.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store