Week 6.5

We are half-way through the program with General Assembly. We’ve learned a lot over the past 6 weeks. We started with python fundamentals and worked our way through functions, iterations, and NumPy, along with command line basics.

We launched the following two week continuing with NumPy and introduced Pandas and Scipy and used different plotting techniques. Linear regression was the first model concept for machine learning. We explored the independent and dependent variables and defined the difference between bias and variance. Following the basic concepts and re-introduction to statistics and linear regression, we learned how to split train and test our dataset to begin building models for predictions and understanding the concepts of over and underfitting these models. Regularizations then became the next lesson, learning how regularization techniques, such as Ridge, Lasso and Elastic regulate overfitting and underfitting.

Week 4 continued into more advanced modeling techniques, such as Logistic Regression, evaluating model fits, and model tuning. We determined which of those independent and dependent variables were relevant and correlated then continued to clean up our data and use feature selection within our predictive models. Web scraping and classifications were also taught as a technique to gather data from various resources throughout the web.

In week 5, we began learning concepts in SQL and SQL databases. Querying data from our terminal and how to use python to query data through SQL by connecting remotely to Postgres. The pipeline method with Sklearn was also introduced, and shown how quick it was to streamline the modeling process. To end week 5, a data scientist was brought in to show her work experience in the data world. Her data journey was very impressive — using data science to catch credit card fraud, identity theft, locate mines and missals more accurately for submarines, and even locate sex, arms and human traffickers in the deep web using NLP techniques.

Week 6 was a big week for us. There were many advanced topics we learned about which included natural language processing, decision trees, API’s and JSON, and model comparison. Decision trees advanced into the random forest and boosting concepts towards the end of the week. We ended the week with a panel of data driven professionals — two data scientist and one data engineer. We had a chance to explore their respective career and interaction with them to gain more insight of how data is used for their companies.