Data Science Progress — 6 weeks
I have a clear goal which is to become a Data Scientist specialized in Bio-Sciences. Following a colleagues advice I decided to image where I want to work and take advantage of being a the beginning of my career. He said once “Commit to something and go towards a specific goal”
It is time to invest my time to take courses, a lot of courses , write a shining project, create my personal brand and brush up my skills to combine it all with my previous experiences.
On January 26th I started a bootcamp on Data Science at the school “The Bridge” in Madrid.
It was exciting and I have to say that I have not realized how much I missed interacting with others. I guess “remote working” has its pros and cons.
THE CURRICULUM
The course started with a so called “Ramp up” in order to introduce us to the basics of python and homogenize the level of the class.
The topics we covered are the following:
PYTHON
0. Installation of Anaconda and Jupyter notebooks
- Markdown language in Jupyter
- Variables, print, comments, del function, types of data, data type conversion , input & none type.
- Basic arithmetic operations , comparative operations, boolean algebra, built-in functions, methods, lists
- Work flux using loops (for, while) and conditional clause ( if/elif/else), break/continue and try/except
- Collections: lists, tuples, dictionaries and sets. Operations, methods and converting between collections
- Functions: Definition, syntax, applications
- Object oriented programming: Classes, attributes, constructor, methods, documentations
- Libraries and modules.
- Functional programming: Map, reduce, filter, timeit
SQL
I have to remark that we were using SQLlite which differs slightly from SQL. We worked with it, inside the Jupyter notebooks using python libraries and other functionalities.
0. Configure the environment
- Data model
- Queries: SELECT, LIMIT, DISTINCT, WHERE, ORDER BY, GROUP BY, JOIN
- Accessing relational Databases
- Creating relational Databases
- NoSQL databases
MATHEMATICS
Calculus
- Functions
- Derivatives
- Optimization— Least Squares
Algebra
Working with Numpy
- Matrices
- Identity matrix, Transpose and inverse
- Vectors
- Operations with matrices & vectors
- Dot product
- Similarity measurement (cosine)
- Linear combinations
Statistics
- Descriptive statistics
- Mean, standard deviation
- Position statistics
- Histograms and frequencies
- Relations between numerical variables
- Exploratory data analysis
- Interpretation and presentation of data
- Inferential statistics
- Probability
- Random variables
- Probability distributions
- Normal distributions
- Confidence intervals
- Absolute error
- Sample size
- Hypothesis contrast
After all this we started with “Data Analysis” module, covering the following:
NUMPY
- Lists and matrices
- Numpy
- Arrays
- Array attributes
- Indexing
- Slicing, subarrays
- Reshape
- Type of data
- Concatenate
- Substitution
- Copy
- Split
- Aggregations
PANDAS
- Introduction to pandas objects
- Series
- DataFrame
- Index
- DataFrame dimensions, columns, few observations, types of data, missing values and statistics
- Indexing ( loc, iloc)
- Silicing
- Masking
- Fancy indexing
- Missing values None, np.nan, null
- Aggregation and grouping: GroupBy (aggregate, filter, transform, apply, applymap, map)
All this content is both theoretical and practical. As it is commonly said “Practice makes perfect”
Now reflecting upon all this content I think I am quite proud of what I have been able to learn in these 6 weeks. It has been very productive and even if I do not master all the skills learnt I would say I have increased my python skills by 500%.
Also I have to say I have been very strict on the first weeks and it did me wrong to be so focused that I will no go out or meet any friends or family. I think this is a long run race and I need to keep a good balance between effort and leisure time.
I will keep learning and posting my progress. I am looking forward to start dealing with real life datasets and applying all my new knowledge to everyday life problems.
See you in the next post! Thanks for reading!