From ISLR
Intro: Moving Beyond Linearity
The truth is almost never linear! But often the linearity assumption is good enough.
Alternatives for flexibility with ease & interpretability:
Create new variables X1 = X, X2 = X², etc and then treat as multiple linear regression.
More on the fitted function values at any value X0 than coefficients
fitted function values →focus more on the new variables X1…

怎么推导
Linear Model Subset Selection (13:44)
Forward Stepwise Selection (12:26)
Backward Stepwise Selection (5:26)
Estimating Test Error — Mallow’s Cp, AIC, BIC, Adjusted R-squared (14:06)
Estimating Test Error — Cross-Validation (8:43)
Ridge Regression (12:37)
Lasso (15:21)
Tuning Parameters (5:27)
Dimension Reduction (4:45)
Pros of Linear…
Keys: Cross-validation & Bootstrap
1. Prediction Error and Validation Set
How do we test methods like regression and classification out?
Usually a new sample, but we don’t always have new data
→ 2 resampling methods: Cross-validation & the Bootstrap
Goal: how well your prediction method works → the test set error of a model

The Validation Process
Classification is used to produce discrete results, eg.spam/not spam
Qualitative variables: unordered
More interested in❤️❤️: The probabilities that X belongs to each category in C
i.e. the estimate of the probability that an insurance claim is fraudulent
2. Why need it?

What is linear regression?
Linear regression belongs to supervised learning.
The assumption of linear regression?
Assumption: dependence of Y on X1, X2…Xp is linear
*However, the true regression functions are never linear!
Advantages & disadvantages of linear regression?
Advantage:
Disadvantage:
Questions we might ask
Notes
Agenda
2.1 What is statistical learning?
2.1.1 Why estimate f?
2.1.2 How do we estimate f?
2.1.3 The trade-off between prediction accuracy and model interpretability
2.1.4 Supervised v.s. unsupervised learning
2.1.5 Regression vs classification problems
2.1 What is statistical learning?
statistical learning includes 1. input variable 2. output variable
Why learn statistical learning?
i.e. in a marketing campaign, the goal is to develop an accurate model to predict sales on the basis of the three media budgets
2.1.1 Why estimate f?
2 main reasons we wish to estimate f: 1. …
At the time of this experiment, Udacity courses currently have two options on the course overview page: “start free trial”, and “access course materials”.
If the student clicks “start free trial”, they will be asked to enter their credit card information, and then they will be enrolled in a free trial for the paid version of the course. After 14 days, they will automatically be charged unless they cancel first.
If the student clicks “access course materials”, they will be able to view the videos and take the quizzes for free, but they will not receive coaching support or a…
-How it works
-Real-life analogy
-Feature importance
-Difference between decision trees and random forests
-Important hyperparameters (predictive power, speed)
-How it works
Random forest — a supervised learning algorithm — is an ensemble of the decision trees. Put simply: random forest builds multiple decision trees and merges them together to get an accurate and stable prediction. It searches for the best features among a random subset of features.
Benefits: Can be used for both classification & regression problems.
-Real-life analogy
I decided where to go on vacation, my friends try to create rules to guide me based on my likes…
Agenda
Running A/B tests is an iterative process
Choosing invariant metrics
The course order change may change what courses visitors are enrolled in, so the time to complete a course will vary.
# events are good since it is the unit of diversion: event, which is randomly assigned between the experiment and control groups
cookies are being explicitly randomized
user Id is typically larger than cookies since one user…
by Tableau
The table below outlines some of the distinctions between each, and when you might want to use them.

The table below outlines some of the distinctions between each, and when you might want to use them.
Applications — Relationship vs. Join vs. Blend
Relationship: Use when you want to combine data from different levels of detail.
Join: Use when you want to add more columns of data across the same row structure
Blend: Use when combining data from different levels of detail (which you can accomplish more powerfully with relationship)