Function for Numeric data
min(): minimum value
max(): maximum value
avg(): average value
Variance:
- Population variance : var_pop()
- Sample variance: var_samp() / same as variance()
Standard deviation:
- Population Standard Deviation: stddev_pop()
- Sample Standard Deviation: stddev_samp() / stddev()
Round:
Truncate
This is very useful using with grouping:
Case study:
We got a data “stackoverflow” as below:
Summarize the distribution of the number of questions with the tag “dropbox” on Stack Overflow per day by binning the data.
We first create boundary: lower/upper
then count the question, combine the boundary with our data.
corr(): correlation between
Case study:
- Compute the mean (
avg()
) and median assets of Fortune 500 companies by sector. - Use the
percentile_disc()
function to compute the median
https://gist.github.com/ShawnXXycarb/7bd47b432d36cca3075f0bbc27dae5de
We can create temp table to simplify a query
While you can join the results of multiple similar queries together with UNION
, sometimes it's easier to break a query down into steps. You can do this by creating a temporary table and inserting rows into it.
Compute the correlations between each pair of profits
, profits_change
, and revenues_change
from the Fortune 500 data.
If you enjoy this article please click the thump up (Keep pressing can send multiple thump up!) and follow me, I’ll keep updating content about data science! Hope you like it.
Reference:
Christina Maimone: EXPLO RATO RY DATA ANALYSIS IN SQL, datacamp