【Python 資料科學教程】環境架設、Numpy, Pandas 基礎 — Data Science with Python

JiunYi Yang (JY)
資料探員 Data Agent
4 min readFeb 11, 2021

--

2021 年 2 月 11 日 Python, tutorial

Leave a comment

不管是提升工作技能或對資料有興趣,都可以簡單上手 Python 數據分析,文末我們將提供 Colab 範本給讀者進行練習。

*Colab 是什麼?
>> Colaboratory (簡稱為「Colab」) 可讓你在瀏覽器上撰寫及執行 Python,且具備下列優點:

  • 不必進行任何設定
  • 免費使用 GPU
  • 輕鬆共用

教程大綱

  1. Environment Setup
  2. Data Processing
  1. Explortary / Statistical Data Analysis
  2. Feature Engineering — Feature Selection
  3. Machine Learning Model Training
  • Supervised Learning
  • Classification
  • KNN
  • Regression
  • Unsupervised Learning
  • Clustering
  • Association Rule Learning
  1. Deep Learning Model Training
  • Time Series Data
  • LSTM
  • GRU
  • Natural Language Processing (NLP)
  • Image Recognition
  • Other

1. Environment Setup

開始寫 Python 程式進行資料處理之前,我們要先做好環境架設:

Windows

安裝 AnacondaPython:

  1. 下載 Anaconda installer
  2. 打開 “Anaconda Prompt
  3. 輸入 conda list 來確認是否安裝成功
  4. 輸入 python 來確認你目前的 Python 版本(輸入 quit() 可以跳出 Python shell)

安裝完成後,我們要先建立虛擬環境,再來安裝需要的 package

  1. 打開 “Anaconda Prompt”
  2. 創建虛擬環境:
    輸入conda create --name env_name python=3.7 anaconda
  • anaconda: 這個指令是為了讓創建的虛擬環境自動納入 anaconda 預設的 packages
  1. 啟動虛擬環境:
    conda activate env_name

當你看到前面換成 (env_name) 時,便成功啟動了

2. Data Processing

Numpy Basics

  • 可對陣列進行數學或邏輯運算
  • 線性代數運算
  • 產生隨機亂數等

Numpy 1D Arrays 一維陣列

1. broadcast operations:對 ndarray 可以進行 broadcast 數學運算,對 list 則無法

example_list = [45, 69, 94, 40, 694, 596, 504]
example_array = np.array(example_list)

2. condition selection:對 ndarray 可以進行邏輯篩選,對 list 則無法

filter = example_list > 50

Numpy 2D Arrays 二維陣列

3. slicing:對 ndarray 可以進行切片選取範圍,對 list 則無法

example_list = [[5,6,7,8,9], 
[7,8,9,10,11],
[9,10,11,12,13]]
example_list[:, 1:4]
example_array[:, 1:4]

4. array operations:對 ndarray 可以進行陣列相乘,list 則無法

multiplier = [34, 78, 90, 5, 9]
example_list * multiplier
example_array * np.array(multiplier)

5. basic statistics functions

  • Average: np.mean()
  • Median: np.median()
  • Standard derivation: np.std()
  • Pearson’s correlation: np.corrcoef()

In 1D Array:

def func(x, axis):
print(np.mean(example_list, axis=axis))
print(np.median(example_list, axis=axis))
print(np.std(example_list, axis=axis))
print(np.corrcoef(example_list), '\n')

In 2D Array:

  • axis=0: verticality operation
  • axis=1: horizontally operation
example_array = np.array(example_list)
print('verticality: ')
func(example_array, axis=0)
print('horizontally: ')
func(example_array, axis=1)

Pandas Basics

2 main data structures:

  • Series
  • DataFrame

Functionalities:

  • slicing, indexing and subsetting
  • groupby
  • reshape
  • pivot_table
  • merge, join, concat, etc.

Series

  1. create series
# create from ndarray
data = np.random.randn(5)
pd.Series(data, index=['a','b','c','d','e'])
# create from dictionary
data = {
'Facebook': 'Mark Zuckerberg',
'Apple': 'Steve Jobs',
'Amazon': 'Jeff Bezos',
'Netflix': 'Reed Hastings',
'Google': 'Larry Page'
}
pd.Series(data)

2. demonstrate operations: array-like operations

data = np.random.randn(5)
ser = pd.Series(data)
ser[ser > ser.mean()]

3. demonstrate operations: dict-like operations

ser
ser['a']

DataFrame

  1. create series
# create from dict of seriesdata = {
'one': pd.Series(np.random.randn(5), index=['a','b','c','d','e']),
'two': pd.Series(np.random.randn(5), index=['a','b','c','d','e']),
}
df = pd.DataFrame(data)
df
# create from dict of ndarrays/listsdata = {
'one': np.random.randn(5),
'two': np.random.randn(5)
}
df = pd.DataFrame(data)
df
# create from list of dictsdata = [{
'a': 1,
'b': 2
}, {
'a': 3,
'b': 4,
'c': 6
}]
df = pd.DataFrame(data)
df
# create a MultiIndexed dataframe from tuples dictdata = {
('top-1','medium-1'): {('I','i'): 1, ('I','ii'): 2},
('top-1','medium-2'): {('I','ii'): 1, ('I','iii'): 2},
('top-2','medium-1'): {('I','i'): 1, ('I','ii'): 2},
('top-2','medium-2'): {('I','i'): 1, ('I','iii'): 2}
}
df = pd.DataFrame(data)
df

Colab 範本

1. Environment Setup
2–1. Numpy Basics
2–2. Pandas Basics

線上支持這個教程

後續還會繼續更新 Python 資料科學的更多教程,喜歡的同學別忘了按讚打開通知~

- 如果你覺得稍有收穫,請不吝嗇幫我鼓掌,給我持續更新的動力!- 5-10拍手:到此一遊
- 10-40拍手:對文章內容的肯定
- 50拍手:你真的很期待再看到『同類別的』文章,我們會根據拍手數做主題的參考
- 粉絲頁行銷數據分析 Marketing Analytics,歡迎大家 Follow 獲取數位廣告&資料分析知識&技能!

--

--

JiunYi Yang (JY)
資料探員 Data Agent

Master @ NCCU MIS.Founder @ jiunyiyang.me .Data Scientist & Digital Ads Optimizer.Feel free to PM me at: m.me/Abao.JiunYiYang or linkedin.com/in/jiunyiyang