27 個Python資料科學庫實戰案例 (附程式碼)

數據分析那些事

Published in

數據分析那些事

41 min readNov 28, 2023

為了大家能夠對人工智慧常用的 Python 庫有一個初步的瞭解，以選擇能夠滿足自己需求的庫進行學習，對目前較為常見的人工智慧庫進行簡要全面的介紹。

来源：THU資料派

1. Numpy

NumPy(Numerical Python)是 Python的一個擴充套件程式庫，支援大量的維度陣列與矩陣運算，此外也針對陣列運算提供大量的數學函式庫，Numpy底層使用C語言編寫，陣列中直接儲存物件，而不是儲存物件指標，所以其運算效率遠高於純Python程式碼。我們可以在示例中對比下純Python與使用Numpy庫在計算列表sin值的速度對比：

import numpy as np
import math
import random
import time

start = time.time()
for i in range(10):
    list_1 = list(range(1,10000))
    for j in range(len(list_1)):
        list_1[j] = math.sin(list_1[j])
print("使用纯Python用时{}s".format(time.time()-start))

start = time.time()
for i in range(10):
    list_1 = np.array(np.arange(1,10000))
    list_1 = np.sin(list_1)
print("使用Numpy用时{}s".format(time.time()-start))

從如下執行結果，可以看到使用 Numpy 庫的速度快於純 Python 編寫的程式碼：

使用純Python用時0.017444372177124023s

使用Numpy用時0.001619577407836914s

2. OpenCV

OpenCV 是一個的跨平臺計算機視覺庫，可以執行在 Linux、Windows 和 Mac OS 作業系統上。它輕量級而且高效 — — 由一系列 C 函式和少量 C++ 類構成，同時也提供了 Python 介面，實現了影象處理和計算機視覺方面的很多通用演算法。下面程式碼嘗試使用一些簡單的濾鏡，包括圖片的平滑處理、高斯模糊等：

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('h89817032p0.png')
kernel = np.ones((5,5),np.float32)/25
dst = cv.filter2D(img,-1,kernel)
blur_1 = cv.GaussianBlur(img,(5,5),0)
blur_2 = cv.bilateralFilter(img,9,75,75)
plt.figure(figsize=(10,10))
plt.subplot(221),plt.imshow(img[:,:,::-1]),plt.title('Original')
plt.xticks([]), plt.yticks([])
plt.subplot(222),plt.imshow(dst[:,:,::-1]),plt.title('Averaging')
plt.xticks([]), plt.yticks([])
plt.subplot(223),plt.imshow(blur_1[:,:,::-1]),plt.title('Gaussian')
plt.xticks([]), plt.yticks([])
plt.subplot(224),plt.imshow(blur_1[:,:,::-1]),plt.title('Bilateral')
plt.xticks([]), plt.yticks([])
plt.show()

3. Scikit-image

scikit-image是基於scipy的影象處理庫，它將圖片作為numpy陣列進行處理。例如，可以利用scikit-image改變圖片比例，scikit-image提供了rescale、resize以及downscale_local_mean等函式。

from skimage import data, color, io
from skimage.transform import rescale, resize, downscale_local_mean

image = color.rgb2gray(io.imread('h89817032p0.png'))

image_rescaled = rescale(image, 0.25, anti_aliasing=False)
image_resized = resize(image, (image.shape[0] // 4, image.shape[1] // 4),
                       anti_aliasing=True)
image_downscaled = downscale_local_mean(image, (4, 3))
plt.figure(figsize=(20,20))
plt.subplot(221),plt.imshow(image, cmap='gray'),plt.title('Original')
plt.xticks([]), plt.yticks([])
plt.subplot(222),plt.imshow(image_rescaled, cmap='gray'),plt.title('Rescaled')
plt.xticks([]), plt.yticks([])
plt.subplot(223),plt.imshow(image_resized, cmap='gray'),plt.title('Resized')
plt.xticks([]), plt.yticks([])
plt.subplot(224),plt.imshow(image_downscaled, cmap='gray'),plt.title('Downscaled')
plt.xticks([]), plt.yticks([])
plt.show()

4. PIL

Python Imaging Library(PIL) 已經成為 Python 事實上的影象處理標準庫了，這是由於，PIL 功能非常強大，但API卻非常簡單易用。但是由於PIL僅支援到 Python 2.7，再加上年久失修，於是一群志願者在 PIL 的基礎上建立了相容的版本，名字叫 Pillow，支援最新 Python 3.x，又加入了許多新特性，因此，我們可以跳過 PIL，直接安裝使用 Pillow。

5. Pillow

使用 Pillow 生成字母驗證碼圖片：

from PIL import Image, ImageDraw, ImageFont, ImageFilter

import random

# 随机字母:
def rndChar():
    return chr(random.randint(65, 90))

# 随机颜色1:
def rndColor():
    return (random.randint(64, 255), random.randint(64, 255), random.randint(64, 255))

# 随机颜色2:
def rndColor2():
    return (random.randint(32, 127), random.randint(32, 127), random.randint(32, 127))

# 240 x 60:
width = 60 * 6
height = 60 * 6
image = Image.new('RGB', (width, height), (255, 255, 255))
# 创建Font对象:
font = ImageFont.truetype('/usr/share/fonts/wps-office/simhei.ttf', 60)
# 创建Draw对象:
draw = ImageDraw.Draw(image)
# 填充每个像素:
for x in range(width):
    for y in range(height):
        draw.point((x, y), fill=rndColor())
# 输出文字:
for t in range(6):
    draw.text((60 * t + 10, 150), rndChar(), font=font, fill=rndColor2())
# 模糊:
image = image.filter(ImageFilter.BLUR)
image.save('code.jpg', 'jpeg')

6. SimpleCV

SimpleCV 是一個用於構建計算機視覺應用程式的開源框架。使用它，可以訪問高效能的計算機視覺庫，如 OpenCV，而不必首先了解位深度、檔案格式、顏色空間、緩衝區管理、特徵值或矩陣等術語。但其對於 Python3 的支援很差很差，在 Python3.7 中使用如下程式碼：

from SimpleCV import Image, Color, Display
# load an image from imgur
img = Image('http://i.imgur.com/lfAeZ4n.png')
# use a keypoint detector to find areas of interest
feats = img.findKeypoints()
# draw the list of keypoints
feats.draw(color=Color.RED)
# show the  resulting image. 
img.show()
# apply the stuff we found to the image.
output = img.applyLayers()
# save the results.
output.save('juniperfeats.png')

會報如下錯誤，因此不建議在 Python3 中使用：

SyntaxError: Missing parentheses in call to 'print'. Did you mean print('unit test')

7. Mahotas

Mahotas 是一個快速計算機視覺演算法庫，其構建在 Numpy 之上，目前擁有超過100種影象處理和計算機視覺功能，並在不斷增長。使用 Mahotas 載入影象，並對畫素進行操作：

import numpy as np
import mahotas
import mahotas.demos

from mahotas.thresholding import soft_threshold
from matplotlib import pyplot as plt
from os import path
f = mahotas.demos.load('lena', as_grey=True)
f = f[128:,128:]
plt.gray()
# Show the data:
print("Fraction of zeros in original image: {0}".format(np.mean(f==0)))
plt.imshow(f)
plt.show()

8. Ilastik

Ilastik 能夠給使用者提供良好的基於機器學習的生物資訊影象分析服務，利用機器學習演算法，輕鬆地分割，分類，跟蹤和計數細胞或其他實驗資料。大多數操作都是互動式的，並不需要機器學習專業知識。

9. Scikit-Learn

Scikit-learn 是針對 Python 程式語言的免費軟體機器學習庫。它具有各種分類，迴歸和聚類演算法，包括支援向量機，隨機森林，梯度提升，k均值和 DBSCAN 等多種機器學習演算法。使用Scikit-learn實現KMeans演算法：

import time

import numpy as np
import matplotlib.pyplot as plt

from sklearn.cluster import MiniBatchKMeans, KMeans
from sklearn.metrics.pairwise import pairwise_distances_argmin
from sklearn.datasets import make_blobs

# Generate sample data
np.random.seed(0)

batch_size = 45
centers = [[1, 1], [-1, -1], [1, -1]]
n_clusters = len(centers)
X, labels_true = make_blobs(n_samples=3000, centers=centers, cluster_std=0.7)

# Compute clustering with Means

k_means = KMeans(init='k-means++', n_clusters=3, n_init=10)
t0 = time.time()
k_means.fit(X)
t_batch = time.time() - t0

# Compute clustering with MiniBatchKMeans

mbk = MiniBatchKMeans(init='k-means++', n_clusters=3, batch_size=batch_size,
                      n_init=10, max_no_improvement=10, verbose=0)
t0 = time.time()
mbk.fit(X)
t_mini_batch = time.time() - t0

# Plot result
fig = plt.figure(figsize=(8, 3))
fig.subplots_adjust(left=0.02, right=0.98, bottom=0.05, top=0.9)
colors = ['#4EACC5', '#FF9C34', '#4E9A06']

# We want to have the same colors for the same cluster from the
# MiniBatchKMeans and the KMeans algorithm. Let's pair the cluster centers per
# closest one.
k_means_cluster_centers = k_means.cluster_centers_
order = pairwise_distances_argmin(k_means.cluster_centers_,
                                  mbk.cluster_centers_)
mbk_means_cluster_centers = mbk.cluster_centers_[order]

k_means_labels = pairwise_distances_argmin(X, k_means_cluster_centers)
mbk_means_labels = pairwise_distances_argmin(X, mbk_means_cluster_centers)

# KMeans
for k, col in zip(range(n_clusters), colors):
    my_members = k_means_labels == k
    cluster_center = k_means_cluster_centers[k]
    plt.plot(X[my_members, 0], X[my_members, 1], 'w',
            markerfacecolor=col, marker='.')
    plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
            markeredgecolor='k', markersize=6)
plt.title('KMeans')
plt.xticks(())
plt.yticks(())

plt.show()

10. SciPy

SciPy 庫提供了許多使用者友好和高效的數值計算，如數值積分、插值、最佳化、線性代數等。SciPy 庫定義了許多數學物理的特殊函式，包括橢圓函式、貝塞爾函式、伽馬函式、貝塔函式、超幾何函式、拋物線圓柱函式等等。


from scipy import special
import matplotlib.pyplot as plt
import numpy as np

def drumhead_height(n, k, distance, angle, t):
    kth_zero = special.jn_zeros(n, k)[-1]
    return np.cos(t) * np.cos(n*angle) * special.jn(n, distance*kth_zero)

theta = np.r_[0:2*np.pi:50j]
radius = np.r_[0:1:50j]
x = np.array([r * np.cos(theta) for r in radius])
y = np.array([r * np.sin(theta) for r in radius])
z = np.array([drumhead_height(1, 1, r, theta, 0.5) for r in radius])


fig = plt.figure()
ax = fig.add_axes(rect=(0, 0.05, 0.95, 0.95), projection='3d')
ax.plot_surface(x, y, z, rstride=1, cstride=1, cmap='RdBu_r', vmin=-0.5, vmax=0.5)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_xticks(np.arange(-1, 1.1, 0.5))
ax.set_yticks(np.arange(-1, 1.1, 0.5))
ax.set_zlabel('Z')
plt.show()

11. NLTK

NLTK 是構建Python程式以處理自然語言的庫。它為50多個語料庫和詞彙資源(如 WordNet )提供了易於使用的介面，以及一套用於分類、分詞、詞幹、標記、解析和語義推理的文字處理庫、工業級自然語言處理 (Natural Language Processing, NLP) 庫的包裝器。NLTK被稱為 “a wonderful tool for teaching, and working in, computational linguistics using Python”。

import nltk
from nltk.corpus import treebank

# 首次使用需要下载
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
nltk.download('treebank')

sentence = """At eight o'clock on Thursday morning Arthur didn't feel very good."""
# Tokenize
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)

# Identify named entities
entities = nltk.chunk.ne_chunk(tagged)

# Display a parse tree
t = treebank.parsed_sents('wsj_0001.mrg')[0]
t.draw()

12. spaCy

spaCy 是一個免費的開源庫，用於 Python 中的高階 NLP。它可以用於構建處理大量文字的應用程式；也可以用來構建資訊提取或自然語言理解系統，或者對文字進行預處理以進行深度學習。

import spacy

  texts = [
      "Net income was $9.4 million compared to the prior year of $2.7 million.",
      "Revenue exceeded twelve billion dollars, with a loss of $1b.",
  ]

  nlp = spacy.load("en_core_web_sm")
  for doc in nlp.pipe(texts, disable=["tok2vec", "tagger", "parser", "attribute_ruler", "lemmatizer"]):
      # Do something with the doc here
      print([(ent.text, ent.label_) for ent in doc.ents])

nlp.pipe 生成 Doc 物件，因此我們可以對它們進行迭代並訪問命名實體預測：

[('$9.4 million', 'MONEY'), ('the prior year', 'DATE'), ('$2.7 million', 'MONEY')]
[('twelve billion dollars', 'MONEY'), ('1b', 'MONEY')]

13. LibROSA

librosa 是一個用於音樂和音訊分析的 Python 庫，它提供了建立音樂資訊檢索系統所必需的功能和函式。

# Beat tracking example
import librosa

# 1. Get the file path to an included audio example
filename = librosa.example('nutcracker')

# 2. Load the audio as a waveform `y`
#    Store the sampling rate as `sr`
y, sr = librosa.load(filename)

# 3. Run the default beat tracker
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
print('Estimated tempo: {:.2f} beats per minute'.format(tempo))

# 4. Convert the frame indices of beat events into timestamps
beat_times = librosa.frames_to_time(beat_frames, sr=sr)

14. Pandas

Pandas 是一個快速、強大、靈活且易於使用的開源資料分析和操作工具， Pandas 可以從各種檔案格式比如 CSV、JSON、SQL、Microsoft Excel 匯入資料，可以對各種資料進行運算操作，比如歸併、再成形、選擇，還有資料清洗和資料加工特徵。Pandas 廣泛應用在學術、金融、統計學等各個資料分析領域。

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

ts = pd.Series(np.random.randn(1000), index=pd.date_range("1/1/2000", periods=1000))
ts = ts.cumsum()

df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list("ABCD"))
df = df.cumsum()
df.plot()
plt.show()

15. Matplotlib

Matplotlib 是Python的繪相簿，它提供了一整套和 matlab 相似的命令 API，可以生成出版質量級別的精美圖形，Matplotlib 使繪圖變得非常簡單，在易用性和效能間取得了優異的平衡。使用 Matplotlib 繪製多曲線圖：

# plot_multi_curve.py
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0.1, 2 * np.pi, 100)
y_1 = x
y_2 = np.square(x)
y_3 = np.log(x)
y_4 = np.sin(x)
plt.plot(x,y_1)
plt.plot(x,y_2)
plt.plot(x,y_3)
plt.plot(x,y_4)
plt.show()

16. Seaborn

Seaborn 是在 Matplotlib 的基礎上進行了更高階的API封裝的Python資料視覺化庫，從而使得作圖更加容易，應該把 Seaborn 視為 Matplotlib 的補充，而不是替代物。

import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="ticks")

df = sns.load_dataset("penguins")
sns.pairplot(df, hue="species")
plt.show()

17. Orange

Orange 是一個開源的資料探勘和機器學習軟體，提供了一系列的資料探索、視覺化、預處理以及建模元件。Orange 擁有漂亮直觀的互動式使用者介面，非常適合新手進行探索性資料分析和視覺化展示；同時高階使用者也可以將其作為 Python 的一個程式設計模組進行資料操作和元件開發。使用 pip 即可安裝 Orange，好評～

$ pip install orange3

安裝完成後，在命令列輸入 orange-canvas 命令即可啟動 Orange 圖形介面：

$ orange-canvas

啟動完成後，即可看到 Orange 圖形介面，進行各種操作。

18. PyBrain

PyBrain 是 Python 的模組化機器學習庫。它的目標是為機器學習任務和各種預定義的環境提供靈活、易於使用且強大的演算法來測試和比較演算法。PyBrain 是 Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library 的縮寫。我們將利用一個簡單的例子來展示 PyBrain 的用法，構建一個多層感知器 (Multi Layer Perceptron, MLP)。首先，我們建立一個新的前饋網路物件：

from pybrain.structure import FeedForwardNetwork
n = FeedForwardNetwork()

接下來，構建輸入、隱藏和輸出層：

from pybrain.structure import LinearLayer, SigmoidLayer

inLayer = LinearLayer(2)
hiddenLayer = SigmoidLayer(3)
outLayer = LinearLayer(1)

為了使用所構建的層，必須將它們新增到網路中：

n.addInputModule(inLayer)
n.addModule(hiddenLayer)
n.addOutputModule(outLayer)

可以新增多個輸入和輸出模組。為了向前計算和反向誤差傳播，網路必須知道哪些層是輸入、哪些層是輸出。這就需要明確確定它們應該如何連線。為此，我們使用最常見的連線型別，全連線層，由 FullConnection 類實現：

from pybrain.structure import FullConnection
in_to_hidden = FullConnection(inLayer, hiddenLayer)
hidden_to_out = FullConnection(hiddenLayer, outLayer

與層一樣，我們必須明確地將它們新增到網路中：

n.addConnection(in_to_hidden)
n.addConnection(hidden_to_out)

所有元素現在都已準備就位，最後，我們需要呼叫.sortModules()方法使MLP可用：

n.sortModules()

這個呼叫會執行一些內部初始化，這在使用網路之前是必要的。

19. Milk

MILK(MACHINE LEARNING TOOLKIT) 是 Python 語言的機器學習工具包。它主要是包含許多分類器比如 SVMS、K-NN、隨機森林以及決策樹中使用監督分類法，它還可執行特徵選擇，可以形成不同的例如無監督學習、密切關係傳播和由 MILK 支援的 K-means 聚類等分類系統。使用 MILK 訓練一個分類器：

import numpy as np
import milk
features = np.random.rand(100,10)
labels = np.zeros(100)
features[50:] += .5
labels[50:] = 1
learner = milk.defaultclassifier()
model = learner.train(features, labels)

# Now you can use the model on new examples:
example = np.random.rand(10)
print(model.apply(example))
example2 = np.random.rand(10)
example2 += .5
print(model.apply(example2))

20. TensorFlow

TensorFlow 是一個端到端開源機器學習平臺。它擁有一個全面而靈活的生態系統，一般可以將其分為 TensorFlow1.x 和 TensorFlow2.x，TensorFlow1.x 與 TensorFlow2.x 的主要區別在於 TF1.x 使用靜態圖而 TF2.x 使用Eager Mode動態圖。這裡主要使用TensorFlow2.x作為示例，展示在 TensorFlow2.x 中構建卷積神經網路 (Convolutional Neural Network, CNN)。

import tensorflow as tf

from tensorflow.keras import datasets, layers, models

# 数据加载
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

# 数据预处理
train_images, test_images = train_images / 255.0, test_images / 255.0

# 模型构建
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))

# 模型编译与训练
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10, 
                    validation_data=(test_images, test_labels))

21. PyTorch

PyTorch 的前身是 Torch，其底層和 Torch 框架一樣，但是使用 Python 重新寫了很多內容，不僅更加靈活，支援動態圖，而且提供了 Python 介面。

# 导入库
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose
import matplotlib.pyplot as plt

# 模型构建
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)

# 损失函数和优化器
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

# 模型训练
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

22. Theano

Theano 是一個 Python 庫，它允許定義、最佳化和有效地計算涉及多維陣列的數學表示式，建在 NumPy 之上。在 Theano 中實現計算雅可比矩陣：

import theano
import theano.tensor as T
x = T.dvector('x')
y = x ** 2
J, updates = theano.scan(lambda i, y,x : T.grad(y[i], x), sequences=T.arange(y.shape[0]), non_sequences=[y,x])
f = theano.function([x], J, updates=updates)
f([4, 4])

23. Keras

Keras 是一個用 Python 編寫的高階神經網路 API，它能夠以 TensorFlow, CNTK, 或者 Theano 作為後端執行。Keras 的開發重點是支援快速的實驗，能夠以最小的時延把想法轉換為實驗結果。

from keras.models import Sequential
from keras.layers import Dense

# 模型构建
model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=10, activation='softmax'))

# 模型编译与训练
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, batch_size=32)

24. Caffe

在 Caffe2 官方網站上，這樣說道：Caffe2 現在是 PyTorch 的一部分。雖然這些 api 將繼續工作，但鼓勵使用 PyTorch api。

25. MXNet

MXNet 是一款設計為效率和靈活性的深度學習框架。它允許混合符號程式設計和指令式程式設計，從而最大限度提高效率和生產力。使用 MXNet 構建手寫數字識別模型：

import mxnet as mx
from mxnet import gluon
from mxnet.gluon import nn
from mxnet import autograd as ag
import mxnet.ndarray as F

# 数据加载
mnist = mx.test_utils.get_mnist()
batch_size = 100
train_data = mx.io.NDArrayIter(mnist['train_data'], mnist['train_label'], batch_size, shuffle=True)
val_data = mx.io.NDArrayIter(mnist['test_data'], mnist['test_label'], batch_size)

# CNN模型
class Net(gluon.Block):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        self.conv1 = nn.Conv2D(20, kernel_size=(5,5))
        self.pool1 = nn.MaxPool2D(pool_size=(2,2), strides = (2,2))
        self.conv2 = nn.Conv2D(50, kernel_size=(5,5))
        self.pool2 = nn.MaxPool2D(pool_size=(2,2), strides = (2,2))
        self.fc1 = nn.Dense(500)
        self.fc2 = nn.Dense(10)

    def forward(self, x):
        x = self.pool1(F.tanh(self.conv1(x)))
        x = self.pool2(F.tanh(self.conv2(x)))
        # 0 means copy over size from corresponding dimension.
        # -1 means infer size from the rest of dimensions.
        x = x.reshape((0, -1))
        x = F.tanh(self.fc1(x))
        x = F.tanh(self.fc2(x))
        return x
net = Net()
# 初始化与优化器定义
# set the context on GPU is available otherwise CPU
ctx = [mx.gpu() if mx.test_utils.list_gpus() else mx.cpu()]
net.initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.03})

# 模型训练
# Use Accuracy as the evaluation metric.
metric = mx.metric.Accuracy()
softmax_cross_entropy_loss = gluon.loss.SoftmaxCrossEntropyLoss()

for i in range(epoch):
    # Reset the train data iterator.
    train_data.reset()
    for batch in train_data:
        data = gluon.utils.split_and_load(batch.data[0], ctx_list=ctx, batch_axis=0)
        label = gluon.utils.split_and_load(batch.label[0], ctx_list=ctx, batch_axis=0)
        outputs = []
        # Inside training scope
        with ag.record():
            for x, y in zip(data, label):
                z = net(x)
                # Computes softmax cross entropy loss.
                loss = softmax_cross_entropy_loss(z, y)
                # Backpropogate the error for one iteration.
                loss.backward()
                outputs.append(z)
        metric.update(label, outputs)
        trainer.step(batch.data[0].shape[0])
    # Gets the evaluation result.
    name, acc = metric.get()
    # Reset evaluation result to initial state.
    metric.reset()
    print('training acc at epoch %d: %s=%f'%(i, name, acc))

26. PaddlePaddle

飛槳 (PaddlePaddle) 以百度多年的深度學習技術研究和業務應用為基礎，集深度學習核心訓練和推理框架、基礎模型庫、端到端開發套件、豐富的工具元件於一體。是中國首個自主研發、功能完備、開源開放的產業級深度學習平臺。使用 PaddlePaddle 實現 LeNtet5：

# 导入需要的包
import paddle
import numpy as np
from paddle.nn import Conv2D, MaxPool2D, Linear

## 组网
import paddle.nn.functional as F

# 定义 LeNet 网络结构
class LeNet(paddle.nn.Layer):
    def __init__(self, num_classes=1):
        super(LeNet, self).__init__()
        # 创建卷积和池化层
        # 创建第1个卷积层
        self.conv1 = Conv2D(in_channels=1, out_channels=6, kernel_size=5)
        self.max_pool1 = MaxPool2D(kernel_size=2, stride=2)
        # 尺寸的逻辑：池化层未改变通道数；当前通道数为6
        # 创建第2个卷积层
        self.conv2 = Conv2D(in_channels=6, out_channels=16, kernel_size=5)
        self.max_pool2 = MaxPool2D(kernel_size=2, stride=2)
        # 创建第3个卷积层
        self.conv3 = Conv2D(in_channels=16, out_channels=120, kernel_size=4)
        # 尺寸的逻辑：输入层将数据拉平[B,C,H,W] -> [B,C*H*W]
        # 输入size是[28,28]，经过三次卷积和两次池化之后，C*H*W等于120
        self.fc1 = Linear(in_features=120, out_features=64)
        # 创建全连接层，第一个全连接层的输出神经元个数为64， 第二个全连接层输出神经元个数为分类标签的类别数
        self.fc2 = Linear(in_features=64, out_features=num_classes)
    # 网络的前向计算过程
    def forward(self, x):
        x = self.conv1(x)
        # 每个卷积层使用Sigmoid激活函数，后面跟着一个2x2的池化
        x = F.sigmoid(x)
        x = self.max_pool1(x)
        x = F.sigmoid(x)
        x = self.conv2(x)
        x = self.max_pool2(x)
        x = self.conv3(x)
        # 尺寸的逻辑：输入层将数据拉平[B,C,H,W] -> [B,C*H*W]
        x = paddle.reshape(x, [x.shape[0], -1])
        x = self.fc1(x)
        x = F.sigmoid(x)
        x = self.fc2(x)
        return x

27. CNTK

CNTK(Cognitive Toolkit) 是一個深度學習工具包，透過有向圖將神經網路描述為一系列計算步驟。在這個有向圖中，葉節點表示輸入值或網路引數，而其他節點表示對其輸入的矩陣運算。CNTK 可以輕鬆地實現和組合流行的模型型別，如 CNN 等。CNTK 用網路描述語言 (network description language, NDL) 描述一個神經網路。簡單的說，要描述輸入的 feature，輸入的 label，一些引數，引數和輸入之間的計算關係，以及目標節點是什麼。

NDLNetworkBuilder=[
    
    run=ndlLR
    
    ndlLR=[
      # sample and label dimensions
      SDim=$dimension$
      LDim=1
    
      features=Input(SDim, 1)
      labels=Input(LDim, 1)
    
      # parameters to learn
      B0 = Parameter(4) 
      W0 = Parameter(4, SDim)
      
      
      B = Parameter(LDim)
      W = Parameter(LDim, 4)
    
      # operations
      t0 = Times(W0, features)
      z0 = Plus(t0, B0)
      s0 = Sigmoid(z0)   
      
      t = Times(W, s0)
      z = Plus(t, B)
      s = Sigmoid(z)    
    
      LR = Logistic(labels, s)
      EP = SquareError(labels, s)
    
      # root nodes
      FeatureNodes=(features)
      LabelNodes=(labels)
      CriteriaNodes=(LR)
      EvalNodes=(EP)
      OutputNodes=(s,t,z,s0,W0)
    ]

※※※※※※※※※※※※※※※※※※※※※※※※※※※※※※※※※

我是「數據分析那些事」。常年分享數據分析乾貨，不定期分享好用的職場技能工具。各位也可以關注我的Facebook，按讚我的臉書並私訊「10」，送你十週入門數據分析電子書唷！期待你與我互動起來~

文章推薦

◆餅圖變形記，肝了3000字，收藏就是學會！

◆MySQL必須掌握4種語言！

◆太實用了！4種方法教你輕鬆製作互動式儀表板！

◆跟資料打交道的人都得會的這8種資料模型，滿足工作中95%的需求

◆妙呀！一行Python程式碼

回顧十週入門數據分析系列文：

學習計劃｜帶你10周入門資料分析

「我是文科生出身，可以學習資料分析嗎？」「我沒有編程基礎，可以成為資料分析師嗎？」「學習資料分析必須學習R和Python嗎？」 … … 其實，資料分析沒有想像中那麼難，入門也沒有那麼多條條框框。…

medium.com

關注數據君的臉書：

我是「數據分析那些事」。常年分享數據分析乾貨，不定期分享好用的職場技能工具。按贊我的臉書，會有豐富資料包贈送唷！

27 個Python資料科學庫實戰案例 (附程式碼)

文章推薦

回顧十週入門數據分析系列文：

學習計劃｜帶你10周入門資料分析

「我是文科生出身，可以學習資料分析嗎？」「我沒有編程基礎，可以成為資料分析師嗎？」「學習資料分析必須學習R和Python嗎？」 … … 其實，資料分析沒有想像中那麼難，入門也沒有那麼多條條框框。…

關注數據君的臉書：

數據分析那些事

數據分析那些事. 5.7K likes. 這是一個專注於數據分析職場的內容社群，聚焦一批數據分析愛好者，在這裡，我們每天都數據分析相關優質文章推送、互動環節以及每周熱點資訊分享，希望熱愛數據的你也可以加入我們！

Written by 數據分析那些事

27 個Python資料科學庫實戰案例 (附程式碼)

文章推薦

回顧十週入門數據分析系列文：

學習計劃｜帶你10周入門資料分析

「我是文科生出身，可以學習資料分析嗎？」 「我沒有編程基礎，可以成為資料分析師嗎？」 「學習資料分析必須學習R和Python嗎？」 … … 其實，資料分析沒有想像中那麼難，入門也沒有那麼多條條框框。…

關注數據君的臉書：

數據分析那些事

數據分析那些事. 5.7K likes. 這是一個專注於數據分析職場的內容社群，聚焦一批數據分析愛好者，在這裡，我們每天都數據分析相關優質文章推送、互動環節以及每周熱點資訊分享，希望熱愛數據的你也可以加入我們！

Written by 數據分析那些事

「我是文科生出身，可以學習資料分析嗎？」「我沒有編程基礎，可以成為資料分析師嗎？」「學習資料分析必須學習R和Python嗎？」 … … 其實，資料分析沒有想像中那麼難，入門也沒有那麼多條條框框。…