How to Run YOLOv7 Human Post Estimation in Google Colab

gary.TsAI(Taiwan A.I.)
2 min readSep 1, 2022

--

我們使用 Google Colab 實操 YOLO 系列的最新版本 「YOLOv7」,輕鬆實現最新的人體姿態估計模型。

如果對物件偵測和實例分割有興趣的小夥伴,也可以點選下方連結唷!

在實際操作之前,我們先上幾張葉問的帥照!

一、使用任意圖片 inference

讓我們用範例圖片 inference 吧!首先下載訓練好的模型。這裡我們使用最高的 “yolov7-w6-pose.pt”

# !wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-w6-pose.pt

首先,導入需要的函式庫。

import torch
import cv2
from torchvision import transforms
import numpy as np
import tqdm
from utils.datasets import letterbox
from utils.general import non_max_suppression_kpt
from utils.plots import output_to_keypoint, plot_skeleton_kpts

定義一個處理姿態估計的函式。

def pose_estimation(img):
image = cv2.imread(img)
image = letterbox(image, 960, stride=64, auto=True)[0]
image_ = image.copy()
image = transforms.ToTensor()(image)
image = torch.tensor(np.array([image.numpy()]))
if torch.cuda.is_available():
image = image.half().to(device)
output, _ = model(image)
output = non_max_suppression_kpt(output, 0.25, 0.65, nc=model.yaml['nc'], nkpt=model.yaml['nkpt'], kpt_label=True)
with torch.no_grad():
output = output_to_keypoint(output)
nimg = image[0].permute(1, 2, 0) * 255
nimg = nimg.cpu().numpy().astype(np.uint8)
nimg = cv2.cvtColor(nimg, cv2.COLOR_RGB2BGR)
for idx in range(output.shape[0]):
plot_skeleton_kpts(nimg, output[idx, 7:].T, 3)
%matplotlib inline
plt.figure(figsize=(8,8))
plt.axis('off')
plt.imshow(nimg)
plt.show()

最後,我們指定圖片並執行該函式。

img = './inference/images/image.jpg'
pose_estimation(img)

我們能夠精準地看到玩命關頭主角們身上的骨架囉!

一、使用任意視頻 inference

定義一個加載視頻並處理姿態估計的函式。

def process_keypoints(video_file, model, output_video_path):
video = cv2.VideoCapture(video_file)
writer = _create_vid_writer(video, output_video_path)
num_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
f_num = 0
pbar = tqdm.tqdm(total=num_frames, desc="inf")
while video.isOpened():
ret, frame = video.read()
if (frame is None):
break
pbar.update(1)

frame = letterbox(frame, 1280, stride=64, auto=True)[0]
frame_ = frame.copy()
frame = transforms.ToTensor()(frame)
frame = torch.tensor(np.array([frame.numpy()]))
frame = frame.to(device)
frame = frame.half()
output, _ = model(frame)
with torch.set_grad_enabled(False):

output = non_max_suppression_kpt(output, 0.25, 0.65, nc=model.yaml['nc'], nkpt=model.yaml['nkpt'], kpt_label=True)
output = output_to_keypoint(output)
nimg = frame[0].permute(1, 2, 0) * 255
nimg = nimg.cpu().numpy().astype(np.uint8)
nimg = cv2.cvtColor(nimg, cv2.COLOR_BGR2RGB)
nimg = cv2.cvtColor(nimg, cv2.COLOR_RGB2BGR)

for idx in range(output.shape[0]):
plot_skeleton_kpts(nimg, output[idx, 7:].T, 3)

writer.write(nimg)
torch.cuda.empty_cache()

video.release()
writer.release()

最後,定義一個輸出視頻的函式。

def _create_vid_writer(vid_cap, video_path):
fps = vid_cap.get(cv2.CAP_PROP_FPS)
w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

writer = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc('m', 'p', '4', 'v'), fps, (1280,768))
return writer
!python detect.py --source inference/images/horses.jpg --weights yolov7-e6e.pt --conf 0.25 --img-size 1280 --device 0

最後,我們指定視頻並執行該函式。

video_file = './inference/pose/pose.mp4'
video_output = 'pose.mp4'
process_keypoints(video_file, model, video_output)

我們能夠很輕鬆運行實現姿態估計囉!

最後上個影片來一起研究葉問的咏春唄!

--

--