How to Run YOLOv7 Human Post Estimation in Google Colab

gary.TsAI(Taiwan A.I.)

2 min readSep 1, 2022

我們使用 Google Colab 實操 YOLO 系列的最新版本「YOLOv7」，輕鬆實現最新的人體姿態估計模型。

如果對物件偵測和實例分割有興趣的小夥伴，也可以點選下方連結唷！

How to Run YOLOv7 in Google Colab

我們使用 Google Colab 實操 YOLO 系列的最新版本「YOLOv7」，輕鬆實現最新的物件偵測模型。

medium.com

How to Run YOLOv7 Instance Segmentation Inference in Google Colab

我們使用 Google Colab 實操 YOLO 系列的最新版本「YOLOv7」，輕鬆實現最新的實例分割模型。

medium.com

在實際操作之前，我們先上幾張葉問的帥照！

一、使用任意圖片 inference

讓我們用範例圖片 inference 吧！首先下載訓練好的模型。這裡我們使用最高的 “yolov7-w6-pose.pt”。

# !wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-w6-pose.pt

首先，導入需要的函式庫。

import torch
import cv2
from torchvision import transforms
import numpy as np
import tqdm
from utils.datasets import letterbox
from utils.general import non_max_suppression_kpt
from utils.plots import output_to_keypoint, plot_skeleton_kpts

定義一個處理姿態估計的函式。

def pose_estimation(img):
  image = cv2.imread(img)
  image = letterbox(image, 960, stride=64, auto=True)[0]
  image_ = image.copy()
  image = transforms.ToTensor()(image)
  image = torch.tensor(np.array([image.numpy()]))  if torch.cuda.is_available():
      image = image.half().to(device)   
  output, _ = model(image)
  output = non_max_suppression_kpt(output, 0.25, 0.65, nc=model.yaml['nc'], nkpt=model.yaml['nkpt'], kpt_label=True)
  with torch.no_grad():
      output = output_to_keypoint(output)
  nimg = image[0].permute(1, 2, 0) * 255
  nimg = nimg.cpu().numpy().astype(np.uint8)
  nimg = cv2.cvtColor(nimg, cv2.COLOR_RGB2BGR)  for idx in range(output.shape[0]):
      plot_skeleton_kpts(nimg, output[idx, 7:].T, 3)  %matplotlib inline
  plt.figure(figsize=(8,8))
  plt.axis('off')
  plt.imshow(nimg)
  plt.show()

最後，我們指定圖片並執行該函式。

img = './inference/images/image.jpg'
pose_estimation(img)

我們能夠精準地看到玩命關頭主角們身上的骨架囉！

一、使用任意視頻 inference

定義一個加載視頻並處理姿態估計的函式。

def process_keypoints(video_file, model, output_video_path):
    video = cv2.VideoCapture(video_file)
    writer = _create_vid_writer(video, output_video_path)
    num_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
    width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
    f_num = 0
    pbar = tqdm.tqdm(total=num_frames, desc="inf")
    while video.isOpened():
        ret, frame = video.read()
        if (frame is None):
            break
        pbar.update(1)
        
        frame = letterbox(frame, 1280, stride=64, auto=True)[0]
        frame_ = frame.copy()
        frame = transforms.ToTensor()(frame)
        frame = torch.tensor(np.array([frame.numpy()]))
        frame = frame.to(device)
        frame = frame.half()
        output, _ = model(frame)
        with torch.set_grad_enabled(False):
          
          output = non_max_suppression_kpt(output, 0.25, 0.65, nc=model.yaml['nc'], nkpt=model.yaml['nkpt'], kpt_label=True)
          output = output_to_keypoint(output)
        nimg = frame[0].permute(1, 2, 0) * 255
        nimg = nimg.cpu().numpy().astype(np.uint8)
        nimg = cv2.cvtColor(nimg, cv2.COLOR_BGR2RGB)
        nimg = cv2.cvtColor(nimg, cv2.COLOR_RGB2BGR) 
      
        for idx in range(output.shape[0]):
            plot_skeleton_kpts(nimg, output[idx, 7:].T, 3)
        
        writer.write(nimg)
        torch.cuda.empty_cache()
    
    video.release()
    writer.release()

最後，定義一個輸出視頻的函式。

def _create_vid_writer(vid_cap, video_path):
	fps = vid_cap.get(cv2.CAP_PROP_FPS)
	w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
	h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

	writer = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc('m', 'p', '4', 'v'), fps, (1280,768))
	return writer!python detect.py --source inference/images/horses.jpg --weights yolov7-e6e.pt --conf 0.25 --img-size 1280 --device 0

最後，我們指定視頻並執行該函式。

video_file = './inference/pose/pose.mp4'
video_output = 'pose.mp4'
process_keypoints(video_file, model, video_output)

我們能夠很輕鬆運行實現姿態估計囉！

最後上個影片來一起研究葉問的咏春唄！