人脸关键点检测

作者: ssz95
日期: 2021.05
摘要: 本示例教程将会演示如何使用飞桨实现人脸关键点检测。

一、简介

在图像处理中,关键点本质上是一种特征。它是对一个固定区域或者空间物理关系的抽象描述,描述的是一定邻域范围内的组合或上下文关系。它不仅仅是一个点信息,或代表一个位置,更代表着上下文与周围邻域的组合关系。关键点检测的目标就是通过计算机从图像中找出这些点的坐标,作为计算机视觉领域的一个基础任务,关键点的检测对于高级别任务,例如识别和分类具有至关重要的意义。

关键点检测方法总体上可以分成两个类型,一个种是用坐标回归的方式来解决,另一种是将关键点建模成热力图,通过像素分类任务,回归热力图分布得到关键点位置。这两个方法,都是一种手段或者是途径,解决的问题就是要找出这个点在图像当中的位置与关系。

其中人脸关键点检测是关键点检测方法的一个成功实践,本示例简要介绍如何通过飞桨开源框架,实现人脸关键点检测的功能。这个案例用到的是第一种关键点检测方法——坐标回归。将使用到 Paddle 2.1的API,集成式的训练接口,能够很方便对模型进行训练和预测。

二、环境设置

本教程基于Paddle 2.1 编写,如果你的环境不是本版本,请先参考官网安装 Paddle 2.1 。

  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. import pandas as pd
  4. import os
  5. import paddle
  6. from paddle.io import Dataset
  7. from paddle.vision.transforms import transforms
  8. from paddle.vision.models import resnet18
  9. from paddle.nn import functional as F
  10. print(paddle.__version__)
  1. 2.1.0

三、数据集

3.1 数据集下载

本案例使用了Kaggle官方举办的人脸关键点检测challenge数据集,官网:https://www.kaggle.com/c/facial-keypoints-detection

官方数据集将人脸图像和标注数据打包成了csv文件,使用panda来读取。其中数据集中的文件:
training.csv: 包含了用于训练的人脸关键点坐标和图像。
test.csv: 包含了用于测试的人脸关键点图像, 没有标注关键点坐标。
IdLookupTable.csv: 测试集关键点的位置的对应名称。

图像的长和宽都为96像素,所需要检测的一共有15个关键点。

  1. !unzip -o ./test.zip -d data/data60
  2. !unzip -o ./training.zip -d data/data60
  1. Archive: ./test.zip
  2. inflating: data/data60/test.csv
  3. Archive: ./training.zip
  4. inflating: data/data60/training.csv

3.2 数据集定义

飞桨(PaddlePaddle)数据集加载方案是统一使用Dataset(数据集定义) + DataLoader(多进程数据集加载)。

首先进行数据集的定义,数据集定义主要是实现一个新的Dataset类,继承父类paddle.io.Dataset,并实现父类中以下两个抽象方法,getitemlen

  1. Train_Dir = './data/data60/training.csv'
  2. Test_Dir = './data/data60/test.csv'
  3. lookid_dir = './data/data60/IdLookupTable.csv'
  4. class ImgTransforms(object):
  5. """
  6. 图像预处理工具,用于将图像进行升维(96, 96) => (96, 96, 3),
  7. 并对图像的维度进行转换从HWC变为CHW
  8. """
  9. def __init__(self, fmt):
  10. self.format = fmt
  11. def __call__(self, img):
  12. if len(img.shape) == 2:
  13. img = np.expand_dims(img, axis=2)
  14. img = img.transpose(self.format)
  15. if img.shape[0] == 1:
  16. img = np.repeat(img, 3, axis=0)
  17. return img
  18. class FaceDataset(Dataset):
  19. def __init__(self, data_path, mode='train', val_split=0.2):
  20. self.mode = mode
  21. assert self.mode in ['train', 'val', 'test'],
  22. "mode should be 'train' or 'test', but got {}".format(self.mode)
  23. self.data_source = pd.read_csv(data_path)
  24. # 清洗数据, 数据集中有很多样本只标注了部分关键点, 这里有两种策略
  25. # 第一种, 将未标注的位置从上一个样本对应的关键点复制过来
  26. # self.data_source.fillna(method = 'ffill',inplace = True)
  27. # 第二种, 将包含有未标注的样本从数据集中移除
  28. self.data_source.dropna(how="any", inplace=True)
  29. self.data_label_all = self.data_source.drop('Image', axis = 1)
  30. # 划分训练集和验证集合
  31. if self.mode in ['train', 'val']:
  32. np.random.seed(43)
  33. data_len = len(self.data_source)
  34. # 随机划分
  35. shuffled_indices = np.random.permutation(data_len)
  36. # 顺序划分
  37. # shuffled_indices = np.arange(data_len)
  38. self.shuffled_indices = shuffled_indices
  39. val_set_size = int(data_len*val_split)
  40. if self.mode == 'val':
  41. val_indices = shuffled_indices[:val_set_size]
  42. self.data_img = self.data_source.reindex().iloc[val_indices]
  43. self.data_label = self.data_label_all.reindex().iloc[val_indices]
  44. elif self.mode == 'train':
  45. train_indices = shuffled_indices[val_set_size:]
  46. self.data_img = self.data_source.reindex().iloc[train_indices]
  47. self.data_label = self.data_label_all.reindex().iloc[train_indices]
  48. elif self.mode == 'test':
  49. self.data_img = self.data_source
  50. self.data_label = self.data_label_all
  51. self.transforms = transforms.Compose([
  52. ImgTransforms((2, 0, 1))
  53. ])
  54. # 每次迭代时返回数据和对应的标签
  55. def __getitem__(self, idx):
  56. img = self.data_img['Image'].iloc[idx].split(' ')
  57. img = ['0' if x == '' else x for x in img]
  58. img = np.array(img, dtype = 'float32').reshape(96, 96)
  59. img = self.transforms(img)
  60. label = np.array(self.data_label.iloc[idx,:],dtype = 'float32')/96
  61. return img, label
  62. # 返回整个数据集的总数
  63. def __len__(self):
  64. return len(self.data_img)
  65. # 训练数据集和验证数据集
  66. train_dataset = FaceDataset(Train_Dir, mode='train')
  67. val_dataset = FaceDataset(Train_Dir, mode='val')
  68. # 测试数据集
  69. test_dataset = FaceDataset(Test_Dir, mode='test')

3.3 数据集抽样展示

实现好Dataset数据集后,来测试一下数据集是否符合预期,因为Dataset是一个可以被迭代的Class,通过for循环从里面读取数据来用matplotlib进行展示。关键点的坐标在数据集中进行了归一化处理,这里乘以图像的大小恢复到原始尺度,并用scatter函数将点画在输出的图像上。

  1. def plot_sample(x, y, axis):
  2. img = x.reshape(96, 96)
  3. axis.imshow(img, cmap='gray')
  4. axis.scatter(y[0::2], y[1::2], marker='x', s=10, color='b')
  5. fig = plt.figure(figsize=(10, 7))
  6. fig.subplots_adjust(
  7. left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
  8. # 随机取16个样本展示
  9. for i in range(16):
  10. axis = fig.add_subplot(4, 4, i+1, xticks=[], yticks=[])
  11. idx = np.random.randint(train_dataset.__len__())
  12. # print(idx)
  13. img, label = train_dataset[idx]
  14. label = label*96
  15. plot_sample(img[0], label, axis)
  16. plt.show()

png

四、定义模型

这里使用到 paddle.vision.models 中定义的 resnet18 网络模型。在ImageNet分类任务中,图像分成1000类,在模型后接一个全连接层,将输出的1000维向量映射成30维,对应15个关键点的横纵坐标。

  1. class FaceNet(paddle.nn.Layer):
  2. def __init__(self, num_keypoints, pretrained=False):
  3. super(FaceNet, self).__init__()
  4. self.backbone = resnet18(pretrained)
  5. self.outLayer1 = paddle.nn.Sequential(
  6. paddle.nn.Linear(1000, 512),
  7. paddle.nn.ReLU(),
  8. paddle.nn.Dropout(0.1))
  9. self.outLayer2 = paddle.nn.Linear(512, num_keypoints*2)
  10. def forward(self, inputs):
  11. out = self.backbone(inputs)
  12. out = self.outLayer1(out)
  13. out = self.outLayer2(out)
  14. return out

4.1 模型可视化

调用飞桨提供的summary接口对组建好的模型进行可视化,方便进行模型结构和参数信息的查看和确认。

  1. from paddle.static import InputSpec
  2. num_keypoints = 15
  3. model = paddle.Model(FaceNet(num_keypoints))
  4. model.summary((1,3, 96, 96))
  1. -------------------------------------------------------------------------------
  2. Layer (type) Input Shape Output Shape Param #
  3. ===============================================================================
  4. Conv2D-1 [[1, 3, 96, 96]] [1, 64, 48, 48] 9,408
  5. BatchNorm2D-1 [[1, 64, 48, 48]] [1, 64, 48, 48] 256
  6. ReLU-1 [[1, 64, 48, 48]] [1, 64, 48, 48] 0
  7. MaxPool2D-1 [[1, 64, 48, 48]] [1, 64, 24, 24] 0
  8. Conv2D-2 [[1, 64, 24, 24]] [1, 64, 24, 24] 36,864
  9. BatchNorm2D-2 [[1, 64, 24, 24]] [1, 64, 24, 24] 256
  10. ReLU-2 [[1, 64, 24, 24]] [1, 64, 24, 24] 0
  11. Conv2D-3 [[1, 64, 24, 24]] [1, 64, 24, 24] 36,864
  12. BatchNorm2D-3 [[1, 64, 24, 24]] [1, 64, 24, 24] 256
  13. BasicBlock-1 [[1, 64, 24, 24]] [1, 64, 24, 24] 0
  14. Conv2D-4 [[1, 64, 24, 24]] [1, 64, 24, 24] 36,864
  15. BatchNorm2D-4 [[1, 64, 24, 24]] [1, 64, 24, 24] 256
  16. ReLU-3 [[1, 64, 24, 24]] [1, 64, 24, 24] 0
  17. Conv2D-5 [[1, 64, 24, 24]] [1, 64, 24, 24] 36,864
  18. BatchNorm2D-5 [[1, 64, 24, 24]] [1, 64, 24, 24] 256
  19. BasicBlock-2 [[1, 64, 24, 24]] [1, 64, 24, 24] 0
  20. Conv2D-7 [[1, 64, 24, 24]] [1, 128, 12, 12] 73,728
  21. BatchNorm2D-7 [[1, 128, 12, 12]] [1, 128, 12, 12] 512
  22. ReLU-4 [[1, 128, 12, 12]] [1, 128, 12, 12] 0
  23. Conv2D-8 [[1, 128, 12, 12]] [1, 128, 12, 12] 147,456
  24. BatchNorm2D-8 [[1, 128, 12, 12]] [1, 128, 12, 12] 512
  25. Conv2D-6 [[1, 64, 24, 24]] [1, 128, 12, 12] 8,192
  26. BatchNorm2D-6 [[1, 128, 12, 12]] [1, 128, 12, 12] 512
  27. BasicBlock-3 [[1, 64, 24, 24]] [1, 128, 12, 12] 0
  28. Conv2D-9 [[1, 128, 12, 12]] [1, 128, 12, 12] 147,456
  29. BatchNorm2D-9 [[1, 128, 12, 12]] [1, 128, 12, 12] 512
  30. ReLU-5 [[1, 128, 12, 12]] [1, 128, 12, 12] 0
  31. Conv2D-10 [[1, 128, 12, 12]] [1, 128, 12, 12] 147,456
  32. BatchNorm2D-10 [[1, 128, 12, 12]] [1, 128, 12, 12] 512
  33. BasicBlock-4 [[1, 128, 12, 12]] [1, 128, 12, 12] 0
  34. Conv2D-12 [[1, 128, 12, 12]] [1, 256, 6, 6] 294,912
  35. BatchNorm2D-12 [[1, 256, 6, 6]] [1, 256, 6, 6] 1,024
  36. ReLU-6 [[1, 256, 6, 6]] [1, 256, 6, 6] 0
  37. Conv2D-13 [[1, 256, 6, 6]] [1, 256, 6, 6] 589,824
  38. BatchNorm2D-13 [[1, 256, 6, 6]] [1, 256, 6, 6] 1,024
  39. Conv2D-11 [[1, 128, 12, 12]] [1, 256, 6, 6] 32,768
  40. BatchNorm2D-11 [[1, 256, 6, 6]] [1, 256, 6, 6] 1,024
  41. BasicBlock-5 [[1, 128, 12, 12]] [1, 256, 6, 6] 0
  42. Conv2D-14 [[1, 256, 6, 6]] [1, 256, 6, 6] 589,824
  43. BatchNorm2D-14 [[1, 256, 6, 6]] [1, 256, 6, 6] 1,024
  44. ReLU-7 [[1, 256, 6, 6]] [1, 256, 6, 6] 0
  45. Conv2D-15 [[1, 256, 6, 6]] [1, 256, 6, 6] 589,824
  46. BatchNorm2D-15 [[1, 256, 6, 6]] [1, 256, 6, 6] 1,024
  47. BasicBlock-6 [[1, 256, 6, 6]] [1, 256, 6, 6] 0
  48. Conv2D-17 [[1, 256, 6, 6]] [1, 512, 3, 3] 1,179,648
  49. BatchNorm2D-17 [[1, 512, 3, 3]] [1, 512, 3, 3] 2,048
  50. ReLU-8 [[1, 512, 3, 3]] [1, 512, 3, 3] 0
  51. Conv2D-18 [[1, 512, 3, 3]] [1, 512, 3, 3] 2,359,296
  52. BatchNorm2D-18 [[1, 512, 3, 3]] [1, 512, 3, 3] 2,048
  53. Conv2D-16 [[1, 256, 6, 6]] [1, 512, 3, 3] 131,072
  54. BatchNorm2D-16 [[1, 512, 3, 3]] [1, 512, 3, 3] 2,048
  55. BasicBlock-7 [[1, 256, 6, 6]] [1, 512, 3, 3] 0
  56. Conv2D-19 [[1, 512, 3, 3]] [1, 512, 3, 3] 2,359,296
  57. BatchNorm2D-19 [[1, 512, 3, 3]] [1, 512, 3, 3] 2,048
  58. ReLU-9 [[1, 512, 3, 3]] [1, 512, 3, 3] 0
  59. Conv2D-20 [[1, 512, 3, 3]] [1, 512, 3, 3] 2,359,296
  60. BatchNorm2D-20 [[1, 512, 3, 3]] [1, 512, 3, 3] 2,048
  61. BasicBlock-8 [[1, 512, 3, 3]] [1, 512, 3, 3] 0
  62. AdaptiveAvgPool2D-1 [[1, 512, 3, 3]] [1, 512, 1, 1] 0
  63. Linear-1 [[1, 512]] [1, 1000] 513,000
  64. ResNet-1 [[1, 3, 96, 96]] [1, 1000] 0
  65. Linear-2 [[1, 1000]] [1, 512] 512,512
  66. ReLU-10 [[1, 512]] [1, 512] 0
  67. Dropout-1 [[1, 512]] [1, 512] 0
  68. Linear-3 [[1, 512]] [1, 30] 15,390
  69. ===============================================================================
  70. Total params: 12,227,014
  71. Trainable params: 12,207,814
  72. Non-trainable params: 19,200
  73. -------------------------------------------------------------------------------
  74. Input size (MB): 0.11
  75. Forward/backward pass size (MB): 10.51
  76. Params size (MB): 46.64
  77. Estimated Total Size (MB): 57.26
  78. -------------------------------------------------------------------------------
  79. {'total_params': 12227014, 'trainable_params': 12207814}

五、训练模型

在这个任务是对坐标进行回归,使用均方误差(Mean Square error )损失函数paddle.nn.MSELoss()来做计算,飞桨2.1中,在nn下将损失函数封装成可调用类。这里使用paddle.Model相关的API直接进行训练,只需要定义好数据集、网络模型和损失函数即可。

使用模型代码进行Model实例生成,使用prepare接口定义优化器、损失函数和评价指标等信息,用于后续训练使用。在所有初步配置完成后,调用fit接口开启训练执行过程,调用fit时只需要将前面定义好的训练数据集、测试数据集、训练轮次(Epoch)和批次大小(batch_size)配置好即可。

  1. model = paddle.Model(FaceNet(num_keypoints=15))
  2. optim = paddle.optimizer.Adam(learning_rate=1e-3,
  3. parameters=model.parameters())
  4. model.prepare(optim, paddle.nn.MSELoss())
  5. model.fit(train_dataset, val_dataset, epochs=60, batch_size=256)
  1. The loss value printed in the log is the current step, and the metric is the average value of previous steps.
  2. Epoch 1/60
  3. step 7/7 - loss: 0.1080 - 680ms/step
  4. Eval begin...
  5. step 2/2 - loss: 0.3060 - 591ms/step
  6. Eval samples: 428
  7. Epoch 2/60
  8. step 7/7 - loss: 0.0448 - 736ms/step
  9. Eval begin...
  10. step 2/2 - loss: 0.0939 - 592ms/step
  11. Eval samples: 428
  12. step 7/7 - loss: 0.0056 - 789ms/step
  13. Eval samples: 428
  14. ...
  15. Epoch 59/60
  16. step 7/7 - loss: 0.0051 - 625ms/step
  17. Eval begin...
  18. step 2/2 - loss: 0.0031 - 485ms/step
  19. Eval samples: 428
  20. Epoch 60/60
  21. step 7/7 - loss: 0.0042 - 576ms/step
  22. Eval begin...
  23. step 2/2 - loss: 0.0011 - 487ms/step
  24. Eval samples: 428

六、模型预测

为了更好的观察预测结果,分别可视化验证集结果与标注点的对比,和在未标注的测试集的预测结果。

6.1 验证集结果可视化

红色的关键点为网络预测的结果, 绿色的关键点为标注的groundtrue。

  1. result = model.predict(val_dataset, batch_size=1)
  1. Predict begin...
  2. step 428/428 [==============================] - 15ms/step
  3. Predict samples: 428
  1. def plot_sample(x, y, axis, gt=[]):
  2. img = x.reshape(96, 96)
  3. axis.imshow(img, cmap='gray')
  4. axis.scatter(y[0::2], y[1::2], marker='x', s=10, color='r')
  5. if gt!=[]:
  6. axis.scatter(gt[0::2], gt[1::2], marker='x', s=10, color='lime')
  7. fig = plt.figure(figsize=(10, 7))
  8. fig.subplots_adjust(
  9. left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
  10. for i in range(16):
  11. axis = fig.add_subplot(4, 4, i+1, xticks=[], yticks=[])
  12. idx = np.random.randint(val_dataset.__len__())
  13. img, gt_label = val_dataset[idx]
  14. gt_label = gt_label*96
  15. label_pred = result[0][idx].reshape(-1)
  16. label_pred = label_pred*96
  17. plot_sample(img[0], label_pred, axis, gt_label)
  18. plt.show()

png

6.2 测试集结果可视化

  1. result = model.predict(test_dataset, batch_size=1)
  1. Predict begin...
  2. step 1783/1783 [==============================] - 15ms/step
  3. Predict samples: 1783
  1. fig = plt.figure(figsize=(10, 7))
  2. fig.subplots_adjust(
  3. left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
  4. for i in range(16):
  5. axis = fig.add_subplot(4, 4, i+1, xticks=[], yticks=[])
  6. idx = np.random.randint(test_dataset.__len__())
  7. img, _ = test_dataset[idx]
  8. label_pred = result[0][idx].reshape(-1)
  9. label_pred = label_pred*96
  10. plot_sample(img[0], label_pred, axis)
  11. plt.show()

png