CNN實現表情識別

2020-03-25 sandag

一、背景介紹

2020年1月29日，教育部有關負責人在接受採訪時表示，防控新型冠狀病毒肺炎是當前頭等重要的大事，各級教育部門正按教育部和當地黨委政府統一部署要求，全力防控，堅決防止疫情在學校蔓延，延期開學是其中的一項重要舉措。與此同時，各地教育部門也為服務保障防控疫情期間中小學校「停課不停教、不停學」做了大量工作。線上教學隨即由此出現了。

但是隨著網絡教學的升入進行，教師不能通過像教室一樣能及時的知道學生的學習狀態，學生也不會像在教室一樣嚴肅對待學習。既然是線上教學，我們就可以藉助一些我們深度學習技術來幫助老師觀察學生表情以此來解決這個問題。本文以CNN入手，實現了一個對人臉表情的識別的深度模型。

二、數據集

Fer2013人臉表情數據集由35886張人臉表情圖片組成，其中，測試圖（Training）28708張，公共驗證圖（PublicTest）和私有驗證圖（PrivateTest）各3589張，每張圖片是由大小固定為48×48的灰度圖像組成，共有7種表情，分別對應於數字標籤0-6，具體表情對應的標籤和中英文如下：0 anger 生氣； 1 disgust 厭惡； 2 fear 恐懼； 3 happy 開心； 4 sad 傷心；5 surprised 驚訝； 6 normal 中性。

同時，我們需要對圖片數據進行數據翻轉、數據旋轉、圖像縮放、圖像剪裁、圖像平移、添加噪聲。看到這裡有些人就會好奇了，我們不是已經有數據了嗎？為什麼還要這樣操作呢？接下來我就來在介紹一下為什麼要這麼操作。

其實這種操作叫做數據增強。在深度學習中，一般要求樣本的數量要充足，樣本數量越多，訓練出來的模型效果越好，模型的泛化能力越強。但是實際中，樣本數量不足或者樣本質量不夠好，這就要對樣本做數據增強，來提高樣本質量。關於數據增強的作用總結如下：

增加訓練的數據量，提高模型的泛化能力
增加噪聲數據，提升模型的魯棒性

圖一：數據增強中的操作

三、模型結構

我們先讀取Fer2013的數據，由於他是一個由灰度圖像組成的csv文件，所以和常規訓練數據讀取起來有些區別。首先我們需要讀取csv文件：

def read_data_np(path):
   with open(path) as f:
   content = f.readlines()
   lines = np.array(content)
   num_of_instances = lines.size
   print("number of instances: ", num_of_instances)
   print("instance length: ", len(lines[1].split(",")[1].split(" ")))
   return lines, num_of_instances

在讀取完文件之後，我們需要對讀取數據進行一定處理，把48*48圖片灰度進行Reshape，並且還需要對總的數據進行數據劃分，劃分出x_train, y_train, x_test, y_test四個部分：

def reshape_dataset(paths, num_classes):
    x_train, y_train, x_test, y_test = [], [], [], []

    lines, num_of_instances = read_data_np(paths)

    # ------------------------------
    # transfer train and test set data
    for i in range(1, num_of_instances):
        try:
            emotion, img, usage = lines[i].split(",")
    
            val = img.split(" ")

            pixels = np.array(val, 'float32')

            emotion = keras.utils.to_categorical(emotion, num_classes)

            if 'Training' in usage:
                y_train.append(emotion)
                x_train.append(pixels)
            elif 'PublicTest' in usage:
                y_test.append(emotion)
                x_test.append(pixels)
        except:
            print("", end="")

    # ------------------------------
    # data transformation for train and test sets
    x_train = np.array(x_train, 'float32')
    y_train = np.array(y_train, 'float32')
    x_test = np.array(x_test, 'float32')
    y_test = np.array(y_test, 'float32')

    x_train /= 255  # normalize inputs between [0, 1]
    x_test /= 255

    x_train = x_train.reshape(x_train.shape[0], 48, 48, 1)
    x_train = x_train.astype('float32')
    x_test = x_test.reshape(x_test.shape[0], 48, 48, 1)
    x_test = x_test.astype('float32')

    print(x_train.shape[0], 'train samples')
    print(x_test.shape[0], 'test samples')

    y_train = y_train.reshape(y_train.shape[0], 7)
    y_train = y_train.astype('int16')
    y_test = y_test.reshape(y_test.shape[0], 7)
    y_test = y_test.astype('int16')

    print('--------x_train.shape:', x_train.shape)
    print('--------y_train.shape:', y_train.shape)


    print(len(x_train), 'train x size')
    print(len(y_train), 'train y size')
    print(len(x_test), 'test x size')
    print(len(y_test), 'test y size')

    return x_train, y_train, x_test, y_test

模型主要使用三層卷積層，兩層全連接層，最後通過一個softmax輸出每個類別的可能性。主要模型代碼如下：

def build_model(num_classes):
  # construct CNN structure
  model = Sequential()
  # 1st convolution layer
  model.add(Conv2D(64, (5, 5), activation='relu', input_shape=(48, 48, 1)))
  model.add(MaxPooling2D(pool_size=(5, 5), strides=(2, 2)))

  # 2nd convolution layer
  model.add(Conv2D(64, (3, 3), activation='relu'))
  # model.add(Conv2D(64, (3, 3), activation='relu'))
  model.add(AveragePooling2D(pool_size=(3, 3), strides=(2, 2)))

  # 3rd convolution layer
  model.add(Conv2D(128, (3, 3), activation='relu'))
  # model.add(Conv2D(128, (3, 3), activation='relu'))
  model.add(AveragePooling2D(pool_size=(3, 3), strides=(2, 2)))

  model.add(Flatten())

  # fully connected neural networks
  model.add(Dense(1024, activation='relu'))
  model.add(Dropout(0.2))
  model.add(Dense(1024, activation='relu'))
  model.add(Dropout(0.2))

  model.add(Dense(num_classes, activation='softmax'))

return model

四、訓練結果

實驗設置steps_per_epoch=256，epoch=5，對數據進行了增強，實驗結果如下圖：

圖二：數據增強的訓練結果

對數據沒有進行增強，實驗結果如下圖：

圖三：沒有數據增強的訓練結果

我們可以我們明顯看到，確實是沒有增強準確率更高，也確實是如此，我們增強了模型的泛化能力能力，必然會降低模型訓練時的準確率，但是我們可以通過以下的幾張額外的圖片，來觀察兩種模型的區別。效果如下（左邊是數據增強，右邊是數據沒有增強）：

圖四：效果對比

我們可以通過上圖實驗結果發現，開心的圖中，用數據增強數據判斷出來的會比沒有用判斷出來的更加準確，同時用數據增強的模型判斷蒙娜麗莎時，更符合蒙娜麗莎的神秘感的常識。

五、總結

本文中，我們只是簡單的使用的CNN網絡，雖然準確率沒有那麼高。但是我們可以利用我們現有的技術來助力線上教學，幫助老師來分析學生的聽課狀態。同時學生如果知道有一雙無形的眼睛盯著他，可能學生也會以在課堂上課的態度來對待網上教學，提高網上教學的授課效率。

CNN實現表情識別

二、數據集

三、模型結構

四、訓練結果

五、總結

數據採集技術簡介

YARN的介紹和一些實踐探索

Data Vault 簡介

Node.js架構剖析

帶你看懂 HMR 熱更新原理

限定性數據結構-棧

手寫一個簡單的HashMap

一文弄懂String的所有小秘密

詳解JS閉包概念

國密算法在數據存儲中的安全應用

Raft 算法摘要

為什麼Java的泛型要用"擦除"實現

Remax 原理淺析

詳解國密 SM2 的數字簽名

Python實現RabbitMQ中6種消息模型

靜態脫敏與動態脫敏的區別

Golang 中生成唯一的字符串（UUID，GUID）

.NET Core + Kubernetes：Pod

Redis zset內部實現

哈希表和高效數組鍊表的實現

.Net 微服務架構技術棧的那些事

Event Loop到底是什麼？

一文讀懂密碼學中的證書

【目標檢測從放棄到入門】一篇文章帶你入門前端視覺編譯技術