關於GCN，我有三種寫法

作者 | 阿澤

來源 | 阿澤的學習筆記（ID: aze_learning）

本篇文章主要基於 DGL 框架用三種不同的方式來實現圖卷積神經網絡。

DGL簡介

DGL（Deep Graph Library）框架是由紐約大學和 AWS 工程師共同開發的開源框架，旨在為大家提供一個在圖上進行深度學習的工具，幫助大家更高效的實現算法。

用現有的一些框架比如 TensorFlow、Pytorch、MXNet 等實現圖神經網絡模型都不太方便，同樣現有框架實現圖神經網絡模型的速度不夠快。

DGL 框架設計理念主要在於將圖神經網絡看作是消息傳遞的過程，每一個節點會發出它自己的消息，也會接收來自其它節點的消息。然後在得到所有信息之後做聚合，計算出節點新的表示。原有的深度學習框架都是進行張量運算，但是圖很多時候並不能直接表示成一個完整的張量，需要手動補零，這其實很麻煩，不高效。

DGL 是基於現有框架，幫助用戶更容易實現圖神經網絡模型。DGL 現在主要是以消息傳遞的接口作為核心，同時提供圖採樣以及批量處理圖的接口。

關於 DGL 就不再進行過多介紹，感興趣的同學可以去官網（http://dgl.ai/）了解。

Prepare

importtorch

importtime

importmath

importdgl

importnumpy asnp

importtorch.nn asnn

fromdgl.data importcitation_graph ascitegrh

fromdgl importDGLGraph

importdgl.function asfn

importnetworkx asnx

importtorch.nn.functional asF

fromdgl.nn importGraphConv

# from dgl.nn.pytorch import GraphConv

# from dgl.nn.pytorch.conv import GraphConv

這裡有三種導入方法，建議用第一種，因為 DGL 的開發同學設計了一個機制，會自動 detect 用了什麼 beckend，從而適配對應的 backend 的 api。

print(torch.__version__)

print(dgl.__version__)

print(nx.__version__)

1.4.0

0.4.3

2.3

GCN

3.1 First version

DGL 的第一種寫法是利用 DGL 預定義的圖卷積模塊 GraphConv 來實現的。

GCN 的數學公式如下：

其中，為節點的鄰居集合，表示節點度的平方根的乘積，用于歸一化數據，為激活函數。

GraphConv 模型參數初始化參考 tkipf 大佬的原始實現，其中使用 Glorot uniform 統一初始化，並將偏差初始化為零。

簡單介紹下 Glorot 均勻分布（uniform）

Glorot 均勻分布，也叫 Xavier 均勻分布，該方法源於 2010 年的一篇論文《Understanding the difficulty of training deep feedforward neural networks》。其核心思想在於：為了使得網絡中信息更好的流動，每一層輸出的方差應該儘量相等。基於這個目標，權重 W 的方差需要滿足，我們知道均勻分布的方差為：。所以我們可以初始化 W 為 Xavier 均勻分布：（具體證明見論文）

classGCN(nn.Module):

def__init__(self,

in_feats,

n_hidden,

n_classes,

n_layers,

activation,

dropout) :

super(GCN, self).__init__

self.g = g

self.layers = nn.ModuleList

# input layer

self.layers.append(GraphConv(in_feats, n_hidden, activation=activation))

# output layer

fori inrange(n_layers - 1):

self.layers.append(GraphConv(n_hidden, n_hidden, activation=activation))

# output layer

self.layers.append(GraphConv(n_hidden, n_classes))

self.dropout = nn.Dropout(p=dropout)

defforward(self, features):

h = features

fori, layers inenumerate(self.layers):

ifi!= 0:

h = self.dropout(h)

h = layers(self.g, h)

returnh

3.2 Secondversion3.2.1 ndata

DGL 的第二種寫法：使用用戶自定義的 Message 和 Reduce 函數

ndata 是 DGL 的一個特殊的語法，可以用於賦值(獲得)某些節點的特徵：

x = tourch.randn( 10, 3)

g.ndata[ 'x'] = x

如果指定某些節點的特徵，可以進行切片操作：

g.ndata[ 'x'][ 0] = th.zeros( 1, 3)

g.ndata[ 'x'][[ 0, 1, 2]] = th.zeros( 3, 3)

g.ndata[ 'x'][th.tensor([ 0, 1, 2])] = th.randn(( 3, 3))

當然也可以獲得邊的特徵：

g.edata[ 'w'] = th.randn( 9, 2)

# Access edge set with IDs in integer, list, or integer tensor

g.edata[ 'w'][ 1] = th.randn( 1, 2)

g.edata[ 'w'][[ 0, 1, 2]] = th.zeros( 3, 2)

g.edata[ 'w'][th.tensor([ 0, 1, 2])] = th.zeros( 3, 2)

# You can get the edge ids by giving endpoints, which are useful for accessing the features.

g.edata[ 'w'][g.edge_id( 1, 0)] = th.ones( 1, 2) # edge 1 -> 0

g.edata[ 'w'][g.edge_ids([ 1, 2, 3], [ 0, 0, 0])] = th.ones( 3, 2) # edges [1, 2, 3] -> 0

# Use edge broadcasting whenever applicable.

g.edata[ 'w'][g.edge_ids([ 1, 2, 3], 0)] = th.ones( 3, 2) # edges [1, 2, 3] -> 0

3.2.2 UDFs

在 DGL 中，通過用戶自定義的函數（User-defined functions，UDFs）來實現消息傳遞和節點特徵變換。

可以利用 Edge UDFs 來定義一個消息（Message）函數，其功能在於基於邊傳遞消息。具體實現如下：

defgcn_msg(edge):

msg = edge.src[ 'h'] * edge.src[ 'norm']

return{ 'm': msg}

Edge UDFs 需要傳入一個 edge 參數，其中 edge 有三個屬性：src、dst、data，分別對應源節點特徵、目標節點特徵和邊特徵。

我們的 Message 函數，是從源節點向目標節點傳遞，所以只考慮源節點的特徵。

節點中的 'norm' 用于歸一化，具體計算方式後面會說。

對於每個節點來說，可能過會收到很多個源節點傳過來的消息，所以可以將這些消息存儲在郵箱中（mailbox）。

我們那再來定義一個聚合（Reduce）函數。

消息傳遞完後，每個節點都要處理下他們的「信箱」（mailbox），Reduce 函數的作用就是用來處理節點「信箱」的消息的。

Reduce 函數是一個 Node UDFs。

Node UDFs 接收一個 node 的參數，並且 node 有兩個屬性 data 和 mailbox，分別為節點的特徵和用來接收信息的「信箱」。

defgcn_reduce(node):

# 需要注意：消息存放在 mailbox 的第二個維上，第一維是消息的數量

accum = torch.sum(node.mailbox[ 'm'], dim= 1) * node.data[ 'norm']

return{ 'h': accum}

Messge UDF 作用於邊上，而 Reduce UDF 作用於節點上。兩者的關係如下：

從左到右開始看，源節點通過 message 函數傳遞節點特徵，並傳遞到目標節點的 Mailbox 中，在觸發 Node UDF 時（這裡為 Reduce 函數），Mailbox 將被清空。

上圖中我們還可以看到作用於節點的有兩個函數：Apply 函數和 Reduce 函數。

Reduce 函數我們上面介紹過了，那這個 Apply 函數是什麼呢？

Apply 函數為節點更新的函數，可以用於 「初始化參數」和 「對節點特徵的進行非線形變換」。

初始化參數：我們剛剛指出，參數分布服從 Glorot 均勻分布，所以要給節點加偏置的話，我們也需要將其初始化為並使其服從 Glorot 均勻分布，如下面代碼中的 reset_parameters 函數

非線形變換：GCN 中每一層進行傳遞後，節點可能需要進行非線形變換，如下面代碼中 forward 函數

classNodeApplyModule(nn.Module):

def__init__(self, out_feats, activation=None, bias=True):

super(NodeApplyModule, self).__init__

ifbias:

self.bias = nn.Parameter(torch.Tensor(out_feats))

else:

self.bias = None

self.activation = activation

self.reset_parameters

defreset_parameters(self):

ifself.bias isnotNone:

stdv = 1./ math.sqrt(self.bias.size( 0))

self.bias.data.uniform_(-stdv, stdv)

defforward(self, nodes):

h = nodes.data[ 'h']

ifself.bias isnotNone:

h = h + self.bias

ifself.activation:

h = self.activation(h)

return{ 'h': h}

有了 Message 函數、Reduce 函數和節點的更新函數後，我們需要將其連貫起來：

g.update_all(message_func= 'default',

reduce_func= 'default',

apply_node_func= 'default')

這個函數可以用於發送信息並更新所有節點，是 send 和 recv 函數的一個簡單組合

3.2.3 GCNLayer

有了這些後，我們便可以定義 GCNLayer 了：

classGCNLayer(nn.Module):

def__init__(self,

in_feats,

out_feats,

activation,

dropout,

bias=True) :

super(GCNLayer, self).__init__

self.g = g

self.weight = nn.Parameter(torch.Tensor(in_feats, out_feats))

ifdropout:

self.dropout = nn.Dropout(p=dropout)

else:

self.dropout = 0.

self.node_update = NodeApplyModule(out_feats, activation, bias)

self.reset_parameters

defreset_parameters(self):

stdv = 1./ math.sqrt(self.weight.size( 1))

self.weight.data.uniform_(-stdv, stdv)

defforward(self, h):

ifself.dropout:

h = self.dropout(h)

self.g.ndata[ 'h'] = torch.mm(h, self.weight)

self.g.update_all(gcn_msg, gcn_reduce, self.node_update)

h = self.g.ndata.pop( 'h')

returnh

然後我們把 GCNLayer 拼接在一起組成 GCN 網絡

classGCN(nn.Module):

def__init__(self,

in_feats,

n_hidden,

n_classes,

n_layers,

activation,

dropout) :

super(GCN, self).__init__

self.layers = nn.ModuleList

# input layer

self.layers.append(GCNLayer(g, in_feats, n_hidden, activation, dropout))

# hidden layers

fori inrange(n_layers - 1):

self.layers.append(GCNLayer(g, n_hidden, n_hidden, activation, dropout))

# output layer

self.layers.append(GCNLayer(g, n_hidden, n_classes, None, dropout))

defforward(self, features):

h = features

forlayer inself.layers:

h = layer(h)

returnh

3.3 Third version

DGL 的第三種寫法：使用 DGL 的內置（builtin）函數

由於 Messge 和 Reduce 函數使用的比較頻繁，所以 DGL 了內置函數以方便使用，我們把剛剛的 Message 和 Reduce 函數改變為內置函數有：

dgl.function.copy_src(src, out)：Message 函數其實就是把源節點的特徵拷貝到目標節點，所以可以換用內置的 copy_src 函數。
dgl.function.sum(msg, out)：Reduce 函數其實就是聚合節點 Mailbox 中的消息，所以可以換用內置的 sum 函數。

classGCNLayer(nn.Module):

def__init__(self,

in_feats,

out_feats,

activation,

dropout,

bias=True) :

super(GCNLayer, self).__init__

self.g = g

self.weight = nn.Parameter(torch.Tensor(in_feats, out_feats))

ifbias:

self.bias = nn.Parameter(torch.Tensor(out_feats))

else:

self.bias = None

self.activation = activation

ifdropout:

self.dropout = nn.Dropout(p=dropout)

else:

self.dropout = 0.

self.reset_parameters

defreset_parameters(self):

stdv = 1./ math.sqrt(self.weight.size( 1))

self.weight.data.uniform_(-stdv, stdv)

ifself.bias isnotNone:

self.bias.data.uniform_(-stdv, stdv)

defforward(self, h):

ifself.dropout:

h = self.dropout(h)

h = torch.mm(h, self.weight)

# normalization by square root of src degree

h = h * self.g.ndata[ 'norm']

self.g.ndata[ 'h'] = h

self.g.update_all(fn.copy_src(src= 'h', out= 'm'),

fn.sum(msg= 'm', out= 'h'))

h = self.g.ndata.pop( 'h')

# normalization by square root of dst degree

h = h * self.g.ndata[ 'norm']

# bias

ifself.bias isnotNone:

h = h + self.bias

ifself.activation:

h = self.activation(h)

returnh

這裡的做了兩次的標準化，對應 GCN 公式中的；
這裡把 Node 的 Apply 函數的功能合併到 GCNLayer 中了。

classGCN(nn.Module):

def__init__(self,

in_feats,

n_hidden,

n_classes,

n_layers,

activation,

dropout) :

super(GCN, self).__init__

self.layers = nn.ModuleList

# input layer

self.layers.append(GCNLayer(g, in_feats, n_hidden, activation, 0.))

# hidden layers

fori inrange(n_layers - 1):

self.layers.append(GCNLayer(g, n_hidden, n_hidden, activation, dropout))

# output layer

self.layers.append(GCNLayer(g, n_hidden, n_classes, None, dropout))

defforward(self, features):

h = features

forlayer inself.layers:

h = layer(h)

returnh

訓練

dropout= 0.5

gpu= -1

lr= 0.01

n_epochs= 200

n_hidden= 16# 隱藏層節點的數量

n_layers= 2# 輸入層 + 輸出層的數量

weight_decay= 5e-4# 權重衰減

self_loop= True# 自循環

# cora 數據集

data = citegrh.load_cora

features = torch.FloatTensor(data.features)

labels = torch.LongTensor(data.labels)

train_mask = torch.BoolTensor(data.train_mask)

val_mask = torch.BoolTensor(data.val_mask)

test_mask = torch.BoolTensor(data.test_mask)

in_feats = features.shape[ 1]

n_classes = data.num_labels

n_edges = data.graph.number_of_edges

# 構建 DGLGraph

g = data.graph

ifself_loop:

g.remove_edges_from(nx.selfloop_edges(g))

g.add_edges_from(zip(g.nodes, g.nodes))

g = DGLGraph(g)

這裡大家可能會有些疑惑：為什麼要先移除自環？然後再加上自環。

這個主要是為了防止原本數據集中有一部分的自環，如果不去掉直接加上自環的話，會導致一些節點有兩個自環，而有些只有一個。

# 加載 GPU

ifgpu < 0:

cuda = False

else:

cuda = True

torch.cuda.set_device(gpu)

features = features.cuda

labels = labels.cuda

train_mask = train_mask.cuda

val_mask = val_mask.cuda

test_mask = test_mask.cuda

# 歸一化，依據入度進行計算

degs = g.in_degrees.float

norm = torch.pow(degs, -0.5)

norm[torch.isinf(norm)] = 0

ifcuda:

norm = norm.cuda

g.ndata[ 'norm'] = norm.unsqueeze( 1)

# 創建一個 GCN 的模型，可以選擇上面的任意一個進行初始化

model = GCN(g,

in_feats,

n_hidden,

n_classes,

n_layers,

F.relu,

dropout)

ifcuda:

model.cuda

# 採用交叉熵損失函數和 Adam 優化器

loss_fcn = torch.nn.CrossEntropyLoss

optimizer = torch.optim.Adam(model.parameters,

lr=lr,

weight_decay=weight_decay)

# 定義一個評估函數

defevaluate(model, features, labels, mask):

model.eval

withtorch.no_grad:

logits = model(features)

logits = logits[mask]

labels = labels[mask]

_, indices = torch.max(logits, dim= 1)

correct = torch.sum(indices == labels)

returncorrect.item * 1.0/ len(labels)

# 訓練，並評估

dur = []

forepoch inrange(n_epochs):

model.train

t0 = time.time

# forward

logits = model(features)

loss = loss_fcn(logits[train_mask], labels[train_mask])

optimizer.zero_grad

loss.backward

optimizer.step

dur.append(time.time - t0)

ifepoch % 10== 0:

acc = evaluate(model, features, labels, val_mask)

print( "Epoch {:05d} | Time(s) {:.4f} | Loss {:.4f} | Accuracy {:.4f} | "

"ETputs(KTEPS) {:.2f}". format(epoch, np.mean(dur), loss.item,

acc, n_edges / np.mean(dur) / 1000))

acc = evaluate(model, features, labels, test_mask)

print( "Test accuracy {:.2%}".format(acc))

Test accuracy 80.40%

5.結論

以上便是本教程的全部，當然還有其他實現的方法，比如說，直接利用矩陣相乘來進行疊代。

參考目錄

DGL Github
DGL 官方文檔
《深度學習——Xavier初始化方法》
《DGL 作者答疑！關於 DGL 你想知道的都在這裡-周金晶》

關於GCN，我有三種寫法

人工智慧在消費領域，都做了哪些事？

清華 CVer 對自監督學習的一些思考

這個開源的「搶茅台腳本」，火了

2020年人工智慧十大技術進展

舌尖上的AI：人工智慧技術正在被「端上」餐桌

AI 和 SEO 的結合：是福還是禍？

當 AI 闖入法律界，第一步是當律師的得力助手

重磅推出開發者計劃、App Store，賽靈思普及自適應計算的一大步

湘苗培優 | 值不值？效果告訴你

視覺+Transformer最新論文出爐，華為聯合北大、雪梨大學發表

CSDN湘苗培優

機器學習和計算機視覺的前20個圖像數據集

尋找長沙「科技之星」，CSDN星城大巡禮

2020年中國AI算力報告發布：超大算法模型挑戰之下，公共AI算力基建是關鍵

完全免費，簡化版Plotly推出，秒繪各類可視化圖表

深度學習中的注意力機制（三）

短視頻特效「耍花招」：線上投籃、擺攤，讓畫中人搖擺，淺談騰訊微視的AR基建

AI化身監工，上班還能摸魚嗎？

《賽博朋克2077》是捏臉遊戲？上科大學生社團開發了一款賽博「濾鏡」

鯤鵬高校行太原站來襲，兩大課程一站式掌握未來潮流

3行Python代碼就能獲取海量數據？

實戰｜手把手教你用Python爬取存儲數據，還能自動在Excel中可視化

CSDN湘苗培優，遇見更好的自己

丟棄Transformer，FCN也可以實現E2E檢測