Open Access Paper
24 May 2022 User identity alignment across heterogeneous networks based on meta-path attention
Author Affiliations +
Proceedings Volume 12260, International Conference on Computer Application and Information Security (ICCAIS 2021); 122600Z (2022) https://doi.org/10.1117/12.2637544
Event: International Conference on Computer Application and Information Security (ICCAIS 2021), 2021, Wuhan, China
Abstract
In order to establish a unified user model in multiple networks, a method of user identity alignment in social networks has been proposed. Mainly focusing on the user identity alignment with homogeneous network with only one type of node and edge, the former studies has been separated into three types: (1) studies based on network topology only, (2) studies based on user behavior only, (3) studies based on both user-generated content and network topology. But the defect of the former studies is obvious that there is no real social platform with only one type of node and edge in the network. This type of network is called a heterogeneous network. This paper proposes a model that can perform user identity alignment on heterogeneous networks, named user alignment across heterogeneous networks based on meta-path attention (MGUIL). MGUIL fuses meta-path features by introducing a graph attention mechanism in two heterogeneous networks and obtains local and global information through a two-layer GAT network, finally aligning the information in both networks with a unified framework. This method not only solves the alignment problem on heterogeneous network but also considers the global information propagation as a unified framework. We compare it with the existing method in real networks and confirm that MGUIL can improve user identity alignment accuracy.

1.

INTRODUCTION

With the development of social networks, a person may have accounts on several platforms. Identification these accounts on different platforms is useful for cross-domain recommendation, link prediction1, network dynamics2, cyberspace security and other research. Thus the user identity alignment problem arises, where user alignment across social networking platforms is defined as linking users with the same identity across different social platforms. User alignment is also known as user identification, anchor link prediction (ALP), profile linkage, user identity linkage (UIL), etc. The purpose is to use different social network platforms links to users of the same natural person3.

At first, according to the network structure4, users with similar network structures, though they belonged to a natural person. Although this approach of predicting only based on attention and being followed is feasible5, it directly loses the user’s presence on the Internet. The content of this part of the information generated, and the prerequisite of this method is that the network maintains consistency, then some network platforms are blogs, some are videos, and network consistency cannot be fully guaranteed.

Subject to the development of network embedding techniques6-9, more studies are now using network embedding methods. Some scholars have analyzed users’ writing styles on social platforms10 and considered the group of users with the most similar writing styles as the same natural person. Some scholars perform user alignment by analyzing users’ timestamps and location information11. Some Scholars have proposed a method combining user behavior and network structure together into user node features12 to improve this problem of content loss. The textual information generated by the user and the network structure is fused with each other13 for network alignment. Some scholars have used attention mechanisms for user alignment14. However, some of these approaches lack a unified model framework. Some do not take into account the global structure, and all of them work with homogeneous networks, that is, networks with only one type of node and edge.

However, users in modern social networks will generate a large amount of content. For example, on Twitter and Facebook, there are not only relationships between users, but relationships between users and tweets, the whole network has more than one type of node and edge, and some scholars also consider the user alignment problem of heterogeneous networks, downscaling and fusion of multiple types of node features on heterogeneous networks15. Although this method is to some extent for heterogeneous network adapts but loses the global information.

To address the above problems, this paper proposes a new approach to solve the user alignment problem on heterogeneous networks, MGUIL. This method uses the multi-layer graph attention mechanism and the idea of meta-paths6,16 to make a fusion by user-generated content in the first layer of GAT, taking the features of the original content and the features of this user node as the second layer of GAT network, all the meta-path fusion vectors are fused according to the network structure. Thus the global information is obtained. The same process is done for the other network, and finally, the two sets of node vectors with low latitude are aligned. It is worth noting that when feature extraction is performed for the second network, the parameters trained in the first network are used, which ensures that the two high-latitude nodes are mapped into the same low-latitude space.

The contributions of this paper are summarized as follows:

  • (i) The method can capture local and global features in a heterogeneous network using meta-paths and attention mechanisms to map a high-dimension node vector into a low-dimension space.

  • (ii) MGUIL is a unified framework that can complete the node feature extraction and user identity alignment for both networks simultaneously.

  • (iii) After testing using real data, it is shown that MGUIL is able to perform the user alignment task better on heterogeneous networks, which is better than the existing algorithms.

2.

PROBLEM DEFINITION

This section defines the heterogeneous network and introduces two new nodes used in the text: the meta-path fusion vector and the global fusion vector. Finally, the user identity alignment module is defined.

2.1

Heterogeneous networks

A heterogeneous network means that there is more than one type of node and edge, which can be G = (V,E,T) to represent that V is the set of nodes, E is a set of edges, and T is a set of all types in the network.

As an example, the network in Figure 1 is given as a network Gs = (VS,ES,TS), where Ts = {t1, t2tp}, represents p the different node types, and 00202_psisdg12260_122600z_page_2_1.jpg, represents the n nodes in the network and each specifies the type of that node, 00202_psisdg12260_122600z_page_2_2.jpg. Figure 1 will also be used as an example in the next presentation.

Figure 1.

Two citation networks and predefined meta-paths.

00202_psisdg12260_122600z_page_2_3.jpg

2.2

Meta-path fusion vector

A meta-path is a path containing a sequence of relations, such as the relation A-P-V in Figure 1, which is: author A publishes a paper P in journal V. The meta-path fusion vector proposed in this paper refers to the use of attention mechanism to fuse the features of all nodes on a meta-path to the first node (the node usually represents the user node in social networks) as a way to obtain information about the locality.

Given a node 00202_psisdg12260_122600z_page_3_1.jpg, the set of other nodes on the meta-path is given by 00202_psisdg12260_122600z_page_3_2.jpg, the 𝑡𝑟 ∈ 𝑇 to denote. For any 00202_psisdg12260_122600z_page_3_3.jpgAfter one layer of initial linear transformation to obtain 00202_psisdg12260_122600z_page_3_4.jpg, the nodes that would be obtained after one layer of GAT 00202_psisdg12260_122600z_page_3_5.jpg The features of other nodes on the meta-path, i.e., the meta-path fusion vector, are denoted by 00202_psisdg12260_122600z_page_3_6.jpg to represent.

2.3

Global fusion vector

Given a user node 00202_psisdg12260_122600z_page_3_8.jpg that is obtained after the first layer 00202_psisdg12260_122600z_page_3_7.jpg, then use the attention and followed information between users, and use the attention mechanism in the second layer to put 00202_psisdg12260_122600z_page_3_9.jpg, the user neighbors of 00202_psisdg12260_122600z_page_3_10.jpg, and k00202_psisdg12260_122600z_page_3_10a.jpg all of them are fused, i.e., we get the global fusion vector 00202_psisdg12260_122600z_page_3_11.jpg.

2.4

User identity alignment

Given two heterogeneous social networksGs = (VS,ES,TS) andGg = (Vg,Eg,Tg) that are also known to have anchor links 00202_psisdg12260_122600z_page_3_12.jpg denote Gs in 00202_psisdg12260_122600z_page_3_13.jpg and Gg in 00202_psisdg12260_122600z_page_3_14.jpg belong to the same natural person. The problem of user identity alignment is then given 00202_psisdg12260_122600z_page_3_15.jpg on the basis of which in, the k00202_psisdg12260_122600z_page_3_16.jpg (for Gg network as well), finding Gs and Gg two other users in the network that belong to the same natural person 00202_psisdg12260_122600z_page_3_17.jpg.

3.

MODELS

In this paper, we propose a unified framework MGUIL to solve the user identity alignment problem in heterogeneous networks, which uses a two-layer graph attention mechanism to fuse the meta-path vector and the global vector associated with each user node, respectively, and obtain the final combined representation of each user node 00202_psisdg12260_122600z_page_3_18.jpg. Based on this, two user nodes are predicted to belong to the same natural person or not by collaboratively measuring the two-by-two similarity of each element vector in the combined vector.

In this chapter, it will be presented how the original node features are turned into the final combined representation through two layers of GAT to 00202_psisdg12260_122600z_page_3_19.jpg, and it is worth noting that the Gs learned attention hyperparameters can be directly applied to another network Gg in which these parameters are shared by both networks so that the nodes in both networks can be mapped to the same low-latitude space to facilitate the next user identity alignment prediction.

3.1

Meta-path fusion vector

The meta-path fusion vector is generated by the first layer of GAT. Here we take Gs network as an example, for each node 00202_psisdg12260_122600z_page_3_20.jpg we first go through a layer of initialization to reduce it to a representation of f. For nodes with textual content (e.g., tweets and user profiles) we use Word2Vec to initialize their dimension feature vectors, and for nodes that are not textual, we use random initialization to represent them.

For each feature vector node, as shown in Figure 2, a linear transformation is first performed to obtain the weight matrix WRfxf’, which can turn the initial feature vector into a higher latitude vector, denoted by 00202_psisdg12260_122600z_page_3_21.jpg to represent. At this point, a point is selected to be shown in Figure 2 00202_psisdg12260_122600z_page_3_22.jpg. For example, its high-latitude feature vector is 00202_psisdg12260_122600z_page_3_23.jpg, according to the predefined meta-path, to direct the attention mechanism of this layer to notice only the feature vectors on the meta-path, and to fuse these vectors into 00202_psisdg12260_122600z_page_3_24.jpg. According to the predefined meta-path: (t1, t2, t3), it is possible to find 00202_psisdg12260_122600z_page_3_25.jpg need to pay attention to the node 00202_psisdg12260_122600z_page_3_26.jpg, and from equation (1) it is possible to sum each node vector on the meta-path with the 00202_psisdg12260_122600z_page_3_27.jpg attention coefficients when performing fusion.

Figure 2.

MGUIL: Extraction of the final representation by two layers of GAT.

00202_psisdg12260_122600z_page_4_1.jpg
00202_psisdg12260_122600z_page_4_2.jpg

where 00202_psisdg12260_122600z_page_4_3.jpg are the dimension f input feature vector, W is f × f’ the dimension weight matrix, 00202_psisdg12260_122600z_page_4_4.jpg after the eigenvectors are combined, thus 00202_psisdg12260_122600z_page_4_5.jpg is the dimension vector of 2 x f’.

Then we have to calculate 00202_psisdg12260_122600z_page_4_6.jpg · for the final attention coefficients since we want to calculate the impact of each meta-path node on 00202_psisdg12260_122600z_page_4_7.jpg (including the effect of 00202_psisdg12260_122600z_page_4_8.jpg itself), so a normalization operation is performed on the attention coefficients.

00202_psisdg12260_122600z_page_4_9.jpg

After that, the feature fusion operation can be performed, and each feature on the meta-path is fused according to the attention factor according to equation (3), including self-attention.

00202_psisdg12260_122600z_page_4_10.jpg

In this paper, in order to enhance the fusion of relevant features, a multiple attention head mechanism is used for the meta-path fusion vector, where K represents the number of attention heads, thus 00202_psisdg12260_122600z_page_4_11.jpg can be expressed as equation (4).

00202_psisdg12260_122600z_page_4_12.jpg

Where and 00202_psisdg12260_122600z_page_4_12a.jpg and Wk represent the k’th attention coefficient and weight matrix.

3.2

Global fusion vector

After the first layer of GAT, we get the meta-path fusion vector that fuses all of its own features. In the second layer, it will focus on fusing the features between user type nodes, and since the meta-path fusion vector already carries all of the user’s information, it can be concluded that the second layer is a global feature fusion. Then the influence between each user type node is shown in equation (5), and the final attention coefficient is shown in equation (6).

00202_psisdg12260_122600z_page_4_13.jpg
00202_psisdg12260_122600z_page_5_1.jpg

where 00202_psisdg12260_122600z_page_5_2.jpg are the dimension f’ input feature vector, and M is the f’ x f” the dimension weight matrix, the 00202_psisdg12260_122600z_page_5_3.jpg after the eigenvectors are combined, so 00202_psisdg12260_122600z_page_5_4.jpg is the 2 × f” the vector of dimension.

00202_psisdg12260_122600z_page_5_5.jpg

The final global fusion vector is obtained by weighting all the node type vectors based on the calculated attention coefficients according to equation (7).

So far, in order to ensure that our extracted features can be well represented both locally and globally, the first GAT layer and the second GAT layer are combined together, i.e., the combined vector, as shown in Figure 2. In the case of Gs and Gg After performing the same operation, we can map the nodes in these two networks into a low-dimension space, and then we can perform user identity alignment in the low-dimension space. It is worth noting that in some cases the node types in the two networks do not coincide, in which case one should align to the one with more node types and initialize the feature vector of the missing node type in the other network to f dimension all zeros.

3.3

User alignment model

Based on the above two operations, the high latitude nodes of two networks can be mapped to the same low latitude space. At this point, we can determine whether the two final combined vectors are the same natural person based on their similarity/distance. Already existing anchor link 00202_psisdg12260_122600z_page_5_6.jpg, it is now necessary to map the anchor links from Gs and Gg network to find the user 00202_psisdg12260_122600z_page_5_7.jpg and they belong to the same natural person, so there is an anchor link 00202_psisdg12260_122600z_page_5_8.jpg. We should make the distance between two vectors belonging to the same natural person on the low-dimension space as small as possible, and make the distance between vectors not belong to a natural person on the low-dimension space as large as possible, so the loss function is as equation (8).

00202_psisdg12260_122600z_page_5_9.jpg

where d is a distance function, and the Chebyshev Distance is used in the text to calculate the distance between the metapath fusion vector and the global fusion vector of both, respectively. 00202_psisdg12260_122600z_page_5_10.jpg represents the distance between 00202_psisdg12260_122600z_page_5_11.jpg the user with which this node is related, i.e., from the combined vector of that user. w and λ are used as hyperparameters to balance the effect of the meta-path fusion vector and the global fusion vector on the results, and 00202_psisdg12260_122600z_page_5_12.jpg.

4.

EXPERIMENT

4.1

Datasets

Twitter-Foursquare is a heterogeneous pair of networks in which node types include users, tweets, and geographic locations17, 18. Foursquare is a platform that encourages mobile phone users to share information such as their current location with others. The details of this dataset are listed in Table 1.

Table 1.

Twitter-foursquare dataset.

DatasetsNodesNode type
Twitter5,220User
9,490,707Tweet
297,183Location
Foursquare5,315User
48,755Tweet
38,921Location

4.2

Baselines

To evaluate the performance of the proposed MGUIL, we compare our framework with the following state-of-the-art methods:

  • IONE (2016): IONE uses the following and followed the relationship between users as a basis to map two networks into space as a whole and still maintains the same following and followed relationship after the mapping, and then aligns users’ identities by anchor links12.

  • DeepLink (2018): DeepLink introduces Deep Learning on the traditional user alignment method by sampling through the random walk, then using skip-gram to do an embedding, and finally pre-training two preliminary mapping functions between networks A and B. At this point, user identity alignment can be formalized as A dual learning game19.

  • HAN (2019): HAN obtain structural and semantic information about the network in hyperbolic space. Using metapath-guided random wandering to obtain structural and semantic association information in heterogeneous networks, the distance of nodes in the hyperbolic space is used as a measure of similarity between nodes15.

4.3

Comparison of experimental results

In the experiments, the hyperparameters of the proposed method MGUIL in this paper w = 0.6, K = 3. f’ = 256 and f” = 128. And the methods in BaseLines are set to be consistent with those in the original paper. Table 2 shows the performance of each method, using the evaluation metrics Precision@k (P@k) and MAP19.

Table 2.

Twitter-foursquare dataset.

 p@1P@10P@20P@30MAP@30
IONE22.3846.3855.7159.7032.79
DeepLink34.4766.0970.0070.4847.78
HAN38.6971.1675.4980.9552.42
MGUIL*42.6790.2394.4796.2257.43

Note: * Means the method works best.

From Table 2, it can be found that:

  • (i) The reason is that MGUIL is specially designed for heterogeneous networks, which is more suitable for the fusion of multiple types of node features, and the attention mechanism added to MGUIL can express the influence between nodes more clearly than the previous two.

  • (ii) MGUIL still has a large improvement in the correct rate of heterogeneous social user identity alignment compared to HAN, because MGUIL evaluates different types of nodes for influence and then weighted fusion according to their influence on user nodes.

  • (iii) MGUIL combines local and global information together to form a new combined vector representation that captures the information in the network more comprehensively and deeply and represents each user’s characteristics more completely than several methods in baselines.

4.4

Hyperparameter setting experiment

After comparing with baselines’ method, we then set the values of the hyperparameters in MGUIL differently to evaluate the effect of different hyperparameter settings on the results of this model, as a way to find the best hyperparameters for MGUIL.

w and 𝜆 are hyperparameters used to balance the influence of meta-path fusion vectors and global fusion vectors on the final combined vector, and it can be seen from Figure 3a that MGUIL has the best effect when w = 0.6, which indicates that there is a balance point between the meta-path fusion vector and the global fusion vector and that the influence of the meta-path fusion vector on the combined vector is somewhat more important than that of the global fusion vector in terms of the percentage.

Figure 3.

Effect of equilibrium factor w, number of multiple attention heads k, embedding dimension on the results.

00202_psisdg12260_122600z_page_7_1.jpg

Since the multiple attention head mechanism is introduced in the first layer of GAT, the number of attention heads K also affects the final effect. As can be seen from Figure 3b, the full capability of the model can be best exploited at K=4. In short, setting a small K value may lead to incomplete feature fusion and not extracting deeper information. Setting a large K value may lead to too much noise introduced in the fusion process and affect the accuracy of the features.

The choice of the embedding dimension determines the complexity of the potential space, and in this paper, we choose 128 dimensions as the final dimension. As shown in Figure 3c, better results can be obtained at 128 dimensions.

5.

SUMMARY

In this paper, we propose MGUIL, a model for user identity alignment in heterogeneous networks, which uses a two-layer attention mechanism to fuse all the features of user nodes themselves in the first layer and to fuse the global network structure through the following relationship between users in the second layer. Finally, the results of the two layers of GAT are combined together and fed into the identity alignment supervised model, which uses known anchor nodes to find a pair of combined nodes with minimal differences and the closest distance on the low-latitude embedding space. And we test it on a real online social platform and the results are ahead with existing methods

ACKNOWLEDGMENTS

This work is supported by the Nation Nature Science Foundation of China (NSFC) (NO. 61572445).

REFERENCES

[1] 

Zhang, J., Yu, P. S. and Zhou, Z., “Meta-path based multi-network collective link prediction,” KDD’, 14 1286 –95 (2014). Google Scholar

[2] 

Zafarani, R. and Liu, H., “Users joining multiple sites: friendship and popularity variations across sites,” Information Fusion, 28 83 –89 (2016). https://doi.org/10.1016/j.inffus.2015.07.002 Google Scholar

[3] 

Chen, B. and Chen, X., “A survey on user alignment across social networks,” Journal of Xihua University, 40 11 –26 (2021). Google Scholar

[4] 

Wang, D., Cui, P. and Zhu, W., “Structural deep network embedding,” KDD’, 16 1225 –34 (2016). Google Scholar

[5] 

Zafarani, R. and Liu, H., “Connecting corresponding identities across communities,” International AAAI Conf. on Web and Social Media, 354 –57 (2009). Google Scholar

[6] 

Velickovic, P., Cucurull, G. and Casanova, A., “Graph attention networks,” ICLR’, 18 1 –12 (2017). Google Scholar

[7] 

Bryan, P., Al-Rfou, R. and Steven, S., “Deepwalk: Online learning of social representations,” KDD’, 14 701 –10 (2014). Google Scholar

[8] 

Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J. and Mei, Q., “LINE: Large-scale information network embedding,” WWW’, 15 1067 –77 (2015). Google Scholar

[9] 

Chu, X., Fan, X., Yao, D., Zhu, Z., Huang, J. and Bi, J., “Cross-network embedding for multi-network alignment,” WWW’, 19 273 –84 (2019). Google Scholar

[10] 

Oana, G., Howard, L., Gerald, F., Robin, S. and Renata, T., “Exploiting innocuous activity for correlating users across sites,” WWW’, 13 447 –58 (2013). Google Scholar

[11] 

Christopher, R., Yunsung, K., Augustin, C., Nitish, K. and Silvio, L., “Linking users across domains with location data: Theory and validation,” WWW’, 16 707 –19 (2016). Google Scholar

[12] 

Liu, L., Cheung, W. K., Li, X., and Liao, L., “Aligning users across social networks using network embedding,” IJCAI’, 16 1774 –80 (2016). Google Scholar

[13] 

Liu, S., Wang, S., Zhu, F., Zhang, J., and Krishnan, R., “HYDRA: Large-scale social identity linkage via heterogeneous behavior modeling,” SIGMOD’14 on Management of Data, 51 –62 (2014). Google Scholar

[14] 

Li, X., Shang, Y. and Cao, Y., “Type-aware anchor link prediction across heterogeneous networks based on graph attention network,” AAAI Conf, 147 –55 on Artificial Intelligence,2020). https://doi.org/10.1609/aaai.v34i01.5345 Google Scholar

[15] 

Wang, X., Ji, H., Shi, C., Wang, B., Ye, Y., Cui, P. and Yu, P. S., “Heterogeneous graph attention network WWW’19,” 2022 –32 (2019). Google Scholar

[16] 

Dong, Y., Chawla, N. V. and Swami, A., “Metapath2vec: Scalable representation learning for heterogeneous networks,” KDD’, 17 135 –44 (2017). Google Scholar

[17] 

Velickovic, P., Cucurull, G. and Casanova, A., “Graph attention networks,” ICLR’, 18 1 –12 (2017). Google Scholar

[18] 

Zhang, J. and Yu, P., “Integrated anchor and social link predictions across social networks,” IJ-CAI’, 15 2125 –32 (2015). Google Scholar

[19] 

Zhou, F., Liu, L., Zhang, K., Trajcevski, G., Wu, J. and Zhong, T., “Deeplink: A deep learning approach for user identity linkage,” in IEEE Conf. on Computer Communications, 1313 –21 (2018). Google Scholar
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Yong Gan, Chenfang Zhang, and Ruisen Yang "User identity alignment across heterogeneous networks based on meta-path attention", Proc. SPIE 12260, International Conference on Computer Application and Information Security (ICCAIS 2021), 122600Z (24 May 2022); https://doi.org/10.1117/12.2637544
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Social networks

Feature extraction

Internet

Network security

Back to Top