|
1.INTRODUCTIONWith the development of social networks, a person may have accounts on several platforms. Identification these accounts on different platforms is useful for cross-domain recommendation, link prediction1, network dynamics2, cyberspace security and other research. Thus the user identity alignment problem arises, where user alignment across social networking platforms is defined as linking users with the same identity across different social platforms. User alignment is also known as user identification, anchor link prediction (ALP), profile linkage, user identity linkage (UIL), etc. The purpose is to use different social network platforms links to users of the same natural person3. At first, according to the network structure4, users with similar network structures, though they belonged to a natural person. Although this approach of predicting only based on attention and being followed is feasible5, it directly loses the user’s presence on the Internet. The content of this part of the information generated, and the prerequisite of this method is that the network maintains consistency, then some network platforms are blogs, some are videos, and network consistency cannot be fully guaranteed. Subject to the development of network embedding techniques6-9, more studies are now using network embedding methods. Some scholars have analyzed users’ writing styles on social platforms10 and considered the group of users with the most similar writing styles as the same natural person. Some scholars perform user alignment by analyzing users’ timestamps and location information11. Some Scholars have proposed a method combining user behavior and network structure together into user node features12 to improve this problem of content loss. The textual information generated by the user and the network structure is fused with each other13 for network alignment. Some scholars have used attention mechanisms for user alignment14. However, some of these approaches lack a unified model framework. Some do not take into account the global structure, and all of them work with homogeneous networks, that is, networks with only one type of node and edge. However, users in modern social networks will generate a large amount of content. For example, on Twitter and Facebook, there are not only relationships between users, but relationships between users and tweets, the whole network has more than one type of node and edge, and some scholars also consider the user alignment problem of heterogeneous networks, downscaling and fusion of multiple types of node features on heterogeneous networks15. Although this method is to some extent for heterogeneous network adapts but loses the global information. To address the above problems, this paper proposes a new approach to solve the user alignment problem on heterogeneous networks, MGUIL. This method uses the multi-layer graph attention mechanism and the idea of meta-paths6,16 to make a fusion by user-generated content in the first layer of GAT, taking the features of the original content and the features of this user node as the second layer of GAT network, all the meta-path fusion vectors are fused according to the network structure. Thus the global information is obtained. The same process is done for the other network, and finally, the two sets of node vectors with low latitude are aligned. It is worth noting that when feature extraction is performed for the second network, the parameters trained in the first network are used, which ensures that the two high-latitude nodes are mapped into the same low-latitude space. The contributions of this paper are summarized as follows:
2.PROBLEM DEFINITIONThis section defines the heterogeneous network and introduces two new nodes used in the text: the meta-path fusion vector and the global fusion vector. Finally, the user identity alignment module is defined. 2.1Heterogeneous networksA heterogeneous network means that there is more than one type of node and edge, which can be G = (V,E,T) to represent that V is the set of nodes, E is a set of edges, and T is a set of all types in the network. As an example, the network in Figure 1 is given as a network Gs = (VS,ES,TS), where Ts = {t1, t2 ⋯tp}, represents p the different node types, and , represents the n nodes in the network and each specifies the type of that node, . Figure 1 will also be used as an example in the next presentation. 2.2Meta-path fusion vectorA meta-path is a path containing a sequence of relations, such as the relation A-P-V in Figure 1, which is: author A publishes a paper P in journal V. The meta-path fusion vector proposed in this paper refers to the use of attention mechanism to fuse the features of all nodes on a meta-path to the first node (the node usually represents the user node in social networks) as a way to obtain information about the locality. Given a node , the set of other nodes on the meta-path is given by , the 𝑡𝑟 ∈ 𝑇 to denote. For any After one layer of initial linear transformation to obtain , the nodes that would be obtained after one layer of GAT The features of other nodes on the meta-path, i.e., the meta-path fusion vector, are denoted by to represent. 2.3Global fusion vectorGiven a user node that is obtained after the first layer , then use the attention and followed information between users, and use the attention mechanism in the second layer to put , the user neighbors of , and k ∈ all of them are fused, i.e., we get the global fusion vector . 2.4User identity alignmentGiven two heterogeneous social networksGs = (VS,ES,TS) andGg = (Vg,Eg,Tg) that are also known to have anchor links denote Gs in and Gg in belong to the same natural person. The problem of user identity alignment is then given on the basis of which i ∈ n, the k ∈ (for Gg network as well), finding Gs and Gg two other users in the network that belong to the same natural person . 3.MODELSIn this paper, we propose a unified framework MGUIL to solve the user identity alignment problem in heterogeneous networks, which uses a two-layer graph attention mechanism to fuse the meta-path vector and the global vector associated with each user node, respectively, and obtain the final combined representation of each user node . Based on this, two user nodes are predicted to belong to the same natural person or not by collaboratively measuring the two-by-two similarity of each element vector in the combined vector. In this chapter, it will be presented how the original node features are turned into the final combined representation through two layers of GAT to , and it is worth noting that the Gs learned attention hyperparameters can be directly applied to another network Gg in which these parameters are shared by both networks so that the nodes in both networks can be mapped to the same low-latitude space to facilitate the next user identity alignment prediction. 3.1Meta-path fusion vectorThe meta-path fusion vector is generated by the first layer of GAT. Here we take Gs network as an example, for each node we first go through a layer of initialization to reduce it to a representation of f. For nodes with textual content (e.g., tweets and user profiles) we use Word2Vec to initialize their dimension feature vectors, and for nodes that are not textual, we use random initialization to represent them. For each feature vector node, as shown in Figure 2, a linear transformation is first performed to obtain the weight matrix W ∈ Rfxf’, which can turn the initial feature vector into a higher latitude vector, denoted by to represent. At this point, a point is selected to be shown in Figure 2 . For example, its high-latitude feature vector is , according to the predefined meta-path, to direct the attention mechanism of this layer to notice only the feature vectors on the meta-path, and to fuse these vectors into . According to the predefined meta-path: (t1, t2, t3), it is possible to find need to pay attention to the node , and from equation (1) it is possible to sum each node vector on the meta-path with the attention coefficients when performing fusion. where are the dimension f input feature vector, W is f × f’ the dimension weight matrix, after the eigenvectors are combined, thus is the dimension vector of 2 x f’. Then we have to calculate · for the final attention coefficients since we want to calculate the impact of each meta-path node on (including the effect of itself), so a normalization operation is performed on the attention coefficients. After that, the feature fusion operation can be performed, and each feature on the meta-path is fused according to the attention factor according to equation (3), including self-attention. In this paper, in order to enhance the fusion of relevant features, a multiple attention head mechanism is used for the meta-path fusion vector, where K represents the number of attention heads, thus can be expressed as equation (4). Where and and Wk represent the k’th attention coefficient and weight matrix. 3.2Global fusion vectorAfter the first layer of GAT, we get the meta-path fusion vector that fuses all of its own features. In the second layer, it will focus on fusing the features between user type nodes, and since the meta-path fusion vector already carries all of the user’s information, it can be concluded that the second layer is a global feature fusion. Then the influence between each user type node is shown in equation (5), and the final attention coefficient is shown in equation (6). where are the dimension f’ input feature vector, and M is the f’ x f” the dimension weight matrix, the after the eigenvectors are combined, so is the 2 × f” the vector of dimension. The final global fusion vector is obtained by weighting all the node type vectors based on the calculated attention coefficients according to equation (7). So far, in order to ensure that our extracted features can be well represented both locally and globally, the first GAT layer and the second GAT layer are combined together, i.e., the combined vector, as shown in Figure 2. In the case of Gs and Gg After performing the same operation, we can map the nodes in these two networks into a low-dimension space, and then we can perform user identity alignment in the low-dimension space. It is worth noting that in some cases the node types in the two networks do not coincide, in which case one should align to the one with more node types and initialize the feature vector of the missing node type in the other network to f dimension all zeros. 3.3User alignment modelBased on the above two operations, the high latitude nodes of two networks can be mapped to the same low latitude space. At this point, we can determine whether the two final combined vectors are the same natural person based on their similarity/distance. Already existing anchor link , it is now necessary to map the anchor links from Gs and Gg network to find the user and they belong to the same natural person, so there is an anchor link . We should make the distance between two vectors belonging to the same natural person on the low-dimension space as small as possible, and make the distance between vectors not belong to a natural person on the low-dimension space as large as possible, so the loss function is as equation (8). where d is a distance function, and the Chebyshev Distance is used in the text to calculate the distance between the metapath fusion vector and the global fusion vector of both, respectively. represents the distance between the user with which this node is related, i.e., from the combined vector of that user. w and λ are used as hyperparameters to balance the effect of the meta-path fusion vector and the global fusion vector on the results, and . 4.EXPERIMENT4.1DatasetsTwitter-Foursquare is a heterogeneous pair of networks in which node types include users, tweets, and geographic locations17, 18. Foursquare is a platform that encourages mobile phone users to share information such as their current location with others. The details of this dataset are listed in Table 1. Table 1.Twitter-foursquare dataset.
4.2BaselinesTo evaluate the performance of the proposed MGUIL, we compare our framework with the following state-of-the-art methods:
4.3Comparison of experimental resultsIn the experiments, the hyperparameters of the proposed method MGUIL in this paper w = 0.6, K = 3. f’ = 256 and f” = 128. And the methods in BaseLines are set to be consistent with those in the original paper. Table 2 shows the performance of each method, using the evaluation metrics Precision@k (P@k) and MAP19. Table 2.Twitter-foursquare dataset.
Note: * Means the method works best. From Table 2, it can be found that:
4.4Hyperparameter setting experimentAfter comparing with baselines’ method, we then set the values of the hyperparameters in MGUIL differently to evaluate the effect of different hyperparameter settings on the results of this model, as a way to find the best hyperparameters for MGUIL. w and 𝜆 are hyperparameters used to balance the influence of meta-path fusion vectors and global fusion vectors on the final combined vector, and it can be seen from Figure 3a that MGUIL has the best effect when w = 0.6, which indicates that there is a balance point between the meta-path fusion vector and the global fusion vector and that the influence of the meta-path fusion vector on the combined vector is somewhat more important than that of the global fusion vector in terms of the percentage. Since the multiple attention head mechanism is introduced in the first layer of GAT, the number of attention heads K also affects the final effect. As can be seen from Figure 3b, the full capability of the model can be best exploited at K=4. In short, setting a small K value may lead to incomplete feature fusion and not extracting deeper information. Setting a large K value may lead to too much noise introduced in the fusion process and affect the accuracy of the features. The choice of the embedding dimension determines the complexity of the potential space, and in this paper, we choose 128 dimensions as the final dimension. As shown in Figure 3c, better results can be obtained at 128 dimensions. 5.SUMMARYIn this paper, we propose MGUIL, a model for user identity alignment in heterogeneous networks, which uses a two-layer attention mechanism to fuse all the features of user nodes themselves in the first layer and to fuse the global network structure through the following relationship between users in the second layer. Finally, the results of the two layers of GAT are combined together and fed into the identity alignment supervised model, which uses known anchor nodes to find a pair of combined nodes with minimal differences and the closest distance on the low-latitude embedding space. And we test it on a real online social platform and the results are ahead with existing methods ACKNOWLEDGMENTSThis work is supported by the Nation Nature Science Foundation of China (NSFC) (NO. 61572445). REFERENCESZhang, J., Yu, P. S. and Zhou, Z.,
“Meta-path based multi-network collective link prediction,”
KDD’, 14 1286
–95
(2014). Google Scholar
Zafarani, R. and Liu, H.,
“Users joining multiple sites: friendship and popularity variations across sites,”
Information Fusion, 28 83
–89
(2016). https://doi.org/10.1016/j.inffus.2015.07.002 Google Scholar
Chen, B. and Chen, X.,
“A survey on user alignment across social networks,”
Journal of Xihua University, 40 11
–26
(2021). Google Scholar
Wang, D., Cui, P. and Zhu, W.,
“Structural deep network embedding,”
KDD’, 16 1225
–34
(2016). Google Scholar
Zafarani, R. and Liu, H.,
“Connecting corresponding identities across communities,”
International AAAI Conf. on Web and Social Media, 354
–57
(2009). Google Scholar
Velickovic, P., Cucurull, G. and Casanova, A.,
“Graph attention networks,”
ICLR’, 18 1
–12
(2017). Google Scholar
Bryan, P., Al-Rfou, R. and Steven, S.,
“Deepwalk: Online learning of social representations,”
KDD’, 14 701
–10
(2014). Google Scholar
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J. and Mei, Q.,
“LINE: Large-scale information network embedding,”
WWW’, 15 1067
–77
(2015). Google Scholar
Chu, X., Fan, X., Yao, D., Zhu, Z., Huang, J. and Bi, J.,
“Cross-network embedding for multi-network alignment,”
WWW’, 19 273
–84
(2019). Google Scholar
Oana, G., Howard, L., Gerald, F., Robin, S. and Renata, T.,
“Exploiting innocuous activity for correlating users across sites,”
WWW’, 13 447
–58
(2013). Google Scholar
Christopher, R., Yunsung, K., Augustin, C., Nitish, K. and Silvio, L.,
“Linking users across domains with location data: Theory and validation,”
WWW’, 16 707
–19
(2016). Google Scholar
Liu, L., Cheung, W. K., Li, X., and Liao, L.,
“Aligning users across social networks using network embedding,”
IJCAI’, 16 1774
–80
(2016). Google Scholar
Liu, S., Wang, S., Zhu, F., Zhang, J., and Krishnan, R.,
“HYDRA: Large-scale social identity linkage via heterogeneous behavior modeling,”
SIGMOD’14 on Management of Data, 51
–62
(2014). Google Scholar
Li, X., Shang, Y. and Cao, Y.,
“Type-aware anchor link prediction across heterogeneous networks based on graph attention network,”
AAAI Conf, 147
–55 on Artificial Intelligence,2020). https://doi.org/10.1609/aaai.v34i01.5345 Google Scholar
Wang, X., Ji, H., Shi, C., Wang, B., Ye, Y., Cui, P. and Yu, P. S.,
“Heterogeneous graph attention network WWW’19,”
2022
–32
(2019). Google Scholar
Dong, Y., Chawla, N. V. and Swami, A.,
“Metapath2vec: Scalable representation learning for heterogeneous networks,”
KDD’, 17 135
–44
(2017). Google Scholar
Velickovic, P., Cucurull, G. and Casanova, A.,
“Graph attention networks,”
ICLR’, 18 1
–12
(2017). Google Scholar
Zhang, J. and Yu, P.,
“Integrated anchor and social link predictions across social networks,”
IJ-CAI’, 15 2125
–32
(2015). Google Scholar
Zhou, F., Liu, L., Zhang, K., Trajcevski, G., Wu, J. and Zhong, T.,
“Deeplink: A deep learning approach for user identity linkage,”
in IEEE Conf. on Computer Communications,
1313
–21
(2018). Google Scholar
|