User identity alignment across heterogeneous networks based on meta-path attention

Yong Gan; Chenfang Zhang; Ruisen Yang

doi:10.1117/12.2637544

24 May 2022 User identity alignment across heterogeneous networks based on meta-path attention

Yong Gan, Chenfang Zhang, Ruisen Yang

Author Affiliations +

Proceedings Volume 12260, International Conference on Computer Application and Information Security (ICCAIS 2021); 122600Z (2022) https://doi.org/10.1117/12.2637544
Event: International Conference on Computer Application and Information Security (ICCAIS 2021), 2021, Wuhan, China

Abstract

In order to establish a unified user model in multiple networks, a method of user identity alignment in social networks has been proposed. Mainly focusing on the user identity alignment with homogeneous network with only one type of node and edge, the former studies has been separated into three types: (1) studies based on network topology only, (2) studies based on user behavior only, (3) studies based on both user-generated content and network topology. But the defect of the former studies is obvious that there is no real social platform with only one type of node and edge in the network. This type of network is called a heterogeneous network. This paper proposes a model that can perform user identity alignment on heterogeneous networks, named user alignment across heterogeneous networks based on meta-path attention (MGUIL). MGUIL fuses meta-path features by introducing a graph attention mechanism in two heterogeneous networks and obtains local and global information through a two-layer GAT network, finally aligning the information in both networks with a unified framework. This method not only solves the alignment problem on heterogeneous network but also considers the global information propagation as a unified framework. We compare it with the existing method in real networks and confirm that MGUIL can improve user identity alignment accuracy.

1. INTRODUCTION

With the development of social networks, a person may have accounts on several platforms. Identification these accounts on different platforms is useful for cross-domain recommendation, link prediction¹, network dynamics², cyberspace security and other research. Thus the user identity alignment problem arises, where user alignment across social networking platforms is defined as linking users with the same identity across different social platforms. User alignment is also known as user identification, anchor link prediction (ALP), profile linkage, user identity linkage (UIL), etc. The purpose is to use different social network platforms links to users of the same natural person³.

At first, according to the network structure⁴, users with similar network structures, though they belonged to a natural person. Although this approach of predicting only based on attention and being followed is feasible⁵, it directly loses the user’s presence on the Internet. The content of this part of the information generated, and the prerequisite of this method is that the network maintains consistency, then some network platforms are blogs, some are videos, and network consistency cannot be fully guaranteed.

Subject to the development of network embedding techniques^6-9, more studies are now using network embedding methods. Some scholars have analyzed users’ writing styles on social platforms¹⁰ and considered the group of users with the most similar writing styles as the same natural person. Some scholars perform user alignment by analyzing users’ timestamps and location information¹¹. Some Scholars have proposed a method combining user behavior and network structure together into user node features¹² to improve this problem of content loss. The textual information generated by the user and the network structure is fused with each other¹³ for network alignment. Some scholars have used attention mechanisms for user alignment¹⁴. However, some of these approaches lack a unified model framework. Some do not take into account the global structure, and all of them work with homogeneous networks, that is, networks with only one type of node and edge.

However, users in modern social networks will generate a large amount of content. For example, on Twitter and Facebook, there are not only relationships between users, but relationships between users and tweets, the whole network has more than one type of node and edge, and some scholars also consider the user alignment problem of heterogeneous networks, downscaling and fusion of multiple types of node features on heterogeneous networks¹⁵. Although this method is to some extent for heterogeneous network adapts but loses the global information.

To address the above problems, this paper proposes a new approach to solve the user alignment problem on heterogeneous networks, MGUIL. This method uses the multi-layer graph attention mechanism and the idea of meta-paths^6,16 to make a fusion by user-generated content in the first layer of GAT, taking the features of the original content and the features of this user node as the second layer of GAT network, all the meta-path fusion vectors are fused according to the network structure. Thus the global information is obtained. The same process is done for the other network, and finally, the two sets of node vectors with low latitude are aligned. It is worth noting that when feature extraction is performed for the second network, the parameters trained in the first network are used, which ensures that the two high-latitude nodes are mapped into the same low-latitude space.

The contributions of this paper are summarized as follows:

(i) The method can capture local and global features in a heterogeneous network using meta-paths and attention mechanisms to map a high-dimension node vector into a low-dimension space.
(ii) MGUIL is a unified framework that can complete the node feature extraction and user identity alignment for both networks simultaneously.
(iii) After testing using real data, it is shown that MGUIL is able to perform the user alignment task better on heterogeneous networks, which is better than the existing algorithms.

2. PROBLEM DEFINITION

This section defines the heterogeneous network and introduces two new nodes used in the text: the meta-path fusion vector and the global fusion vector. Finally, the user identity alignment module is defined.

2.1

Heterogeneous networks

A heterogeneous network means that there is more than one type of node and edge, which can be G = (V,E,T) to represent that V is the set of nodes, E is a set of edges, and T is a set of all types in the network.

As an example, the network in Figure 1 is given as a network G^s = (V^S,E^S,T^S), where T^s = {t₁, t₂ ⋯t_p}, represents p the different node types, and , represents the n nodes in the network and each specifies the type of that node, . Figure 1 will also be used as an example in the next presentation.

Figure 1.

Two citation networks and predefined meta-paths.

2.2

Meta-path fusion vector

A meta-path is a path containing a sequence of relations, such as the relation A-P-V in Figure 1, which is: author A publishes a paper P in journal V. The meta-path fusion vector proposed in this paper refers to the use of attention mechanism to fuse the features of all nodes on a meta-path to the first node (the node usually represents the user node in social networks) as a way to obtain information about the locality.

Given a node , the set of other nodes on the meta-path is given by , the 𝑡𝑟 ∈ 𝑇 to denote. For any After one layer of initial linear transformation to obtain , the nodes that would be obtained after one layer of GAT The features of other nodes on the meta-path, i.e., the meta-path fusion vector, are denoted by to represent.

2.3

Global fusion vector

Given a user node that is obtained after the first layer , then use the attention and followed information between users, and use the attention mechanism in the second layer to put , the user neighbors of , and k ∈ all of them are fused, i.e., we get the global fusion vector .

2.4

User identity alignment

Given two heterogeneous social networksG^s = (V^S,E^S,T^S) andG^g = (V^g,E^g,T^g) that are also known to have anchor links denote G^s in and G^g in belong to the same natural person. The problem of user identity alignment is then given on the basis of which i ∈ n, the k ∈ (for G^g network as well), finding G^s and G^g two other users in the network that belong to the same natural person .

3. MODELS

In this paper, we propose a unified framework MGUIL to solve the user identity alignment problem in heterogeneous networks, which uses a two-layer graph attention mechanism to fuse the meta-path vector and the global vector associated with each user node, respectively, and obtain the final combined representation of each user node . Based on this, two user nodes are predicted to belong to the same natural person or not by collaboratively measuring the two-by-two similarity of each element vector in the combined vector.

In this chapter, it will be presented how the original node features are turned into the final combined representation through two layers of GAT to , and it is worth noting that the G^s learned attention hyperparameters can be directly applied to another network G^g in which these parameters are shared by both networks so that the nodes in both networks can be mapped to the same low-latitude space to facilitate the next user identity alignment prediction.

3.1

Meta-path fusion vector

The meta-path fusion vector is generated by the first layer of GAT. Here we take G^s network as an example, for each node we first go through a layer of initialization to reduce it to a representation of f. For nodes with textual content (e.g., tweets and user profiles) we use Word2Vec to initialize their dimension feature vectors, and for nodes that are not textual, we use random initialization to represent them.

For each feature vector node, as shown in Figure 2, a linear transformation is first performed to obtain the weight matrix W ∈ R^fxf’, which can turn the initial feature vector into a higher latitude vector, denoted by to represent. At this point, a point is selected to be shown in Figure 2 . For example, its high-latitude feature vector is , according to the predefined meta-path, to direct the attention mechanism of this layer to notice only the feature vectors on the meta-path, and to fuse these vectors into . According to the predefined meta-path: (t₁, t₂, t₃), it is possible to find need to pay attention to the node , and from equation (1) it is possible to sum each node vector on the meta-path with the attention coefficients when performing fusion.

Figure 2.

MGUIL: Extraction of the final representation by two layers of GAT.

where are the dimension f input feature vector, W is f × f’ the dimension weight matrix, after the eigenvectors are combined, thus is the dimension vector of 2 x f’.

Then we have to calculate · for the final attention coefficients since we want to calculate the impact of each meta-path node on (including the effect of itself), so a normalization operation is performed on the attention coefficients.

After that, the feature fusion operation can be performed, and each feature on the meta-path is fused according to the attention factor according to equation (3), including self-attention.

In this paper, in order to enhance the fusion of relevant features, a multiple attention head mechanism is used for the meta-path fusion vector, where K represents the number of attention heads, thus can be expressed as equation (4).

Where and and W^k represent the k’th attention coefficient and weight matrix.

3.2

Global fusion vector

After the first layer of GAT, we get the meta-path fusion vector that fuses all of its own features. In the second layer, it will focus on fusing the features between user type nodes, and since the meta-path fusion vector already carries all of the user’s information, it can be concluded that the second layer is a global feature fusion. Then the influence between each user type node is shown in equation (5), and the final attention coefficient is shown in equation (6).

where are the dimension f’ input feature vector, and M is the f’ x f” the dimension weight matrix, the after the eigenvectors are combined, so is the 2 × f” the vector of dimension.

The final global fusion vector is obtained by weighting all the node type vectors based on the calculated attention coefficients according to equation (7).

So far, in order to ensure that our extracted features can be well represented both locally and globally, the first GAT layer and the second GAT layer are combined together, i.e., the combined vector, as shown in Figure 2. In the case of G^s and G^g After performing the same operation, we can map the nodes in these two networks into a low-dimension space, and then we can perform user identity alignment in the low-dimension space. It is worth noting that in some cases the node types in the two networks do not coincide, in which case one should align to the one with more node types and initialize the feature vector of the missing node type in the other network to f dimension all zeros.

3.3

User alignment model

Based on the above two operations, the high latitude nodes of two networks can be mapped to the same low latitude space. At this point, we can determine whether the two final combined vectors are the same natural person based on their similarity/distance. Already existing anchor link , it is now necessary to map the anchor links from G^s and G^g network to find the user and they belong to the same natural person, so there is an anchor link . We should make the distance between two vectors belonging to the same natural person on the low-dimension space as small as possible, and make the distance between vectors not belong to a natural person on the low-dimension space as large as possible, so the loss function is as equation (8).

where d is a distance function, and the Chebyshev Distance is used in the text to calculate the distance between the metapath fusion vector and the global fusion vector of both, respectively. represents the distance between the user with which this node is related, i.e., from the combined vector of that user. w and λ are used as hyperparameters to balance the effect of the meta-path fusion vector and the global fusion vector on the results, and .

4. EXPERIMENT

4.1

Datasets

Twitter-Foursquare is a heterogeneous pair of networks in which node types include users, tweets, and geographic locations^{17, 18}. Foursquare is a platform that encourages mobile phone users to share information such as their current location with others. The details of this dataset are listed in Table 1.

Table 1.

Twitter-foursquare dataset.

Datasets	Nodes	Node type
Twitter	5,220	User
9,490,707	Tweet
297,183	Location
Foursquare	5,315	User
48,755	Tweet
38,921	Location

4.2

Baselines

To evaluate the performance of the proposed MGUIL, we compare our framework with the following state-of-the-art methods:

• IONE (2016): IONE uses the following and followed the relationship between users as a basis to map two networks into space as a whole and still maintains the same following and followed relationship after the mapping, and then aligns users’ identities by anchor links¹².
• DeepLink (2018): DeepLink introduces Deep Learning on the traditional user alignment method by sampling through the random walk, then using skip-gram to do an embedding, and finally pre-training two preliminary mapping functions between networks A and B. At this point, user identity alignment can be formalized as A dual learning game¹⁹.
• HAN (2019): HAN obtain structural and semantic information about the network in hyperbolic space. Using metapath-guided random wandering to obtain structural and semantic association information in heterogeneous networks, the distance of nodes in the hyperbolic space is used as a measure of similarity between nodes¹⁵.

4.3

Comparison of experimental results

In the experiments, the hyperparameters of the proposed method MGUIL in this paper w = 0.6, K = 3. f’ = 256 and f” = 128. And the methods in BaseLines are set to be consistent with those in the original paper. Table 2 shows the performance of each method, using the evaluation metrics Precision@k (P@k) and MAP¹⁹.

Table 2.

Twitter-foursquare dataset.

	p@1	P@10	P@20	P@30	MAP@30
IONE	22.38	46.38	55.71	59.70	32.79
DeepLink	34.47	66.09	70.00	70.48	47.78
HAN	38.69	71.16	75.49	80.95	52.42
MGUIL*	42.67	90.23	94.47	96.22	57.43

Note: * Means the method works best.

From Table 2, it can be found that:

(i) The reason is that MGUIL is specially designed for heterogeneous networks, which is more suitable for the fusion of multiple types of node features, and the attention mechanism added to MGUIL can express the influence between nodes more clearly than the previous two.
(ii) MGUIL still has a large improvement in the correct rate of heterogeneous social user identity alignment compared to HAN, because MGUIL evaluates different types of nodes for influence and then weighted fusion according to their influence on user nodes.
(iii) MGUIL combines local and global information together to form a new combined vector representation that captures the information in the network more comprehensively and deeply and represents each user’s characteristics more completely than several methods in baselines.

4.4

Hyperparameter setting experiment

After comparing with baselines’ method, we then set the values of the hyperparameters in MGUIL differently to evaluate the effect of different hyperparameter settings on the results of this model, as a way to find the best hyperparameters for MGUIL.

w and 𝜆 are hyperparameters used to balance the influence of meta-path fusion vectors and global fusion vectors on the final combined vector, and it can be seen from Figure 3a that MGUIL has the best effect when w = 0.6, which indicates that there is a balance point between the meta-path fusion vector and the global fusion vector and that the influence of the meta-path fusion vector on the combined vector is somewhat more important than that of the global fusion vector in terms of the percentage.

Figure 3.

Effect of equilibrium factor w, number of multiple attention heads k, embedding dimension on the results.

Since the multiple attention head mechanism is introduced in the first layer of GAT, the number of attention heads K also affects the final effect. As can be seen from Figure 3b, the full capability of the model can be best exploited at K=4. In short, setting a small K value may lead to incomplete feature fusion and not extracting deeper information. Setting a large K value may lead to too much noise introduced in the fusion process and affect the accuracy of the features.

The choice of the embedding dimension determines the complexity of the potential space, and in this paper, we choose 128 dimensions as the final dimension. As shown in Figure 3c, better results can be obtained at 128 dimensions.

5. SUMMARY

In this paper, we propose MGUIL, a model for user identity alignment in heterogeneous networks, which uses a two-layer attention mechanism to fuse all the features of user nodes themselves in the first layer and to fuse the global network structure through the following relationship between users in the second layer. Finally, the results of the two layers of GAT are combined together and fed into the identity alignment supervised model, which uses known anchor nodes to find a pair of combined nodes with minimal differences and the closest distance on the low-latitude embedding space. And we test it on a real online social platform and the results are ahead with existing methods

ACKNOWLEDGMENTS

This work is supported by the Nation Nature Science Foundation of China (NSFC) (NO. 61572445).

REFERENCES

[1]

Zhang, J., Yu, P. S. and Zhou, Z., “Meta-path based multi-network collective link prediction,” KDD’, 14 1286 –95 (2014). Google Scholar

[2]

Zafarani, R. and Liu, H., “Users joining multiple sites: friendship and popularity variations across sites,” Information Fusion, 28 83 –89 (2016). https://doi.org/10.1016/j.inffus.2015.07.002 Google Scholar

[3]

Chen, B. and Chen, X., “A survey on user alignment across social networks,” Journal of Xihua University, 40 11 –26 (2021). Google Scholar

[4]

Wang, D., Cui, P. and Zhu, W., “Structural deep network embedding,” KDD’, 16 1225 –34 (2016). Google Scholar

[5]

Zafarani, R. and Liu, H., “Connecting corresponding identities across communities,” International AAAI Conf. on Web and Social Media, 354 –57 (2009). Google Scholar

[6]

Velickovic, P., Cucurull, G. and Casanova, A., “Graph attention networks,” ICLR’, 18 1 –12 (2017). Google Scholar

[7]

Bryan, P., Al-Rfou, R. and Steven, S., “Deepwalk: Online learning of social representations,” KDD’, 14 701 –10 (2014). Google Scholar

[8]

Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J. and Mei, Q., “LINE: Large-scale information network embedding,” WWW’, 15 1067 –77 (2015). Google Scholar

[9]

Chu, X., Fan, X., Yao, D., Zhu, Z., Huang, J. and Bi, J., “Cross-network embedding for multi-network alignment,” WWW’, 19 273 –84 (2019). Google Scholar

[10]

Oana, G., Howard, L., Gerald, F., Robin, S. and Renata, T., “Exploiting innocuous activity for correlating users across sites,” WWW’, 13 447 –58 (2013). Google Scholar

[11]

Christopher, R., Yunsung, K., Augustin, C., Nitish, K. and Silvio, L., “Linking users across domains with location data: Theory and validation,” WWW’, 16 707 –19 (2016). Google Scholar

[12]

Liu, L., Cheung, W. K., Li, X., and Liao, L., “Aligning users across social networks using network embedding,” IJCAI’, 16 1774 –80 (2016). Google Scholar

[13]

Liu, S., Wang, S., Zhu, F., Zhang, J., and Krishnan, R., “HYDRA: Large-scale social identity linkage via heterogeneous behavior modeling,” SIGMOD’14 on Management of Data, 51 –62 (2014). Google Scholar

[14]

Li, X., Shang, Y. and Cao, Y., “Type-aware anchor link prediction across heterogeneous networks based on graph attention network,” AAAI Conf, 147 –55 on Artificial Intelligence,2020). https://doi.org/10.1609/aaai.v34i01.5345 Google Scholar

[15]

Wang, X., Ji, H., Shi, C., Wang, B., Ye, Y., Cui, P. and Yu, P. S., “Heterogeneous graph attention network WWW’19,” 2022 –32 (2019). Google Scholar

[16]

Dong, Y., Chawla, N. V. and Swami, A., “Metapath2vec: Scalable representation learning for heterogeneous networks,” KDD’, 17 135 –44 (2017). Google Scholar

[17]

Velickovic, P., Cucurull, G. and Casanova, A., “Graph attention networks,” ICLR’, 18 1 –12 (2017). Google Scholar

[18]

Zhang, J. and Yu, P., “Integrated anchor and social link predictions across social networks,” IJ-CAI’, 15 2125 –32 (2015). Google Scholar

[19]

Zhou, F., Liu, L., Zhang, K., Trajcevski, G., Wu, J. and Zhong, T., “Deeplink: A deep learning approach for user identity linkage,” in IEEE Conf. on Computer Communications, 1313 –21 (2018). Google Scholar

Citation Download Citation

Yong Gan, Chenfang Zhang, and Ruisen Yang "User identity alignment across heterogeneous networks based on meta-path attention", Proc. SPIE 12260, International Conference on Computer Application and Information Security (ICCAIS 2021), 122600Z (24 May 2022); https://doi.org/10.1117/12.2637544

Access the abstract

PROCEEDINGS
8 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Social networks

Feature extraction

Internet

Network security

1.

INTRODUCTION

2.

PROBLEM DEFINITION

2.1

Heterogeneous networks

Figure 1.

2.2

Meta-path fusion vector

2.3

Global fusion vector

2.4

User identity alignment

3.

MODELS

3.1

Meta-path fusion vector

Figure 2.

3.2

Global fusion vector

3.3

User alignment model

4.

EXPERIMENT

4.1

Datasets

Table 1.

4.2

Baselines

4.3

Comparison of experimental results

Table 2.

4.4

Hyperparameter setting experiment

Figure 3.

5.

SUMMARY

ACKNOWLEDGMENTS

REFERENCES

Keywords/Phrases

Search In:

Publication Years