Graph Mining for Influence Maximization in Social Networks

Abstract : Modern science of graphs has emerged the last few years as a field of interest and has been bringing significant advances to our knowledge about networks. Until recently the existing data mining algorithms were destined for structured/relational data while many datasets exist that require graph representation such as social networks, networks generated by textual data, 3D protein structures and chemical compounds. It has become therefore of crucial importance to be able to extract meaningful information from that kind of data and towards this end graph mining and analysis methods have been proven essential. The goal of this thesis is to study problems in the area of graph mining focusing especially on designing new algorithms and tools related to information spreading and specifically on how to locate influential entities in real-world networks. This task is crucial in many applications such as information diffusion, epidemic control and viral marketing. In the first part of the thesis, we have studied spreading processes in social networks focusing on finding topological characteristics that rank entities in the network based on their influential capabilities. We have specifically focused on the K-truss decomposition which is an extension of the core decomposition of the graph. Extensive experimental analysis showed that the nodes that belong to the maximal K-truss subgraph show a better spreading behavior when compared to baseline criteria. Such spreaders can influence a greater part of the network during the first steps of a spreading process but also the total fraction of the influenced nodes at the end of the epidemic is greater. We have also observed that node members of such dense subgraphs are those achieving the optimal spreading in the network.In the second part of the thesis, we focused on identifying a group of nodes that by acting all together maximize the expected number of influenced nodes at the end of the spreading process, formally called Influence Maximization (IM). The IM problem is actually NP-hard though there exist approximation guarantees for efficient algorithms that can solve the problem while obtaining a solution within the 63% of optimal classes of models. As those guarantees propose a greedy approximation which is computationally expensive especially for large graphs, we proposed the MATI algorithm which succeeds in locating the group of users that maximize the influence while also being scalable. The algorithm takes advantage the possible paths created in each node’s neighborhood to precalculate each node’s potential influence and produces competitive results in quality compared to those of baseline algorithms such as the Greedy, LDAG and SimPath. In the last part of the thesis, we study the privacy point of view of sharing such metrics that are good influential indicators in a social network. We have focused on designing an algorithm that addresses the problem of computing through an efficient, correct, secure, and privacy-preserving algorithm the k-core metric which measures the influence of each node of the network. We have specifically adopted a decentralization approach where the social network is considered as a Peer-to-peer (P2P) system. The algorithm is built based on the constraint that it should not be possible for a node to reconstruct partially or entirely the graph using the information they obtain during its execution. While a distributed algorithm that computes the nodes’ coreness is already proposed, dynamic networks are not taken into account. Our main contribution is an incremental algorithm that efficiently solves the core maintenance problem in P2P while limiting the number of messages exchanged and computations. We provide a security and privacy analysis of the solution regarding network de-anonimization and show how it relates to previously defined attacks models and discuss countermeasures.
Complete list of metadatas

Cited literature [165 references]  Display  Hide  Download

https://pastel.archives-ouvertes.fr/tel-01688005
Contributor : Abes Star <>
Submitted on : Friday, January 19, 2018 - 9:06:30 AM
Last modification on : Friday, May 3, 2019 - 5:39:08 PM
Long-term archiving on : Thursday, May 24, 2018 - 1:31:54 AM

File

65309_ROSSI_2017_archivage.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01688005, version 1

Citation

Maria Rossi. Graph Mining for Influence Maximization in Social Networks. Artificial Intelligence [cs.AI]. Université Paris-Saclay, 2017. English. ⟨NNT : 2017SACLX083⟩. ⟨tel-01688005⟩

Share

Metrics

Record views

526

Files downloads

1225