Trend Detection and Information Propagation in Dynamic Social Networks

Abstract : During the last decade, the information within Dynamic Social Networks has increased dramatically. The ability to study the interaction and communication between users in these networks can provide real time valuable prediction of the evolution of the information. The study of social networks has several research challenges, e.g. (a) real time search has to balance between quality, authority, relevance and timeliness of the content, (b) studying the information of the correlation between groups of users can reveal the influential ones, and predict media consumption, network and traffic resources, (c) detect spam and advertisements, since with the growth of social networks we also have a continuously growing amount of irrelevant information over the network. By extracting the relevant information from online social networks in real time, we can address these challenges. In this thesis a novel method to perform topic detection, classification and trend sensing in short texts is introduced. Instead of relying on words as most other existing methods which use bag-of-words or n-gram techniques, we introduce Joint Complexity, which is defined as the cardinality of a set of all distinct common factors, subsequences of characters, of two given strings. Each short sequence of text is decomposed in linear time into a memory efficient structure called Suffix Tree and by overlapping two trees, in linear or sublinear average time, we obtain the cardinality of factors that are common in both trees. The method has been extensively tested for Markov sources of any order for a finite alphabet and gave good approximation for text generation and language discrimination. The proposed method is language-agnostic since we can detect similarities between two texts in any loosely character-based language. It does not use semantics or based on a specific grammar, therefore there is no need to build any specific dictionary or stemming technique. The proposed method can be used to capture a change of topic within a conversation, as well as the style of a specific writer in a text. In the second part of the thesis, we take advantage of the nature of the data, which motivated us in a natural fashion to use of the theory of Compressive Sensing driven from the problem of target localization. Compressive Sensing states that signals which are sparse or compressible in a suitable transform basis can be recovered from a highly reduced number of incoherent random projections, in contrast to the traditional methods dominated by the well- established Nyquist-Shannon sampling theory. Based on the spatial nature of the data, we apply the theory of Compressive Sensing to perform topic classification by recovering an indicator vector, while reducing significantly the amount of information from tweets. The method works in conjunction with a Kalman filter to update the states of a dynamical system as a refinement step. In this thesis we exploit datasets collected by using the Twitter streaming API, gathering tweets in various languages and we obtain very promising results when comparing to state-of-the-art methods.
Complete list of metadatas

Cited literature [119 references]  Display  Hide  Download

https://pastel.archives-ouvertes.fr/tel-01152275
Contributor : Dimitrios Milioris <>
Submitted on : Friday, May 15, 2015 - 4:56:04 PM
Last modification on : Thursday, October 17, 2019 - 12:36:05 PM
Long-term archiving on : Thursday, April 20, 2017 - 12:22:42 AM

Identifiers

  • HAL Id : tel-01152275, version 1

Citation

Dimitrios Milioris. Trend Detection and Information Propagation in Dynamic Social Networks. Document and Text Processing. École Polytechnique, 2015. English. ⟨tel-01152275⟩

Share

Metrics

Record views

960

Files downloads

2273