Abstract:
In microblog networks, when a user posts a microblog, other users may forward the post, and then the forwarding process will bring about the rapid dissemination and diffusion of information. In this paper, we propose a comprehensive and novel approach to predict user forwarding behavior. Firstly, we build the feature sets that affect the microblog forwarding, such as interest topic,geographic location, user aggregation coefficient,neighborhood overlap and so on. These features are classified into four categories: user characteristics, microblog features, network structure features, and interactive behavior characteristics. Secondly, we establish a feature selection model based on Filtering and Wrapping for predicting the forwarding behavior of users. The model includes three aspects: (1)ANOVA(Analysis of variance): The value of each feature is analyzed by variance analysis. If the feature variance is small, the feature provides less information. (2) test and point-two-column correlation analysis: They filter discrete and continuous features, respectively. (3)Wrapper analysis: In order to solve the strong correlations between the features, we use LVW(Las vagas wrapper) algorithm to analyze the above feature sets, and then obtain the optimal feature combination. Finally, we propose the forwarding prediction model based on AdaBoost(Adaptive boosting) algorithm. Experimental results demonstrate that the model has the highest precision and F1 score than Naive Bayes, Logistic Regression, Random Forest and SVM(Support vector machine), and the F1 score reached 0.885. Among different topics, our proposed AdaBoost prediction model has good recall and F1 scores for different topics. In addition, by using different feature sets for comparison experiments, it is found that the optimal features selected in this paper are very effective.

The forwarding of Microsoft is related to the three parts of idols, fans, and posts. By analyzing the relationships, we construct the initial feature set that affects the...
View more
Due to the popularity of social networks, the communications become more and more convenient. In the social networks, online users can get the information what they want. Meanwhile, they can publish their own views and interact with other online users. So without going outdoors, they know all the world’s affairs. Meanwhile, a large number of social network platforms,such as Sina microblog, Twitter, Facebook, etc. have a strong influence on the social networks, generating a large number of topics and events, which show the most important or hot news in time. Therefore, we study that microblog users’ forwarding prediction can grasp the forwarding rules accurately and effectively, hold the information dissemination and control public opinion analysis. Meanwhile, online public opinion products are in great demand in the market. So, Effective public opinion analysis products are of great value to the development of various fields. The following part includes the research status at home and abroad and the contributions of this paper.
Fan et al. [1] find that Sina microblog is very different from Twitter, it has its own characteristics between the network structures and the user behavior. Ma et al. [2] extract seven information in a microblog features from hashtag strings and tweets sets containing hashtag, and extract 11 history microblog features from social graphs formed by users with hashtag. Suh et al. [3] find that URLs and hashtags have a strong relationship with forwarding ability in content characteristics. Li et al. [4] propose five features of interest similarity, user activity, content importance, user influence and user intimacy. SVM algorithm is used to predict the size of microblog forwarding. The prediction accuracy of the experiment reaches 86.63%. Zhang and Cai [5] find that there are different types of links in the Microblog network, and put forward the characteristics of homogeneity, micronetwork structure, geographical distance and gender. Chen et al. [6] focus on user attributes, message attributes and microblog user attributes, and establish a forwarding prediction model for hot topics by quantifying characteristics of forwarding activity, forwarding interest. Cao et al. [7] and Wang et al. [8] put forward three kinds of characteristics based on microblog content, user attributes and social relations and study the topological structure of the relationship of microblog users’ interest network and propose a probabilistic cascade model. Tang et al. [9] use the idea of “microeconomics” to study the redistribution behavior of individual users and the relay relationship between users through the similarity between users, and finally transform the prediction problems into multi-task learning problems. Can et al. [10] analyze the image features besides the content and structure of tweets, and predict the number of forwarding tweets. Liu et al. [11] propose a method based on user activity and time window forwarding behavior, unreceived behavior and neglected behavior, and they propose a user forwarding rate and interaction frequency. Boyd et al. [12] analyze the psychological motivation of microblog users when forwarding posts. Lee and Sundar [13] study the credibility of information dissemination in Twitter. Zhong et al. [14] propose a method to discover user interest based on the characteristics of Microblog network. Xiao et al. [15] analyzed the factors affecting user behavior and found that implicit links play a very important role in user behavior. Michelson et al. [16] use knowledge base to eliminate the ambiguity of entities in tweet and classify them. Guo et al. [17] analyze Sina Microblog data and find that the factors affecting microblog forwarding are divided into three categories: microblog author, microblog heat and microblog interest.
Zhang et al. [18] study how the friends affect a microblog user’s forwarding behavior in a self-centered network. Galuba et al. [19] track and analyze the spread of URLs in Twitter social networks, and propose a propagation model that predicts which URLs a users may refer to. Tian et al. [20] analyze the factors affecting the dissemination of micro-blog information in regular network, random network and micro-blog information dissemination network. Bagdouri and Oard [21] propose a discriminant model to predict the possibility of users replying or retweeting on Twitter networks. Fan et al. [22], Yan et al. [23] quantify and analyze the information diffusion and network structure in Microblog. Tang et al. [24] combine personal and global characteristics, and establish the microblog forwarding model IRBLRUS based on user similarity. Yao et al. [25] make the statistics in the large-scale network structure of Sina Microblog from a macro perspective. They find that forwarding makes the network highly linked.
It is important to study the dissemination of information in social networks for microblog forwarding. Pastor-Satorras and Vespignani [26] propose a dynamic model for spreading infection on a scale-free network, and find the average lifetime and persistence of the virus on the Internet. Xiao et al. [27] analyzed the factors that affect information dissemination through a hot spread model based on user’s multidimensional attributes and evolutionary games. Liu et al. [28] propose an improved rumor propagation model SEIR(susceptible-exposed-infected-recovered), and use clustering algorithm to study the impact of user communities on the spread of microblog rumors. Kanavos et al. [29] propose a prediction model for Tweet retweeting depth and width using data mining technology. Ma et al. [30] use Sina Microblog data to study the microblog popularity. Tsur and Rappoport [31] propose a hybrid method based on linear regression to predict the propagation of an idea in a given time frame. The combination of Twitter content and topological structure features with time series can minimize the prediction error. Jenders [32] discuss important issues related to Twitter information dissemination, and analyze the impact of tweet posts and user characteristics on the dissemination. Gao et al. [33] propose an extended enhanced Poisson process model with time mapping process, and predict its future trends. Kupavskii et al. [34] study the number of forwarding times for the posts in Twitter within a fixed time period . Bild et al. [35] study the factors affecting information dissemination on Facebook platform. Li et al. [36] summarize the information dissemination of online social networks.
In this paper, we mainly analyse the factors that affect the forwarding of microblogs, and formalize these factors to obtain the complete feature sets. Then these features are filtered to get the optimal features. Finally, the algorithm based on the integrated learning framework is used to predict whether the microblog is forwarded or not. Our main contributions are listed as follows:
At present, the existing feature selection algorithms can be divided into three categories: Filter mode, Wrapper mode and Embedded mode [37].
The feature filtering model only scores each dimension features according to divergence or correlation, and each score represents the importance degree of the features. Then, the model sets a threshold to select features by the number of features or by score ranking. Representative methods include analysis of variance, information gain, test, and correlation coefficient method.