Introduction to the Special Issue: Mining Social Media
José Carlos Cortizo, Francisco M. Carrero, and José María Gómez, Guest Editors
International Journal of Electronic Commerce,
Volume 15 Number 3, Spring 2011, pp. 5.
Social media are technological tools that allow users to share and discuss information. Most social media are Internet-based applications that manage textual information. These include blogs (Blogger, WordPress), microblogging (Twitter, Pownce), wikis (Wikipedia), forums, and social networks (Facebook, MySpace, LinkedIn). But there also exist other social media applications through which users share more than text, such as photographs (Flickr, Picasa), videos (YouTube, Vimeo), livecasts (Ustream), and audio and music tracks (last.fm, ccMixter, FreeSound). More recent social media include virtual worlds (Second Life), online gaming (World of Warcraft, WarHammer Online), game sharing (Miniclip.com), and mobile social media, such as Nomad Social Networks, through which users share their current position in the real world.
Social media have shifted the way information is generated and consumed. At first, information was generated by one person and “consumed” by many people, but now the information is generated by many people and consumed by many people, changing the needs in information access and management. In addition, social media applications manage huge quantities of users and data. Facebook manages more than 500 million users; it is estimated that 1 million blog posts are generated each day. The microblogging service Twitter generates several million messages each day. YouTube manages more than 15 billion videos views. All this makes clear that social media are an excellent application field for data miners.
In a Forrester Research report, “The Future of the Social Web” [1], the development of the social Web is divided into five eras. The first era witnessed the explosion of social relationships among users. Then, in the social functionality era, some Web sites started to add social functions in order to help users interact with their peers. We are now in the era of social colonization, in which technologies such as Facebook Connect or Google Friend Connect have standardized social functionalities among Web sites, and a vast majority of Web sites now include several social functionalities. Soon these federated identities will empower people to enter the era of social context with personalized and social content, and with the development of tools for personalization of social content, auguring the era of social e commerce. Social media mining is a crucial aspect of social e commerce. The five papers in this Special Issue address techniques for mining and exploiting social media data through strong theoretical grounding and empirical testing.
In the first paper, “Automatic Moderation of Online Discussion Sites,” Jean Yves Delort, Bavani Arunasalam, and Cecile Paris discuss a partially labeled data-based learning method for the automatic moderation of online forums. Online discussion sites are plagued with various types of unwanted content, such as spam and obscene or malicious content. Although there exist widely adopted prevention techniques, detection of inappropriate content remains mostly a manual task. The authors found that their partially labeled learning method is able to automatically moderate inappropriate content in online discussion sites effectively. The method is easily adaptable to other domains as well.
In the second paper, “Fusing Recommendations for Social Bookmarking Web Sites,” Toine Bogers and Antal van den Bosch present an empirical comparison of a number of item recommendation approaches for social bookmarking systems. These methods explore several sources of information, such as user tags and item metadata, and are fused/merged/combined following well-known score aggregation methods. The paper addresses a problem that is gaining momentum with a renewed and growing interest in recommender systems, which in recent years have changed from novelties used by a few e commerce sites to serious business tools that are changing the world of e commerce.
The third paper, “Expert Stock Picker: The Wisdom of (Experts in) Crowds,” by Shawndra Hill and Noah Ready-Campbell, describes a study of stock picks generated by users of a financial voting system. The authors show that the aggregated stock picks of the crowd outperform the S&P 500 index and those of a group of experts identified by their past performance give even better results. The main contribution of the paper is a genetic algorithm approach that can be used to identify the appropriate vote weights for users based on users’ prior individual voting success rankings as well as the number of most recent vote contributions to the user-generated content site.
The fourth paper, “Learning to Identify Internet Sexual Predation,” by India McGhee, Jennifer Bayzick, April Kontostathis, Lynne Edwards, Alexandra McBride, and Emma Jakubowski, focuses on an important problem that is poorly studied by other applied researchers: an approach to detecting online predators in chat communication, through a mix of techniques. The identification of sexual predators is a difficult task because it requires processing much textual content, the interactions between users, and other attributes, such as the actual context.
In the fifth paper, “Internet Auction Fraud Detection Using Social Network Analysis and Classification Tree Approaches,” Chaochang Chiu, Yungchang Ku, Ting Lie, and Yuchi Chen present a methodology to detect online auction fraud. In particular, they use social network analysis and data mining techniques to classify auction transactions into three types: normal, suspicious, and fraudulent. Internet fraud detection is a very interesting problem, specific to e commerce. In fact, detecting and avoiding fraud in e commerce would contain one of the most important barriers to online shopping.
In summary, the papers in this issue illustrate the leading research in the field of mining social media. This Special Issue also illustrates how social media information can be mined to improve e commerce and related applications. This is especially the case as social e commerce begins to rise using mined social information to improve the entire process of e commerce.
Reference
1. Owyang, J.K.; Bernoff, J.; Pflaum, C.; and Bowen, E. “The Future of the Social Web.” Forrester Research, Cambridge, MA, April 27, 2009 (available at www.forrester.com/rb/Research/future_of_social_web/q/id/46970/t/2/).
José Carlos Cortizo (josecarlos.cortizo@brainsins.com) is an associate professor at Universidad Europea de Madrid and CTO of BrainSINS, a company that develops several social information management systems, such as social recommender systems and social search engines. His research has focused on social intelligent systems, including personalization, machine learning, and recommender systems in the social Web. He has co‑organized several workshops related to mining social media information, such as the 1st International Workshop on Mining Social Media, the 2nd International Workshop on Search and Mining User-Generated Content as a CIKM 2010 workshop, and the 1st International Workshop on Adaptation, Personalization and Recommendation in the Social-Semantic Web co-located with the 7th Extended Semantic Web Conference. He has served as guest editor for several international journals and as a program committee member for several international conferences and workshops.
Francisco M. Carrero (francisco.carrero@brainsins.com) is an associate professor at Universidad Europea de Madrid and CEO of BrainSINS, a company that develops several social information management systems, such as social recommender systems and social search engines. His main research interests are in the field of information retrieval and have evolved to user modeling and social recommender systems. He has co‑organized several workshops related to mining social media information, such as the 1st International Workshop on Mining Social Media, the 2nd International Workshop on Search and Mining User-Generated Content as a CIKM 2010 workshop, and the 1st International Workshop on Adaptation, Personalization and Recommendation in the Social-Semantic Web co-located with the 7th Extended Semantic Web Conference. He has served as guest editor for several international journals and as a program committee member for several international conferences and workshops.
José María Gómez (jgomez@optenet.com) has been a professor at Complutense University of Madrid and Universidad Europea de Madrid for more than 10 years, being head director of the department of Computer Science Engineering. He is also research director at Optenet, a company that develops information filters and security systems. His main research interests are focused on natural language processing and machine learning, with applications in news and biomedicine, and adversarial information retrieval, with applications for spam and pornographic content detection. He has been a program committee member at CEAS 2007, Spam Symposium 2007, and several other conferences, such as JASIST and ECIR. He has also reviewed R&D projects for the European Commission.