Social Bot Field Experiments in Computational Communi-cation: Concept, Method, and Application

WU Ye, LI Zhanghao, MIN Yong

Chinese Journal of Journalism & Communication ›› 2024, Vol. 46 ›› Issue (9) : 135-154.

PDF(1704 KB)
PDF(1704 KB)
Chinese Journal of Journalism & Communication ›› 2024, Vol. 46 ›› Issue (9) : 135-154.
Research Articles

Social Bot Field Experiments in Computational Communi-cation: Concept, Method, and Application

Author information +
History +

Abstract

With advances in artificial intelligence technologies, social bots are increasingly being applied in social science research. From the perspective of computational communication, we discuss the conception, methodology, experimental design, and practical applications of social bot field experiments. Leveraging the advantages of big data analytics and simulation techniques, social bot field experiments have now evolved into a highly controllable research method. We highlighted how this method provides new approaches for observing, analyzing, and understanding communication phenomena in digital media environments, contributing to the verification, exploration, and expansion of communication theories. In the empirical research section, we conducted a preliminary investigation into the causes of filter bubbles through field experiments with social robots. The study finds that despite controlling for reading behavior preferences, social robot accounts may still fall into filter bubbles after random reading experiments.

Key words

social bot / field experiment / digital trace / algorithm audit / filter bubble

Cite this article

Download Citations
WU Ye , LI Zhanghao , MIN Yong. Social Bot Field Experiments in Computational Communi-cation: Concept, Method, and Application[J]. Chinese Journal of Journalism & Communication. 2024, 46(9): 135-154

References

[1]
方师师(2016). 算法机制背后的新闻价值观——围绕“Facebook偏见门”事件的研究. 《新闻记者》,(9),39-50.
[2]
高山冰, 汪婧(2020). 智能传播时代社交机器人的兴起、挑战与反思. 《现代传播(中国传媒大学学报)》,(11),8-11+18.
[3]
葛岩, 秦裕林, 赵汗青(2020). 社交媒体必然带来舆论极化吗:莫尔国的故事. 《国际新闻界》,(2),67-99.
[4]
韩娜, 孙颖(2022). 国家安全视域下社交机器人涉华议题操纵行为探析. 《现代传播(中国传媒大学学报)》,(8),40-49.
[5]
何塞·范·迪克, 孙少晶, 陶禹舟.(2021). 平台化逻辑与平台社会——对话前荷兰皇家艺术和科学院主席何塞·范·迪克. 《国际新闻界》,(9),49-59.
[6]
李晓静, 付思琪(2020). 智能时代传播学受众与效果研究:理论、方法与展望——与香港城市大学祝建华教授、斯坦福大学杰佛瑞·汉考克教授对谈. 《国际新闻界》,(3),108-128.
[7]
李永宁, 吴晔, 张伦(2021). 动态社团发现研究综述. 《复杂系统与复杂性科学》,(2),1-8+88.
[8]
刘河庆, 梁玉成(2023). 透视算法黑箱:数字平台的算法规制与信息推送异质性. 《社会学研究》,(2),49-71+227.
[9]
罗俊(2020). 计算·模拟·实验:计算社会科学的三大研究方法. 《学术论坛》,(1),35-49.
[10]
彭兰(2020). 导致信息茧房的多重因素及“破茧”路径. 《新闻界》,(1),30-38+73.
[11]
申琦, 王璐瑜(2021). 当“机器人”成为社会行动者:人机交互关系中的刻板印象. 《新闻与传播研究》,(2),37-52+127.
[12]
沈伟伟(2019). 算法透明原则的迷思——算法规制理论的批判. 《环球法律评论》,(6),20-39.
[13]
师文, 陈昌凤(2020). 社交机器人在新闻扩散中的角色和行为模式研究——基于《纽约时报》“修例”风波报道在Twitter上扩散的分析. 《新闻与传播研究》,(5),5-20+126.
[14]
师文, 陈昌凤(2023). 平台算法的“主流化”偏向与“个性化”特质研究——基于计算实验的算法审计. 《新闻记者》,(11),3-14.
[15]
宋美杰, 刘云(2023). 智能新物种崛起与人机传播模式重构. 《福建师范大学学报(哲学社会科学版)》,(5),90-100.
[16]
塔娜, 林聪(2023). 点击搜索之前:针对搜索引擎自动补全算法偏见的实证研究. 《国际新闻界》,(8),132-154.
[17]
王斌, 李宛真(2018). 如何戳破“过滤气泡”算法推送新闻中的认知窄化及其规避. 《新闻与写作》,(9),20-26.
[18]
王成军, 党明辉, 杜骏飞(2019). 找回失落的参考群体:对沉默的螺旋理论的边界条件的考察. 《新闻大学》,(4),13-29+116-117.
[19]
王敏, 张子柯(2022). 计算传播学的仿真研究范式:优势、挑战与发展. 《新闻界》,(10),64-74.
[20]
徐明华, 魏子瑶(2023). 算法伦理的治理新范式:算法审计的兴起、发展与未来. 《当代传播》,(1),80-86.
[21]
杨敏, 熊则见(2013). 模型验证——基于主体建模的方法论问题. 《系统工程理论与实践》,(6),1458-1470.
[22]
张洪忠, 王競一(2023). 机器行为范式:传播学研究挑战与拓展路径. 《现代传播(中国传媒大学学报)》,(1),1-9.
[23]
张伦, 邓依林(2021). 网络议程设置理论与方法:计算传播学视角. 《中国传媒大学学报(自然科学版)》,(1),50-54.
[24]
赵蓓, 张洪忠(2023). 议程设置中的时间变化:基于社交机器人、媒体和公众时间滞后分析. 《国际新闻界》,(2),52-80.
[25]
周葆华(2020). “计算”的传播与“传播”的计算. 《新闻与写作》,(5),1.
[26]
周丽华, 王家龙, 王丽珍, 陈红梅, 孔兵(2022). 异质信息网络表征学习综述. 《计算机学报》,(1),160-189.
[27]
Argyle L. P., Busby E. C., Fulda N., Gubler J. R., Rytting C., & Wingate D. (2023). Out of One, Many: Using Language Models to Simulate Human Samples. Political Analysis, 31(3), 337-351.
We propose and explore the possibility that language models can be studied as effective proxies for specific human subpopulations in social science research. Practical and research applications of artificial intelligence tools have sometimes been limited by problematic biases (such as racism or sexism), which are often treated as uniform properties of the models. We show that the “algorithmic bias” within one such tool—the GPT-3 language model—is instead both fine-grained and demographically correlated, meaning that proper conditioning will cause it to accurately emulate response distributions from a wide variety of human subgroups. We term this propertyalgorithmic fidelityand explore its extent in GPT-3. We create “silicon samples” by conditioning the model on thousands of sociodemographic backstories from real human participants in multiple large surveys conducted in the United States. We then compare the silicon and human samples to demonstrate that the information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and sociocultural context that characterize human attitudes. We suggest that language models with sufficient algorithmic fidelity thus constitute a novel and powerful tool to advance understanding of humans and society across a variety of disciplines.
[28]
Atteveldt W. van, & Peng T.-Q. (2018). When Communication Meets Computation: Opportunities, Challenges, and Pitfalls in Computational Communication Science. Communication Methods and Measures, 12(2-3), 81-92.
[29]
Bakshy E., Messing S., & Adamic L. A. (2015). Exposure to ideologically diverse news and opinion on Facebook. Science, 348(6239), 1130-1132.
Exposure to news, opinion, and civic information increasingly occurs through social media. How do these online networks influence exposure to perspectives that cut across ideological lines? Using deidentified data, we examined how 10.1 million U.S. Facebook users interact with socially shared news. We directly measured ideological homophily in friend networks and examined the extent to which heterogeneous friends could potentially expose individuals to cross-cutting content. We then quantified the extent to which individuals encounter comparatively more or less diverse content while interacting via Facebook's algorithmically ranked News Feed and further studied users' choices to click through to ideologically discordant content. Compared with algorithmic ranking, individuals' choices played a stronger role in limiting exposure to cross-cutting content. Copyright © 2015, American Association for the Advancement of Science.
[30]
Bennett W. L., & Iyengar S. (2008). A New Era of Minimal Effects? The Changing Foundations of Political Communication. Journal of Communication, 707-731.
[31]
Bond R. M. et al. (2012). A 61-million-person experiment in social influence and political mobilization. Nature, 489(7415), 295-298.
[32]
Borgesius F., Trilling D., Möller J., Bodó B., Vreese C. D., & Helberger N. (2016). Should we worry about filter bubbles?. Internet Policy Review, 5(1), 1-16.
[33]
Chaffee S. H., & Metzger M. J. (2001). The End of Mass Communication? Mass Communication and Society, 4(4), 365-379.
[34]
Chen W., Pacheco D., Yang K.-C., & Menczer F. (2021). Neutral bots probe political bias on social media. Nature Communications, 12(1), 5580.
[35]
Choi S. (2020). When Digital Trace Data Meet Traditional Communication Theory: Theoretical/Methodological Directions. Social Science Computer Review, 38(1), 91-107.
This study suggests one direction of theoretical and methodological coupling of communication research with the digital trace data, utilizing its differences from the traditional social science approach (e.g., sampling vs. population, normal distribution vs. power–law distribution, generalization vs. simulation, deductive vs. inductive, and perceived vs. actual). We propose specific examples of (i) combining communication research with trace data methodologically and theoretically; (ii) collaborating with linguistic psychology complemented with the automated content analysis and natural language processing techniques; and (iii) creating new theoretical inquiries by configuring the granular level of interactivity and underlying dynamics, observing the longitudinal change of interactions, and discovering the neglected presence of outliers and the invisibles. We expect the direction suggested by this study contributes to deepening our understanding of human communication behavior.
[36]
Cinelli M., Morales G. D. F., Galeazzi A., Quattrociocchi W., & Starnini M. (2021). The echo chamber effect on social media. Proceedings of the National Academy of Sciences, 118(9), e2023301118.
[37]
Davis C. A., Varol O., Ferrara E., Flammini A., & Menczer F. (2016, April). Botornot:A system to evaluate social bots. In Proceedings of the 25th international conference companion on world wide web. Montréal Québec.
[38]
de Vreese C. H., & Neijens P. (2016). Measuring Media Exposure in a Changing Communications Environment. Communication Methods and Measures, 10(2-3), 69-80.
[39]
Ferrara E. (2023). Social bot detection in the age of ChatGPT: Challenges and opportunities. First Monday, 28(6).
[40]
Ferrara E., Varol O., Davis C., Menczer F., & Flammini A. (2016). The rise of social bots. Communications of the ACM, 59(7), 96-104.
[41]
Guess A. M. (2015). Measure for Measure: An Experimental Test of Online Political Media Exposure. Political Analysis, 23(1), 59-75.
Self-reported measures of media exposure are plagued with error and questions about validity. Since they are essential to studying media effects, a substantial literature has explored the shortcomings of these measures, tested proxies, and proposed refinements. But lacking an objective baseline, such investigations can only make relative comparisons. By focusing specifically on recent Internet activity stored by Web browsers, this article's methodology captures individuals' actual consumption of political media. Using experiments embedded within an online survey, I test three different measures of media exposure and compare them to the actual exposure. I find that open-ended survey prompts reduce overreporting and generate an accurate picture of the overall audience for online news. I also show that they predict news recall at least as well as general knowledge. Together, these results demonstrate that some ways of asking questions about media use are better than others. I conclude with a discussion of survey-based exposure measures for online political information and the applicability of this article's direct method of exposure measurement for future studies.
[42]
Holbert R. L., Garrett R. K., & Gleason L. S. (2010). A New Era of Minimal Effects? A Response to Bennett and Iyengar. Journal of Communication, 60(1), 15-34.
[43]
Hosseinmardi H., Ghasemian A., Rivera-Lanas M., Horta Ribeiro M., West R., & Watts D. J. (2024). Causally estimating the effect of YouTube’s recommender system using counterfactual bots. Proceedings of the National Academy of Sciences, 121(8), e2313377121.
[44]
Kramer A. D. I., Guillory J. E., & Hancock J. T. (2014). Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences, 111(24), 8788-8790.
[45]
Lazer D. (2015). The rise of the social algorithm. Science, 348(6239), 1090-1091.
\n Does content curation by Facebook introduce ideological bias?\n \n [Also see Report by\n \n Bakshy\n et al.\n \n ]\n \n
[46]
Ledford H. (2020). Social scientists battle bots to glean insights from online chatter. Nature, 578(7793), 17.
[47]
Levy R. (2021). Social Media, News Consumption, and Polarization: Evidence from a Field Experiment. American Economic Review, 111(3), 831-870.
Does the consumption of ideologically congruent news on social media exacerbate polarization? I estimate the effects of social media news exposure by conducting a large field experiment randomly offering participants subscriptions to conservative or liberal news outlets on Facebook. I collect data on the causal chain of media effects: subscriptions to outlets, exposure to news on Facebook, visits to online news sites, and sharing of posts, as well as changes in political opinions and attitudes. Four main findings emerge. First, random variation in exposure to news on social media substantially affects the slant of news sites that individuals visit. Second, exposure to counter-attitudinal news decreases negative attitudes toward the opposing political party. Third, in contrast to the effect on attitudes, I find no evidence that the political leanings of news outlets affect political opinions. Fourth, Facebook’s algorithm is less likely to supply individuals with posts from counter-attitudinal outlets, conditional on individuals subscribing to them. Together, the results suggest that social media algorithms may limit exposure to counter-attitudinal news and thus increase polarization. (JEL C93, D72, L82)
[48]
Lombard M., & Xu K. (2021). Social responses to media technologies in the 21st century: The media are social actors paradigm. Human-Machine Communication, 2, 29-55.
[49]
McCombs M. E., & Shaw D. L. (1972). The agenda-setting function of mass communication. Public Opinion Quarterly, 36(2), 176-187.
[50]
Min Y., Jiang T., Jin C., Li Q., & Jin X. (2019). Endogenetic structure of filter bubble in social networks. Royal Society Open Science, 6(11), 190868.
[51]
Naab T. K., & Sehl A. (2017). Studies of user-generated content: A systematic review. Journalism, 18(10), 1256-1273.
This article presents a review of communication research on user-generated content with a special focus on studies which include a content analysis. The trends of research on this comparatively new and rapidly developing subject are systematically discussed and desiderata are identified. The evaluation is based on a content analysis of pertinent approaches in nine relevant international peer-reviewed journals published from 2004 to 2012. From the results, the article concludes that user-generated content is approached by scholars from a variety of perspectives and offers scope for interdisciplinary cooperation but also notes that several of the challenges posed by the continuously changing nature of the content are not fully met.
[52]
Nass C., & Moon Y. (2000). Machines and mindlessness: Social responses to computers. Journal of Social Issues, 56(1), 81-103.
[53]
Ohme J. et al. (2023). Digital Trace Data Collection for Social Media Effects Research: APIs, Data Donation, and (Screen) Tracking. Communication Methods and Measures, 0(0), 1-18.
[54]
Pariser E. (2012). The Filter Bubble: How the New Personalized Web Is Changing What We Read and How We Think (Reprint edition). London, UK: Penguin Books.
[55]
Parry D. A., Davidson B. I., Sewall C. J. R., Fisher J. T., Mieczkowski H., & Quintana D. S. (2021). A systematic review and meta-analysis of discrepancies between logged and self-reported digital media use. Nature Human Behaviour, 5(11), 1535-1547.
There is widespread public and academic interest in understanding the uses and effects of digital media. Scholars primarily use self-report measures of the quantity or duration of media use as proxies for more objective measures, but the validity of these self-reports remains unclear. Advancements in data collection techniques have produced a collection of studies indexing both self-reported and log-based measures. To assess the alignment between these measures, we conducted a pre-registered meta-analysis of this research. Based on 106 effect sizes, we found that self-reported media use correlates only moderately with logged measurements, that self-reports were rarely an accurate reflection of logged media use and that measures of problematic media use show an even weaker association with usage logs. These findings raise concerns about the validity of findings relying solely on self-reported measures of media use.
[56]
Pasquale F. (2016). The black box society:The secret algorithms that control money and information (First Harvard University Press paperback edition). Cambridge, MA: Harvard University Press.
[57]
Prior M. (2013). The Challenge of Measuring Media Exposure: Reply to Dilliplane, Goldman, and Mutz. Political Communication, 30(4), 620-634.
[58]
Rahwan I. et al. (2019). Machine behaviour. Nature, 568(7753), 477-486.
[59]
Reeves B., & Nass C. I. (1996). The media equation:How people treat computers, television, and new media like real people and places. New York, NY, US: Cambridge University Press.
[60]
Salganik M. J. (2018). Bit by bit:Social research in the digital age. Princeton: Princeton University Press.
[61]
Sandvig C., Hamilton K., Karahalios K., & Langbort C. (2014). Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms. Data and Discrimination: Converting Critical Concerns into Productive Inquiry, 22, 4349-4357.
[62]
Shanahan M., McDonell K., & Reynolds L. (2023). Role play with large language models. Nature, 623(7987), 493-498.
[63]
Sherry J. L. (2015). The complexity paradigm for studying human communication: A summary and integration of two fields. Review of Communication Research, 3, 22-54.
\r\nThere have been growing concerns regarding the potential impact of social media on democracy and public debate. While some theorists have claimed that ICTs and social media would bring about a new independent public sphere and increase exposure to political divergence, others have warned that they would lead to polarization through the formation of echo chambers. The issue of social media echo chambers is both crucial and widely debated. This article attempts to provide a comprehensive account of the scientific literature on this issue, shedding light on the different approaches, their similarities, differences, benefits, and drawbacks, and offering a consolidated and critical perspective that can hopefully support future research in this area. Concretely, it presents the results of a systematic review of 55 studies investigating the existence of echo chambers on social media, providing a first classification of the literature and identifying patterns across the studies’ foci, methods and findings. We found that conceptual and methodological choices influence the results of research on this issue. Most importantly, articles that found clear evidence of echo chambers on social media were all based on digital trace data. In contrast, those that found no evidence were all based on self-reported data. Future studies should take into account the possible biases of the different approaches and the significant potential of combining self-reported data with digital trace data.\r\n
[64]
Spohr D. (2017). Fake news and ideological polarization: Filter bubbles and selective exposure on social media. Business Information Review, 34(3), 150-160.
This article addresses questions of ideological polarization and the filter bubble in social media. It develops a theoretical analysis of ideological polarization on social media by considering a range of relevant factors. Over recent years, fake news and the effect of the social media filter bubble have become of increasing importance both in academic and general discourse. The article reviews the assumption that algorithmic curation and personalization systems place users in a filter bubble of content that decreases their likelihood of encountering ideologically cross-cutting news content. At the intersection of new media, politics and behavioural science, the article establishes a theoretical framework for further research and future actions by society, policymakers and industries.
[65]
Sunstein C. R. (2009). Republic.com 2.0. Princeton, N.J: Princeton University Press.
[66]
Waldherr A., & Wettstein M. (2019). Bridging the Gaps: Using Agent-Based Modeling to Reconcile Data and Theory in Computational Communication Science. International Journal of Communication, 13(0), 24.
[67]
Watts D. J. (2007). A twenty-first century science. Nature, 445(7127), 489.

Footnotes

1. 本研究中使用发布在GitHub上的开源模型以分析文本相似度,源代码链接如下: https://github.com/mmihaltz/word2vec-GoogleNews-vectors [检索于2024年5月23日]

2. 新闻类别香农熵的公式如下: H(X)=-∑n P(xi) logbP(xi) 。其中,新闻类别分为硬新闻、软新闻或其他,P(xi) 为随机变量新闻类别取特定值xi的概率;b是对数的底数,本研究取b=2。

Funding

This paper was supported by Beijing Social Science Foundation(21DTR040)
Humanities and Social Sciences Research Planning Fund of the Ministry of Education(23YJA860011)
PDF(1704 KB)

Accesses

Citation

Detail

Sections
Recommended

/