预测与解释:走向因果表征的传播学

李雪莲, 刘德寰

国际新闻界 ›› 2023, Vol. 45 ›› Issue (7) : 157-176.

PDF(1562 KB)
PDF(1562 KB)
国际新闻界 ›› 2023, Vol. 45 ›› Issue (7) : 157-176.
研究论文

预测与解释:走向因果表征的传播学

作者信息 +

Prediction and Explanation: Towards a Communication Study of Causal Representation

Author information +
文章历史 +

摘要

传播学实证研究关注媒介曝露及其影响,较少展开信息传播或个体行为预测。依循演绎逻辑进行建模的过程中则存在解释性和预测性建模的方法混淆,影响因果推断的准确性、有效性和可靠性。随着计算社会科学的发展,研究者开始关注解释与预测认识论上的区别。我们认为解释和预测的明确区分及有效整合将帮助传播学实现更好的因果识别和科学预测,本文归纳国内传播学实证研究在计算社会科学背景下的研究设计,澄清两者在本学科方法运用上的现状及问题。依据Pearl提出的因果阶梯,结合传播学议题特殊性将运用在线数据进行实证研究提取为四个层级的目标和任务:(1)关联分析;(2)干预研究;(3)解释性研究;(4)反事实因果推理,并为因果建模中解释与预测模型的有效整合提出具体方法建议,通过分析层级上的循环验证更好地理解传播规律与人类行为。

Abstract

Positive communication studies have been focusing on the explanation of media exposure and its effects, while less on the prediction of information dissemination or individual behaviors. Various modelling methodologies under deductive logic confound explanatory modelling with predictive modelling, affecting the accuracy, validity and reliability of causal inferences. The development of computational social science enacts changes in data analysis methods, and it also prompts researchers to focus on the epistemological distinction between explanation and prediction. We believe that a clear distinction between and effective integration of explanation and prediction could assist communication studies to discover better causal knowledge and yield scientific prediction. This paper firstly reviews the empirical research strategies in the field of computational social science to clarify the current status and the differences between the two methods employed in our discipline. Drawing on Pearl’s causal ladder, communication studies based on online data are further extracted into four layers: (1) correlational analysis; (2) intervention studies; (3) explanatory studies; and (4) counterfactual causal reasoning. This paper outlines the analytical obstacles researchers face in causal explanation and prediction based on online data, and makes specific suggestions for achieving effective integration of explanatory and predictive models in causal modeling. Through circular validation on four levels of analysis, we offer a better way to understand the laws of communication and human behaviors.

关键词

因果关系 / 社会预测 / 解释 / 大数据 / 机器学习

Key words

causal inference / prediction / explanation / big data / machine learning

引用本文

导出引用
李雪莲, 刘德寰. 预测与解释:走向因果表征的传播学[J]. 国际新闻界. 2023, 45(7): 157-176
LI Xuelian, LIU Dehuan. Prediction and Explanation: Towards a Communication Study of Causal Representation[J]. Chinese Journal of Journalism & Communication. 2023, 45(7): 157-176

参考文献

[1]
埃里克·麦格雷(2003/2009). 《传播理论史——一种社会学的视角》(刘芳译). 北京: 中国传媒大学出版社.
[2]
陈云松, 吴晓刚, 胡安宁, 贺光烨, 句国栋(2020). 社会预测:基于机器学习的研究新范式. 《社会学研究》,(3),94-117+244.
[3]
DataFunTalk(2021). 《快手因果推断与实验设计》.检索于 https://mp.weixin.qq.com/s/svVl1eiVUH6rOYG3p2YiGg.
[4]
廖圣清, 黄文森, 易红发, 申琦(2015). 媒介的碎片化使用:媒介使用概念与测量的再思考. 《新闻大学》,(6),61-73.
[5]
刘德寰, 李雪莲(2013). 大数据的风险和现存问题. 《广告大观(理论版)》,(3),67-73.
[6]
刘德寰, 李雪莲(2015). 七八月的孩子们——小学入学年龄限制与青少年教育获得及发展. 《社会学研究》,(6),169-192+245.
[7]
刘海龙(2014). 中国传播研究的史前史. 《新闻与传播研究》,(1),21-36+126.
[8]
罗家德等 2021). 大数据和结构化数据整合的方法论——以中国人脉圈研究为例. 《社会学研究》,(2),69-91+227.
[9]
罗家德, 刘济帆, 杨鲲昊, 傅晓明(2018). 论社会学理论导引的大数据研究——大数据、理论与预测模型的三角对话. 《社会学研究》,(5),117-138+244-245.
[10]
罗伯特·金·默顿(1942/2003). 《科学社会学》(鲁旭东,林聚任译). 北京: 商务印书馆.
[11]
乔恩·埃尔斯特(2007/2019). 《解释社会行为:社会科学的机制视角》(刘骥,何淑静,熊彩等译), 重庆: 重庆大学出版社.
[12]
王刚, 吴星漫(2021). 从统计推断到因果推断:传播学定量研究中的内生性问题. 《新闻与传播研究》(4),19-37+126.
[13]
杨国斌(2017). 中国互联网的深度研究. 《新闻与传播评论》,(1),22-42.
[14]
祝建华, 彭泰权, 梁海, 王成军, 秦洁, 陈鹤鑫(2014). 计算社会科学在新闻传播研究中的应用. 《科研信息化技术与应用》,(2),3-13.
[15]
Anderson A., Maystre L., Anderson I., Mehrotra R., & Lalmas M. (2020, April). Algorithmic effects on the diversity of consumption on spotify. Paper presented at Proceedings of The Web Conference 2020. Taipei.
[16]
Angrist J. D., & Pischke J. S. (2010). The credibility revolution in empirical economics: How better research design is taking the con out of econometrics. Journal of Economic Perspectives, 24(2), 3-30.
[17]
Baden C., Pipal C., Schoonvelde M., & van der Velden M. A. G.(2021). Three gaps in computational text analysis methods for social sciences: A research agenda. Communication Methods and Measures, 16(1), 1-18.
[18]
Bail C. A. (2016). Emotional feedback and the viral spread of social media messages about autism spectrum disorders. American Journal of Public Health, 106(7), 1173-1180.
To determine whether exchanges of emotional language between health advocacy organizations and social media users predict the spread of posts about autism spectrum disorders (ASDs).I created a Facebook application that tracked views of ASD advocacy organizations' posts between July 19, 2011, and December 18, 2012. I evaluated the association between exchanges of emotional language and viral views of posts, controlling for additional characteristics of posts, the organizations that produced them, the social media users who viewed them, and the broader social environment.Exchanges of emotional language between advocacy organizations and social media users are strongly associated with viral views of posts.Social media outreach may be more successful if organizations invite emotional dialogue instead of simply conveying information about ASDs. Yet exchanges of angry language may contribute to the viral spread of misinformation, such as the rumor that vaccines cause ASDs.
[19]
Bail C. A., Brown T. W., & Wimmer A. (2019). Prestige, proximity, and prejudice: How Google search terms diffuse across the world. American Journal of Sociology, 124(5), 1496-1548.
[20]
Bail C. A., et al. (2018). Exposure to opposing views on social media can increase political polarization. Proceedings of the National Academy of Sciences, 115(37), 9216-9221.
[21]
Bakshy E., Messing S., & Adamic L. A. (2015). Exposure to ideologically diverse news and opinion on Facebook. Science, 348(6239), 1130-1132.
Exposure to news, opinion, and civic information increasingly occurs through social media. How do these online networks influence exposure to perspectives that cut across ideological lines? Using deidentified data, we examined how 10.1 million U.S. Facebook users interact with socially shared news. We directly measured ideological homophily in friend networks and examined the extent to which heterogeneous friends could potentially expose individuals to cross-cutting content. We then quantified the extent to which individuals encounter comparatively more or less diverse content while interacting via Facebook's algorithmically ranked News Feed and further studied users' choices to click through to ideologically discordant content. Compared with algorithmic ranking, individuals' choices played a stronger role in limiting exposure to cross-cutting content. Copyright © 2015, American Association for the Advancement of Science.
[22]
Bond R. M., et al. (2012). A 61-million-person experiment in social influence and political mobilization. Nature, 489(7415), 295-298.
[23]
Boyd D., & Crawford K. (2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society, 15, 662-679.
[24]
Breiman L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science, 16(3), 199-231.
[25]
Centola D. (2010). The spread of behavior in an online social network experiment. Science, 329(5996), 1194-1197.
How do social networks affect the spread of behavior? A popular hypothesis states that networks with many clustered ties and a high degree of separation will be less effective for behavioral diffusion than networks in which locally redundant ties are rewired to provide shortcuts across the social space. A competing hypothesis argues that when behaviors require social reinforcement, a network with more clustering may be more advantageous, even if the network as a whole has a larger diameter. I investigated the effects of network structure on diffusion by studying the spread of health behavior through artificially structured online communities. Individual adoption was much more likely when participants received social reinforcement from multiple neighbors in the social network. The behavior spread farther and faster across clustered-lattice networks than across corresponding random networks.
[26]
Chaffee S. (1977). Mass media effects:New research perspectives. In Lerner, D., & Nelson, L. (Eds.). Communication Research-A Half-century Appraisal (pp.210-241). Honolulu, HI: University of Hawaii Press.
[27]
Coe K., Kenski K., & Rains S. A. (2014). Online and uncivil? Patterns and determinants of incivility in newspaper website comments. Journal of Communication, 64(4), 658-679.
[28]
De Vreese C. H., Boukes M., Schuck A., Vliegenthart R., Bos L., & Lelkes Y. (2017). Linking survey and media content data: Opportunities, considerations, and pitfalls. Communication Methods and Measures, 11(4), 221-244.
[29]
De Vreese C., Azrout R., & Moeller J. (2016). Cross road elections: Change in EU performance evaluations during the European Parliament elections 2014. Politics and Governance, 4(1), 69-82.
[30]
Denny M. J., & Spirling A. (2018). Text preprocessing for unsupervised learning: Why it matters, when it misleads, and what to do about it. Political Analysis, 26(2), 168-189.
Despite the popularity of unsupervised techniques for political science text-as-data research, the importance and implications of preprocessing decisions in this domain have received scant systematic attention. Yet, as we show, such decisions have profound effects on the results of real models for real data. We argue that substantive theory is typically too vague to be of use for feature selection, and that the supervised literature is not necessarily a helpful source of advice. To aid researchers working in unsupervised settings, we introduce a statistical procedure and software that examines the sensitivity of findings under alternate preprocessing regimes. This approach complements a researcher’s substantive understanding of a problem by providing a characterization of the variability changes in preprocessing choices may induce when analyzing a particular dataset. In making scholars aware of the degree to which their results are likely to be sensitive to their preprocessing decisions, it aids replication efforts.
[31]
Enni S. A., & Herrie M. B. (2021). Turning biases into hypotheses through method: A logic of scientific discovery for machine learning. Big Data & Society, 8(1).
[32]
Fosch-Villaronga E., Poulsen A., Søraa R. A., & Custers B. (2021). Gendering algorithms in social media. ACM SIGKDD Explorations Newsletter, 23(1), 24-31.
Social media platforms employ inferential analytics methods to guess user preferences and may include sensitive attributes such as race, gender, sexual orientation, and political opinions. These methods are often opaque, but they can have significant effects such as predicting behaviors for marketing purposes, influencing behavior for profit, serving attention economics, and reinforcing existing biases such as gender stereotyping. Although two international human rights treaties include express obligations relating to harmful and wrongful stereotyping, these stereotypes persist both online and offline, and platforms often appear to fail to understand that gender is not merely a binary of being a 'man' or a 'woman,' but is socially constructed. Our study investigates the impact of algorithmic bias on inadvertent privacy violations and the reinforcement of social prejudices of gender and sexuality through a multidisciplinary perspective including legal, computer science, and queer media viewpoints. We conducted an online survey to understand whether and how Twitter inferred the gender of users. Beyond Twitter's binary understanding of gender and the inevitability of the gender inference as part of Twitter's personalization trade-off, the results show that Twitter misgendered users in nearly 20% of the cases (N=109). Although not apparently correlated, only 8% of the straight male respondents were misgendered, compared to 25% of gay men and 16% of straight women. Our contribution shows how the lack of attention to gender in gender classifiers exacerbates existing biases and affects marginalized communities. With our paper, we hope to promote the online account for privacy, diversity, and inclusion and advocate for the freedom of identity that everyone should have online and offline.
[33]
Friedman M. (1953). Essays in Positive Economics. Chicago, IL: University of Chicago Press.
[34]
Golder S. A., & Macy M. W. (2011). Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science, 333(6051), 1878-1881.
We identified individual-level diurnal and seasonal mood rhythms in cultures across the globe, using data from millions of public Twitter messages. We found that individuals awaken in a good mood that deteriorates as the day progresses--which is consistent with the effects of sleep and circadian rhythm--and that seasonal change in baseline positive affect varies with change in daylength. People are happier on weekends, but the morning peak in positive affect is delayed by 2 hours, which suggests that people awaken later on weekends.
[35]
Grinberg N., Joseph K., Friedland L., Swire -Thompson B., & Lazer D. (2019). Fake news on Twitter during the 2016 US presidential election. Science, 363(6425), 374-378.
[36]
Guo L., & McCombs M. (2011, May). Network agenda setting: A third level of media effects. Paper presented at Annual Conference of the International Communication Association. Boston.
[37]
Hilbert M., et al. (2019). Computational communication science: A methodological catalyzer for a maturing discipline. International Journal of Communication, 13, 3912-3934.
[38]
Hofman J. M., Sharma A., & Watts D. J. (2017). Prediction and explanation in social systems. Science, 355(6324), 486-488.
Historically, social scientists have sought out explanations of human and social phenomena that provide interpretable causal mechanisms, while often ignoring their predictive accuracy. We argue that the increasingly computational nature of social science is beginning to reverse this traditional bias against prediction; however, it has also highlighted three important issues that require resolution. First, current practices for evaluating predictions must be better standardized. Second, theoretical limits to predictive accuracy in complex social systems must be better characterized, thereby setting expectations for what can be predicted or explained. Third, predictive accuracy and interpretability must be recognized as complements, not substitutes, when evaluating explanations. Resolving these three issues will lead to better, more replicable, and more useful social science.Copyright © 2017, American Association for the Advancement of Science.
[39]
Hofman J. M., et al. (2021). Integrating explanation and prediction in computational social science. Nature, 595(7866), 181-188.
[40]
Johnson I., McMahon C., Schöning J., & Hecht B. (2017, May). The effect of population and “structural” biases on social media-based algorithms: A case study in geolocation inference across the urban-rural spectrum. Paper presented at Proceedings of the 2017 CHI conference on Human Factors in Computing Systems. New York.
[41]
Kim H. S. (2015). Attracting views and going viral: How message features and news-sharing channels affect health news diffusion. Journal of Communication, 65(3), 512-534.
This study examined how intrinsic as well as perceived message features affect the extent to which online health news stories prompt audience selections and social retransmissions, and how news-sharing channels (e-mail vs. social media) shape what goes viral. The study analyzed actual behavioral data on audience viewing and sharing of health news articles, and associated article content and context data. News articles with high informational utility and positive sentiment invited more frequent selections and retransmissions. Articles were also more frequently selected when they presented controversial, emotionally evocative, and familiar content. Informational utility and novelty had stronger positive associations with e-mail-specific virality, while emotional evocativeness, content familiarity, and exemplification played a larger role in triggering social media-based retransmissions.
[42]
Kramer A. D., Guillory J. E., & Hancock J. T. (2014). Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences, 111(24), 8788-8790.
[43]
Lazer D. M., et al. (2020). Computational social science: Obstacles and opportunities. Science, 369(6507), 1060-1062.
[44]
Lazer D., et al. (2009). Social science. Computational social science. Science, 323(5915), 721-723.
[45]
LeCun Y., Bengio Y., & Hinton G. (2015). Deep learning. Nature, 521(7553), 436-444.
[46]
Maier D., et al. (2018). Applying LDA topic modeling in communication research: Toward a valid and reliable methodology. Communication Methods and Measures, 12(2-3), 93-118.
[47]
Margolin D. B. (2019). Computational contributions: A symbiotic approach to integrating big, observational data studies into the communication field. Communication Methods and Measures, 13(4), 229-247.
Though there is growing enthusiasm for computational research methods in the field of communication, research using computational methods is not yet integrated into mainstream communication research. This paper promotes such integration by highlighting niche for computational research within communication - the specialized roles it can play in symbiosis with researchers using other methods in the field. This paper articulates this niche via the computational communication researcher's specific comparative advantages for testing existing communication theory and generating novel communication theoretic ideas. It then proposes four recommendations for performing and evaluating computational communication research to facilitate meeting these objectives.
[48]
Nelson L. K. (2020). Computational grounded theory: A methodological framework. Sociological Methods & Research, 49(1), 3-42.
[49]
Nelson L. K. (2021). Cycles of conflict, a century of continuity: The impact of persistent place-based political logics on social movement strategy. American Journal of Sociology, 127(1), 1-59.
[50]
Park P.S., Blumenstock J.E., Macy M.W. (2018). The strength of long-range ties in population-scale social networks. Science, 362(6421), 1410-1413.
摘要
Long-range connections that span large social networks are widely assumed to be weak, composed of sporadic and emotionally distant relationships. However, researchers historically have lacked the population-scale network data needed to verify the predicted weakness. Using data from 11 culturally diverse population-scale networks on four continents-encompassing 56 million Twitter users and 58 million mobile phone subscribers-we find that long-range ties are nearly as strong as social ties embedded within a small circle of friends. These high-bandwidth connections have important implications for diffusion and social integration.Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
[51]
Pearl J., Mackenzie D. (2018). The Book of Why: The New Science of Cause and Effect. New York, NY: Basic books.
[52]
Reeves B., et al. (2021). Screenomics: A framework to capture and analyze personal life experiences and the ways that technology shapes them. Human-Computer Interaction, 36(2), 150-201.
Digital experiences capture an increasingly large part of life, making them a preferred, if not required, method to describe and theorize about human behavior. Digital media also shape behavior by enabling people to switch between different content easily, and create unique threads of experiences that pass quickly through numerous information categories. Current methods of recording digital experiences provide only partial reconstructions of digital lives that weave - often within seconds - among multiple applications, locations, functions and media. We describe an end-to-end system for capturing and analyzing the "screenome" of life in media, i.e., the record of individual experiences represented as a sequence of screens that people view and interact with over time. The system includes software that collects screenshots, extracts text and images, and allows searching of a screenshot database. We discuss how the system can be used to elaborate current theories about psychological processing of technology, and suggest new theoretical questions that are enabled by multiple time scale analyses. Capabilities of the system are highlighted with eight research examples that analyze screens from adults who have generated data within the system. We end with a discussion of future uses, limitations, theory and privacy.
[53]
Salganik M. J. (2019). Bit by Bit: Social Research in the Digital Age. Princeton, NJ: Princeton University Press.
[54]
Schuck A. R., Vliegenthart R., & De Vreese C. H. (2016). Matching theory and data: Why combining media content with survey data matters. British Journal of Political Science, 46(1), 205-213.
[55]
Shmueli G. (2010). To explain or to predict?. Statistical Science, 25(3), 289-310.
[56]
Song H., & Cho J. (2021). Assessing (In) accuracy and biases in self-reported measures of exposure to disagreement: Evidence from linkage analysis using digital trace data. Communication Methods and Measures, 15(3), 190-210.
[57]
Song H., Eberl J. M., & Eisele O. (2020). Less fragmented than we thought? Toward clarification of a subdisciplinary linkage in communication science, 2010-2019. Journal of Communication, 70(3), 310-334.
摘要
With the explosive growth in research topics, communication science is said to be more fragmented and hyper-specialized than ever before, producing an increasing number of small, niche research topics that lack intellectual coherence as a whole. While such issues have been a central concern for the field, there has been a relative lack of systematic effort to map the topical interconnections among different communication science subfields, answering the question of how they remain empirically fragmented. Using full-texts of scholarly articles published in the top 20 communication science journals from 2010 to 2019, we provide systematic evidence to such claims in terms of their actual contents and their connectivity patterns. Drawing on extant works concerning the sociology of science and structures of scientific knowledge, as well as on topic modeling and simulation-based inferences on network topological features, we find that subdisciplinary linkage in communication is more frequent than we often think.
[58]
Stier S., Breuer J., Siegers P., & Thorson K. (2020). Integrating survey data and digital trace data: Key issues in developing an emerging field. Social Science Computer Review, 38(5), 503-516.
While survey research has been at the heart of social science for decades and social scientific research with digital trace data has been growing rapidly in the last few years, until now, there are relatively few studies that combine these two data types. This may be surprising given the potential of linking surveys and digital trace data, but at the same time, it is important to note that the collection and analysis of such linked data are challenging in several regards. The three key issues are: (1) data linking including informed consent for individual-level studies, (2) methodological and ethical issues impeding the scientific (re)analysis of linked survey and digital trace data sets, and (3) developing conceptual and theoretical frameworks tailored toward the multidimensionality of such data. This special issue addresses these challenges by presenting cutting-edge methodological work on how to best collect and analyze linked data as well as studies that have successfully combined survey data and digital trace data to find innovative answers to relevant social scientific questions.
[59]
Theocharis Y., & Jungherr A. (2021). Computational social science and the study of political communication. Political Communication, 38(1-2), 1-22.
[60]
Van Atteveldt W., & Peng T. Q. (2018). When communication meets computation: Opportunities, challenges, and pitfalls in computational communication science. Communication Methods and Measures, 12(2-3), 81-92.
[61]
Wagner C., Strohmaier M., Olteanu A., Kıcıman E., Contractor N., & Eliassi-Rad T. (2021). Measuring algorithmically infused societies. Nature, 595(7866), 197-204.
[62]
Yarkoni T., & Westfall J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100-1122.
Psychology has historically been concerned, first and foremost, with explaining the causal mechanisms that give rise to behavior. Randomized, tightly controlled experiments are enshrined as the gold standard of psychological research, and there are endless investigations of the various mediating and moderating variables that govern various behaviors. We argue that psychology's near-total focus on explaining the causes of behavior has led much of the field to be populated by research programs that provide intricate theories of psychological mechanism but that have little (or unknown) ability to predict future behaviors with any appreciable accuracy. We propose that principles and techniques from the field of machine learning can help psychology become a more predictive science. We review some of the fundamental concepts and tools of machine learning and point out examples where these concepts have been used to conduct interesting and important psychological research that focuses on predictive research questions. We suggest that an increased focus on prediction, rather than explanation, can ultimately lead us to greater understanding of behavior.

注释 [Notes]

1. 本文着重对计算社会科学发展迅速的2019-2021期间国内传播学四大期刊《新闻与传播研究》《国际新闻界》《现代传播》《新闻大学》刊发的实证研究论文进行了基于方法和数据获取方式的内容分析,初步结论显示在1936篇论文中,实证研究论文共计778篇,采用质性研究方法为364篇,量化研究方法414篇(包括混合研究方法82篇)。在社会计算性质不断增长的三年中,数据收集方法仍然以问卷调查、深度访谈为主。其中305篇为描述性研究,43篇采用实验法进行干预研究,39篇对传播机制进行解释,27篇尝试进行预测。以大数据资源为基础的研究主要集中在数据描述层面,使进入不可见领域的传播结构得以被描述和呈现,因本文主要内容集中在因果推理分析框架,具体的文献选取、编码、分析过程我们将在另一项专门研究中展开。

2. 篇幅限制,本文的文献集中在对方法本身与解释、预测人类传播行为的研究梳理及个别案例的分析上,关于具体议题的计算传播论文未能在本文综述中进行全面概述,对采用计算分析的定性研究及网络神经科学等实验室环境的研究没有专门涉及。

基金

国家社科基金重大项目“建立全媒体传播体系研究”(20ZDA057)

PDF(1562 KB)

Accesses

Citation

Detail

段落导航
相关文章

/