Evaluation of the communication effect of synthetic speech news: The EEG evidence of the effect of speech speed

YU Guoming, WANG Wenxuan, FENG Fei, XIU Lichao

Chinese Journal of Journalism & Communication ›› 2021, Vol. 43 ›› Issue (2) : 6-26.

PDF(1593 KB)
PDF(1593 KB)
Chinese Journal of Journalism & Communication ›› 2021, Vol. 43 ›› Issue (2) : 6-26.
Communication Research

Evaluation of the communication effect of synthetic speech news: The EEG evidence of the effect of speech speed

Author information +
History +

Abstract

This study used EEG technology to explore the effects of psychological perception and EEG of users of different genders on different speech rate (1.0x and 1.5x) when listening to synthetic speech news. The results showed that the female audience liked the synthetic speech news more than the male audience. When the speed of synthetic speech was 1.5 times, the male audience trusted more than female ones. At 1.0 time speed, males trusted less than females. In addition, EEG results showed that females have a higher power spectral density (PSD) than males while listening to synthetic speech. Moreover, the audience had greater cognitive load and concentration when they listening to the synthesized speech at 1.5 times speed than when listening to the synthesized speech at 1.0 time. These findings suggest that likability is not related to the speed of speech, but related to the gender of the audience. Credibility was also related to rate of speech and gender of the audience. Synthetic speech triggered greater brain activity in Females. Faster speech speed attracted more attention from the audience.

Key words

Audience gender / liking / credibility / LC4MP / EEG

Cite this article

Download Citations
YU Guoming , WANG Wenxuan , FENG Fei , et al. Evaluation of the communication effect of synthetic speech news: The EEG evidence of the effect of speech speed[J]. Chinese Journal of Journalism & Communication. 2021, 43(2): 6-26

References

[1]
埃里克·麦克卢汉, 弗兰克·秦格龙(1995/2000). 《麦克卢汉精粹》(何道宽译). 南京: 南京大学出版社.
[2]
丹尼斯·麦奎尔(1997/2006). 《受众分析》(刘燕南译). 北京: 中国人民大学出版社.
[3]
李维婕(2017). 《电视购物频道促销语的探索与创新》.江西师范大学硕士论文.南昌.
[4]
马歇尔·麦克卢汉(1964/2000). 《理解媒介——论人的延伸(第1版)》(何道宽译). 北京: 商务印书馆.
[5]
孟伟(2006). 《声音传播:多媒介传播时代的广播听觉文本》. 北京: 中国传媒大学出版.
[6]
倪爱珍(2017). 听觉文化转向的发生语境与研究路径. 《文艺评论》,(6),73-83.
[7]
彭聃龄(2004). 《普通心理学》. 北京: 北京师范大学出版社.
[8]
人民网舆情数据中心联合搜狗知音发布《智能语音大数据分析报告》(2018).检索于http://yuqing.people.com.cn/n1/2018/0202/c209043-29802833.html.
[9]
谢礼逵, 周振玲(2002). 广播新闻播音语速浅析. 《新闻前哨》,(2),23-24.
[10]
喻国明, 钱绯璠, 陈瑶, 修利超, 杨雅(2019). “后真相”的发生机制:情绪化文本的传播效果——基于脑电技术范式的研究. 《西安交通大学学报(社会科学版)》, 39(4),73-78.
[11]
喻国明, 王文轩, 冯菲(2019). “声音”作为未来传播主流介质的洞察范式——以用户对语音新闻感知效果与测量为例. 《社会科学战线》,(7),136-145.
[12]
Ahrens M.M., Hasan B.A.S., Giordano B.L. & Belin P. (2014). Gender differences in the temporal voice areas. Frontiers in Neuroscience. 8, 228.
[13]
Angelakis E., Lubar J.F. & Stathopoulou S.(2004). Electroencephalographic peak alpha frequency correlates of cognitive traits. Neuroscience Letters, 371(1), 60-63.
EEG peak alpha frequency (PAF) has been shown to differentiate groups of adults with higher memory performance from those of lower performance, groups of children with advanced reading ability from matched controls, and to predict state-dependent working memory. The present study attempted to explore PAF as a predicting variable for verbal and attentional cognitive trait abilities in young adults. Nineteen undergraduate students had their EEG recorded during initial rest, reading, and post-reading rest, and at a different day were evaluated on reading, vocabulary, and attentional performance. Results showed significant correlations of reading vocabulary and response control with PAF during reading and post-reading recordings, but not during initial rest. PAF may reflect some general cognitive ability that is not necessarily memory or reading, possibly response control or the ability to acquire vocabulary. It is suggested that cognitive ability traits may reflect the ability to induce cognitive states.
[14]
Angelidis A., Dose W.v.d., Schakel L.& Putman P.(2016) Frontal EEG theta/beta ratio as an electrophysiological marker for attentional control and its test-retest reliability. Biological Psychology, 121(Pt A),49-51.
[15]
Antonenko P., Paas F., Grabner R. & Gog T. V. (2010). Using electroencephalography to measure cognitive load. Educational Psychology Review, 22(4), 425-438.
[16]
Avgusta Y Shestyuk, Karthik Kasinathan, Viswajith, Karapoondinott, Robert T.Knight& Ram Gurumoorthy. (2019). Individual eeg measures of attention, memory, and motivation predict population level tv viewership and twitter engagement. PloSONE, 14(3),1-27.
[17]
Bartosiak M.& Piccoli G. (2016, November). Presentation Format and Online Reviews Persuasiveness: The Effect of Computer-Synthesized Speech. Paper presented at Thirty Seventh International Conference on Information Systems.Dublin.
[18]
Bolls P., Muehling D. D. & Yoon K. (2003). The effects of television commercial pacing on viewers’attention and memory. Journal of Marketing Communication, 9(1), 17-28.
[19]
Borghini G., Astolfi L., Vecchiato G., Mattia D. & Babiloni F. (2014). Measuring neurophysiological signals in aircraft pilots and car drivers for the assessment of mental workload, fatigue and drowsiness. Neuroscience and Biobehavioral Reviews, 44, 58-75.
This paper reviews published papers related to neurophysiological measurements (electroencephalography: EEG, electrooculography EOG; heart rate: HR) in pilots/drivers during their driving tasks. The aim is to summarise the main neurophysiological findings related to the measurements of pilot/driver's brain activity during drive performance and how particular aspects of this brain activity could be connected with the important concepts of "mental workload", "mental fatigue" or "situational awareness". Review of the literature suggests that exists a coherent sequence of changes for EEG, EOG and HR variables during the transition from normal drive, high mental workload and eventually mental fatigue and drowsiness. In particular, increased EEG power in theta band and a decrease in alpha band occurred in high mental workload. Successively, increased EEG power in theta as well as delta and alpha bands characterise the transition between mental workload and mental fatigue. Drowsiness is also characterised by increased blink rate and decreased HR values. The detection of such mental states is actually performed "offline" with accuracy around 90% but not online. A discussion on the possible future applications of findings provided by these neurophysiological measurements in order to improve the safety of the vehicles will be also presented. Copyright © 2012 Elsevier Ltd. All rights reserved.
[20]
Barry R.J., Clarke A.R., Johnstone S.J., Magee C. A. & Rushby J. A. (2008). EEG differences between eyes-closed and eyes-open resting conditions. Clinical Neurophysiology, 118(12), 2765-2773.
[21]
Clerwall C.(2014). Enter the Robot Journalist. Journalism Practice, 8(05), 519-531.
[22]
Feldstein S., Dohm F. A. & Crown C. L.(2001). Gender and and Speech Rate in the Perception of Competence and Social Attractiveness. The Journal of Social Psychology, 141(6), 785-806.
[23]
Frey J.N., Ruhnau P. & Weisz N.(2015). Not so different after all: The same oscillatory processes support different types of attention. Brain Research, 1626,183-197.
Scientific research from the last two decades has provided a vast amount of evidence that brain oscillations reflect physiological activity enabling diverse cognitive processes. The goal of this review is to give a broad empirical and conceptual overview of how ongoing oscillatory activity may support attention processes. Keeping in mind that definitions of cognitive constructs like attention are prone to being blurry and ambiguous, the present review focuses mainly on the neural correlates of 'top-down' attention deployment. In particular, we will discuss modulations of (ongoing) oscillatory activity during spatial, temporal, selective, and internal attention. Across these seemingly distinct attentional domains, we will summarize studies showing the involvement of two oscillatory processes observed during attention deployment: power modulations mainly in the alpha band, and phase modulations in lower frequency bands. This article is part of a Special Issue entitled SI: Prediction and Attention. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
[24]
Geske J. & Bellur S. (2008). Differences in brain information processing between print and computer screens. International Journal of Advertising, 27(3), 399-423.
[25]
Gevins A., Smith M.E., McEvoy L.& Yu D. (1997). High-resolution EEG mapping of cortical activation related to working memory: effects of task difficulty, type of processing, and practice. Cerebral Cortex, 7(4),374-385.
Changes in cortical activity during working memory tasks were examined with electroencephalograms (EEGs) sampled from 115 channels and spatially sharpened with magnetic resonance imaging (MRI)-based finite element deblurring. Eight subjects performed tasks requiring comparison of each stimulus to a preceding one on verbal or spatial attributes. A frontal midline theta rhythm increased in magnitude with increased memory load. Dipole models localized this signal to the region of the anterior cingulate cortex. A slow (low-frequency), parietocentral, alpha signal decreased with increased working memory load. These signals were insensitive to the type of stimulus attribute being processed. A faster (higher-frequency), occipitoparietal, alpha signal was relatively attenuated in the spatial version of the task, especially over the posterior right hemisphere. Theta and alpha signals increased, and overt performance improved, after practice on the tasks. Increases in theta with both increased task difficulty and with practice suggests that focusing attention required more effort after an extended test session. Decreased alpha in the difficult tasks indicates that this signal is inversely related to the amount of cortical resources allocated to task performance. Practice-related increases in alpha suggest that fewer cortical resources are required after skill development. These results serve: (i) to dissociate the effects of task difficulty and practice; (ii) to differentiate the involvement of posterior cortex in spatial versus verbal tasks; (iii) to localize frontal midline theta to the anteromedial cortex; and (iv) to demonstrate the feasibility of using anatomical MRIs to remove the blurring effect of the skull and scalp from the ongoing EEG. The results are discussed with respect to those obtained in a prior study of transient evoked potentials during working memory.
[26]
Graefe A., Haim M., Haarmann B. & Brosius H.B. (2018). Readers’ perception of computer-generated news: Credibility, expertise, and readability. Journalism, 19(5), 595-610.
We conducted an online experiment to study people’s perception of automated computer-written news. Using a 2 × 2 × 2 design, we varied the article topic (sports, finance; within-subjects) and both the articles’ actual and declared source (human-written, computer-written; between-subjects). Nine hundred eighty-six subjects rated two articles on credibility, readability, and journalistic expertise. Varying the declared source had small but consistent effects: subjects rated articles declared as human written always more favorably, regardless of the actual source. Varying the actual source had larger effects: subjects rated computer-written articles as more credible and higher in journalistic expertise but less readable. Across topics, subjects’ perceptions did not differ. The results provide conservative estimates for the favorability of computer-written news, which will further increase over time and endorse prior calls for establishing ethics of computer-written news.
[27]
Jones, C., Berry, L. & Stevens, C. (2007). Synthesized speech intelligibility and persuasion: Speech rate and non-native listeners. Computer Speech & Language, 21(4), 641-651.
[28]
Kenta O. & Naomi O.(2018). Verbal disaster warnings and perceived intelligibility, reliability,and urgency: The effects of voice gender, fundamental frequency,and speaking rate. Acoustical Science and Technology, 39(2), 56-65.
[29]
Kallinen K.& Ravaja N.(2005). Effects of the rate of computer-mediated speech on emotion-related subjective and physiological responses. Behaviour & Information Technology, 24(5):365-373.
[30]
Khanna P. & Carmena J. M. (2015,April). Changes in reaching reaction times due to volitional modulation of beta oscillations. paper present at 7th International IEEE/EMBS Conference on Neural Engineering (NER).IEEE.France.
[31]
Lang A., Geiger S., Strickwerda M. & Sumner J. (1993). The effects of related and unrelated cuts on viewers’ memory for television: A limited capacity theory of television viewing. Communication Research, 20(1), 4-29.
Abstract
This study tested the differential effects of two different types of cuts (related and unrelated) on attention, capacity, and audio and visual memory for the information contained in television messages. Related cuts were related by either visual or audio information. Unrelated cuts occurred between two completely unrelated scenes. Unrelated cuts were always associated with a change in content. Related scenes were never associated with a change in content. Results showed that both related and unrelated cuts resulted in cardiac orienting responses. Reaction times were slower immediately following unrelated cuts than when following related cuts, indicating that processing unrelated cuts required more capacity than processing related cuts. Memory was better for information presented after related cuts than it was for information presented after unrelated cuts. This effect was greater for visual memory than for audio memory. These results add to the growing body of knowledge on how people process television.
[32]
Lang A., Dhillon K. & Dong Q. (1995). The effects of emotional arousal and valence on television viewers’ cognitive capacity and memory. Journal of Broad-casting and Electronic Media, 39(3),313-327.
[33]
Lang A., Newhagen J. & Reeves B. (1996). Negative video as structure: Emotion, attention, capacity, and memory. Journal of Broadcasting & Electronic Media, 40(4), 460-477.
[34]
Lang A., Bolls P., Potter R. F. & Kawahara K. (1999). The effects of production pacing and arousing content on the information processing of television messages. Journal of Broadcasting & Electronic Media, 43(4), 451-475.
[35]
Lang A., Zhou S., Schwartz N., Bolls P.D. & Potter R. F. (2000). The effects of edits on arousal, attention,and memory for television messages: when an edit is an edit can an edit be too much?. Journal of Broadcasting & Electronic Media, 44(1), 94-109.
[36]
Lang A., Chung Y., Lee S., & Schwartz N. (2002). Processing anti-drug public service announcements: Production pacing, arousing content, and adolescence. Psychophysiology, 39, S50-S50.
[37]
Lang A. (2006). Using the limited capacity model of motivated mediated message processing to design effective cancer communication messages. Journal of Communication, 56(1),57-80.
[38]
Lang A., Schwartz N., Lee S. & Angelini J. (2007). Processing radio PSAs: production pacing, arousing content, and age. Journal of Health Communication, 12(6), 581-599.
This experiment uses the limited capacity model of mediated message processing (LC3MP) to investigate the effects of production pacing and arousing content in radio public service announcements (PSAs) on the emotional and cognitive responses of college-age and tween (9-12-year-olds) participants. The LC3MP predicts that both arousing content and production pacing should increase emotional arousal, physiological arousal, cognitive effort, and encoding up to the point of cognitive overload after which cognitive effort and encoding should decrease. Results showed that, as expected, arousing content did increase emotional arousal and cognitive effort for both tweens and college students, though the effect was larger for college students. For production pacing, however, the results were less clear cut. First, it was found that for radio PSAs pacing increased arousal for calm messages only. Further, the effects of production pacing on cognitive effort were larger for tweens and were experienced primarily during the first 25 seconds of the message, while college students were less affected by production pacing, and those effects appeared in the last 25 seconds of the messages. Finally, none of the messages in this experiment resulted in cognitive overload - thus both production pacing and arousing content increased memory for both groups of participants.
[39]
Lang, A. (2017). Limited Capacity Model of Motivated Mediated Message Processing (LC4MP).In Patrick, R., Cynthia,A., Hoffner, & Liesbet,v.Z. (Eds.). The International Encyclopedia of Media Effects(pp.1-9). John Wiley & Sons, Inc Press.
[40]
Megehee C. M., Dobie K. & Grant J. (2003). Time versus pause manipulation in communications directed to the young adult population: does it matter?. Journal of Advertising Research, 43(03), 281-292.
[41]
Mastropieri M. A., Leinart A. & Scruggs T. E. (1999). Strategies to increase reading fluency. Intervention in School and Clinic, 34(5), 278-283.
Problems in reading fluency have long been considered to be among the most common characteristics of students with mild disabilities and other special needs. In this article, we review research on reading fluency and provide recommendations for practice. Interventions that have received research attention include repeated reading, peer-mediated instruction, computer-guided practice, previewing, and combined approaches.
[42]
Murphey C., Dobie K. & Grant J. (2003). Time versus pause manipulation in communications directed to the young adult population: Does it matter? Journal of Advertising Research, 43(03), 281-292.
[43]
Meyer P.(1988). Defining and Measuring Credibility of Newspapers: Developing an Index. Journalism&Mass Communication Quarterly, 65(3), 567-574.
[44]
Nass C., Steuer J.& Tauber E. R.,(1994, April). Computers are social actors. Paper present at Conference on Human Factors in Computing Systems, CHI. Boston, Massachusetts, USA.
[45]
Nass C., Moon Y., Fogg B.J., Reeves B. & Christopher D. (1995). Can computer personalities be human personalities?. International Journal of Human-Computer Studies, 43(2), 223-239.
[46]
Nass C., Foehr U., Brave S., Somoza M.(2001,January). The effects of emotion of voice in synthesized and recorded speech. in Proceedings of the AAAI Symposium: Emotional and Intelligent II: The Tangled Knot of Social Cognition(pp.4-5). North Falmouth, MA.
[47]
Nass C., Robles E., Heenan C., Bienstock H. & Treinen M. (2003). Speech-based disclosure systems: effects of modality, gender of prompt, and gender of user. International Journal of Speech Technology, 6(2), 113-121.
[48]
Newhagen J. & Nass C. (1989). Differential criteria for evaluating credibility of newspapers and tv news. Journalism & Mass Communication Quarterly, 66(2), 277-284.
[49]
Potter R. F., Bolls P., Lang A., Zhou S., Schwartz N., Borse J., et al. (1997, August). What is it? Orienting to structural features of radio messages. Paper presented to the Theory and Methodology Division of the Association for Education in Journalism and Mass Communication(pp.1-30), Chicago, IL.
[50]
Potter R. F. & Choi J. (2006). The effects of auditory structural complexity on attitudes, attention, arousal, and memory. Media Psychology, 8(4), 395-419.
[51]
Rodero E.(2011). Intonation and emotion: influence of pitch levels and contour type on creating emotions. Journal of Voice. 25(1), e25-e34.
[52]
Rodero E.(2015). Influence of Speech Rate and Information Density on Recognition: The Moderate Dynamic Mechanism. Media Psychology, 19(2), 224-242.
[53]
Simonds B.K., Meyer K.R., Quinlan M.M. & Hunt S.K. (2006). Effects of Instructor Speech Rate on Student Affective Learning, Recall, and Perceptions of Nonverbal Immediacy, Credibility, and Clarity. Communication Research Reports, 23(3), 187-197.
[54]
Skuk V.G. & Schweinberger S.R.(2013) Gender differences in familiar voice identification. Hearing Research, 296, 131-140.
We investigated gender differences in the identification of personally familiar voices in a gender-balanced sample of 40 listeners. From various types of utterances, listeners had to identify by name 20 speakers (10 female) among a set of 70 possible classmates who were all 12th grade pupils from the same local secondary school. Mean identification rates were 67% from sentences, and around 35% for an isolated /Hello/ or a VCV syllable. Even from non-verbal harrumphs, speakers were identified with an accuracy of 18%, i.e. highly above chance levels. Substantial individual differences were observed between listeners. Importantly, superior overall performance of female listeners was qualified by an interaction between voice gender and listener gender. Male listeners exhibited an own-gender bias (i.e. better identification for male than female voices), whereas female listeners identified voices of both genders at similar levels. Individual own-gender identification biases were correlated with differences in reported contact to a speaker's voice and voice distinctiveness. Overall, the present study establishes a number of factors that account for substantial individual differences in personal voice identification.Copyright © 2012 Elsevier B.V. All rights reserved.
[55]
Sundar S.S. (1999). Exploring Receivers’ Criteria for Perception of Print and Online News. Journalism & Mass Communication Quarterly, l76(2), 373-386.
[56]
Thorson E.& Lang A. (1992). The effects television video graphics and lecture familiarity on adult cardiac orienting responses and memory. Communication Research, 19(3), 346-369.
This study outlines a psychophysiological model of the role of orienting responses (ORs) in learning from televised lectures. ORs are involuntary responses to environmental stimuli that are novel or that signal the occurrence of something meaningful in the environment. In the present study, ORs were indexed with phasic decelerative heart-rate patterns. The experiment demonstrates that insertion of videographics in talking-head lectures produces ORs in television viewers. It also demonstrates that if lectures contain familiar and therefore easier material for viewers to remember, the ORs enhance learning, but if the lectures contain unfamiliar and therefore more difficult material to remember, the ORs interfere with learning. These results extend the idea that attention to television exhibits limited attentional capacity and suggests that there is a trade-off between people's ability to attend to structural and informational aspects of the television stimulus.
[57]
Watt J. H., & Krull R. (1974). An information theory measure for television programming. Communication Research, 1(1), 44-68.
A content-free measure of television program form is developed. This measure is created from a rigorous general theory construction viewpoint. The basic terms of the theoretical measure are created from iconic aspects of programming. These terms are mapped into variables by considering their human information-processing implications. The definitions of the variables are in information theory entropy terms. The variables are used to score 168 television programs, and the results used to factor analyze the variables, creating two independent dimensions called Dynamics and Unfamiliarity. As a validation, the viewing patterns of 149 adolescents on these two dimensions are compared to two other measures of programming content. The information theory dimensions of form are found to detect nonrandom viewing patterns as well or better than either of the other measures of content.
[58]
Wartella E., & Ettema J. S. (1974). A cognitive developmental study of children’s attention to television commercials. Communication Research, 1(1), 69-88.
From a cognitive developmental theoretical foundation, an experiment was designed to study the role of stimulus complexity in children's attention to TV commercials. One hundred twenty nursery, kindergarten, and second-grade subjects' attention to twelve commercials was observed. The commercials were controlled for content and manipulated for visual and auditory complexity with an information theory measure of entropy level. We predicted and found that the largest difference in attention to high- versus low-complexity commercials is for nursery schoolers, although in one block of commercials the difference in attention to high- and low-complexity commercials is not statistically significant across the three age groups. The nursery schoolers also are less stable in their attention than are the older children. Further analyses examine the attention profiles of the three age groups and the role of the visual and auditory channels in drawing attention.
[59]
Watt, J. H. J., & Welch A. J. (1983). Effects of static and dynamic complexity on children’s attention and recall of televised instruction.In J. Bryant & D. R. Anderson (Eds.), Children’ s understanding of television (pp.69-102). New Y ork: Academic Press.
[60]
Wildgruber D., Pihan H., Ackermann H., Erb M. & Grodd W. (2002). Dynamic brain activation during processing of emotional intonation: influence of acoustic parameters, emotional valence and sex. NeuroImage. 15 (4), 856-869.
Appreciation of the emotional tone of verbal utterances represents an important aspect of social life. It is still unsettled, however, which brain areas mediate processing of intonational information and whether the presumed right-sided superiority depends upon acoustic properties of the speech signal. Functional magnetic resonance imaging was used to disentangle brain activation associated with (i) extraction of specific acoustic cues and (ii) detection of specific emotional states. Stimulus material comprised pairs of emotionally intonated utterances, exclusively differing either in pitch range or in the length of stressed vowels. Hemodynamic responses showed a dynamic pattern of cerebral activation including sequenced bilateral responses of various cortical and subcortical structures. Activation associated with discrimination of emotional expressiveness predominantly emerged within the right inferior parietal lobule, within the bilateral mesiofrontal cortex and--with an asymmetry toward the right hemisphere--at the level of bilateral dorsolateral frontal cortex. Lateralization did not depend upon acoustic structure or emotional valence of stimuli. These findings might prove helpful in reconciling the controversial previous clinical and experimental data.(C)2002 Elsevier Science (USA).

Footnotes

1. 皮肤电导(skin conductance)被认为是自主神经系统激活的一个指标。

2. 英语语音的速度。

3. 英语语音的速度

4. 英语教学的语速。

5. 这里速度为西班牙语的语音速度。

6. 节奏(pacing)被定义为某个具体的已知的、可以引发定向反应让受众注意力集中的结构特征出现的次数(Lang,Bolls,Potter & Kawahara,1999)。

7. 安徽科大讯飞股份有限公司成立于1999年,是亚太地区知名的智能语音和人工智能上市企业,长期从事语音及语言、自然语言理解、机器学习推理及自主学习等核心技术研究并保持了国际前沿技术水平。

PDF(1593 KB)

Accesses

Citation

Detail

Sections
Recommended

/