PDF(1593 KB)
Evaluation of the communication effect of synthetic speech news: The EEG evidence of the effect of speech speed
YU Guoming, WANG Wenxuan, FENG Fei, XIU Lichao
Chinese Journal of Journalism & Communication ›› 2021, Vol. 43 ›› Issue (2) : 6-26.
PDF(1593 KB)
PDF(1593 KB)
Evaluation of the communication effect of synthetic speech news: The EEG evidence of the effect of speech speed
This study used EEG technology to explore the effects of psychological perception and EEG of users of different genders on different speech rate (1.0x and 1.5x) when listening to synthetic speech news. The results showed that the female audience liked the synthetic speech news more than the male audience. When the speed of synthetic speech was 1.5 times, the male audience trusted more than female ones. At 1.0 time speed, males trusted less than females. In addition, EEG results showed that females have a higher power spectral density (PSD) than males while listening to synthetic speech. Moreover, the audience had greater cognitive load and concentration when they listening to the synthesized speech at 1.5 times speed than when listening to the synthesized speech at 1.0 time. These findings suggest that likability is not related to the speed of speech, but related to the gender of the audience. Credibility was also related to rate of speech and gender of the audience. Synthetic speech triggered greater brain activity in Females. Faster speech speed attracted more attention from the audience.
Audience gender / liking / credibility / LC4MP / EEG
| [1] |
埃里克·麦克卢汉, 弗兰克·秦格龙(1995/2000). 《麦克卢汉精粹》(何道宽译). 南京: 南京大学出版社.
|
| [2] |
丹尼斯·麦奎尔(1997/2006). 《受众分析》(刘燕南译). 北京: 中国人民大学出版社.
|
| [3] |
李维婕(2017). 《电视购物频道促销语的探索与创新》.江西师范大学硕士论文.南昌.
|
| [4] |
马歇尔·麦克卢汉(1964/2000). 《理解媒介——论人的延伸(第1版)》(何道宽译). 北京: 商务印书馆.
|
| [5] |
孟伟(2006). 《声音传播:多媒介传播时代的广播听觉文本》. 北京: 中国传媒大学出版.
|
| [6] |
倪爱珍(2017). 听觉文化转向的发生语境与研究路径. 《文艺评论》,(6),73-83.
|
| [7] |
彭聃龄(2004). 《普通心理学》. 北京: 北京师范大学出版社.
|
| [8] |
人民网舆情数据中心联合搜狗知音发布《智能语音大数据分析报告》(2018).检索于http://yuqing.people.com.cn/n1/2018/0202/c209043-29802833.html.
|
| [9] |
谢礼逵, 周振玲(2002). 广播新闻播音语速浅析. 《新闻前哨》,(2),23-24.
|
| [10] |
喻国明, 钱绯璠, 陈瑶, 修利超, 杨雅(2019). “后真相”的发生机制:情绪化文本的传播效果——基于脑电技术范式的研究. 《西安交通大学学报(社会科学版)》, 39(4),73-78.
|
| [11] |
喻国明, 王文轩, 冯菲(2019). “声音”作为未来传播主流介质的洞察范式——以用户对语音新闻感知效果与测量为例. 《社会科学战线》,(7),136-145.
|
| [12] |
|
| [13] |
EEG peak alpha frequency (PAF) has been shown to differentiate groups of adults with higher memory performance from those of lower performance, groups of children with advanced reading ability from matched controls, and to predict state-dependent working memory. The present study attempted to explore PAF as a predicting variable for verbal and attentional cognitive trait abilities in young adults. Nineteen undergraduate students had their EEG recorded during initial rest, reading, and post-reading rest, and at a different day were evaluated on reading, vocabulary, and attentional performance. Results showed significant correlations of reading vocabulary and response control with PAF during reading and post-reading recordings, but not during initial rest. PAF may reflect some general cognitive ability that is not necessarily memory or reading, possibly response control or the ability to acquire vocabulary. It is suggested that cognitive ability traits may reflect the ability to induce cognitive states.
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
This paper reviews published papers related to neurophysiological measurements (electroencephalography: EEG, electrooculography EOG; heart rate: HR) in pilots/drivers during their driving tasks. The aim is to summarise the main neurophysiological findings related to the measurements of pilot/driver's brain activity during drive performance and how particular aspects of this brain activity could be connected with the important concepts of "mental workload", "mental fatigue" or "situational awareness". Review of the literature suggests that exists a coherent sequence of changes for EEG, EOG and HR variables during the transition from normal drive, high mental workload and eventually mental fatigue and drowsiness. In particular, increased EEG power in theta band and a decrease in alpha band occurred in high mental workload. Successively, increased EEG power in theta as well as delta and alpha bands characterise the transition between mental workload and mental fatigue. Drowsiness is also characterised by increased blink rate and decreased HR values. The detection of such mental states is actually performed "offline" with accuracy around 90% but not online. A discussion on the possible future applications of findings provided by these neurophysiological measurements in order to improve the safety of the vehicles will be also presented. Copyright © 2012 Elsevier Ltd. All rights reserved.
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
Scientific research from the last two decades has provided a vast amount of evidence that brain oscillations reflect physiological activity enabling diverse cognitive processes. The goal of this review is to give a broad empirical and conceptual overview of how ongoing oscillatory activity may support attention processes. Keeping in mind that definitions of cognitive constructs like attention are prone to being blurry and ambiguous, the present review focuses mainly on the neural correlates of 'top-down' attention deployment. In particular, we will discuss modulations of (ongoing) oscillatory activity during spatial, temporal, selective, and internal attention. Across these seemingly distinct attentional domains, we will summarize studies showing the involvement of two oscillatory processes observed during attention deployment: power modulations mainly in the alpha band, and phase modulations in lower frequency bands. This article is part of a Special Issue entitled SI: Prediction and Attention. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
|
| [24] |
|
| [25] |
Changes in cortical activity during working memory tasks were examined with electroencephalograms (EEGs) sampled from 115 channels and spatially sharpened with magnetic resonance imaging (MRI)-based finite element deblurring. Eight subjects performed tasks requiring comparison of each stimulus to a preceding one on verbal or spatial attributes. A frontal midline theta rhythm increased in magnitude with increased memory load. Dipole models localized this signal to the region of the anterior cingulate cortex. A slow (low-frequency), parietocentral, alpha signal decreased with increased working memory load. These signals were insensitive to the type of stimulus attribute being processed. A faster (higher-frequency), occipitoparietal, alpha signal was relatively attenuated in the spatial version of the task, especially over the posterior right hemisphere. Theta and alpha signals increased, and overt performance improved, after practice on the tasks. Increases in theta with both increased task difficulty and with practice suggests that focusing attention required more effort after an extended test session. Decreased alpha in the difficult tasks indicates that this signal is inversely related to the amount of cortical resources allocated to task performance. Practice-related increases in alpha suggest that fewer cortical resources are required after skill development. These results serve: (i) to dissociate the effects of task difficulty and practice; (ii) to differentiate the involvement of posterior cortex in spatial versus verbal tasks; (iii) to localize frontal midline theta to the anteromedial cortex; and (iv) to demonstrate the feasibility of using anatomical MRIs to remove the blurring effect of the skull and scalp from the ongoing EEG. The results are discussed with respect to those obtained in a prior study of transient evoked potentials during working memory.
|
| [26] |
We conducted an online experiment to study people’s perception of automated computer-written news. Using a 2 × 2 × 2 design, we varied the article topic (sports, finance; within-subjects) and both the articles’ actual and declared source (human-written, computer-written; between-subjects). Nine hundred eighty-six subjects rated two articles on credibility, readability, and journalistic expertise. Varying the declared source had small but consistent effects: subjects rated articles declared as human written always more favorably, regardless of the actual source. Varying the actual source had larger effects: subjects rated computer-written articles as more credible and higher in journalistic expertise but less readable. Across topics, subjects’ perceptions did not differ. The results provide conservative estimates for the favorability of computer-written news, which will further increase over time and endorse prior calls for establishing ethics of computer-written news.
|
| [27] |
Jones, C., Berry, L. & Stevens, C. (2007). Synthesized speech intelligibility and persuasion: Speech rate and non-native listeners. Computer Speech & Language, 21(4), 641-651.
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
This study tested the differential effects of two different types of cuts (related and unrelated) on attention, capacity, and audio and visual memory for the information contained in television messages. Related cuts were related by either visual or audio information. Unrelated cuts occurred between two completely unrelated scenes. Unrelated cuts were always associated with a change in content. Related scenes were never associated with a change in content. Results showed that both related and unrelated cuts resulted in cardiac orienting responses. Reaction times were slower immediately following unrelated cuts than when following related cuts, indicating that processing unrelated cuts required more capacity than processing related cuts. Memory was better for information presented after related cuts than it was for information presented after unrelated cuts. This effect was greater for visual memory than for audio memory. These results add to the growing body of knowledge on how people process television.
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
This experiment uses the limited capacity model of mediated message processing (LC3MP) to investigate the effects of production pacing and arousing content in radio public service announcements (PSAs) on the emotional and cognitive responses of college-age and tween (9-12-year-olds) participants. The LC3MP predicts that both arousing content and production pacing should increase emotional arousal, physiological arousal, cognitive effort, and encoding up to the point of cognitive overload after which cognitive effort and encoding should decrease. Results showed that, as expected, arousing content did increase emotional arousal and cognitive effort for both tweens and college students, though the effect was larger for college students. For production pacing, however, the results were less clear cut. First, it was found that for radio PSAs pacing increased arousal for calm messages only. Further, the effects of production pacing on cognitive effort were larger for tweens and were experienced primarily during the first 25 seconds of the message, while college students were less affected by production pacing, and those effects appeared in the last 25 seconds of the messages. Finally, none of the messages in this experiment resulted in cognitive overload - thus both production pacing and arousing content increased memory for both groups of participants.
|
| [39] |
|
| [40] |
|
| [41] |
Problems in reading fluency have long been considered to be among the most common characteristics of students with mild disabilities and other special needs. In this article, we review research on reading fluency and provide recommendations for practice. Interventions that have received research attention include repeated reading, peer-mediated instruction, computer-guided practice, previewing, and combined approaches.
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
We investigated gender differences in the identification of personally familiar voices in a gender-balanced sample of 40 listeners. From various types of utterances, listeners had to identify by name 20 speakers (10 female) among a set of 70 possible classmates who were all 12th grade pupils from the same local secondary school. Mean identification rates were 67% from sentences, and around 35% for an isolated /Hello/ or a VCV syllable. Even from non-verbal harrumphs, speakers were identified with an accuracy of 18%, i.e. highly above chance levels. Substantial individual differences were observed between listeners. Importantly, superior overall performance of female listeners was qualified by an interaction between voice gender and listener gender. Male listeners exhibited an own-gender bias (i.e. better identification for male than female voices), whereas female listeners identified voices of both genders at similar levels. Individual own-gender identification biases were correlated with differences in reported contact to a speaker's voice and voice distinctiveness. Overall, the present study establishes a number of factors that account for substantial individual differences in personal voice identification.Copyright © 2012 Elsevier B.V. All rights reserved.
|
| [55] |
|
| [56] |
This study outlines a psychophysiological model of the role of orienting responses (ORs) in learning from televised lectures. ORs are involuntary responses to environmental stimuli that are novel or that signal the occurrence of something meaningful in the environment. In the present study, ORs were indexed with phasic decelerative heart-rate patterns. The experiment demonstrates that insertion of videographics in talking-head lectures produces ORs in television viewers. It also demonstrates that if lectures contain familiar and therefore easier material for viewers to remember, the ORs enhance learning, but if the lectures contain unfamiliar and therefore more difficult material to remember, the ORs interfere with learning. These results extend the idea that attention to television exhibits limited attentional capacity and suggests that there is a trade-off between people's ability to attend to structural and informational aspects of the television stimulus.
|
| [57] |
A content-free measure of television program form is developed. This measure is created from a rigorous general theory construction viewpoint. The basic terms of the theoretical measure are created from iconic aspects of programming. These terms are mapped into variables by considering their human information-processing implications. The definitions of the variables are in information theory entropy terms. The variables are used to score 168 television programs, and the results used to factor analyze the variables, creating two independent dimensions called Dynamics and Unfamiliarity. As a validation, the viewing patterns of 149 adolescents on these two dimensions are compared to two other measures of programming content. The information theory dimensions of form are found to detect nonrandom viewing patterns as well or better than either of the other measures of content.
|
| [58] |
From a cognitive developmental theoretical foundation, an experiment was designed to study the role of stimulus complexity in children's attention to TV commercials. One hundred twenty nursery, kindergarten, and second-grade subjects' attention to twelve commercials was observed. The commercials were controlled for content and manipulated for visual and auditory complexity with an information theory measure of entropy level. We predicted and found that the largest difference in attention to high- versus low-complexity commercials is for nursery schoolers, although in one block of commercials the difference in attention to high- and low-complexity commercials is not statistically significant across the three age groups. The nursery schoolers also are less stable in their attention than are the older children. Further analyses examine the attention profiles of the three age groups and the role of the visual and auditory channels in drawing attention.
|
| [59] |
|
| [60] |
Appreciation of the emotional tone of verbal utterances represents an important aspect of social life. It is still unsettled, however, which brain areas mediate processing of intonational information and whether the presumed right-sided superiority depends upon acoustic properties of the speech signal. Functional magnetic resonance imaging was used to disentangle brain activation associated with (i) extraction of specific acoustic cues and (ii) detection of specific emotional states. Stimulus material comprised pairs of emotionally intonated utterances, exclusively differing either in pitch range or in the length of stressed vowels. Hemodynamic responses showed a dynamic pattern of cerebral activation including sequenced bilateral responses of various cortical and subcortical structures. Activation associated with discrimination of emotional expressiveness predominantly emerged within the right inferior parietal lobule, within the bilateral mesiofrontal cortex and--with an asymmetry toward the right hemisphere--at the level of bilateral dorsolateral frontal cortex. Lateralization did not depend upon acoustic structure or emotional valence of stimuli. These findings might prove helpful in reconciling the controversial previous clinical and experimental data.(C)2002 Elsevier Science (USA).
|
1. 皮肤电导(skin conductance)被认为是自主神经系统激活的一个指标。
2. 英语语音的速度。
3. 英语语音的速度
4. 英语教学的语速。
5. 这里速度为西班牙语的语音速度。
6. 节奏(pacing)被定义为某个具体的已知的、可以引发定向反应让受众注意力集中的结构特征出现的次数(Lang,Bolls,Potter & Kawahara,
7. 安徽科大讯飞股份有限公司成立于1999年,是亚太地区知名的智能语音和人工智能上市企业,长期从事语音及语言、自然语言理解、机器学习推理及自主学习等核心技术研究并保持了国际前沿技术水平。
/
| 〈 |
|
〉 |