作者: 陈明 译 / 6201次阅读 时间: 2017年6月16日
来源: Alex Fradera 文 标签: 可重复 社会心理学

Was the “crisis” in social psychology really that bad? Have things improved? 
Alex Fradera
陈明 译


Part One: the researchers’ perspective

The field of social psychology is reeling from a series of crises that call into question the everyday scientific practices of its researchers. The fuse was lit by statistician John Ioannidis in 2005,  in a review  that outlined why, thanks particularly to what are now termed “questionable research practices” (QRPs), over half of all published research in social and medical sciences might be invalid. Kaboom. This shook a large swathe of science, but the fires continue to burn especially fiercely in the fields of social and personality psychology, which marshalled its response through    a 2012 special issue in Perspectives on Psychological Sciencethat brought these concerns fully out in the open, discussing    replication failure, publication biases, and how to reshape incentives to improve the field. The fire flared up again in 2015 with the publication of Brian Nosek and the Open Science Collaboration’s high-profile attempt to replicate 100 studies in these fields,    which succeeded in only 36 per cent of cases. Meanwhile, and to its credit, efforts to institute better safeguards like    registered reports  have gathered pace.

社会心理学领域正在受到一系列的危机的影响,这场质疑研究者的日常科学实践的危机,其导火索由统计学家John Ioannidis于2005点燃,他在一篇综述中概述了之所以要特别感谢所谓的“有问题的研究惯例questionable research practices (QRPs)”的原因。在所有已发表的社会科学与医学研究之中,可能有一半是无效的。这个大爆炸撼动了一大堆的科学,但是这场大火在社会和人格心理学领域依旧剧烈地燃烧着,《心理科学透视》2012年特刊的整理和回应,将这场担忧完全公开了,在这个特刊中,讨论了重复性失败 、出版社的偏见、以及如何重塑激励措施、如何改善这一领域。,这场大火于2015年又开始复燃,布莱恩·诺赛克(Brian Nosek)和“开放科学合作组织”(The Open Science Collaboration)在2015年高调尝试重复了这一领域的100项研究,只有36%的案例成功重复。值得赞扬的是,与此同时,努力建立像注册报告这样的更好保障措施已经开始加快了步伐。

So how bad did things get, and have they really improved? A    new article  in pre-print at theJournal of Personality and Social Psychology  tries to tackle the issue from two angles: first by asking active researchers what they think of the past and present state of their field, and how they now go about conducting psychology experiments, and second by analysing features of published research to estimate the prevalence of broken practices more objectively.


The paper comes from a large group of authors at the University of Illinois at Chicago under the guidance of Linda Skitka, a distinguished social psychologist who participated in the creation of the journal  Social Psychological and Personality Science and who is on the editorial board of many more social psych journals, and led by Matt Motyl, a social and personality psychologist who has published with Nosek in the past, including on the issue of improving scientific practice.

这篇论文来自于伊利诺伊大学的一个团队,论文指导者Linda Skitka是位杰出的社会心理学家,他参与了《社会与人格心理学》杂志的创立,他还是众多社会心理学期刊的编委,这个团队由社会和人格心理学家Matt Motyl带领,Matt Motyl曾经和Nosek一起发表过一篇提高科学实践问题的论文。

Psychology research is the air that we breathe at the Digest, making it crucial that we understand its quality. So in this two-part series, we’re going to explore the issues raised in the University of Illinois at Chicago paper, to see if we can make sense of the state of social psychology, beginning in this post with the findings from Motyl et al’s survey of approximately 1,200 social and personality psychologists, from graduate students to full professors, mainly from the US, Europe and Australasia.


Motyl’s team began by asking their participants about the state of the field now as opposed to 10 years ago. On average, participants believed that older research would only replicate in 40 per cent of cases – quite close to Nosek’s figure – but they believed that research being conducted now would have a better rate, about 50 per cent, and that generally the field was improving itself in response to the crisis

Motyl团队首相询问了该领域现在的状况与10年前有何不同。平均而言,参与者认为,过去的研究只有40%的可重复性——这与Nosek的数字很接近——但他们认为 ,现在进行的研究将会有更好的可复制率,大概50%左右。而且,总的来说,这个领域正在进行自我改善,以应对危机。

Motyl’s team also canvassed the respondents on a range of questionable research practices, sketchy behaviours like neglecting to report all the measures taken, or quietly dropping experimental conditions from your study. Thanks particularly to work by Joseph Simmons, Leif Nelson, and Uri Simonsohn, we understand just how much these practices compromise the assumptions of scientific significance testing, making it easy to produce false positive results even in the absence of fraudulent intent. In their words, QRPs are not wrong “in the way it’s wrong to jaywalk”, the way that researchers have often implicitly been encouraged to think of them, but “wrong the way it’s wrong to rob a bank.”

Motyl的团队还向受访者询问了一系列有问题的研究行为,比如,粗枝大叶的玩忽报告所有采取的措施,或者悄然无息的从研究中删除实验条件。特别感谢Joseph Simmons, Leif Nelson, 和Uri Simonsohn的工作,让我们了解了这些做法在多大程度上损害了科学意义检验假设,这些做法很容易在不存在欺诈意图的情况下产生假阳性结果。用他们的话来说,QRPs“在马路上乱窜”是没有错的,鼓励研究人员以毫无保留的方式为自己考虑是没有错的,但是“抢银行就是误入歧途了。”

Previous surveys of researchers’ own QRP usage have uncovered high levels of admissions, as if the field was rushing to the confession box to purge their sins. Here, Motyl’s team used finer-grained questioning to look at frequency (often a “yes” turned out to be “rarely” or “once”) and justification. In some cases, a researcher’s justification showed that they had misinterprete the question and that they were actually expressing strong disapproval of the QRP – in fact, this seemed to be the case in virtually all “confessions” of data fabrication. In other cases, the context provided by a justification painted the particular research practice in a completely different light

之前对研究者本身的QRP习惯调查已经建立了更高的入场门槛,就好象这个领域正在慌忙地赶去教堂忏悔室以清洗他们的罪恶。在这里,Motyl的团队使用更细致的质询( 将问卷选项中的“是”变成了“很少”或“一次”)来查看分布的频次和过失情况。在某些情况下,研究者的辩解表明他们对这个问题的误解,而且,他们实际上是在表达对QRP的强烈反对——事实上,几乎所有数据造假者的“忏悔供词”看起来都是这种情况。其他情况是,由正当理由提供的上下文以一种完全不同的方式描绘了不寻常的研究实践。

For example, consider the seemingly dodgy decision to drop conditions from your study analysis. If your rationale is that the condition didn’t turn out to do what you want to do – in an emotion and memory study, your sad video didn’t produce a sad mood in participants, for instance – it’s actually more problematic to keep what is effectively a bogus condition in your analysis than it is to exclude it (ideally in a principled way according to a registered procedure). For the new survey, independent judges evaluated all the stated justifications, and felt they legitimised the “questionable” practices in 90 per cent of cases


Discovering these misunderstandings and justifiable practices littered through the QRP data led Motyl’s team to conclude that pre-explosion psychology practices aren’t as derelict as once feared, although the fact that 70 per cent respondents said they are now less likely to engage in many of these practices than ten years ago suggests that all was not entirely virtuous back then.


So not perfect, but getting better, is the take within the field: a cautious optimism compared to some dire pronouncements on the state of psychology. In Part Two, we’ll look at the body of psychological research itself, to see if this optimism is justified



A new paper  in the Journal of Personality and Social Psychology has taken a hard look at psychology’s crisis of replication and research quality and we’re covering its findings in two parts.


In  Part One, published yesterday, we reported the views of active research psychologists on the state of their field, as surveyed by Matt Motyl and his colleagues at the University of Illinois at Chicago. Researchers reported a cautious optimism: research practices hadn’t been as bad as feared, and are in any case improving.

我们在第一部分中公布了活跃的心理学家对他们领域之状态的观点,例如Matt Motyl和他的同事们在芝加哥伊利诺伊大学的调查。研究人员报告了一种谨慎乐观的态度:研究实践并没有想象的那么糟糕,而且无论如何都在改善。

But is their optimism warranted? After all,  several high-profile replication projects have found that, more often than not, re-running previously successful studies produces only null results. But defenders of the state of psychology argue that replications fail for many reasons, including defects in the reproduction and differences in samples, so the implications aren’t settled.


To get closer to the truth, Motyl’s team complemented their survey findings with a forensic analysis of published data, uncovering results that seem to bolster their optimistic position. In Part Two of our coverage, we look at these findings and why they’re already proving controversial.


Motyl and his colleagues used a relatively new type of analysis to assess the quality and honesty of the data found in over 500 previously published papers in social psychology. Their approach is technical, involving weirdly-named statistics conducted upon even more statistics, so it helps to use an analogy: Just as a vegetable garden produces a variety of tomatoes, some bigger than others, some misshapen, some puny and poor for eating, an honestly-conducted body of research should bear a range of fruit in the same way. True experimental effects shouldn’t always come out exactly the same: they should vary in size from experiment to experiment, including instances when the effect is too small to be statistically significant.

Motyl和他的同事们使用了一个相对较新的分析方法,来评估以往500多篇已发表的社会心理学论文中数据的质量和诚实性。他们的方法是技术性的,所涉及的古怪统计名词进行了更稀奇罕有的统计措辞,用一个比喻来说:这些研究就像是一个蔬菜园。出产了的各式各样的西红柿,有的比其他的更大,有的畸形,有的没长大而且不好吃,一个诚实的管理研究机构应该以相同的方式为某个范围内的水果负责(bear)。真正的实验效果不应该总是完全相同的:实验和实验之间的大小应该不同, 包括效果太小而不具有统计学意义时的实例子。

These are the sorts of things you can evaluate in a body of research – in this case with the Test for Insufficient Variance, which Motyl’s study used alongside six other indices. When there were too many irregularities in the data, or bizarre regularity like identikit supermarket tomatoes, this suggested to Motyl and his colleagues that questionable research practices may have been used to make the weak results swell up to reach the desired appearance.


Crucially, however, the study found that more often than not, the indices showed low levels of anomalies, suggesting research practices are more likely to be acceptable than questionable. This was the case for studies from 2003-4, before the crisis was fully acknowledged, and the researchers found an even better picture for more recent (2013-14) papers. The fruits of the research may have been tampered with from time to time, but there was no case that the entire enterprise was “rotten to the core”.


This optimistic conclusion conflicts with similar analyses performed in the past, but this might be explained by the different approaches of collecting the data – of gathering the fruit, if you will. Past approaches automatically scraped articles for every instance of a statistic, such as every listed p-value. But this is like a bulldozer ripping out a corner of a garden and measuring everything that looks anything like a tomato, including stones and severed gnome-heads. To take just one example, articles will often list p-values for manipulation checks: confirmations that an experimental condition was set up correctly (did participants agree that the violent kung-fu clip was more violent than the video of grass growing?). But these aren’t tests to determine new scientific knowledge, rather – turning to another analogy – the equivalent of a chemist checking their equipment works before running an experiment. So Motyl’s team took a more nuanced approach, reading through every article and picking out by hand only the relevant statistics.


However, all is not rosy in the garden. At their Datacolada blog, “state of science” researchers Joseph Simmons, Leif Nelson, and Uri Simonsohn, have already responded to the new analysis and they’re sceptical. Simmons and co first note the daunting scale of the new enterprise: to correctly identify 1800 relevant test statistics from 500 papers. In an online response, Motyl’s team agreed that yes, it was time consuming, and yes, it required a lot of hands: “there are reasons this paper has many authors: It really took a village,” they said.

然而,花园里的一切并非称心如意。“科学状态”的研究则Joseph Simmons,Leif Nelson,和Uri Simonsohn,已经在他们的datacolada.org博客中对新的分析作出了回应——他们对此持怀疑的态度。Simmons和联合作者首先注意到了新计划事业的令人生畏的规模:在500篇论文中,需要正确地识别1800个相关的测试统计!Motyl的团队在网上的一个回应中说,这的确很费时间,这需要许多的人工:“这篇论文有很多作者的理由是,他真的需要占领了一个村庄。”他们说。

But Datacolada sampled some of the statistics that Motyl’s team used in their assessments and they argue that far too many of them were inappropriate, including data from manipulation checks that Motyl’s group had themselves categorised as statistica non grata. To the Datacolada team, this renders the whole enterprise suspect: “We are in no position to say whether their conclusions are right or wrong. But neither are they.” In their response, Motyl’s team make some concessions, but they argue that some of the statistic selection comes down to difference of opinion, and defend both their overall procedure, and the amount of coding errors they expect their study will contain. So….




So doing high-quality science isn’t straightforward. Neither is doing high-quality science on the quality of science, nor is gathering everything together to form high-quality conclusions. But if we care about the validity of the more sexy findings in psychology – the amazing powers of power poses to make you physically more confident, how you can hack your happiness simply by changing your face, and how even subtle social signals about age, race or gender can transform how we perform at tasks – we need to care about psychological science itself, how it’s working and how it isn’t. (By the way, those findings I just listed?They’ve all struggled to replicate.).)

所以做高质量的科学并不简单。无论是高质量科学研究的高质量,还是把所有的东西聚集在一起,形成高质量的结论。但是,如果我们关心心理学更性感的发现——更有力量的姿势能够增强你的自信【译注:Amy Cuddy: 肢体语言塑造你自己】——的有效性,你怎么能够只需改变你的表情,你的年龄、种族、性别这些我们可以在任务中表现的社会信号来破解你的幸福?——我们更关心的是心理科学本身,它是如何工作的,以及又是如何无效的。(顺便说一下, 他们都在努力复制我刚才列举的发现。)

There are surely ways to to improve the methods of this new study – perhaps not coincidentally, Datacolada’s Leif Nelson is running a similar project – but even if the new assessment does include some irrelevant statistics, it will likely be an advance on past analyses that included every irrelevant statistic.

肯定有改善这一新研究的方法——也许不是巧合,datacolada的 Leif Nelson运行了一个类似的项目,但是,即便新的评估不包括一些不相关的数据,这可能推动以往的分析,包括每一个不相关的数据。

So … the new insights have budged my position on the state of science a little: I’m still worried, but I can see a little more light among the dark. Motyl’s group make the case that social psychology isn’t ruined, that the garden isn’t totally contaminated. I hope so. But it’s not hope on its own that will move our field forward, but research, debate, and making sense of the evidence. After all, psychology is too good to give up on.


The State of Social and Personality Science: Rotten to the Core, Not so Bad, Getting Better, or Getting Worse?


TAG: 可重复 社会心理学
«我们与前任保持友谊的原因 科普
延伸阅读· · · · · ·