Use Mac OS to generate speech files from text

Here’s a free (if you’ve got access to a Mac), quick method for creating spoken experimental stimuli, which I’ve implemented in this project and now use regularly in my lab. Hat tip to Richard Morey for suggesting this and writing a few lines of code that saves many hours of tedious voice recording.

  1. Create a plain text file with the texts you want to create .wavs for entered on separate lines. Line break after each word or phrase that you want to generate a sound file for. Name your text file “words.txt”. It should look something like this one we recently used to generate audio files for a digit span task: 
  2. Open iTerm, and navigate to the folder where your “words.txt” file is located. Here’s a lesson on how to navigate via command line. TLDR: the “cd” (current directory) command is what you need. cd .. will move you to the directory one above where you are. The “ls” command lists the folders in the current directory.
  3. Type the following into iTerm:

In a few seconds, the directory containing “words.txt” will also include a .wav file for each entry from “words.txt”.

This code uses the MacOS voice “Kate” (a female voice with a British accent). You can replace “Kate” with any voice available on your system by removing “Kate” from the code and typing your preferred voice instead.

Text-to-speech isn’t  perfect. You want to check that your .wav files sound like the text you meant for them to sound like. Different voices sometimes pronounce the text in different ways, so one solution to weird-sounding speech is to try another voice. You can also try alternative spellings to get the right pronunciation.

Share this post:

Individual differences in experiencing visual imagery

One frustration with studying visual memory is that we have no way to directly experience what someone else sees in their mind. When trying to understand individuals’ memories about uncontrolled events, we rely mainly on verbal descriptions of visual memories and individuals’ feelings of confidence in the clarity and vividness of those memories. There is a lot of variability here. Some people believe that their visual memories are as clear and high-fidelity as a photograph (though there are very few examples of this fidelity being put to a convincing objective test; see this article for an overview). Other people report experiencing no visual imagery at all (see here and here for descriptions and discussion), even when asked to imagine something that should be extremely familiar. Most of our experiences probably fall somewhere in between these extremes.

My student Gintare Siugzdinyte is currently studying individual differences in visual imagination experiences, and hoping to attract a large and diverse sample to respond to a short questionnaire. Later in 2017 after we have analyzed these data, I can provide an update about what we learned about visual imagery from this survey. You can find our survey here. Thanks very much for your help!

Share this post:

Measurement of visual memory is not always confounded by verbalization

If you study visual memory, you can bet that you will encounter skepticism about whether your participants really remembered visual images. Maybe they only remembered verbal labels corresponding to your stimuli. Certainly, there are many published studies of “visual” memory that merit skepticism on this point. Unlike studying verbal memory, where participants can convey what phonemes they are thinking of with high fidelity via speech, communication of visual memories must be mediated by speech or gesture. You cannot directly show someone else what the mental image you remember looks like. In visual memory research, it is often unclear whether we are measuring memory for what something looked like based on representation of a visual image or a representation of how we intend to communicate what something looked like, two completely different things. 

One method that minimizes these concerns is visual change detection. Visual change detection paradigms use arbitrary and unique arrangements of colored shapes as stimuli. Participants view the stimuli very briefly and indicate whether the test item has changed or not by pressing a button. This plausibly minimizes dependence on verbal labels for several reasons. Simply remembering all the color or shape names would not reliably lead to correct responses because it is also important to know where they were situated. Exposures are too brief for all the components (plus some spatial cue) to be named. Responses are not spoken, which should further discourage verbal labeling. And if you’re still wary of the possibility of verbal labeling, you can make participants engage in articulatory suppression, where they repeat some irrelevant and easy-to-remember sequence (e.g., “super, super, super”, “the, the, the”) during the task to further discourage verbalization. 

Imposing articulatory suppression is a pain for the researchers collecting data. Participants understandably do not like continuously speaking aloud, and researchers have to actually monitor them to ensure they comply. I’ve done my time in this respect. Most of the participants I ran for my MSc and PhD research engaged in articulatory suppression. I’ve spent hundred of hours listening to research participants chant arbitrary sequences. I’ve commiserated with them about how tiresome it is. Once, I excused a participant after he spontaneously switched to droning a novel sequence of curse words. (I wasn’t offended by the profanity, but I judged that this insubordination reflected a withdrawal of consent.) Sometimes I had a good reason, grounded in hypothesis testing, for imposing articulatory suppression. Usually though, I decided to impose suppression only on the chance that a peer reviewer might not be persuaded that I had really measured “visual” memory unless participants had been suppressing articulation. I know I’m not the only visual memory researcher who has made this decision.

My colleagues and I (led by Rijksuniversiteit Groningen PhD candidate Florian Sense) have recently shown that this precaution is unnecessary in a paper in Behavior Research Methods. We compared performance on a typical visual change detection task with performance on a modified task designed to increase the likelihood that participants would try to generate verbal labels. In the modified task, colored items were presented one-by-one so that participants could more easily verbalize them. Participants completed half the trials silently, which should allow opportunity for labeling, and half with articulatory suppression, which should discourage labeling. If labeling occurs and boosts memory for these stimuli, then we should have observed better performance with sequential presentation and no suppression. However, this wasn’t the case. Our state-trace analyses, designed to detect whether we were measuring one mnemonic process (presumably maintenance of visual imagery) or multiple mnemonic processes (maintenance of imagery plus maintenance of verbal labels) yielded consistent evidence across all participants favoring the simpler explanation that a single mnemonic process was measured in all conditions. 

I don’t doubt that there are circumstances where verbal labels influence memory for visual imagery, both for better and for worse. That is not in dispute. But just because there are instances where we know this happens doesn’t make it reasonable to claim that verbal labeling is always probable. When stimuli are unfamiliar and abstract, and exposures are brief, visual change detection paradigms appear to offer a pure measure of memory for visual imagery uncontaminated by verbal labeling. 

Share this post:

I, for one, welcome our statcheck overlords

My spell-checker is stupid. It recently wondered whether the URL for my website,, shouldn’t be “scaremongering”. It thinks proper names are mis-spelled words. This is one reason why we don’t let computers write our papers. It would also be pretty stupid of me not to let software help me catch the spelling mistakes that I’m unlikely to notice. In the same way, the statcheck package is a great tool. Unless there is something horribly awry, I’m unlikely to notice if a p-value doesn’t match an F-value for its degrees of freedom, even though this is the hook on which the interpretation of data regularly hangs, the fact that justifies discussion of some phenomenon in the first place. If we messed that up, we’d really like to know, whether we are author, reader, meta-analyzer, or peer-reviewer.

Now that statcheck is being pre-emptively deployed – Psychological Science has announced that they’re running it on submitted manuscripts, and the group behind statcheck has deployed it on thousands of papers – there’s a hum of panic. What if we’re wrong? Won’t this stigmatize great scientists for nothing worse than typos? It’s as though there are scientists out there who think they have never committed a mistake in print. I bet we all have. I know for a fact I have. Acknowledging that everyone makes errors should really take the stigma out of this criticism and let us let statcheck do what it is good for – help us be better.

Anxious reactions to mass deployment of statcheck have all supposed that exposing mistakes in our work will make us look like bad researchers. But if we believe we all mess up, then researchers’ reactions to criticism are what tells us whether they are careless or conscientious researchers. If you think that your work is important at all, if you think that your data and analyses are leading up to some truth, then surely it matters whether the details you report are correct or not. Surely you would want to make it as easy as possible for readers to assess your work. The conscientious scientist who enables and invites criticism will eventually be caught in error, but then that error will be corrected, and the work will be better than it was. If no one has ever detected in error in your work, is it because there aren’t any, or because you’ve made it impossible for anyone to find them?

Researchers who are taking care of quality control deserve our respect. Mere days after we announced the winners of the Journal of Cognitive Psychology’s 2015 Best Paper Award, the corresponding author Vanessa Loaiza emailed me to say that she had recently been re-analyzing the data, and had uncovered a smattering of coding errors that occurred as they transferred scores from pencil-and-paper forms into digital data frames. The authors wanted to issue an erratum correcting the record, even though these mistakes did not change the inferential outcomes. Abashed, the team of authors also felt they may no longer be worthy of the award because they had acknowledged that they made a mistake. It would be ridiculous if we thought more of a hypothetical research team who had never re-examined their data, discovered and corrected their mistakes, or a research team who discovered them but decided not to rectify them, than one who acted and responded as Loaiza and colleagues did. I’m proud that the Journal of Cognitive Psychology honored a group that is so honestly committed to high quality research.

I was troubled that these authors felt like acknowledging a mistake should somehow lead us to discount the quality of their work, even though I know the feeling personally. A few years ago, I issued an erratum to correct errors that occurred when I carelessly copy-pasted code from one analysis to another. These errors were discovered by a colleague after I shared data and analysis scripts with him. I can feel my face flush just recalling how embarrassing this was. If I were unwilling to make my code available, no one, maybe not even me, would have ever have known about this blunder, and I would never have felt this acute embarrassment. But though the errors didn’t affect the inferences we reported, they affected the values in a way that puzzled my colleague. Would it really have been better for my reputation to have a colleague quietly wondering what could be wrong with my analysis rather than knowing for certain that I made a dumb but benign editing mistake, and then corrected it? I think the long-term consequences for my reputation would have been worse had I not made the data and code available for inspection. I just would not have been aware that respected colleagues were suspicious of the quality my work.

Nuijten and colleagues’ work suggests that reporting errors are more prevalent than we imagined. Statcheck only helps us find reporting errors occurring at the final step of the data processing and analysis workflow. Undoubtedly we are making even more errors than they have revealed. We should be worried when we encounter a colleague who isn’t worried about quality control, not when we hear that a colleague is correcting mistakes. We should embrace tools like statcheck, but go even further to ensure quality by also welcoming criticism of the interim steps leading to our reported analyses.

Share this post:

Interpreting nulls, even surprising ones, is not trivial

Sometimes I design an experiment really wondering what will happen, but that wasn’t the case when I first decided to compare proactive interference effects for verbal and visual memoranda. Proactive interference occurs when some information you have previously memorized disrupts your ability to learn new information. For example, having studied Spanish in high school could impair learning Italian vocabulary now: as you try to retrieve the now-relevant Italian word, you risk retrieving the previously-learned Spanish word instead. Learning new information also conflicts with remembering older information (that’s retroactive interference). Proactive interference is particularly interesting for me though because it helps address what happens to briefly presented and retained information like the arbitrary word lists or color patterns we ask participants to try to remember in short-term memory tasks: you can only observe proactive interference if the earlier stimuli remain in mind. If new stimuli overwrite these arbitrary lists or patterns, as is expected in many short-term memory models, then proactive interference wouldn’t occur. Testing whether proactive interference occurs for both verbal and visual memoranda seemed like an important way to assess whether parallel short-term memory systems operate for both kind of material.

I had recently conducted a series of studies including a verbal recognition memory task designed to be analogous to visual recognition memory tasks, and I realized that it would be great to use these materials to estimate the size of proactive interference for verbal and visual information. There were two possible outcomes, each of which would be interesting: 1) proactive interference would be observed for verbal, but not visual memoranda or 2) proactive interference would be observed for both kinds of materials. We designed a release-from-proactive-interference experiment. Participants observed a string of trials in which stimuli were drawn from the same set of similar materials. In the verbal task, this was a closed set of phonologically similar words (for instance, words that were close phonological neighbors to the word “cat”) and in the visual task, it was a closed set of similar colors (e.g., varying from blue to yellow). After drawing stimuli from this same set several times, we switched to a different set constructed in the same way. For verbal stimuli, this was phonological neighbors of “men” and for visual stimuli, it was colors varying between blue and red. If we hold lingering memories of the early trials that interfere with our ability to remember precisely which sequence of words that sound like cat we heard most recently, then performance should grow worse and worse as we keep drawing from the “cat” set. Then, when we switch to the “men” set, performance should spike because the fading memories of cat-like combinations cannot be confused for these novel men-like combinations. Lots of previous literature predicts that this pattern would be observed for verbal memoranda, and our study would establish whether something similar or different occurs with comparably constructed visual memoranda.

Only we didn’t find much release from proactive interference with verbal memoranda. The graph below give estimates of verbal task capacity averaged across 18-trial blocks. The switch to a new stimulus set occurred on trial 13. In our first experiment, we did observe performance in the right direction: on average, performance got worse from Trial 1 to Trial 12, and then got a little better again after trial 12. But not by much at all.


This picture of proactive interference accumulation and release with verbal memoranda, though detectable in our modeling, didn’t provide a compelling comparison for the utterly null effects we found with visual memoranda:


These figures came from our first attempt. Though the evidence for proactive interference in the verbal task was underwhelming, at first we believed that we had a small but legitimate effect that was simply being masked by something trivial about our task or design, and with tweaking, we could make it emerge more strongly. At this point, I’ve conducted several follow-ups, and it never gets clearer. Sometimes we find no release of proactive interference with verbal stimuli, sometimes we find a very small effect. We don’t see it when participants have to reconstruct the stimuli rather than recognize them (at least, not yet). We don’t see it for verbal materials that are recalled via speech (at least, not yet). We never see it with visual memoranda, but that finding is difficult to interpret without the clear, expected verbal effect as a foil. A summary of the results of our series of studies on this, along with the data, analysis scripts, and experimental paradigms is available on OSF. This summary document outlines what we’ve done so far.


I’m happy to #BringOutYerNulls and let anyone examine our progress with this; in fact, these results have been available on OSF for quite a while and I’ve discussed them with a few colleagues. But interpreting a null effect, even in light of the strong expectation that an effect should have occurred, isn’t easy. There are many reasons why this might not have worked, and many of them are really boring. Possibly, these null results are really important: maybe there are boundary conditions on proactive interference that limit its generality, and if we firmly established these conditions we would know something novel and fundamental about memory.

Share this post:

Pre-emptive open science is fairer open science

The Peer Reviewers Openness Initiative is a grass-roots campaign to empower reviewers to demand greater transparency in scientific research. Simply put, peer reviewers promptly request that authors “complete” their manuscript by making the data and research materials publicly accessible, or provide a justification for why they have not done so. I encourage you to read the PRO initiative, and if you support its aims, sign it.

One advantage of pushing for not just open science but pre-emptive open science is that making data and materials accessible at publication will better ensure that the same opportunities to benefit from knowledge are available to everyone. Most of us offer an available-upon-request policy for letting other researchers access our data. But this available-upon-request model is inadequate: it is inconvenient for both the author and the requester, and I argue that it is also one more opportunity for our motivations and limited resources to selectively disadvantage junior researchers.

When we haven’t preemptively prepared data and materials for sharing, these impromptu requests are inconvenient. It is tempting to find a reason why you cannot spend the time to locate and curate the materials right now: too busy this week, not sure where the data are, need to check it for anonymity, etc. There are plenty of plausible reasons for delay, and as the lag between the request and compliance increases, the chance that you forget the request increases. But the enthusiasm you feel for complying with a request for data likely varies depending on who requested it. Is it a rival? Is it an unknown student from a foreign country? Is it a potential peer reviewer? Is it a prominent colleague who is likely to have influence over whether your next grant application is funded or whether you are promoted? 

I bet some requests are far less likely to be forgotten than others, and compliance in certain cases is likely to be prompt and enthusiastic, not reluctant. These differences tend to place junior researchers who want data or materials to facilitate their own projects at a disadvantage. Anecdotally, I’ve requested data or experimental materials on many occasions, and I’ve sometimes directed my students to request data, naively thinking that this would give them a chance to experience a pleasant, productive interaction with another working scientist. My personal success rate in getting the requested materials is something like 50%. I estimate that I got at least some reply 90% of the time I attempted to get data or materials on request, even if the author declined to share. My students’ success rate, defined as getting any acknowledgment of their request, has been closer to 10%. Because students’ projects are usually time-sensitive, sluggish responses to requests (or waiting for weeks to get no reply at all) can make a huge difference in the amount and quality of work that student can produce.  

Complying with a request for data or materials may seem like a chore, but I think it is a mistake to assume there is nothing in it for you. Sharing data and materials increases your reputation among your colleagues. It is a display of confidence in your lab’s work. Availability of your published data or materials is likely to increase citations of your original work. In my experience, sharing or requesting data has occasionally resulted in collaborations which produced novel, jointly co-authored research. If our interest is to get important work done, helping a colleague by sharing materials and data should be rewarding for everyone because it reduces duplicated efforts in collecting data or programming experiments and analyses. These are benefits that should be available to all scientists, not just the ones we most hope to impress. 

I think sharing data and materials should be the norm, and currently it isn’t. In order for this to be convenient and fair, we need sharing to be done preemptively, not just on sporadic request. By joining the PRO initiative, you can help make this happen. You can also express your support for the PRO initiative by using one of our badges as your avatar on social media, or by placing a badge or banner on your website.


Share this post:

A visit from the Ghost of Research Past

A request for an old data set recently afforded me the opportunity, much like Ebenezer Scrooge, of revisiting my Past-Self when I was a brand-new post-graduate student, and allowing Past-Self and Future-Self to help me critique how my lab curates our data and materials in the present day. Both Past-Self and Future-Self are compelling agitators for a proactive approach to opening data, especially implementing a Data Partner scheme. 

Openness about our work is consistent with believing that the work is important and excellent. Being asked for access to your work is an acknowledgement that it is valuable, and sharing it is an expression of your confidence in its value. I’ve found openness to be rewarding, leading to additional citations, gracious acknowledgments, and sometimes new collaboration opportunities. 

However, requests for data or materials fluster us, arriving out-of-the-blue. It always seems necessary to perform fresh checks: Is the code understandable and functional? Data may need to be explained and possibly tidied: what do the column headings mean again? Could there be identifying details in any of the responses? I might spend hours performing these checks before complying with a request. 

Waiting until the request arrives to open up data and materials can be seen as a tacit judgment on the expected impact of the data. Why, if I believe the work I do is worthwhile, am I not preparing it for public consumption before I publish it? When did I start imagining that no one was likely to be interested in re-analyzing my data or using my experimental code?  

Recently I was asked for data from the first paper I ever published, part of my master’s research project, which were collected in autumn 2002 and published in 2004. Possibly, sharing data that has been untouched for more than 10 years is asking too much. It wouldn’t have been strange if I had lost it in institutional moves and computer crashes, or if it proved impossible to adequately document. But if found, going through these data would give me an opportunity to pay a visit to my Past-Self, recall what it was like to begin a research project for the first time, and maybe learn something from her.

One thing that struck me as I examined Past-Self’s data is that Past-Self organized it expecting that other people would be looking at it. Past-Self inserted comments explaining what numeric codes meant. Past-Self wrote summaries of the purpose of experiments, and Past-Self organized files into hierarchical directories with sub-folders for data files, analyses, and experimental stimuli. I think it would have surprised Past-Self that no one would ask to look at this information until 2015. Past-Self thought this work was important and documented it accordingly.

Though Past-Self began as a data-sharing idealist, she had minimal skills for curating data and materials. Some organization elements improved drastically in the later experiments in her project. Past-self learned it is better to make category codes self-explanatory (e.g., why assign “male” or “female” to arbitrary numeric codes instead of just entering the words?). Past-self developed sensible conventions for naming files. Past-self reduced redundancies in data recording. 

But though some practices improved, it also became clear that Past-Self abandoned the expectation that anyone apart from her and her supervisor would ever see these raw data and materials. As the project drew on, the helpful comments disappeared, and the summaries for subsequent experiments were unchanged from the earliest ones. The whole directory was organized around an 8-experiment master’s project, which eventually resulted in the publication of three experiments in two separate papers. Past-Self never re-organized these materials so that it would be immediately obvious how to locate the materials pertaining to each paper specifically.

Altogether I interrogated Past-Self for about 5 hours: we located the data sets requested, established through re-analysis that they did in fact include the same data that were published, saved them in an accessible non-proprietary format, documented what the data sets contained and how these variables were coded, and published the data and guidance on Open Science Framework. On the one hand, that isn’t terrible. My Future-Self, who checked in throughout this process, insists that 5 hours of work accomplished now is a sound investment. It enables a colleague on the other side of the planet to do a meaningful new analysis, from which we might all learn something novel. Furthermore, those data are now available to anyone else who might have other ideas for how our data can be useful. Future-Self insists that this will lead to glory. On the other hand, this 5 hours of work entirely replicated work that Past-Self did more than 10 years ago in her haphazard manner. If Past-Self had carried on carefully documenting her data, if she had considered that materials should be available in commonly accessible formats, and if she had updated her personal repository to reflect the published record, then these materials would have been ready for sharing upon request in minutes, not hours. Future-Self is anxious to know how I am going to prevent this waste of time. Past-Self wonders whether I can do more to help my trainees learn good habits.

What, if any, are the constraints to proactively curating lab work? Proactive curation is obviously desirable for Future-Self: it saves her time and effort and it increases the impact and utility of the work. It is arguably good for trainees and PIs alike. Because I work with many short-term trainees, I have handled most data curation myself, but this is a valuable skill that Past-Self needed to learn better, and that Future-Self wants delegated. The Data Partner scheme is ideal for this: my trainees can be paired with trainees from a colleague’s lab, and these two students will help each other curate data by seeing whether their partner’s work is clear, self-explanatory, and reproducible. They do this independently of me. When the data are shown to me, they have already been vetted by one other person, providing an additional chance to catch mistakes. My trainees get the practice that Past-Self lacked, and Future-Self will never wonder whether data and materials are ready to be shared.

Are you at Psychonomics 2015? Come to our talk, Open Science: Practical Guidance for Psychological Scientists, Friday at 10:40 am, in the Statistics and Methodology II session.

Update: Check out Lorne Campbell’s thoughts on this too.


Share this post:

Bayesian methods and experimental design

At our symposium on Bayesian Methods in Cognitive Science at the meeting of the European Society for Cognitive Psychology, we talked about the advantages using Bayes factors for inference. I talked about a logical hypothesis test about the role of attention in binding that has frequently resulted in non-significant interactions (e.g., Morey & Bieler, 2013) and Evie Vergauwe described the freedom that comes from realizing that you don’t need to let the possibility of a statistically significant effect restrict experimental designs. This struck me; I remember feeling this too. After I first learned to compute Bayes factors (BF), and on this first attempt observed very large BFs favoring a null hypothesis, I thought exultantly of every experiment I ever did that turned up non-significant results and thought that now I can return to these data and learn something from them, maybe even publish them. Trying this confirmed for me that sometimes the design of an experiment was not sufficient to tell me much of anything at all, no matter what statistic I applied to the data.

An example: The second data set I analyzed using the BayesFactor package was part of a collaboration with Katherine Guérard and Sébastien Tremblay. We were testing hypotheses about maintaining bindings between visual and verbal features. We asked participants to remember a display of colored letters. At test, participants sometimes made judgements about a colored letter, indicating whether that particular combination had been shown or not. Other times participants made judgments about an isolated feature. In separate blocks, these kinds of tests were either isolated or blended, so that we could test whether expectations about whether it was necessary to remember binding affected feature memory.

We wanted to compare the possibilities that 1) encoding binding enhances performance, or 2) that only retrieving binding matters. If binding helps feature memory because multiple copies of the feature are stored are in different modules (e.g., Baddeley et al., 2011), then feature memory should be better when feature-tests are mixed with binding tests, creating the expectation that one must try to remember binding, than in blocks with only feature tests. However, if retrieving bindings matters, then feature memory should improve when tested with a two-feature probe, rather than with a single-feature probe. If differences appear only at test, then one can’t argue that any advantage conveyed by binding occurs because of a filter on what features are encoded. We manipulated feature similarity (in this case, color similarity) so that we could potentially find an interaction between the cost of similarity and binding context, such that the expectation to remember bindings mitigated decreases in accuracy expected in the high-similarity conditions.

In a preliminary experiment, this interaction between color similarity and binding context on color feature tests was non-significant. This is the large null BF that I expected might change our fortunes and help us decide what to believe. The BF against this interaction was underwhelming, about 4, not enough to help us to a confident conclusion.

Rather than collecting more subjects on the same design (which we could legitimately do, now that we had converted to Bayesian analysis), we decided we would have a better chance of strengthening our findings if we increased the amount of data per participant and tightened our experimental design. We tested two memory set sizes (3 or 5 items) rather than three (3, 4, or 5 items). We asked each participant to undertake two experimental sessions, so that we could run the binding context blocks in two different orders within each participant on different days (order did not matter much in our preliminary studies, but it was an additional source of noise). We also used a single similar-color set rather that two comparable single color sets. These changes both increased the amount of data acquired per participant and decreased potential sources of noise in our original design arising from variables that were incidental to testing the hypothesis. With the new data set, the crucial BF favoring the null was 10 – much more convincing.

Bayes factors are a really useful research tool, but of course they complement (not replace!) good experimental design. Everything you already know about designing a strong experiment still matters, perhaps more than ever, because simply surpassing some criterion is no longer your goal. If you are using Bayes factors, you want values as far from 1 as possible. You must assume that if you are making an argument that is controversial at all, the number that convinces you is likely to be smaller than the number that might convince anyone with an antagonistic outlook. This could change the time-to-publish dynamic: rather than rush out a sloppy experiment that luckily produced a low p-value, it may become strategically wiser to follow-up the preliminary experiment with a better one, which should yield a more convincing BF. This way of thinking also reduces the circular notion that if you observed p-values less the criterion, then your experiment must have been decent. Whichever hypothesis is supported, designing a well-considered experiment with minimal sources of noise and a strong manipulation is always an advantage. As a researcher I find this comforting: something I was trained to do, and that I know how to do well, matters in the current skeptical climate.

Share this post: