Skip to content

What’s More Effective in Online Video: Text or Dialogue?

Guest Blogger

Guest Blogger

Jim Clinton, PhD & Joe Magliano, PhD

In an effort to bring academic findings into the business world, we’ve been reaching out to researchers across the country to share how their areas of expertise affect the PR & digital marketing world. Today, Jim Clinton, PhD and Joe Magliano, PhD of Northern Illinois University share thoughts on visual media.

Whether it’s a 15-second pre-roll ad on YouTube or a 90-second explainer video, marketing professionals all face the challenge of striking the delicate balance between presenting text and/or dialogue alongside images in online video. The goal is to engage the viewer and hope that they will remember the content of the ad and act on it. So when is it justifiable and worth the investment to recruit voice actors to speak scripted dialog in lieu of the more cost-effective means of showing text on the screen?

This blog explores this issue from the perspective of cognitive psychology-an area of psychology that researches how people experience, remember, and generally make sense of the world around them. This area of research has much to say about this issue.

Walter Dill Scott (1869-1955) was one of the first psychologists to apply psychological principles of persuasion to advertising.   He showed that using direct commands in language (e.g., “buy this product”) affects consumer behavior. These commands are particularly effective when accompanied by images, but only when it is easy to understand and process the language and its relation to the image. Right now, you may have a strong desire to “Get Duffed!”

duff image

Cognitive psychologists have shown that pairing language with images works because it makes the message more memorable (Paivio, 1986). This happens because there are two ways to remember the message, and specifically, the language and the image.   You remember the image, then you remember the command.  You remember the command, then you remember the image.

However, let’s return to the original question posed in the blog. When do you invest in voice actors to convey the language in an ad? This gets tricky with video advertisements.

Anyone who has seen a subtitled film can tell you that the constant switching back and forth between reading and viewing can be exhausting. This is because we as viewers have a limit on how much information we can handle at any given moment in time-something called our working-memory capacity (Baddeley, 1992). The cost of alternating back and forth between reading and viewing is that a viewer might miss a key line of dialog, a critical moment of action, or both.

So again, how do we balance the presentation of text and/or dialog alongside images as to not overwhelm the audience? Psychologist Richard Mayer, an expert in multimedia learning, provides one answer to this question. He proposes that we provide as much overlap as possible between text/dialog and images and limit text to short, simple phrases (Mayer, 2009).

To see some of these principles in action, consider this commercial from Groupon. In the 60-second commercial, an omniscient narrator describes a series of vignettes common of Groupon users. In this example, the voice-over seamlessly complements the actions of the vignettes rather than drawing attention away from them. Also notice how little text is shown on the screen. The only text that appears for an extended period of time is the Groupon logo and the tagline, “Check Groupon First.” (Walter Dill Scott strikes again!)

Sometimes it’s best to limit language to your core message. Take, for example, the early YouTube teaser trailers for the Star Wars film The Force Awakens. These teasers contain quick glances of the film, whereas the focus of the trailer is on the text displaying the movie title and theatrical release date. While the sights and sounds of the film might feel chaotic and all over the place, the teaser certainly carved December 2015 into memory.

To revisit our opening question, when is the investment in voice actors worth it? Voice over is most effective when you have more than 1 or 2 words paired with dynamically moving images. This is because voice overs do not force viewers to switch between reading and viewing, a behavior that has negative consequences on the memory of the ad by consuming valuable cognitive energy.

Similarly, text works better when it is not in competition with other visuals. This allows the viewer the opportunity to focus and thoroughly process the text, which ultimately forms a stronger memory of the ad. Leveraging video to tell an effective story can be incredibly powerful especially when you are aware of Scott’s command principle and the limitations of human working memory.

Jim Clinton is a research scientist, and Joe Magliano is a professor of psychology at Northern Illinois University.  They collaborate on research that examines how we process, understand, and are influenced by visual media.


Baddeley, A. (1992). Working memory. Science, 255(5044), 556-559.

Mayer, R. E. (2009). Multimedia learning. Cambridge university press.

Paivio, A. (1986). Mental representations: A dual coding approach. Oxford University Press. New York.