Dan Gilbert is one of the world's most famous psychologists.  I'm a big fan of his work and his teaching, and though I believe he's a great communicator and intelligent synthesizer, some of the studies he has quoted in his older work is haven't held up over time.  Below are a few examples of citations I've checked because they were important enough to me to understand more thoroughly.

I really enjoyed Dan Gilbert's Stumbling on Happiness; Gilbert is a talented and well-read author, and his work stands out as an interesting synthesis of many ideas.   It's difficult to trust his narrative, though, when some of it seems based on misleading (and I think important) misreporting about the hedonic setpoint and the role of agency in our happiness.

The 2004 Ted Talk

Dan Gilbert's 2004 Ted Talk has been viewed almost 20 million times.  It has powerfully shaped the public discourse on the topic.   In a follow-up interview, he owned up to a couple (forgivable) mistakes in the talk, which he deserves to be lauded for.  

In my opinion, the only mistake of substance  is this: paraplegics are not as happy as controls or lottery winners after their accident.  This mis-statement was based on 'Lottery Winners and Accident Victims: Is Happiness Relative?', a 1978 study of 22 lottery winners ($50k - $1 million), 22 controls, and 29 paralyzed accident victims.

It's important to note that this study was one survey in which they simply asked these small groups of people how happy they were "at this stage of their life" (not in the moment), how happy there were before the event, and how happy they expect to be in the future.  This survey was done 1-month to 1-year after the event– they tried to catch people right after it occurred (not after they'd had time to process and adapt to it).    

Moreover, most of the lottery winners didn't win 'life-changing money' (only 23% reported lifestyle changes).  The ratings of life-satisfaction in that moment were 4.0, 3.8, and 3.0  (on a 6-point scale) for lottery winners, controls, and paraplegics (paraplegics were significantly worse).

The study also measures lottery-ticket buyers vs non-lottery ticket buyers to see if they have different baselines in happiness, which came out to 3.8 vs 4.0 (lottery ticket buyers have the lower average happiness).  With these small sample sizes, differences of a few tenths of a point on this six point scale are not meaningful.

This is a poorly powered study that gets quoted a lot to justify the idea of a 'hedonic setpoint' or 'hedonic adaptation'– the idea we consistently will find ourselves feeling moderately happy, regardless of our situation.  

Its sample size is small.  While everyone seem clustered around 'slightly happy' regardless of condition, it's unclear whether this is evidence of a 'hedonic setpoint' as opposed to a social desirability bias or sampling bias. (A small subset of people were willing to be interviewed; they're likely the most public lottery winners and the best psychologically performing paraplegics; and everyone wants to project that they are doing relatively well.)  The biggest clear effect seems to be a nostalgia boost for when recalling the past for paraplegics.

While Gilbert's misquote of the results of the paper are forgivable, more concerning is the plot that Dan Gilbert shows in his TED Talk while stating 'this is the real data' , which shows paraplegics and lottery winners both around 50/70  happiness points 'one year later' (with paraplegics given a slight edge).  This plot bares no resemblance whatsoever to the original data (as Jason Collins has pointed out).

The gist of his points– (1) that by nature we are content, and will hedonically adapt to most circumstances, and (2) that we severely overestimate the impact of individual events on our happiness– is accurate.  There has been quite a body of research on this topic that bares out a soft interpretation of this idea.  Unfortunately, the example he chose to cite– individuals after a severe, long term disability– is one clear example where hedonic setpoints don't apply.  From 'Beyond the Hedonic Treadmill: Revising the adaptation theory of well-being':  

Lucas (2005a) used two large, nationally representative panel studies to examine adaptation to the onset of disability. Participants in this study (who were followed for an average of seven years before and seven years after onset) reported moderate to large drops in satisfaction and very little evidence of adaptation over time. For instance, those individuals who were certified as being 100% disabled reported life satisfaction scores that were 1.20 standard deviations lower than their nondisabled baseline levels.

Stumbling on Happiness

Gilbert's book shares common pitfalls with all of the psychological literature of the time– it cites underpowered research on effects which we now know don't replicate, like the widely discredited social priming research of John Bargh.  In Chapter 9 he discusses how we are susceptible to subconscious primes– words flashed on a screen that alter our behavior, even though we don't consciously see them (including the 'old words make people walk slower' experiment– one of the most famous casualties of the replication crisis).  This should be common knowledge at this point, but just to reiterate– no credible evidence of subconscious priming has ever been collected.

That mistake is very forgivable– almost everyone writing in this era had improper faith in Bargh's work.   Unfortunately, Gilbert very publicly doubled down in defence of social psychology in a Science editorial when the replication crisis was first breaking.  This decision resulted in a lengthy and public back and forth between Gilbert and Nosek (the man behind the Open Science Collaboration)– a back and forth which at times got a little heated (i.e on twitter).

The dust of this debate has settled, with almost all statistically strong observers landing squarely against Gilbert (for example Andrew Gelman, Sanjay Srivastava, Daniel Lankins, and Ulrich Schimmack).  The Cut and Retraction Watch provide good summaries of the nuance between the sides.  With the benefit of hindsight, it seems Gilbert's attempts to defend (now debunked) social psychology research was ill-advised.

Agency and Happiness

In a section on the importance of agency, Gilbert quotes this study (by Elizabeth Langer):

In one study, researchers gave elderly residents of a local nursing home a houseplant. They told half the residents that they were in control of the plant’s care and feeding (high-control group), and they told the remaining residents that a staff person would take responsibility for the plant’s well-being (low-control group). Six months later, 30 percent of the residents in the low-control group had died, compared with only 15 percent of the residents in the high-control group.

The shape of this anecdote should raise flags for anyone– it is pretty unbelievable that not having control of watering a houseplant– independent of all other influences on your agency in your life– makes you're 15% more likely to die. In this example, participant age ranged from 65 to 90; and this difference (6 deaths out of 45 people) doesn't control for age or health status prior to the intervention.  

While self- and nurse-reported surveys indicate there may have been some real health differences– relative to baseline– between the two groups, this stark mortality statistic actually calls more plausible differences between the two groups (based on questionnaires) into question.  If an additional six participants died in the control arm, it is likely that their health and age deteriorated more precipitously during the study for reasons other than the house plant intervention.

Gilbert uses a second, unrelated study to reiterate his point about the importance of control:

Residents in the high-control group were allowed to control the timing and duration of the student’s visit (“Please come visit me next Thursday for an hour”), and residents in low-control group were not (“I’ll come visit you next Thursday for an hour”). After two months, residents in the high-control group were happier, healthier, more active, and taking fewer medications than those in the low-control group.

Gilbert's summary unfortunately misrepresents the original research;  here's a quote from the original study, which looked at four groups (the 'controllable' group– with agency over the visit schedule– and 'predictable' group without it, as well as two groups unmentioned in the above anecdote– a 'variable' group that didn't know when they would be visited, and a no-treatment group without additional visitors):

Subjects for whom the visits were predictable or controllable were consistently and significantly superior on indicators of physical and psychological status when compared to subjects who were visited on a random schedule or who received no visits. No significant differences were found between the predict and control groups or between the random and no treatment groups, suggesting that the positive outcome of the predict and control groups is attributable to predictability alone.

In fact, there was no difference between the groups Gilbert describes– those with agency vs. those without it.  The only thing that mattered was knowing when a visit would occur.  Perhaps other features are important to happiness, like (1) having future plans, (2) the knowledge that someone values you enough to set aside time to see you, and  (3) looking forward to the visit.  This study suggests agency isn't.

Gilbert doubles down on this misinterpretation:

...the researchers concluded their study and discontinued the student visits. Several months later they were chagrined to learn that a disproportionate number of residents who had been in the high-control group had died. Only in retrospect did the cause of this tragedy seem clear. The residents who had been given control, and who had benefited measurably from that control while they had it, were inadvertently robbed of control when the study ended. Apparently, gaining control can have a positive impact on one’s health and well-being, but losing control can be worse than never having had any at all.

In terms of mortality, the study actually states:

Two persons in the predict group and one person in the control-enhanced group died prior to the 24-month follow-up. A fourth person, also in the control-enhanced group, died between the 30- and 42-month follow-up.

Two people who knew (and could predict) when people would come, without agency, passed away; two people who were given agency also passed away.  No participants in the no treatment or random visit groups passed away.  

A disproportionate number of 'in-control' residents did not perish compared to the other groups, though four people with some intervention did.  The real question is whether any intervention at all (not agency) led to poor outcomes.

While the follow-up notes a lower average health status of the intervention groups compared to unvisited participants (for both the 'in-control' group as well as the predictable schedule group), it was a 1 out of 9 point difference.  Moreover, this 1 point health difference is between small groups of elderly people with highly variable, rapidly changing situations.

The study authors' conclude that these interventions don't necessarily have long-term positive effects (in opposition to previous findings by Langer), and that there is an ethical imperative to evaluate long-term risks of study withdrawal.  

From an ethical perspective, the critical question raised by this research is, Did the termination of the study actually harm the participants? While subjects in the enhanced groups did drop below baseline on both indicators, the analysis showed that these differences were only marginally significant. This is suggestive, but hardly compelling.

They conclude that this type of experimental work should continue because the real risk to the elderly is provably small– no one actually died (or even experienced statistically important long term effects) at the hands of this intervention.  This is the exact opposite conclusion from the one that Gilbert implies in his book.

Stumbling Through Happiness implies that an intervention that modulates your control over watering a houseplant can lead to a 15% increase likelihood of death; it then mis-reports the results of another study to suggest a difference in happiness and mortality statistics that did not exist, to support a claim that is in direct opposition to that study's (well-stated) conclusions.  In this case, the data cited to support the importance of agency for health and happiness seems to do the reverse.

Alexithymia Study

In another section of his book, Gilbert discusses alexithymics– people who have trouble distinguishing their own emotion:

When alexithymics are asked what they are feeling, they usually say, “Nothing,” and when they are asked how they are feeling, they usually say, “I don’t know.” Alas, theirs is not a malady that can be cured by a pocket thesaurus or a short course in word power, because alexithymics do not lack the traditional affective lexicon so much as they lack introspective awareness of their emotional states. They seem to have feelings, they just don’t seem to know about them. For instance, when researchers show volunteers emotionally evocative pictures of amputations and car wrecks, the physiological responses of alexithymics are indistinguishable from those of normal people. But when they are asked to make verbal ratings of the unpleasantness of those pictures, alexithymics are decidedly less capable than normal people of distinguishing them from pictures of rainbows and puppies.

Unfortunately, the cited research was conducted on 57 normal undergraduates, who were split into high and low 'alexithymic' groups based on their median score on a test known as the Toronto Alexithymia Scale (TAS).  Given what we know about people with difficulty processing emotion, we'd expect true (probably quite mild) alexithymics to make up no more than 6 of the 57 participants .  While Gilbert accurately quotes the authors' interpretation of their findings, the findings themselves are exceedingly weak.

These 57 undergraduates were shown pleasant and unpleasant pictures, and while viewing them, loud noise bursts were introduced to invoke a startle blink.  The intensity of their blink is the measured physiological response.  They then ranked each picture for pleasantness on a scale of 1 to 6 (they were not asked to find words to describe the stimuli).  

This study has a small sample size, doesn't compare substantially alexithymic people to healthy controls, and doesn't show a meaningful difference in ability to assign correct emotion labels to images or blink amplitudes.  

The included graphs show standard errors instead of a confidence intervals– a more accurate sense of the range of expected error would roughly double the size.  While they correctly use repeated measures ANOVA for the analysis, the differences we're looking at stem from very small variations– a difference of 5.3 and 5.4 when ranking pleasant images out of a scale up to 6 (from 'very pleasant to very unpleasant'), and a difference of 2.2 to 1.9 when rating unpleasant images (with confidence intervals of roughly 0.2):

Figure 2 from Affective Reactions in the Blink of an Eye: Individual Differences in Subjective Experience and Physiological Responses to Emotional Stimuli. Note that the errors are standard errors, so we should double them in size to get a 95% confidence interval. The 'Alexithymic Group' seems very capable of distinguishing the stimuli.

User rankings might artificially reduce variance for purely numerical reasons– we've topped and bottomed out the scales (many people are ranking things as 'very unpleasant', the most unpleasant ranking).  Some participants might have attempted to spread their rankings over the pleasant and unpleasant slides to emphasize differences, while others might simply default to a binary extreme.  The study design also seems to socially reinforce extreme rankings, by telling the undergrads whether others ranked a picture 'pleasant' or 'unpleasant' after their ranking in between each trial; it may be that the more 'emotional' group was slightly more responsive to the suggestion.

Even with these caveats, there is almost no difference between groups.  The authors found no main effect of alexithymia on affective rankings, only an interaction effect (which means there is no real, interpretable relationship here).

The study that Gilbert describes as showing physiological similarity and verbal discontinuity between alexithymics and controls does not appear to support that claim.   Almost no one in the sample was alexithymic, and the 'measurable differences' are very unconvincing upon closer inspection.  This study has not captured evidence of definitive changes in physiological and semantic emotional processing; certainly the group that is called 'alexithymic' here had no trouble identifying the emotional valence of the images and rating them properly (with no statistically significant difference compared to their 'more emotional' colleagues).

Gilbert gets a pass here since the authors of the paper themselves overstate the data in their discussion.  It also may well be true that alexithymics do retain their physiological emotional responses, and simply struggle to interpret that interoceptive state.  (In fact, it's likely true for at least some of the many people that struggle with emotional processing.)  This citation, however, feels like a point in search of evidence, and Gilbert's book would be better without it.

Anchoring

Anchoring is a real phenomena, but this explanation from Stumbling on Happiness misses the mark:

...consider a study in which volunteers were asked to guess how many African countries belonged to the United Nations. Rather than answering the question straightaway, the volunteers were asked to make their judgments by using the flip-then-flop method. Some volunteers were asked to give their answer by saying how much larger or smaller it was than ten, and other volunteers were asked to answer by saying how much larger or smaller it was than sixty. In other words, volunteers were given an arbitrary starting point and were asked to correct it until they reached an appropriate ending point...  Volunteers who started with ten guessed that there were about twenty-five African nations in the U.N., whereas volunteers who started with sixty guessed that there were about forty-five. Why such different answers? Because volunteers began their task by asking themselves whether the starting point could be the right answer, and then, realizing that it could not, moved slowly toward a more reasonable one (“Ten can’t be right. How about twelve? No, still too low. Fourteen? Maybe twenty-five?”).  Alas, because this process requires time and attention, the group that started with ten and the group that started with sixty got tired and quit before they met in the middle. This really isn’t so strange. If you asked a child to count upward from zero and another child to count downward from a million, you could be pretty sure that when they finally got exhausted, gave up, and went off in search of eggs to throw at your garage door, they would have reached very different numbers. Starting points matter because we often end up close to where we started.

Gilbert uses this to argue that we frequently reason by thinking of a specific, concrete exemplar, and then trying to correct for it given contextual information (which is absolutely valid).  His argument that people run out of attention as they slowly count up or down from a starting point seems exaggerated.  People with longer attention spans don't deterministically drift further from the anchor.

If you ask someone in the real world 'How many African countries are in the U.N.?', they will either (1) tell you they have no idea, or (2) assume you're playing a trivia game with the reasonable expectation that they have no idea, and simply guess.

When playing this kind of game, utilizing the anchor is a great strategy– you should assume they picked a number that is reasonable to make the question harder to answer.  You'd be a fool not to revise your a priori guess (which you know is poor) toward the anchor point.  

You'd also be a fool to make a real decision based on this guess, and no one would.  In normal life you'd just look it up.  Anchoring research is full of examples of these kinds of reasoning mistakes, that only really occur when you constrain people by taking away their ability to access trustworthy information to act as their own anchors.  When you take this ability away, people treat it like the game that it is.

Fin

Gilbert is a great author and his summaries and anecdotes are truly enlightening.  Unfortunately, there are a few places where data has been improperly interpreted.  Like much of the pop-psychology world, Stumbling through Happiness requires a thorough read of sources and citations to interpret it correctly.