Pages

02 January 2015

Re-doing one of the "best infographics of 2014"

Someone at work shared this Fast Company post on the 18 best infographics of 2014. First up, the sleep schedules of 27 great minds:

Image credit: New York Times

Except... I don't think it's that great. Granted, I'm heavily influenced by the blog, Junk Charts by Kaiser Fung, and am coming off a bit of a junk chart high after hosting him as a speaker where I work last month :)

Some issues I have with this "best" infographic:

  • The polar coordinates artificially scale the durations. Check out Beethoven (toward the outside in yellowish) vs. Simenon (the very center, green). Same sleep schedule, but Beethoven looks like he hibernates every night compared to Simenon's cat nap.
  • Why color 12a - 12p in black; half of that is daylight?
  • The number of people represented with a unique color is too much to track. Who's little afternoon nap from 3-4p is represented in red? My first glance led my eye to either Milton or Amis, and for some reason I didn't see Darwin buried in there.
  • The faces are sort of a neat touch for well known characters (only recognized Franklin and Beethoven myself, but perhaps I'm just un-cultured!), but they just clutter the chart for the rest. There's also only 10/27 shown... how did they choose which were famous enough to display?
  • Adding the faces, at least I think, caused them to shift around the labels. When you see the left edge of an arc, sometimes the name is there and sometimes not. For example, I'm thinking Kant, Angelou, and Hugo are pushed to the right simply because Milton's bust is in the way. It would be easier to find the label if it were at the start of the arc vs. having to track it to a different point.
  • Similarly, why start Styron anywhere other than at the very left? The reader has to crane his head a bit more to read a vertical/slightly upside-down label vs. making as much of it as possible in almost an upright orientation.
Lastly, FastCo highlights the apparent point of the infographic:
The infographic seems to debunk the myth that geniuses stay up through the wee hours working manically, and that you're more creative when you're tired—most of these 27 luminaries got a wholesome eight hours a night.

That's not all that apparent to me when I look at the chart, primarily because you have to sort of parse out the actual sleep duration from the text (i.e. "10 p.m. - 6 a.m."), not from the arcs.

I decided to give a shot at a [granted, less pretty] re-do by entering the data into my own spreadsheet for plotting with R. At first I tried simple numerical values (sleep: 0, wake: 8 for 12-8am), but it got a bit tricky with respect to accounting for crossing midnight and shifting the axes how I wanted them. I resorted to actual date/time format. Not my favorite thing to muck around with in R, but it wasn't too bad. The data is here if you want to try yourself.

### load libraries
library(ggplot2)
library(scales)

### read in data

data <- read.csv("~/Desktop/sleep-hours-posix.csv")

### convert to POSIXct format

data$sleep_posix <- as.POSIXct(data$sleep, format = "%m/%d/%Y %H:%M:%S")
data$wake_posix <- as.POSIXct(data$wake, format = "%m/%d/%Y %H:%M:%S")

### I wanted to order the names by earliest bedtime

### the way I plotted the night/day rectangle also required a numeric
### value for each name, so I added that after ordering the names
data <- data[order(data$sleep_posix, decreasing = T), ]
data$name <- factor(data$name, levels = unique(data$name, fromLast = T))
data$id <- as.numeric(data$name)

With the data read in, here was my first plot attempt:

p <- ggplot()
p <- p + geom_segment(aes(x = sleep_posix, xend = wake_posix, y = id, yend = id,
                          group = as.factor(g)), data = data)
p <- p + geom_rect(aes(xmin = as.POSIXct("2015-01-01 22:00:00"),
                       xmax = as.POSIXct("2015-01-02 06:00:00"),
                       ymin = 0, ymax = 28),
                   fill = "black", alpha = 0.35)
p <- p + scale_x_datetime("", breaks = date_breaks("2 hours"),
                          labels = date_format("%H:%M"))
p <- p + scale_y_discrete("", limits = 1:27, breaks = 1:27,
                          labels = unique(data$name, fromLast = T))
p <- p + theme_bw()
p

And the plot:


Again, not as pretty, but I find this easier to interpret. I chose an estimate for "normal sleeping hours" to shade darker (10pm - 6am) instead of AM vs. PM. This was my attempt at picking out "odd" sleep schedules from what one might consider conventional. It's easy to see that Balzac goes to bed extremely early compared to the rest, and that Styron, Flaubert, and Fitzgerald are shifted pretty far to the right. The two nappers are also easily identified. There's no need to add the actual times to each person, and the shaded region (once you know it's 10p - 6a) and the tick spacing also gives an additional reference point vs. necessarily having to always refer to the axes. In other words, Mann is a major tick after both the shaded start and end (12 - 8a).

Or perhaps if the goal is to illustrate that "geniuses" have reasonable bedtimes and sleep durations... who cares who they actually are individually. If this is the goal of the story, perhaps the names could be written in some body text and a simple scatter plot of bedtime vs. duration could be used?

### probably not elegant, but summarizing by name and g to capture the earlier
### bedtime and both durations, and then again by just the name to add the durations
data_sum <- ddply(data, .(name, g), summarize,
                  bedtime = sleep_posix,
                  duration = wake_posix - sleep_posix)

data_sum <- ddply(data_sum, .(name), summarize,

                  bedtime = bedtime[1],
                  duration = as.numeric(sum(duration)))

### note the jittering due to overlapping data p <- ggplot()

p <- p + geom_jitter(aes(x = bedtime, y = duration), data = data_sum, size = 3,
                     position = position_jitter(width = 120, height = 0.1))
p <- p + geom_rect(aes(xmin = as.POSIXct("2015-01-01 21:00:00"),
                       xmax = as.POSIXct("2015-01-01 23:00:00"),
                       ymin = 6,
                       ymax = 9),
                   fill = "black", alpha = 0.35)
p <- p + theme_bw()
p

The plot:



The darkened area captures those with a "normalish" bedtime (9-11p) and "normalish" duration (6-9 hrs). This was just one possible choice. One could highlight the entire width for durations of x-y hrs of sleep to show that regardless of bedtime, most of these individuals got a similar duration of sleep with only one at 5hrs/night. Or perhaps one could just highlight some portion of bedtimes, which might illustrate that 1/3 - 1/2 of the individuals actually were night owls to some extent.

So, there's my attempts at putzing around with other ways to look at this data set. There's probably better ways, but I certainly don't think the linear scale and scatter plot are any worse than the original, which I don't think actually communicates the data/point at all. It's not even that pretty to me due to the hodgepodge of colors, and I think the main thing it accomplishes is looking "interesting from a distance." This is the bummer with a lot of infographics -- they are better at being a sort of piece of artwork as long as you don't start trying to look very closely. Almost like a divergent Magic Eye poster. For it to work you sort of have to stare through it. If you start looking for information in the pixels, you're in for a headache.


5 comments:

  1. I too found the Sleeping Habits infographic lacking in visual clarity. I took the scatterplot route for my alternative and a clear headline emerged: Mozart was a Machine. Here is my alternative take:

    https://public.tableausoftware.com/profile/boldmayer#!/vizhome/GeniusSleep/GeniusSleep

    -Steve

    ReplyDelete
  2. @Steve: Cool -- nice redo! I do like the names next to the points on yours in the event someone's curious about a particular person, and having seen Mozart on your plot, I can clearly pick him out from my scatterplot as well. I still think I'd prefer a y-axis vs. colors (harder to look at a color and clearly gauge the corresponding value, plus you don't need the "later bedtime" note).

    Thanks for commenting, and out of curiosity -- how did you find the post (pretty new/un-developed blog)?

    ReplyDelete
    Replies
    1. @jwhandy - great post, especially the thoroughness of your critique & the explanation of your alternate approaches. R code gets a little in the way of design critique, but your coding medium depends on your audience (R could be the right choice for your readers).

      Thanks for the feedback.

      Delete
    2. Thanks for the comment! After thinking about this more, I think you might have interpreted me asking "How did you like/think of the post?" (which is great feedback as well!). I actually wondered how you made your way to this blog/discovered this post :) It was just so recent and I've only done one other post here that I was surprised anyone actually found find it!

      Delete
    3. Nevermind... I just saw that the post made it onto Junk Charts :)

      Delete

formatting help: <i>italics</i>, <b>bold</b>