



Author of the book: Interactive web-based data visualization with R, plotly, and shiny.
Maintainer of the following R packages: plotly, LDAvis, thematic, bootstraplib, shinymeta.
Also a regular contributor to: shiny, rmarkdown, knitr, etc.
PhD in statistics at Iowa State University

Any graph can be broken down into the following components:
As a ggplot2 user, all you really need to provide is 1, 2, and 3. Everything thing else has smart defaults.
Helps minimize the cognitive burden, especially during the iteration phase.
R comes with some useful toy datasets (e.g., mtcars):
#> # A tibble: 32 x 5#> name wt mpg am cyl#> <chr> <dbl> <dbl> <chr> <dbl>#> 1 Mazda RX4 2.62 21 manual 6#> 2 Mazda RX4 Wag 2.88 21 manual 6#> 3 Datsun 710 2.32 22.8 manual 4#> 4 Hornet 4 Drive 3.22 21.4 automatic 6#> 5 Hornet Sportabout 3.44 18.7 automatic 8#> 6 Valiant 3.46 18.1 automatic 6#> 7 Duster 360 3.57 14.3 automatic 8#> 8 Merc 240D 3.19 24.4 automatic 4#> 9 Merc 230 3.15 22.8 automatic 4#> 10 Merc 280 3.44 19.2 automatic 6#> # … with 22 more rowslibrary(ggplot2)
ggplot(mtcars) +
geom_point(mapping = aes(x = wt, y = mpg))

library(ggplot2)
ggplot(mtcars) +
geom_point(mapping = aes(x = wt, y = mpg, color = am))

library(ggplot2)
ggplot(mtcars) +
geom_point(mapping = aes(x = wt, y = mpg, color = am))

library(ggplot2)
ggplot(mtcars) +
geom_point(mapping = aes(x = wt, y = mpg, color = am))

library(ggplot2)
ggplot(mtcars) +
geom_point(aes(x = wt, y = mpg, color = am)) +
scale_color_manual("Transmission", values = c(automatic="blue", manual="red"))

library(ggplot2)
ggplot(mtcars) +
geom_point(aes(x = wt, y = mpg, color = am)) +
scale_color_brewer("Transmission", type = "qual")

library(ggplot2)
ggplot(mtcars) +
geom_point(mapping = aes(x = wt, y = mpg, color = am, shape = am))

aes(): set property without scalinglibrary(ggplot2)
ggplot(mtcars) +
geom_point(mapping = aes(x = wt, y = mpg, color = am, shape = am), size = 4)

aes(): set property with scalinglibrary(ggplot2)
ggplot(mtcars) +
geom_point(mapping = aes(x = wt, y = mpg, color = am, shape = am, size = hp))

ggplot()library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg, color = am)) +
geom_point() +
geom_smooth()

ggplot()library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg, color = am)) +
geom_point(aes(shape = am), size = 3) +
geom_smooth(aes(linetype = am))

library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg, color = am)) +
geom_point(aes(shape = am), size = 3) +
geom_smooth(aes(linetype = am), method = "lm", se = FALSE)

library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg, color = am)) +
geom_point(aes(shape = am), size = 3) +
geom_smooth(aes(linetype = am), method = "lm", se = FALSE) +
facet_wrap(~cyl)

library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg, color = am)) +
geom_point(aes(shape = am), size = 3) +
geom_smooth(aes(linetype = am), method = "lm", se = FALSE) +
facet_wrap(~paste("Cylinders:", cyl))

library(ggplot2)
ggplot(mtcars, aes(x = mpg, color = am)) +
geom_density() +
facet_wrap(~paste("Cylinders:", cyl))

library(ggplot2)
ggplot(mtcars, aes(x = mpg, color = factor(cyl))) +
geom_density() +
facet_wrap(~am)

library(plotly)ggplotly() # picks up on the previously printed ggplot
library(plotly)p <- ggplot(mtcars, aes(x = wt, y = mpg, color = am)) + geom_point() + geom_smooth(method = "lm", se = FALSE)ggplotly(p)
library(plotly)
p <- ggplot(mtcars, aes(x = wt, y = mpg, color = am)) +
geom_point(aes(text = name)) +
geom_smooth(method = "lm", se = FALSE)
ggplotly(p, tooltip = "text")
last_plot() %>% style(hoverlabel = list(bgcolor = "white"), hoverinfo = "x+y+text") %>% layout( xaxis = list(showspikes = TRUE), yaxis = list(showspikes = TRUE) )
ggplotly()? Try plot_ly()!plot_ly() is a more "direct" interface to the underlying plotly.js (JavaScript) library.
plot_ly(mtcars) %>% add_markers(x = ~wt, y = ~mpg, color = ~am)

plot_ly(): also inspired by grammar of graphicsFocus on 3 key aspects: Data, Mappings, and Geoms.
plot_ly(mtcars) %>%
add_markers(x = ~wt, y = ~mpg, color = ~am)

plot_ly(): embraces the pipeTo add to (or modify) a plotly object, use %>% instead of +
plot_ly(mtcars) %>%
add_markers(x = ~wt, y = ~mpg, color = ~am)

Use multiple perceptual channels (i.e., color, symbol, linetype) to distinguish groups.
plot_ly(mtcars) %>%
add_markers(x = ~wt, y = ~mpg, color = ~am, symbol = ~am)

toWebGL() (also works with ggplotly())plot_ly(diamonds) %>%
add_markers(x = ~carat, y = ~price) %>%
toWebGL()

toWebGL() changes rendering to HTML Canvas instead of SVG. The difference is similar to using png() instead of pdf() for static plots (lower-quality, but way more scalable).
plot_ly(diamonds) %>%
add_markers(x = ~carat, y = ~price, alpha = 0.1) %>%
toWebGL()

plot_ly(diamonds) %>%
add_histogram2d(x = ~carat, y = ~price)

For "heavy-tailed" distributions, it can be useful to perform the summary (e.g., log counts) in R yourself. For more on this, see https://plotly-r.com/frequencies-2d
Go to our RStudio Cloud project, and open the exercise.R script. Walk through the code by pressing Ctrl+Enter (Cmd+Enter on Mac) and answer the questions.
Feel free to send me a message through the Teams chat if you have questions and/or you're finished.
10:00
logs <- cranlogs::cran_downloads( c("plotly", "leaflet", "ggvis", "networkD3", "rbokeh"), from = Sys.Date() - 365, to = Sys.Date())logs
# A tibble: 1,830 x 3 date count package <date> <dbl> <chr> 1 2019-04-21 2676 plotly 2 2019-04-22 4549 plotly 3 2019-04-23 5912 plotly 4 2019-04-24 5368 plotly 5 2019-04-25 5222 plotly 6 2019-04-26 4903 plotly 7 2019-04-27 3151 plotly 8 2019-04-28 2982 plotly 9 2019-04-29 4961 plotly 10 2019-04-30 5544 plotly # … with 1,820 more rowsskimr::skim(logs)
── Data Summary ──────────────────────── ValuesName logs Number of rows 1830 Number of columns 3 _______________________ ── Variable type: character ─────────────────────────────────────────────────────────────── skim_variable n_missing complete_rate min max empty n_unique whitespace1 package 0 1 5 9 0 5 0── Variable type: Date ──────────────────────────────────────────────────────────────────── skim_variable n_missing complete_rate min max median n_unique1 date 0 1 2019-04-21 2020-04-20 2019-10-20 366── Variable type: numeric ───────────────────────────────────────────────────────────────── skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist 1 count 0 1 1441. 1787. 0 211 573 2229 8690 ▇▂▁▁▁plot_ly(logs) %>% add_lines(x = ~date, y = ~count, color = ~package)

logs$weekly_avg <- zoo::rollapply(logs$count, 7, mean, fill = "extend")plot_ly(logs) %>% add_lines(x = ~date, y = ~weekly_avg, color = ~package)

plot_ly(logs) %>% add_lines(x = ~date, y = ~weekly_avg, color = ~package) %>% layout(yaxis = list(type = "log"), hovermode = "compare")

subplot(shareX = TRUE, nrows = 2, plot_ly(logs) %>% add_heatmap(x = ~date, y = ~package, z = ~weekly_avg), plot_ly(logs) %>% add_lines(x = ~date, y = ~weekly_avg, color = ~package))

These questions drive at least two influential papers:
This figure is from Data Visualization for Social Science (highly recommended!) in reference to Bostock and Heer.
Figure from Heer and Bostock (2010)

(Especially if we place "similar" packages near one another, which is easy thanks to heatmaply!)
Hadley Wickham
Imagine having many panels of scatterplots to sift through. If we attach numerical summaries to each (e.g., slope, intercept, etc), we could use that to inform which panels to view
These are nine scagnostics (scatterplot-cognostics) measures from (Wilkinson and Wills, 2008). Same concept can be applied to time series (see tsfeatures package).
library(trelliscopejs)library(plotly)ggplot(logz) + geom_line(aes(date, weekly_avg)) + facet_trelliscope(~package, as_plotly = TRUE)
facet_trelliscope() makes it super easy to work around the "too many panels" issue of facet_wrap().
This automatically computes some sensible cognostics (i.e., mean, median, variance, etc).
See here to learn how to customize the cognostics (and graphs).
See here for how I implemented the previous slide.
Open and run the trelliscope.R script on RStudio Cloud.
Sort the panels (i.e., countries) by highest/lowest mean life expectancy.
Think of how trelliscopejs might be useful for exploring your own data project.
Try to implement your idea, either on Cloud or locally:
install.packages("trelliscopejs")15:00
Hadley Wickham
Statistical graphics perspective on "big data viz".
Ben Shneiderman
Information visualization perspective on "big data viz".
Image from Rob Hyndman's lecture on "Visualisation of big time series data"
See more about this data and analysis https://github.com/cpsievert/pedestrians
The basics of dplyr (about 30-60 minutes):
The basics of shiny:

Keyboard shortcuts
| ↑, ←, Pg Up, k | Go to previous slide |
| ↓, →, Pg Dn, Space, j | Go to next slide |
| Home | Go to first slide |
| End | Go to last slide |
| Number + Return | Go to specific slide |
| b / m / f | Toggle blackout / mirrored / fullscreen mode |
| c | Clone slideshow |
| p | Toggle presenter mode |
| t | Restart the presentation timer |
| ?, h | Toggle this help |
| Esc | Back to slideshow |