class: center, middle, inverse, title-slide # DSBA 5122: Visual Analytics ## Why R/RStudio and tidyverse? ### Ryan Wesslen ### August 26, 2019 --- class: center, middle # What is R?  --- class: center, middle # What is RStudio? <img src="../../images/slides/dsba-rintro/rstudio.gif" width="95%" style="display: block; margin: auto;" /> --- # 1. Why R / RStudio? It's free, as in ... free beer? <p align="center"> <div style="width:80%;height:0;padding-bottom:77%;position:relative;"><iframe src="https://giphy.com/embed/3o6MbkGRfcasxz9rnW" width="80%" height="80%" style="position:absolute" align="middle" frameBorder="0" class="giphy-embed" allowFullScreen></iframe></div><p><a href="https://giphy.com/gifs/season-11-the-simpsons-11x5-3o6MbkGRfcasxz9rnW">via GIPHY</a></p> </p> --- # 2. Why R / RStudio? <blockquote class="twitter-tweet tw-align-center" width="50%" data-lang="en"><p lang="en" dir="ltr">well this R=Batman, Python=Superman apparently showed up in a vendor pitch to one of my colleagues. That must make it official. <a href="https://twitter.com/hashtag/rstats?src=hash&ref_src=twsrc%5Etfw">#rstats</a> <a href="https://twitter.com/hashtag/python?src=hash&ref_src=twsrc%5Etfw">#python</a> <a href="https://t.co/pkKGB8EOHj">pic.twitter.com/pkKGB8EOHj</a></p>— JD Long (@CMastication) <a href="https://twitter.com/CMastication/status/1037486624500854784?ref_src=twsrc%5Etfw">September 5, 2018</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> --- <img src="../../images/slides/dsba-rintro/r1.png" style="display: block; margin: auto;" /> --- <img src="../../images/slides/dsba-rintro/r2.png" style="display: block; margin: auto;" /> --- <img src="../../images/slides/dsba-rintro/r3.png" style="display: block; margin: auto;" /> --- <img src="../../images/slides/dsba-rintro/r4.png" style="display: block; margin: auto;" /> --- <img src="../../images/slides/dsba-rintro/r5.png" style="display: block; margin: auto;" /> --- <img src="../../images/slides/dsba-rintro/r6.png" style="display: block; margin: auto;" /> --- <img src="../../images/slides/dsba-rintro/r7.png" style="display: block; margin: auto;" /> --- # 3. Why R / RStudio? The [`#rstats`](https://twitter.com/search?q=%23rstats) community (h/t [`@hrbrmstr`](https://rud.is/books/21-recipes/visualizing-a-graph-of-retweet-relationships.html)) ```r library(rtweet) library(igraph) library(ggraph) library(tidyverse) rt_g <- search_tweets("#rstats", n=3200) %>% # twitter api filter(retweet_count > 0) %>% # keep tweets with RT's select(screen_name, mentions_screen_name) %>% # select column from/to unnest(mentions_screen_name) %>% # unnest json filter(!is.na(mentions_screen_name)) %>% # include mentions graph_from_data_frame() # convert to ggraph format ``` --- # 3. Why R / RStudio? The [`#rstats`](https://twitter.com/search?q=%23rstats) community (h/t [`@hrbrmstr`](https://rud.is/books/21-recipes/visualizing-a-graph-of-retweet-relationships.html)) ```r # ggplot inspired networks: ggraph ggraph(rt_g, layout = 'linear', circular = TRUE) + geom_edge_arc(edge_width=0.125, aes(alpha=..index..)) + geom_node_label(aes(label=node_label, size=node_size), label.size=0, fill="#ffffff66", segment.colour="springgreen", color="slateblue", repel=TRUE, fontface="bold") + coord_fixed() + scale_size_area(trans="sqrt") + labs(title="Retweet Relationships", subtitle= subt) + theme_graph() + theme(legend.position="none") ``` --- class: middle, center <!-- --> --- # And it's just fun! <blockquote class="twitter-tweet tw-align-center" width="45%" data-conversation="none" data-lang="en"><p lang="en" dir="ltr">I can and it's HORRIFYING😬😬😬😬 <a href="https://twitter.com/hashtag/rstats?src=hash&ref_src=twsrc%5Etfw">#rstats</a> <a href="https://t.co/aMQiwva7fr">pic.twitter.com/aMQiwva7fr</a></p>— Tyler Morgan-Wall (@tylermorganwall) <a href="https://twitter.com/tylermorganwall/status/1088978382195437568?ref_src=twsrc%5Etfw">January 26, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> --- class: middle <blockquote class="twitter-tweet tw-align-center" data-conversation="none" data-lang="en"><p lang="und" dir="ltr"> <a href="https://t.co/BhS5yaeMbk">pic.twitter.com/BhS5yaeMbk</a></p>— David Schoppik (@schoppik) <a href="https://twitter.com/schoppik/status/1088983406220886016?ref_src=twsrc%5Etfw">January 26, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> --- class: middle, center # tidyverse <img src="../../images/slides/dsba-rintro/r8.png" width="60%" /> --- # What are R packages? <img src="../../images/slides/dsba-rintro/r10.png" width="100%" /> --- # What are R packages? <img src="../../images/slides/dsba-rintro/r11.png" width="100%" /> --- # What are R packages? <img src="../../images/slides/dsba-rintro/r12.png" width="100%" /> --- # What are R packages? <img src="../../images/slides/dsba-rintro/r12b.png" width="100%" /> --- # How to install and run packages: <img src="../../images/slides/dsba-rintro/r13.png" width="100%" /> --- # How to install and run packages: <img src="../../images/slides/dsba-rintro/r14.png" width="100%" /> -- <img src="../../images/slides/dsba-rintro/r15.png" width="100%" /> --- # How to install and run packages: <img src="../../images/slides/dsba-rintro/r16.png" width="100%" /> --- # How to install and run packages: <img src="../../images/slides/dsba-rintro/r17.png" width="100%" /> --- # Data science workflow <img src="../../images/slides/dsba-rintro/r18.png" width="100%" /> --- # tidy data <img src="../../images/slides/dsba-rintro/r19.png" width="100%" /> > “Tidy datasets are all alike but every messy dataset is messy in its own way.” – Hadley Wickham --- # tidy data: "pivoting" <img src="../../images/slides/dsba-rintro/r20.png" width="100%" /> --- # Core packages: dplyr, ggplot2, piping <img src="../../images/slides/dsba-rintro/r21.png" width="90%" style="display: block; margin: auto;" /> --- # Core packages: dplyr, ggplot2, piping <img src="../../images/slides/dsba-rintro/r22.png" width="90%" style="display: block; margin: auto;" /> --- # Core packages: dplyr, ggplot2, piping What are the top 3 most popular years for males born "Taylor"? ```r # Load the data library(babynames) data(babynames) ``` .pull-left[ ```r df <- filter(babynames, sex=="M") df <- filter(df, name=="Taylor") df <- arrange(df, desc(n)) df <- select(df, year, n, prop) head(df, n=3) ``` ``` ## # A tibble: 3 x 3 ## year n prop ## <dbl> <int> <dbl> ## 1 1992 8240 0.00393 ## 2 1991 7967 0.00376 ## 3 1993 7688 0.00372 ``` ] .pull-right[ ```r babynames %>% filter(sex=="M") %>% filter(name=="Taylor") %>% arrange(desc(n)) %>% select(year, n, prop) %>% head(n=3) ``` ``` ## # A tibble: 3 x 3 ## year n prop ## <dbl> <int> <dbl> ## 1 1992 8240 0.00393 ## 2 1991 7967 0.00376 ## 3 1993 7688 0.00372 ``` ] --- class: middle, center # Happy R programming! <iframe src="https://giphy.com/embed/JIX9t2j0ZTN9S" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/JIX9t2j0ZTN9S"></a></p>