Multidimensional & Dimensionality Reduction
Materials for class on Monday, October 14, 2019
Contents
Announcements
Quiz 2 will be available tomorrow morning. You have until 5:59pm, Monday October 21 to complete. Canvas link to Quiz 2
Tomorrow, RStudio Engineer Davis Vaughn (SpeakerDeck) will be presenting for the Charlotte RMeetup Group tomorrow from 12-1pm at CenterCity 1102. Remember, if you attend (must attend the full hour), per the Extra Credit Policy, you will receive 1% extra credit for attending. I will take attendance after the session.
RMeetup Group for optional final project presentation in December meeting. We had 21 students respond with “Yes”. Therefore, I’m going to book our class as presenting up to five group projects for the last RMeetup Group. You don’t have to do anything, simply mark your calendars for Tuesday, December 17 from 12-1pm at CenterCity.
DataCamp 3 is due in 3 weeks (November 5) DataCamp no longer offers Intro to Shiny so have two options, which you’ll need to report on Canvas. Please see page for details.
Tweet of the day
🍾🍾 New project in the family! 🍾🍾
— Yan Holtz (@R_Graph_Gallery) February 19, 2019
->The https://t.co/tKJaXKC2qm
A gallery of 200 Simple charts made with d3.js, with reproducible, commented & editable code. #dataviz pic.twitter.com/m19BKxiNe5
Yan has an excellent visualization decision tree along with the r-graph gallery.com, python-graph-gallery.com, and even d3-graph-gallery.com.
Slides:
- Mike Bostock (bl.ocks.org), “Nutrient Parallel Coordinates”
Lab: Text and t-SNE
We’ll run text application in this RStudio.cloud project. This uses
reticulate
package, allowing python packages.For this data set, we’ll use a sample of the CFPB Public Complaints data set. We’ll just use a sample of credit reporting (product) issues.
We’ll use a technique called t-SNE to represent the text dataset. Wattenberg, et al., “How to Use t-SNE Effectively”, Distill, 2016. http://doi.org/10.23915/distill.00002
TensorFlow Projector Dimensionality Reduction on Word Embeddings: https://projector.tensorflow.org/
For more information on basics of text, see Chapter 1: Tidytext format and Chapter 3: Analyzing word and document frequency: tf-idf in Julia Silge and David Robinson, Tidy Text Mining in R