Multidimensional & Dimensionality Reduction

Materials for class on Monday, October 14, 2019

Contents

Announcements

  1. Quiz 2 will be available tomorrow morning. You have until 5:59pm, Monday October 21 to complete. Canvas link to Quiz 2

  2. Tomorrow, RStudio Engineer Davis Vaughn (SpeakerDeck) will be presenting for the Charlotte RMeetup Group tomorrow from 12-1pm at CenterCity 1102. Remember, if you attend (must attend the full hour), per the Extra Credit Policy, you will receive 1% extra credit for attending. I will take attendance after the session.

  3. RMeetup Group for optional final project presentation in December meeting. We had 21 students respond with “Yes”. Therefore, I’m going to book our class as presenting up to five group projects for the last RMeetup Group. You don’t have to do anything, simply mark your calendars for Tuesday, December 17 from 12-1pm at CenterCity.

  4. DataCamp 3 is due in 3 weeks (November 5) DataCamp no longer offers Intro to Shiny so have two options, which you’ll need to report on Canvas. Please see page for details.

Tweet of the day

Yan has an excellent visualization decision tree along with the r-graph gallery.com, python-graph-gallery.com, and even d3-graph-gallery.com.

Slides:

full screen / pdf version

Lab: Text and t-SNE

  1. We’ll run text application in this RStudio.cloud project. This uses reticulate package, allowing python packages.

  2. For this data set, we’ll use a sample of the CFPB Public Complaints data set. We’ll just use a sample of credit reporting (product) issues.

  3. We’ll use a technique called t-SNE to represent the text dataset. Wattenberg, et al., “How to Use t-SNE Effectively”, Distill, 2016. http://doi.org/10.23915/distill.00002

  4. TensorFlow Projector Dimensionality Reduction on Word Embeddings: https://projector.tensorflow.org/

For more information on basics of text, see Chapter 1: Tidytext format and Chapter 3: Analyzing word and document frequency: tf-idf in Julia Silge and David Robinson, Tidy Text Mining in R