Skip to content

Latest commit

 

History

History
 
 

week1

This week covers:

  • An intro to Git and Github for sharing code
  • Command line tools
  • R and the Tidyverse

Day 1

Setup

Install tools: Ubuntu on Windows, GitHub for Windows, R, and RStudio

Ubuntu on Windows

  • Open http://aka.ms/wslstore and select Ubuntu on Windows
  • If this seems like it's hanging, hit enter
  • Create a username and password
  • Updates all packages with sudo apt-get update and sudo apt-get upgrade

Git / GitHub for Windows

  • Check that you have git under bash by typing git --version in the terminal
  • Install GitHub for Windows

R and RStudio

  • Download and install R from a CRAN mirror
  • Download and install RStudio
  • Open RStudio and install the tidyverse package, which includes dplyr, ggplot2, and more: install.packages('tidyverse', dependencies = T)

Text editor

  • You'll need a plain text editing program
  • If you are familiar with emacs or vim, you can install them in Ubuntu with sudo apt-get install emacs or sudo apt-get install vim
  • Otherwise consider Visual Studio Code, Atom, or Sublime
  • Check your editor's settings for unix-friendly line endings

Filesystem setup

  • Files that you create in Ubuntu on Windows get stored in a somewhat hidden location within the Windows filesystem
  • To make it easier to find files you work on in Ubuntu, do the following:
    • Open a bash shell
    • Go to your home directory: cd ~
    • Create a symbolic link to your Documents folder: ln -s /mnt/c/Users/<your name>/Documents ~/Documents (if there's a space in your name you'll need to backslash escape it, a good tip here is to type just the first couple of letters of, say, your name, and use tab to autocomplete it)
    • Change to this directory: cd ~/Documents
    • Do all of your work, including the following section, from within this folder, which you'll be able to see under "Documents" in the Windows Explorer

Intro to Git(Hub)

Make your first commit and pull request

  • Sign up for a free GitHub account
  • Then follow this guide to fork your own copy of the course repository
  • Clone a copy of your forked repository, which should be located at https://github.com/<yourusername>/coursework.git, to your local machine
  • Once that's done, create a new file in the week1/students directory, <yourfirstname>.txt (e.g., jake.txt)
  • Use git add to add the file to your local repository
  • Use git commit and git push to commit and push your changes to your copy of the repository
  • Then issue a pull request to send the changes back to the original course repository
  • Finally, sync changes from the main repo to your fork with git pull upstream master (if your machine doesn't recognize upstream, do the following to create the upstream shortcut: git remote add upstream https://github.com/msr-ds3/coursework.git)

Learn more (optional)

Extra

Think about how to write a musical_pairs.sh script to determine your programming partner each day. We want the script to do the following:

  • Produce a (pseudo)random pairing of 6 groups of 2 people who get to work together each day on pair programming assignments
  • Any one of us should be able to run the script and get the same pairing on a given day (i.e., as long as our computers agree on the year/month/day)
  • It's interesting to think about how we might avoid repeated pairs from one day to the next, but for a first cut (and maybe final cut) version of the script you can ignore that issue

Day 2

Intro to the Command Line

Learn more (optional)

Command line exercises

  • Pull changes from the msr-ds3/coursework repo: git pull upstream master
  • Use the download_trips.sh file to download Citibike trip data by running bash download_trips.sh or ./download_trips.sh
  • Fill in solutions under each comment in citibike.sh using the 201402-citibike-tripdata.csv file

Save your work

  • Make sure to save your work and push it to GitHub. Do this in three steps:
    1. git add and git commit and new files to your local repository. (Omit large data files.)
    2. git pull upstream master to grab changes from this repository, and resolve any merge conflicts, commiting the final results.
    3. git push origin master to push things back up to your GitHub fork of the course repository.
  • Finish by submitting a pull request with your solutions so we can review them!

Day 3

Intro to R

R counting exercises

Learn more

Day 4

Plotting

Plotting exercises

  • Read chapter 3 of the online edition of R for Data Science and do the following exercises:
    • Section 3.3.1, exercises 1, 2, and 3
    • Section 3.5.1, exercises 1 and 4
    • Section 3.6.1, exercises 5 and 6
    • Section 3.8.1, exercises 1 and 2
  • Citibike plots
    • Modify and use the download_trips.sh script to grab all months of trip data from 2014
    • Run the load_trips.R file to generate trips.RData
    • Write code in plot_trips.R to create visualizations using trips.RData

Learn more