Skip to content

Latest commit

 

History

History
188 lines (107 loc) · 7.56 KB

syllabus.md

File metadata and controls

188 lines (107 loc) · 7.56 KB

Course Title

Obtaining data


Course Instructor(s)

Jeff Leek


Course Description

Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, and from colleagues in various formats including raw text files, binary files, and databases. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.


Course Content

  • Data collection
    • Raw files (.csv,.xlsx)
    • Databases (mySQL,MongoDB)
    • APIs
  • Data formats
    • Flat files (.csv,.txt)
    • XML
    • JSON
  • Making data tidy
  • Distributing data
  • Scripting for data cleaning

Lecture Materials

Lecture videos will be released weekly and will be available for the week and thereafter. You are welcome to view them at your convenience. Accompanying each video lecture will be a PDF copy of the slides and a link to an HTML5 version of the slides.


Weekly quizzes

Quiz 1

Assigned: Class open (1st of Month) Due: 7th of the Month 12:00 AM UTC

Quiz 2

Assigned: 8th of the Month 12:01 AM UTC Due: 14th of the Month 12:00 AM UTC

Quiz 3

Assigned: 15th of the Month 12:01 AM UTC Due: 21st of the Month 12:00 AM UTC

Quiz 4

Assigned: 22nd of the Month 12:01 AM UTC Due: 28th of the Month 12:00 AM UTC


Background lectures

Background lectures about the content of the course with respect to other quantitative courses, course logistics, and the R programming language are provided as reference material. It is not necessary to watch the videos to complete the course, however they may be useful for explaining background, the grading schemes used, and how to use R.


Quiz Scoring

You may attempt each quiz up to 2 times. Only the score from your final attempt will count toward your grade.


Hard deadlines and soft deadlines

The reported due date is the soft deadline for each quiz. You may turn in quizzes up to two days after the soft deadline. The hard deadline is the Tuesday after the Quiz is due at 23:30 UTC-5:00. Each day late will incur a 10% penalty, but if you use a late day, the penalty will not be applied to that day.


Late Days for Quizzes

You are permitted 5 late days for quizzes in the course. If you use a late day, your quiz grade will not be affected.


Dates for the project

Submission

Assigned: Class open (1st of Month) Due: 21st of the Month 12:00 AM UTC

Review

Assigned: 22nd of the Month 12:01 AM UTC Due: 28th of the Month 12:00 AM UTC


Typos

  • We are prone to a typo or two - please report them and we will try to update the notes accordingly.
  • In some cases, the videos may still contain typos that have been fixed in the lecture notes. The lecture notes represent the most up-to-date version of the course material.

Differences of opinion

Keep in mind that currently data analysis is as much art as it is science - so we may have a difference of opinion - and that is ok! Please refrain from angry, sarcastic, or abusive comments on the message boards. Our goal is to create a supportive community that helps the learning of all students, from the most advanced to those who are just seeing this material for the first time.


Peer Review

For many of the course projects, peer scoring will be necessary to evaluate the completion of the assignments. We have created and tested rubrics for each assignment. They are not perfect and will not be perfectly applied. However, we believe that the feedback from peer assessment adds value above simple multiple choice assessments.

  • We have tried to make the criteria as objective as possible, do your best to apply them to the best of your abilities.
  • If you have questions or suggestions about the rubrics, please report them in the forum, "Rubric Issues".
  • If you disagree with the scores you received through peer review, you may report those issues in the "Grading Issues" forum. Please note that it will be impossible for us to revise peer-grades, but we will attempt to use reports to improve future versions of the rubric.

Plagiarism

Johns Hopkins University defines plagiarism as “…taking for one’s own use the words, ideas, concepts or data of another without proper attribution. Plagiarism includes both direct use or paraphrasing of the words, thoughts, or concepts of another without proper attribution.” We take plagiarism very seriously, as does Johns Hopkins University.

We recognize that many students may not have a clear understanding of what plagiarism is or why it is wrong. Please see the following guide for more information on plagiarism:

http://www.jhsph.edu/academics/degree-programs/master-of-public-health/current-students/JHSPH-ReferencingHandbook.pdf

It is critically important that you give people/sources credit when you use their words or ideas. If you do not give proper credit – particularly when quoting directly from a source – you violate the trust of your fellow students.

The Coursera Honor code includes an explicit statement about plagiarism:

I will register for only one account. My answers to homework, quizzes and exams will be my own work (except for assignments that explicitly permit collaboration). I will not make solutions to homework, quizzes or exams available to anyone else. This includes both solutions written by me, as well as any official solutions provided by the course staff. I will not engage in any other activities that will dishonestly improve my results or dishonestly improve/hurt the results of others.


Reporting plagiarism on course projects

One of the criteria in the project rubric focuses on plagiarism. Keep in mind that some components of the projects will be very similar across terms and so answers that appear similar may be honest coincidences. However, we would appreciate if you do a basic check for obvious plagiarism and report it during your peer assessment phase.

It is currently very difficult to prove or disprove a charge of plagiarism in the MOOC peer assessment setting. We are not in a position to evaluate whether or not a submission actually constitutes plagiarism, and we will not be able to entertain appeals or to alter any grades that have been assigned through the peer evaluation system.

But if you take the time to report suspected plagiarism, this will help us to understand the extent of the problem and work with Coursera to address critical issues with the current system.


Technical Information

Regardless of your platform (Windows or Mac) you will need a high-speed Internet connection in order to watch the videos on the Coursera web site. It is possible to download the video files and watch them on your computer rather than stream them from Coursera and this may be preferable for some of you.

Here is some platform-specific information:

Windows

The Coursera web site seems to work best with either the Chrome or the Firefox web browsers. In particular, you may run into trouble if you use Internet Explorer. The Chrome and Firefox browsers can be downloaded from: _Chrome: http://www.google.com/chrome _ Firefox: http://www.mozilla.org

Mac

The Coursera site appears to work well with Safari, Chrome, or Firefox, so any of these browsers should be fine.