forked from DataScienceSpecialization/courses
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request DataScienceSpecialization#15 from rdpeng/master
Classes & Methods / R Packages for Data Products
- Loading branch information
Showing
14 changed files
with
3,363 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
#' Building a Model with Top Ten Features | ||
#' | ||
#' This function develops a prediction algorithm based on the top 10 features | ||
#' in 'x' that are most predictive of 'y'. | ||
#' | ||
#' @param x a n x p matrix of n observations and p predictors | ||
#' @param y a vector of length n representing the response | ||
#' @return a 'lm' object representing the linear model with the top 10 predictors | ||
#' @author Roger Peng | ||
#' @details | ||
#' This function runs a univariate regression of y on each predictor in x and | ||
#' calculates the p-value indicating the significance of the association. The | ||
#' final set of 10 predictors is the taken from the 10 smallest p-values. | ||
#' @seealso \code{lm} | ||
#' @import stats | ||
#' @export | ||
|
||
topten <- function(x, y) { | ||
p <- ncol(x) | ||
if(p < 10) | ||
stop("there are less than 10 predictors") | ||
pvalues <- numeric(p) | ||
for(i in seq_len(p)) { | ||
fit <- lm(y ~ x[, i]) | ||
summ <- summary(fit) | ||
pvalues[i] <- summ$coefficients[2, 4] | ||
} | ||
ord <- order(pvalues) | ||
x10 <- x[, ord] | ||
fit <- lm(y ~ x10) | ||
coef(fit) | ||
} | ||
|
||
#' Prediction with Top Ten Features | ||
#' | ||
#' This function takes a set coefficients produced by the \code{topten} | ||
#' function and makes a prediction for each of the values provided in the | ||
#' input 'X' matrix. | ||
#' | ||
#' @param X a n x 10 matrix containing n observations | ||
#' @param b a vector of coefficients obtained from the \code{topten} function | ||
#' @return a numeric vector containing the predicted values | ||
|
||
predict10 <- function(X, b) { | ||
X <- cbind(1, X) | ||
drop(X %*% b) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,317 @@ | ||
--- | ||
title : Building R Packages | ||
subtitle : | ||
author : Roger D. Peng, Associate Professor of Biostatistics | ||
job : Johns Hopkins Bloomberg School of Public Health | ||
logo : bloomberg_shield.png | ||
framework : io2012 # {io2012, html5slides, shower, dzslides, ...} | ||
highlighter : highlight.js # {highlight.js, prettify, highlight} | ||
hitheme : tomorrow # | ||
url: | ||
lib: ../../librariesNew | ||
assets: ../../assets | ||
widgets : [mathjax] # {mathjax, quiz, bootstrap} | ||
mode : selfcontained # {standalone, draft} | ||
--- | ||
|
||
## What is an R Package? | ||
|
||
- A mechanism for extending the basic functionality of R | ||
- A collection of R functions, or other (data) objects | ||
- Organized in a systematic fashion to provide a minimal amount of consistency | ||
- Written by users/developers everywhere | ||
|
||
--- | ||
|
||
## Where are These R Packages? | ||
|
||
- Primarily available from CRAN and Bioconductor | ||
|
||
- Also available from GitHub, Bitbucket, Gitorious, etc. (and elsewhere) | ||
|
||
- Packages from CRAN/Bioconductor can be installed with `install.packages()` | ||
|
||
- Packages from GitHub can be installed using `install_github()` from | ||
the <b>devtools</b> package | ||
|
||
You do not have to put a package on a central repository, but doing so | ||
makes it easier for others to install your package. | ||
|
||
--- | ||
|
||
## What's the Point? | ||
|
||
- "Why not just make some code available?" | ||
- Documentation / vignettes | ||
- Centralized resources like CRAN | ||
- Minimal standards for reliability and robustness | ||
- Maintainability / extension | ||
- Interface definition / clear API | ||
- Users know that it will at least load properly | ||
|
||
--- | ||
|
||
## Package Development Process | ||
|
||
- Write some code in an R script file (.R) | ||
- Want to make code available to others | ||
- Incorporate R script file into R package structure | ||
- Write documentation for user functions | ||
- Include some other material (examples, demos, datasets, tutorials) | ||
- Package it up! | ||
|
||
--- | ||
|
||
## Package Development Process | ||
|
||
- Submit package to CRAN or Bioconductor | ||
- Push source code repository to GitHub or other source code sharing web site | ||
- People find all kinds of problems with your code | ||
- Scenario #1: They tell you about those problems and expect you to fix it | ||
- Scenario #2: They fix the problem for you and show you the changes | ||
- You incorporate the changes and release a new version | ||
|
||
--- | ||
|
||
## R Package Essentials | ||
|
||
- An R package is started by creating a directory with the name of the R package | ||
- A DESCRIPTION file which has info about the package | ||
- R code! (in the R/ sub-directory) | ||
- Documentation (in the man/ sub-directory) | ||
- NAMESPACE (optional, but do it) | ||
- Full requirements in Writing R Extensions | ||
|
||
--- | ||
|
||
## The DESCRIPTION File | ||
|
||
- <b>Package</b>: Name of package (e.g. library(name)) | ||
- <b>Title</b>: Full name of package | ||
- <b>Description</b>: Longer description of package in one sentence (usually) | ||
- <b>Version</b>: Version number (usually M.m-p format) | ||
- <b>Author</b>, <b>Authors@R</b>: Name of the original author(s) | ||
- <b>Maintainer</b>: Name + email of person who fixes problems | ||
- <b>License</b>: License for the source code | ||
|
||
--- | ||
|
||
## The DESCRIPTION File | ||
|
||
These fields are optional but commonly used | ||
|
||
- <b>Depends</b>: R packages that your package depends on | ||
- <b>Suggests</b>: Optional R packages that users may want to have installed | ||
- <b>Date</b>: Release date in YYYY-MM-DD format | ||
- <b>URL</b>: Package home page | ||
- <b>Other</b> fields can be added | ||
|
||
--- | ||
|
||
## DESCRIPTION File: `gpclib` | ||
|
||
<b>Package</b>: gpclib<br /> | ||
<b>Title</b>: General Polygon Clipping Library for R | ||
<b>Description</b>: General polygon clipping routines for R based on Alan Murta's C library<br /> | ||
<b>Version</b>: 1.5-5<br /> | ||
<b>Author</b>: Roger D. Peng <[email protected]> with contributions from Duncan Murdoch and Barry Rowlingson; GPC library by Alan Murta<br /> | ||
<b>Maintainer</b>: Roger D. Peng <[email protected]><br /> | ||
<b>License</b>: file LICENSE<br /> | ||
<b>Depends</b>: R (>= 2.14.0), methods<br /> | ||
<b>Imports</b>: graphics<br /> | ||
<b>Date</b>: 2013-04-01<br /> | ||
<b>URL</b>: http://www.cs.man.ac.uk/~toby/gpc/, http://github.com/rdpeng/gpclib | ||
|
||
--- | ||
|
||
## R Code | ||
|
||
- Copy R code into the R/ sub-directory | ||
- There can be any number of files in this directory | ||
- Usually separate out files into logical groups | ||
- Code for all functions should be included here and not anywhere else in the package | ||
|
||
--- | ||
|
||
## The NAMESPACE File | ||
|
||
- Used to indicate which functions are <b>exported</b> | ||
- Exported functions can be called by the user and are considered the public API | ||
- Non-exported functions cannot be called directly by the user (but the code can be viewed) | ||
- Hides implementation details from users and makes a cleaner package interface | ||
|
||
--- | ||
|
||
## The NAMESPACE File | ||
|
||
- You can also indicate what functions you <b>import</b> from other packages | ||
- This allows for your package to use other packages without making other packages visible to the user | ||
- Importing a function loads the package but does not attach it to the search list | ||
|
||
--- | ||
|
||
## The NAMESPACE File | ||
|
||
Key directives | ||
- export("\<function>") | ||
- import("\<package>") | ||
- importFrom("\<package>", "\<function>") | ||
|
||
Also important | ||
- exportClasses("\<class>") | ||
- exportMethods("\<generic>") | ||
|
||
--- | ||
|
||
## NAMESPACE File: `mvtsplot` package | ||
|
||
```r | ||
export("mvtsplot") | ||
importFrom(graphics, "Axis") | ||
import(splines) | ||
``` | ||
|
||
--- | ||
|
||
## NAMESPACE File: `gpclib` package | ||
|
||
```r | ||
export("read.polyfile", "write.polyfile") | ||
|
||
importFrom(graphics, plot) | ||
|
||
exportClasses("gpc.poly", "gpc.poly.nohole") | ||
|
||
exportMethods("show", "get.bbox", "plot", "intersect”, "union”, "setdiff", | ||
"[", "append.poly", "scale.poly", "area.poly", "get.pts", | ||
"coerce", "tristrip", "triangulate") | ||
``` | ||
|
||
--- | ||
|
||
## Documentation | ||
|
||
- Documentation files (.Rd) placed in man/ sub-directory | ||
- Written in a specific markup language | ||
- Required for every exported function | ||
- Another reason to limit exported functions | ||
- You can document other things like concepts, package overview | ||
|
||
--- | ||
|
||
## Help File Example: `line` Function | ||
|
||
``` | ||
\name{line} | ||
\alias{line} | ||
\alias{residuals.tukeyline} | ||
\title{Robust Line Fitting} | ||
\description{ | ||
Fit a line robustly as recommended in \emph{Exploratory Data Analysis}. | ||
} | ||
``` | ||
|
||
--- | ||
|
||
## Help File Example: `line` Function | ||
|
||
``` | ||
\usage{ | ||
line(x, y) | ||
} | ||
\arguments{ | ||
\item{x, y}{the arguments can be any way of specifying x-y pairs. See | ||
\code{\link{xy.coords}}.} | ||
} | ||
``` | ||
|
||
--- | ||
|
||
## Help File Example: `line` Function | ||
|
||
``` | ||
\details{ | ||
Cases with missing values are omitted. | ||
Long vectors are not supported. | ||
} | ||
\value{ | ||
An object of class \code{"tukeyline"}. | ||
Methods are available for the generic functions \code{coef}, | ||
\code{residuals}, \code{fitted}, and \code{print}. | ||
} | ||
``` | ||
|
||
--- | ||
|
||
## Help File Example: `line` Function | ||
|
||
``` | ||
\references{ | ||
Tukey, J. W. (1977). | ||
\emph{Exploratory Data Analysis}, | ||
Reading Massachusetts: Addison-Wesley. | ||
} | ||
``` | ||
|
||
--- | ||
|
||
## Building and Checking | ||
|
||
- R CMD build is a command-line program that creates a package archive | ||
file (`.tar.gz`) | ||
|
||
- R CMD check runs a battery of tests on the package | ||
|
||
- You can run R CMD build or R CMD check from the command-line using a | ||
terminal or command-shell application | ||
|
||
- You can also run them from R using the system() function | ||
|
||
```r | ||
system("R CMD build newpackage") | ||
system("R CMD check newpackage") | ||
``` | ||
|
||
--- | ||
|
||
## Checking | ||
|
||
- R CMD check runs a battery tests | ||
- Documentation exists | ||
- Code can be loaded, no major coding problems or errors | ||
- Run examples in documentation | ||
- Check docs match code | ||
- All tests must pass to put package on CRAN | ||
|
||
|
||
--- | ||
|
||
## Getting Started | ||
|
||
- The `package.skeleton()` function in the utils package creates a "skeleton" R package | ||
- Directory structure (R/, man/), DESCRIPTION file, NAMESPACE file, documentation files | ||
- If there are functions visible in your workspace, it writes R code files to the R/ directory | ||
- Documentation stubs are created in man/ | ||
- You need to fill in the rest! | ||
|
||
--- | ||
|
||
## Summary | ||
|
||
- R packages provide a systematic way to make R code available to others | ||
- Standards ensure that packages have a minimal amount of documentation and robustness | ||
- Obtained from CRAN, Bioconductor, Github, etc. | ||
|
||
--- | ||
|
||
## Summary | ||
|
||
- Create a new directory with R/ and man/ sub-directories (or just use package.skeleton()) | ||
- Write a DESCRIPTION file | ||
- Copy R code into the R/ sub-directory | ||
- Write documentation files in man/ sub-directory | ||
- Write a NAMESPACE file with exports/imports | ||
- Build and check | ||
|
Oops, something went wrong.