forked from tidyverse/dplyr
-
Notifications
You must be signed in to change notification settings - Fork 0
/
select.Rd
131 lines (113 loc) · 5.04 KB
/
select.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/select.R
\name{select}
\alias{select}
\title{Subset columns using their names and types}
\usage{
select(.data, ...)
}
\arguments{
\item{.data}{A data frame, data frame extension (e.g. a tibble), or a
lazy data frame (e.g. from dbplyr or dtplyr). See \emph{Methods}, below, for
more details.}
\item{...}{<\code{\link[=dplyr_tidy_select]{tidy-select}}> One or more unquoted
expressions separated by commas. Variable names can be used as if they
were positions in the data frame, so expressions like \code{x:y} can
be used to select a range of variables.}
}
\value{
An object of the same type as \code{.data}.
\itemize{
\item Rows are not affected.
\item Output columns are a subset of input columns, potentially with a different
order. Columns will be renamed if \code{new_name = old_name} form is used.
\item Data frame attributes are preserved.
\item Groups are maintained; you can't select off grouping variables.
}
}
\description{
Select (and optionally rename) variables in a data frame, using a concise
mini-language that makes it easy to refer to variables based on their name
(e.g. \code{a:f} selects all columns from \code{a} on the left to \code{f} on the
right). You can also use predicate functions like \link{is.numeric} to select
variables based on their properties.
}
\section{Useful functions}{
As well as using existing functions like \code{:} and \code{c()}, there are
a number of special functions that only work inside \code{select()}:
\itemize{
\item \code{\link[=any_of]{any_of()}}, \code{\link[=all_of]{all_of()}}.
\item \code{\link[=starts_with]{starts_with()}}, \code{\link[=ends_with]{ends_with()}}, \code{\link[=contains]{contains()}}, \code{\link[=matches]{matches()}}.
\item \code{\link[=num_range]{num_range()}}.
\item \code{\link[=group_cols]{group_cols()}}, \code{\link[=last_col]{last_col()}}.
\item \code{\link[=everything]{everything()}}.
}
You can also use predicate functions (functions that return a single \code{TRUE}
or \code{FALSE}) like \code{is.numeric}, \code{is.character}, and \code{is.factor}
to select variables of specific types.
Selections can be combined using Boolean algebra:
\itemize{
\item \code{starts_with("a") & ends_with("x")}: variables with names that start with "a" and end with "x"
\item \code{starts_with("a") | starts_with("b")}: variables with names that start with "a" or "b"
\item \code{!starts_with("a")}: variables with names that do not start with "a"
}
To remove variables from a selection, use \code{-}:
\itemize{
\item \code{starts_with("a") - ends_width("x")}: variables with names that start with "a" and do not end with "x"
\item \code{is.numeric - c(a, b, c)}: numeric variables except, for \code{a}, \code{b}, \code{c}.
}
See \link[tidyselect:select_helpers]{select helpers} for more details and
examples.
Note that except for \code{:}, \code{-} and \code{c()}, all complex expressions
are evaluated outside the data frame context. This is to prevent
accidental matching of data frame variables when you refer to
variables from the calling environment.
}
\section{Methods}{
This function is a \strong{generic}, which means that packages can provide
implementations (methods) for other classes. See the documentation of
individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages:
\Sexpr[stage=render,results=rd]{dplyr:::methods_rd("select")}.
}
\examples{
select(starwars, starts_with("h"))
select(starwars, ends_with("color"))
select(starwars, !contains("s"))
select(starwars, starts_with("h") & ends_with("color"))
select(starwars, is.numeric)
# Optionally, rename individual variables as they are selected,
# in the format `new_name = old_name`
select(starwars, character_name = name, character_height = height)
# Use num_range() to select variables with numeric suffixes
df <- as.data.frame(matrix(runif(100), nrow = 10))
select(df, V4:V6) # Specify variable names explicitly
select(df, num_range(prefix = "V", range = 4:6)) # Or, specify the prefix used on a numeric range
# Select the existing grouping variables:
starwars \%>\% group_by(gender, eye_color) \%>\% select(group_cols())
# Using select() semantics in across()
starwars \%>\% summarise(across(cols = height:mass, fns = ~mean(.x, na.rm = TRUE)))
# Use `{{ }}` inside functions to tunnel data-variables through
# function arguments. See ?dplyr_tidy_eval for more information.
averages <- function(data, vars) {
data \%>\%
select({{ vars }}) \%>\%
lapply(mean, na.rm = TRUE)
}
starwars \%>\% averages(height)
starwars \%>\% averages(c(height, mass))
# Modifying the order of variables --------------------------
# As of dplyr 1.0.0, use relocate(), not select():
starwars \%>\% select(name:birth_year) \%>\% relocate(birth_year, .before = 1)
starwars \%>\% select(name:birth_year) \%>\% relocate(name, .after = last_col())
}
\seealso{
Other single table verbs:
\code{\link{arrange}()},
\code{\link{filter}()},
\code{\link{mutate}()},
\code{\link{rename}()},
\code{\link{slice}()},
\code{\link{summarise}()}
}
\concept{single table verbs}