Monday, September 11, 2017

subset() {base}


subset() function returns a subset of vectors, matrices or data frames which meet specific conditions.

The arguments are:
 - vector, matrix or data frame to get the subset from.
 - subset: expression indicating elements to keep.
 - select: expression indicating columns to select from a data frame.

Using vectors:
x =c(1:100)
subset(x, x > 50)
##  [1]  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67
## [18]  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84
## [35]  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100

Using data frames, the subset argument works on the rows:
#we will work with the dataset `ChickWeight`.
#to get to know the data we are working with:
head(ChickWeight)
##   weight Time Chick Diet
## 1     42    0     1    1
## 2     51    2     1    1
## 3     59    4     1    1
## 4     64    6     1    1
## 5     76    8     1    1
## 6     93   10     1    1
require(ggplot2)
ggplot(ChickWeight, aes(ChickWeight$weight, ChickWeight$Time)) +
  geom_point(aes(color=Diet, shape = Diet)) +
  facet_grid(.~ Diet) + 
  xlab("Weight") +
  ylab("Diet") +
  ggtitle("Chickweight")

#using the subset function with the dataset `ChickWeight`:
subset(ChickWeight, Chick == 2) #it is the same as : ChickWeight[ChickWeight$Chick == 1,]
##    weight Time Chick Diet
## 13     40    0     2    1
## 14     49    2     2    1
## 15     58    4     2    1
## 16     72    6     2    1
## 17     84    8     2    1
## 18    103   10     2    1
## 19    122   12     2    1
## 20    138   14     2    1
## 21    162   16     2    1
## 22    187   18     2    1
## 23    209   20     2    1
## 24    215   21     2    1
subset(ChickWeight, Chick == 3 & weight > 70)
##    weight Time Chick Diet
## 29     84    8     3    1
## 30     99   10     3    1
## 31    115   12     3    1
## 32    138   14     3    1
## 33    163   16     3    1
## 34    187   18     3    1
## 35    198   20     3    1
## 36    202   21     3    1
subset(ChickWeight, Chick == 3, select = c(weight, Time))
##    weight Time
## 25     43    0
## 26     39    2
## 27     55    4
## 28     67    6
## 29     84    8
## 30     99   10
## 31    115   12
## 32    138   14
## 33    163   16
## 34    187   18
## 35    198   20
## 36    202   21
subset(ChickWeight, Chick == 2, select = -weight, drop = TRUE)
##    Time Chick Diet
## 13    0     2    1
## 14    2     2    1
## 15    4     2    1
## 16    6     2    1
## 17    8     2    1
## 18   10     2    1
## 19   12     2    1
## 20   14     2    1
## 21   16     2    1
## 22   18     2    1
## 23   20     2    1
## 24   21     2    1

No comments:

Post a Comment

duplicated() {base}

duplicated()  function determines which elements are duplicated and returns a logical vector. The parameters of the function are:   ...