Tuesday, September 12, 2017

sample() {base}


sample() functions takes a sample of specific size from the elements of x, with or without replacement

sample(x, size, replace = FALSE, prob = NULL)

Parameters: 
x : vector of elements 
size : size of the sample 
replace: sampling with or without replacement 
prob: a vector of probability weights for obtaining the elements of the vector being sampled

x <- 1:100

replacesample =  sample(x, size = 20, replace = TRUE, prob = NULL) #number may be repeated
replacesample
##  [1] 87 76 56 65 44  4 47  3 90  9 68 31  9  4  4 53 33 83 48 23
sort(replacesample)
##  [1]  3  4  4  4  9  9 23 31 33 44 47 48 53 56 65 68 76 83 87 90
noreplacesample = sample(x, size = 20, replace = FALSE, prob = NULL) #numbers may not be repeated
noreplacesample
##  [1] 14 22 75 46 48 78 33 73 44 67 74 28  1 20 32 37 88 66 89 30
sort(noreplacesample)
##  [1]  1 14 20 22 28 30 32 33 37 44 46 48 66 67 73 74 75 78 88 89

If replace = FALSE, the size has to be equal or smaller than x:
#size smaller than x:
sample(x, size = 50, replace = FALSE, prob = NULL) 
##  [1] 20 26 75 51 84 17 72 87 91 36 29 93 69  4 71 86 30 81 35 99  9 34 33
## [24] 46 66 83 48 70 80 50  6 43 61 28 42  1 67 11 18 52 13 56 65 15 54 16
## [47]  5 96 92 10
#size bigger than x gives an error:
#sample(x, size = 150, replace = FALSE, prob = NULL) 
Gives the following error: 
Error in sample.int(length(x), size, replace, prob) : cannot take a sample larger than the population when ‘replace = FALSE’

When replace = TRUE sample may be bigger than the population:
#size bigger than x:
sample(x, size = 150, replace = TRUE, prob = NULL) 
##   [1]  18  82  14  29  21  29  16  53   7  47  21  57  84  21   8  64  90
##  [18] 100  56  85 100  38  33  80  41  32   9  95  28  46 100  75  75  54
##  [35]  52  83  78  85  42  63   3  55  82   9  75  13  10  43  90  88  58
##  [52]  25  26  93  56  31  96  86  62  87  80  50  36  98  86  24  53 100
##  [69]  59  14  24 100  76  58  28  51  87 100  18  79  13  42  84  88   7
##  [86]  95  65  75  45  79  49  12  75  70  28  57  61  45  53  53  66  69
## [103]  57  93  17  50   1  57   3   4  66  61  93  33  29  31  63  58  82
## [120]   6   3   8  16  81  12  23  97  41  10  88  40  78  76  34  90  71
## [137]  65  89  79  86  19  98  92  81  90  53  66  95  14  90

Probabilities for each element to be sampled:
y = 1:6
sample1 = sample(y, size = 10, replace = TRUE, prob = c(0.1, 0.1, 0.1, 0.1, 0.3, 0.3))
polygon(plot(density(sample1), col='purple'))
sample2 = sample(y, size = 10, replace = TRUE, prob = c(0.3, 0.3, 0.1, 0.1, 0.1, 0.1))
polygon(plot(density(sample2), col='purple'))

No comments:

Post a Comment

duplicated() {base}

duplicated()  function determines which elements are duplicated and returns a logical vector. The parameters of the function are:   ...