Thursday, September 28, 2017

paste() paste0() {base}


paste() and paste0() concatenate vectors after converting to a character.

paste (..., sep = " ", collapse = NULL) 
paste0(..., collapse = NULL)

The parameters are:
 - … : objects to be converted to a character vector
 - sep: string to separate the terms
 - collapse: optional character string to separate the results

paste():
(x = c('A', 'B', 'C'))
## [1] "A" "B" "C"
(y = c('one', 'two', 'three'))
## [1] "one"   "two"   "three"
paste(x,y, sep = "")
## [1] "Aone"   "Btwo"   "Cthree"
paste(x,y, sep = " ")
## [1] "A one"   "B two"   "C three"
paste(x,y, sep = "_")
## [1] "A_one"   "B_two"   "C_three"
(x = c('A', 'B', 'C'))
## [1] "A" "B" "C"
(y = c('1', '2', '3'))
## [1] "1" "2" "3"
paste(x,y, sep = "", collapse = " -- ")
## [1] "A1 -- B2 -- C3"
paste(x,y, sep = " ", collapse = " -- ")
## [1] "A 1 -- B 2 -- C 3"

paste0:
paste0(..., collapse) is equivalent to paste(..., sep = "", collapse).

paste0(x,y)
## [1] "A1" "B2" "C3"
paste(x,y, sep = "")
## [1] "A1" "B2" "C3"
paste0(x,y, collapse = " -- ")
## [1] "A1 -- B2 -- C3"
par(mfrow = c(1,2))
plot(iris$Species, iris$Sepal.Length, names = paste(toupper(levels(iris$Species)), "S.L.", sep = " - "), las = 2, col = 'aquamarine2', cex.axis = 0.60)
plot(iris$Species, iris$Sepal.Width, names = paste(toupper(levels(iris$Species)), "S.W.", sep = " - "), las = 2, col = 'aquamarine4', cex.axis = 0.60)

Wednesday, September 27, 2017

jitter() {base}


jitter() is a function that adds a small amount of noise to a numeric vector.

jitter(x, factor = 1, amount = NULL)

The parameters are:
  • x: numeric vector
  • factor: numeric
  • amount: numeric


factor and amount:
The result obtained is: x + runif(n, -a, a) where n <- length(x) and a is the amount argument when specified.
If amount == 0, we set a <- factor * z/50, where z = max(x) - min(x).
If amount is NULL (default), we set a <- factor * d/5 where d is the smallest difference between adjacent unique x values.

(x = c(1:10))
##  [1]  1  2  3  4  5  6  7  8  9 10
jitter(x)
##  [1] 1.130417 1.986822 2.988324 3.859536 5.000287 5.857188 6.918340
##  [8] 8.169720 8.874839 9.905134
jitter(x, factor = 1)
##  [1]  0.834980  1.835347  3.013703  3.871540  5.081525  6.090221  6.962873
##  [8]  7.851089  8.928137 10.162149
jitter(x, factor = 100)
##  [1]  -2.893356 -17.441460   1.516910   9.771455   7.970764  -8.361613
##  [7]   5.423622  17.443090  22.896337  -4.537215
jitter(x, factor = 1000)
##  [1] -183.377948  134.473433 -184.563341  102.799489   94.493774
##  [6]  -51.109763   -8.312566  104.303829  -70.242552  151.695206
jitter(x, factor = 1, amount = 1)
##  [1] 1.569603 2.476006 2.667142 3.165431 4.174408 5.520519 7.001732
##  [8] 7.417415 8.922810 9.026437
jitter(x, factor = 1, amount = 10)
##  [1]  1.636452 -3.943719 -2.927030  4.313734 -4.221980  9.040693 14.964047
##  [8] 14.971467 16.330573 18.446910
jitter(x, factor = 10, amount = 10)
##  [1] 10.3411754  1.4124063  1.5017123 -1.0730307  7.1035618  7.6606934
##  [7] -2.9521261  1.3903118  0.6792946  8.9670520
jitter(x, factor = 10, amount = 100)
##  [1]  54.23251  64.86327 -14.59972  75.91645  84.13828  51.97269 -12.05809
##  [8] 106.60717  73.01726 -79.91987

Also, jitter() function can be useful for data visualization. When working with scatter plots using a quantitative variable dots can be overlapped making difficult the visualization of the data.
#Data:
X=rep(1:5, each=50)
a=runif(50 , min=0 , max=10)
Y=c(a-2 , a-3 , a+2, a+4, a+3)
 
par(mfrow = c(1,2))
# plot (overlapped dots)
plot(X, Y, pch = 22, main = 'No using `jitter()`', cex.main = 0.75)
# plot with jitter
plot(jitter(X), Y, pch = 22, col = c('darkviolet'), xlab="X", ylab="Y", main = 'Using `jitter()`', cex.main = 0.75)

We can see that using jitter() function data visualization is easier.

Tuesday, September 26, 2017

cumsum(), cumprod(), cummax(), cummin(), {base}


cumsum()cumprod()cummax() and cummin() are functions that return a vector where the elements are cumulative sums, products, maxima and minima, respectively, of the elements of the argument.

cumsum(x) cumprod(x) cummax(x) cummin(x)

The parameters are:
  • x: numeric vector 

x = c(1:5)
x
## [1] 1 2 3 4 5
cumsum(x)
## [1]  1  3  6 10 15
cumprod(x)
## [1]   1   2   6  24 120
par(mfrow = c(1,2))
#cumsum plot
plot(cumsum(x), col = 'gold', type = 'l',  main = 'cumsum() function')
points(x, type = 'b')
#cumprod plot
plot(cumprod(x), col = 'gold', type = 'l',  main = 'cumprod() function')
points(x, type = 'b')



y = c(2:7,5:1,1:4)
y
##  [1] 2 3 4 5 6 7 5 4 3 2 1 1 2 3 4
cummin(y)
##  [1] 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1
par(mfrow = c(1,2))
#cummin plot
plot(y, col = 'blue', type = 'b', main = 'cummin() function')
points(cummin(y), col = 'deeppink', type = 'l')
#cummax plot
plot(y, col = 'blue', type = 'b', main = 'cummax() function')
points(cummax(y), col = 'deeppink', type = 'l')
We will use the dataset AirPassenger, which has the monthly totals of international airlines passengers, from 1949 to 1960.
head(AirPassengers)
## [1] 112 118 132 129 121 135
#in a dataframe format:
AirPas <- matrix(AirPassengers, ncol = 12, byrow =TRUE,  dimnames = list( as.character(1949:1960),month.abb))
AirPass = as.data.frame(AirPas)
head(AirPass)
##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
cumsum(AirPass)
##       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1949  112  118  132  129  121  135  148  148  136  119  104  118
## 1950  227  244  273  264  246  284  318  318  294  252  218  258
## 1951  372  394  451  427  418  462  517  517  478  414  364  424
## 1952  543  574  644  608  601  680  747  759  687  605  536  618
## 1953  739  770  880  843  830  923 1011 1031  924  816  716  819
## 1954  943  958 1115 1070 1064 1187 1313 1324 1183 1045  919 1048
## 1955 1185 1191 1382 1339 1334 1502 1677 1671 1495 1319 1156 1326
## 1956 1469 1468 1699 1652 1652 1876 2090 2076 1850 1625 1427 1632
## 1957 1784 1769 2055 2000 2007 2298 2555 2543 2254 1972 1732 1968
## 1958 2124 2087 2417 2348 2370 2733 3046 3048 2658 2331 2042 2305
## 1959 2484 2429 2823 2744 2790 3205 3594 3607 3121 2738 2404 2710
## 1960 2901 2820 3242 3205 3262 3740 4216 4213 3629 3199 2794 3142
cummin(AirPass)
##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 112 118 132 129 121 135 148 148 136 119 104 118
## 1951 112 118 132 129 121 135 148 148 136 119 104 118
## 1952 112 118 132 129 121 135 148 148 136 119 104 118
## 1953 112 118 132 129 121 135 148 148 136 119 104 118
## 1954 112 118 132 129 121 135 148 148 136 119 104 118
## 1955 112 118 132 129 121 135 148 148 136 119 104 118
## 1956 112 118 132 129 121 135 148 148 136 119 104 118
## 1957 112 118 132 129 121 135 148 148 136 119 104 118
## 1958 112 118 132 129 121 135 148 148 136 119 104 118
## 1959 112 118 132 129 121 135 148 148 136 119 104 118
## 1960 112 118 132 129 121 135 148 148 136 119 104 118
cummax(AirPass)
##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 196 236 235 234 264 302 293 259 229 203 229
## 1955 242 233 267 269 270 315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306
## 1957 315 301 356 348 355 422 465 467 404 347 305 336
## 1958 340 318 362 348 363 435 491 505 404 359 310 337
## 1959 360 342 406 396 420 472 548 559 463 407 362 405
## 1960 417 391 419 461 472 535 622 606 508 461 390 432
plot(AirPass$Jan, ylim = c(0,3000), col = 'darkblue', type = 'b')
points(cumsum(AirPass$Jan), type = 'l', col = 'cyan3')
The line represents the cumulative sums of the passengers in January over the years from 1949 to 1960. The points represent the number of passenger of each January over the years.
plot(AirPass$Jan, col = 'darkblue', type = 'b')
points(cummax(AirPass$Jan), type = 'l', col = 'cyan3')
The line represents the cumulative maxima of the passengers in January over the years from 1949 to 1960, and the points represent the number of passenger of each January over the years. Since the number of passengers have been increasing over the years the line (cumulative maxima) is the same as the points (number of passengers).

duplicated() {base}

duplicated()  function determines which elements are duplicated and returns a logical vector. The parameters of the function are:   ...