Wednesday, September 6, 2017

merge() {base}


merge() function allows to merge horizontally 2 dataframes by key variables.

The parameters are: 
x: first dataframe to be merged. 
y: second dataframe to be merged.
 - by: the variable or variables used to do the merging.
 - incomparables : values that can not be matched, intended to be used for merging one column.

#dataframe 1:
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
iris$ID <- row.names(iris)
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species ID
## 1          5.1         3.5          1.4         0.2  setosa  1
## 2          4.9         3.0          1.4         0.2  setosa  2
## 3          4.7         3.2          1.3         0.2  setosa  3
## 4          4.6         3.1          1.5         0.2  setosa  4
## 5          5.0         3.6          1.4         0.2  setosa  5
## 6          5.4         3.9          1.7         0.4  setosa  6
#dataframe 2:
ID <- c(1,3,5,7,9,11,13)
y <- rep('test', 7)
Sepal.Length <- c(5.1,5.1,4.9,5.1,5.1,5.1,4.8)
z <- data.frame(ID,y, Sepal.Length)
z
##   ID    y Sepal.Length
## 1  1 test          5.1
## 2  3 test          5.1
## 3  5 test          4.9
## 4  7 test          5.1
## 5  9 test          5.1
## 6 11 test          5.1
## 7 13 test          4.8

To get only the rows that match only the ID variable:
#merge dataframes by 'ID' variable:
merge(iris, z, by = 'ID')
##   ID Sepal.Length.x Sepal.Width Petal.Length Petal.Width Species    y
## 1  1            5.1         3.5          1.4         0.2  setosa test
## 2 11            5.4         3.7          1.5         0.2  setosa test
## 3 13            4.8         3.0          1.4         0.1  setosa test
## 4  3            4.7         3.2          1.3         0.2  setosa test
## 5  5            5.0         3.6          1.4         0.2  setosa test
## 6  7            4.6         3.4          1.4         0.3  setosa test
## 7  9            4.4         2.9          1.4         0.2  setosa test
##   Sepal.Length.y
## 1            5.1
## 2            5.1
## 3            4.8
## 4            5.1
## 5            4.9
## 6            5.1
## 7            5.1
We get 7 rows.

To get only the values that macth the ID variables, except those that we put in the incomparables parameter:
#merge dataframes by 'ID' and 'Sepal.Length' variables, except `incomparables` parameter:
merge(iris, z, by = 'ID', incomparables = c(1,3,4,5,6,7))
##   ID Sepal.Length.x Sepal.Width Petal.Length Petal.Width Species    y
## 1 11            5.4         3.7          1.5         0.2  setosa test
## 2 13            4.8         3.0          1.4         0.1  setosa test
## 3  3            4.7         3.2          1.3         0.2  setosa test
## 4  9            4.4         2.9          1.4         0.2  setosa test
##   Sepal.Length.y
## 1            5.1
## 2            4.8
## 3            5.1
## 4            5.1
Here, we get 4 rows. 

To get the values that match the ID and Sepal.Length variables:
#merge dataframes by 'ID' and 'Sepal.Length' variables:
merge(iris, z, by = c('ID', 'Sepal.Length'))
##   ID Sepal.Length Sepal.Width Petal.Length Petal.Width Species    y
## 1  1          5.1         3.5          1.4         0.2  setosa test
## 2 13          4.8         3.0          1.4         0.1  setosa test
And, here we get only 2 rows.

No comments:

Post a Comment

duplicated() {base}

duplicated()  function determines which elements are duplicated and returns a logical vector. The parameters of the function are:   ...