Saturday, October 14, 2017

var() {base}


var() function that computes the variation of the values in x.

var(x, y = NULL, na.rm = FALSE)

The arguments are:
  • x: numeric vector
  • y:NULL (default) or a vector, matrix or data frame with compatible dimensions to x.
  • na.rm: logical value indicating whether NA values should be removed before the computation proceeds.

Formula to calculate the Standard Deviation:
σ=(xix¯)2n1

The variance of a data set measures the mathematical dispersion of the data relative to the mean. However, this value is difficult to apply in a real-world sense because the values used to calculate it were squared.
The standard deviation, as the square root of the variance gives a value that is in the same units as the original values, which makes it much easier to work with and to interpret. (https://rfunctionaday.blogspot.com.es/2017/10/sd-base.html)

x:
x = c(1,6,10,23,4,5,56)
var1 = sum((x - mean(x))^2)/(length(x)-1)
var2 = var(x)
var1 ; var2 #same result
## [1] 378
## [1] 378
y:
var(1:10); var(1:10,1:10) #same result
## [1] 9.166667
## [1] 9.166667
y = c(10,45,10,3,24,54,5)
var1 = var(x); var3 = var(y)
var4 = var(x,y)
var1 ; var3 ; var4
## [1] 378
## [1] 415.619
## [1] -195
na.rm:
z = c(1,6,10,23,NA, NA, 4,5,56, 56, NA)
var3 = sum((z - mean(z))^2)/(length(z)-1)
var4 = var(z)
var5 = var(z, na.rm = TRUE) #remove NA values to compute var
var3 ; var4 ; var5
## [1] NA
## [1] NA
## [1] 534.125
summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 
meanset = mean(iris$Sepal.Length[iris$Species=='setosa'])
meanversi = mean(iris$Sepal.Length[iris$Species=='versicolor'])
meanvir = mean(iris$Sepal.Length[iris$Species=='virginica'])

sdset = sd(iris$Sepal.Length[iris$Species=='setosa'])
sdversi = sd(iris$Sepal.Length[iris$Species=='versicolor'])
sdvir = sd(iris$Sepal.Length[iris$Species=='virginica'])

varset = var(iris$Sepal.Length[iris$Species=='setosa'])
varversi = var(iris$Sepal.Length[iris$Species=='versicolor'])
varvir = var(iris$Sepal.Length[iris$Species=='virginica'])
plot(iris$Species, iris$Sepal.Length, col = 'thistle1', main = 'SD/Var')
#SD segments
segments(1, meanset+sdset,1, meanset-sdset, col = 'deeppink', lwd = 5)
segments(2, meanversi+ sdversi, 2, meanversi-sdversi, col = 'deeppink', lwd = 5)
segments(3, meanvir + sdvir, 3, meanvir-sdvir, col = 'deeppink',lwd = 5)
#var segments
segments(1, meanset+varset,1, meanset-varset, col = 'darkviolet', lwd = 5)
segments(2, meanversi+ varversi, 2, meanversi-varversi, col = 'darkviolet', lwd = 5)
segments(3, meanvir + varvir, 3, meanvir-varvir, col = 'darkviolet',lwd = 5)

No comments:

Post a Comment

duplicated() {base}

duplicated()  function determines which elements are duplicated and returns a logical vector. The parameters of the function are:   ...