Monday, August 7, 2017

summary() {base}


summary() is a generic function that gives summary values of an object. When working with a vector or a data frame the summary values are Min1st Qu.MedianMean3rd Qu. and Max:
summary(1:100)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   25.75   50.50   50.50   75.25  100.00
summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 

The function has the following parameters:
 x: object to display the values from
 maxsum: levels to be shown for factors
 digits: digits to be shown
summary(1:100)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   25.75   50.50   50.50   75.25  100.00
summary(1:100, digits = 1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1      30      50      50      80     100
summary(iris, digits = 1)
##   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width        Species  
##  Min.   :4     Min.   :2    Min.   :1     Min.   :0.1   setosa    :50  
##  1st Qu.:5     1st Qu.:3    1st Qu.:2     1st Qu.:0.3   versicolor:50  
##  Median :6     Median :3    Median :4     Median :1.3   virginica :50  
##  Mean   :6     Mean   :3    Mean   :4     Mean   :1.2                  
##  3rd Qu.:6     3rd Qu.:3    3rd Qu.:5     3rd Qu.:1.8                  
##  Max.   :8     Max.   :4    Max.   :7     Max.   :2.5
summary(iris$Sepal.Width, digits = 2) #digits = 2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     2.0     2.8     3.0     3.1     3.3     4.4
summary(iris$Sepal.Width, digits = 4) #digits = 4
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   2.800   3.000   3.057   3.300   4.400
summary(iris, maxsum = 2, digits = 4) #maxsum = 2
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##     Species   
##  setosa : 50  
##  (Other):100  
##               
##               
##               
## 
summary(iris, maxsum = 1, digits = 4) #maxsum = 1
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##     Species   
##  (Other):150  
##               
##               
##               
##               
## 

summary() function can also be used when working with other functions as lm() (lineal model), where gives other summary values as Residuals,  Coefficients,  Residual standard error,  Multiple R-squared,  Adjusted R-squared,  F-statistic and p-value:
lmod <- lm(Temp~Ozone+Solar.R+Wind, data = airquality)
lmod
## 
## Call:
## lm(formula = Temp ~ Ozone + Solar.R + Wind, data = airquality)
## 
## Coefficients:
## (Intercept)        Ozone      Solar.R         Wind  
##   72.418579     0.171966     0.007276    -0.322945
summary(lmod)
## 
## Call:
## lm(formula = Temp ~ Ozone + Solar.R + Wind, data = airquality)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.942  -4.996   1.283   4.434  13.168 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 72.418579   3.215525  22.522  < 2e-16 ***
## Ozone        0.171966   0.026390   6.516 2.42e-09 ***
## Solar.R      0.007276   0.007678   0.948    0.345    
## Wind        -0.322945   0.233264  -1.384    0.169    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.834 on 107 degrees of freedom
##   (42 observations deleted due to missingness)
## Multiple R-squared:  0.4999, Adjusted R-squared:  0.4858 
## F-statistic: 35.65 on 3 and 107 DF,  p-value: 4.729e-16

No comments:

Post a Comment

duplicated() {base}

duplicated()  function determines which elements are duplicated and returns a logical vector. The parameters of the function are:   ...