Friday, September 15, 2017

colSums(), rowSums(), colMeans(), rowMeans() {base}


colSums()colMeans()rowSums() and rowMeans() are functions that return the sums or means for columns or rows, from numeric arrays or dataframes.

colSums (x, na.rm = FALSE, dims = 1) 
rowSums (x, na.rm = FALSE, dims = 1)
colMeans(x, na.rm = FALSE, dims = 1)
rowMeans(x, na.rm = FALSE, dims = 1)

The parameters are:
 -x: array or dataframe
 -na.rm: logical parameter to indicate if the NA values are removed

First, we will use the dataset AirPassenger, which has the monthly totals of international airlines passengers, from 1949 to 1960.
head(AirPassengers)
## [1] 112 118 132 129 121 135
#in a dataframe format:
AirPas <- matrix(AirPassengers, ncol = 12, byrow =TRUE,  dimnames = list( as.character(1949:1960),month.abb))
AirPass = as.data.frame(AirPas)
head(AirPass)
##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
dim(AirPass)
## [1] 12 12
colSums(AirPass)
##  Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec 
## 2901 2820 3242 3205 3262 3740 4216 4213 3629 3199 2794 3142
rowSums(AirPass)
## 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 
## 1520 1676 2042 2364 2700 2867 3408 3939 4421 4572 5140 5714
colMeans(AirPass)
##      Jan      Feb      Mar      Apr      May      Jun      Jul      Aug 
## 241.7500 235.0000 270.1667 267.0833 271.8333 311.6667 351.3333 351.0833 
##      Sep      Oct      Nov      Dec 
## 302.4167 266.5833 232.8333 261.8333
rowMeans(AirPass)
##     1949     1950     1951     1952     1953     1954     1955     1956 
## 126.6667 139.6667 170.1667 197.0000 225.0000 238.9167 284.0000 328.2500 
##     1957     1958     1959     1960 
## 368.4167 381.0000 428.3333 476.1667
AirPass$Total_per_year = rowSums(AirPass)
AirPass$Mean_per_year = rowMeans(AirPass)

Total_per_month = colSums(AirPass)
Mean_per_month = colMeans(AirPass)

new = rbind(AirPass, Total_per_month, Mean_per_month)
rownames(new) = c(row.names(AirPass), 'Total_per_month', 'Mean_per_month')
new
##                     Jan  Feb       Mar       Apr       May       Jun
## 1949             112.00  118  132.0000  129.0000  121.0000  135.0000
## 1950             115.00  126  141.0000  135.0000  125.0000  149.0000
## 1951             145.00  150  178.0000  163.0000  172.0000  178.0000
## 1952             171.00  180  193.0000  181.0000  183.0000  218.0000
## 1953             196.00  196  236.0000  235.0000  229.0000  243.0000
## 1954             204.00  188  235.0000  227.0000  234.0000  264.0000
## 1955             242.00  233  267.0000  269.0000  270.0000  315.0000
## 1956             284.00  277  317.0000  313.0000  318.0000  374.0000
## 1957             315.00  301  356.0000  348.0000  355.0000  422.0000
## 1958             340.00  318  362.0000  348.0000  363.0000  435.0000
## 1959             360.00  342  406.0000  396.0000  420.0000  472.0000
## 1960             417.00  391  419.0000  461.0000  472.0000  535.0000
## Total_per_month 2901.00 2820 3242.0000 3205.0000 3262.0000 3740.0000
## Mean_per_month   241.75  235  270.1667  267.0833  271.8333  311.6667
##                       Jul       Aug       Sep       Oct       Nov
## 1949             148.0000  148.0000  136.0000  119.0000  104.0000
## 1950             170.0000  170.0000  158.0000  133.0000  114.0000
## 1951             199.0000  199.0000  184.0000  162.0000  146.0000
## 1952             230.0000  242.0000  209.0000  191.0000  172.0000
## 1953             264.0000  272.0000  237.0000  211.0000  180.0000
## 1954             302.0000  293.0000  259.0000  229.0000  203.0000
## 1955             364.0000  347.0000  312.0000  274.0000  237.0000
## 1956             413.0000  405.0000  355.0000  306.0000  271.0000
## 1957             465.0000  467.0000  404.0000  347.0000  305.0000
## 1958             491.0000  505.0000  404.0000  359.0000  310.0000
## 1959             548.0000  559.0000  463.0000  407.0000  362.0000
## 1960             622.0000  606.0000  508.0000  461.0000  390.0000
## Total_per_month 4216.0000 4213.0000 3629.0000 3199.0000 2794.0000
## Mean_per_month   351.3333  351.0833  302.4167  266.5833  232.8333
##                       Dec Total_per_year Mean_per_year
## 1949             118.0000       1520.000      233.8462
## 1950             140.0000       1676.000      257.8462
## 1951             166.0000       2042.000      314.1538
## 1952             194.0000       2364.000      363.6923
## 1953             201.0000       2700.000      415.3846
## 1954             229.0000       2867.000      441.0769
## 1955             278.0000       3408.000      524.3077
## 1956             306.0000       3939.000      606.0000
## 1957             336.0000       4421.000      680.1538
## 1958             337.0000       4572.000      703.3846
## 1959             405.0000       5140.000      790.7692
## 1960             432.0000       5714.000      879.0769
## Total_per_month 3142.0000      40363.000     6209.6923
## Mean_per_month   261.8333       3363.583      517.4744
par(mfrow = c(1,2))
plot(row.names(new)[1:12], new$Total_per_year[1:12], type ='s', las = 2, ylab= 'Number of passengers', xlab = 'Year', main = 'Total number of pass. per year', col = 'purple3')
plot(1:12,c(new['Mean_per_month',1:12]),type = 's',  xlab = 'Month', ylab = 'Number of passengers', las = 2, main = 'Mean number of pass. per month', col = 'violetred2')

na.rm parameter: 
For using this parameter we will use the dataset airquality that contains missing values. airquality  dataset contains quality measurements in New York, from May to September 1973.
head(airquality)
##   Ozone Solar.R Wind Temp Month Day
## 1    41     190  7.4   67     5   1
## 2    36     118  8.0   72     5   2
## 3    12     149 12.6   74     5   3
## 4    18     313 11.5   62     5   4
## 5    NA      NA 14.3   56     5   5
## 6    28      NA 14.9   66     5   6
summary(airquality) 
##      Ozone           Solar.R           Wind             Temp      
##  Min.   :  1.00   Min.   :  7.0   Min.   : 1.700   Min.   :56.00  
##  1st Qu.: 18.00   1st Qu.:115.8   1st Qu.: 7.400   1st Qu.:72.00  
##  Median : 31.50   Median :205.0   Median : 9.700   Median :79.00  
##  Mean   : 42.13   Mean   :185.9   Mean   : 9.958   Mean   :77.88  
##  3rd Qu.: 63.25   3rd Qu.:258.8   3rd Qu.:11.500   3rd Qu.:85.00  
##  Max.   :168.00   Max.   :334.0   Max.   :20.700   Max.   :97.00  
##  NA's   :37       NA's   :7                                       
##      Month            Day      
##  Min.   :5.000   Min.   : 1.0  
##  1st Qu.:6.000   1st Qu.: 8.0  
##  Median :7.000   Median :16.0  
##  Mean   :6.993   Mean   :15.8  
##  3rd Qu.:8.000   3rd Qu.:23.0  
##  Max.   :9.000   Max.   :31.0  
## 
colSums(airquality) #NAs present
##   Ozone Solar.R    Wind    Temp   Month     Day 
##      NA      NA  1523.5 11916.0  1070.0  2418.0
colSums(airquality, na.rm = TRUE) #NAs not present 
##   Ozone Solar.R    Wind    Temp   Month     Day 
##  4887.0 27146.0  1523.5 11916.0  1070.0  2418.0
rowSums(airquality)
##   [1] 311.4 241.0 255.6 413.5    NA    NA 407.6 203.8 122.1    NA    NA
##  [12] 367.7 394.2 385.9 174.2 444.5 441.0 182.4 455.5 151.7 103.7 447.6
##  [23] 127.7 226.0    NA    NA    NA 148.0 426.9 457.7 435.4    NA    NA
##  [34]    NA    NA    NA    NA 260.7    NA 480.8 476.5    NA    NA 280.0
##  [45]    NA    NA 325.9 436.7 155.2 241.5 262.3    NA    NA    NA    NA
##  [56]    NA    NA    NA    NA    NA    NA 500.1 400.2 368.2    NA 338.6
##  [67] 460.9 460.1 477.3 482.7 373.4    NA 380.3 317.9    NA 171.3 418.9
##  [78] 425.3 461.3 384.1 406.5 131.9    NA    NA 499.6 456.0 224.6 266.0
##  [89] 425.4 454.4 444.4 441.2 218.9 137.8 193.4    NA    NA    NA 485.0
## [100] 434.3 432.0    NA    NA 353.5 415.5 333.7    NA 204.3 220.3 247.4
## [111] 390.9 350.3 401.5 161.3    NA 377.7 523.4 416.0    NA 421.7 476.3
## [122] 461.3 412.3 370.9 383.1 363.8 390.6 250.4 238.5 378.9 348.3 354.9
## [133] 384.7 395.9 392.5 371.3 137.9 231.5 392.9 348.8 153.3 368.3 336.0
## [144] 357.6 148.2 298.3 168.3 147.6 334.9    NA 331.3 271.0 361.5
rowSums(airquality, na.rm = TRUE)
##   [1] 311.4 241.0 255.6 413.5  80.3 119.9 407.6 203.8 122.1 286.6 103.9
##  [12] 367.7 394.2 385.9 174.2 444.5 441.0 182.4 455.5 151.7 103.7 447.6
##  [23] 127.7 226.0 169.6 369.9  97.0 148.0 426.9 457.7 435.4 379.6 378.7
##  [34] 334.1 289.2 324.6 369.3 260.7 380.9 480.8 476.5 379.9 369.2 280.0
##  [45] 445.8 433.5 325.9 436.7 155.2 241.5 262.3 260.3 164.7 200.6 362.3
##  [56] 249.0 245.0 163.3 223.5 157.9 265.0 500.1 400.2 368.2 206.9 338.6
##  [67] 460.9 460.1 477.3 482.7 373.4 247.6 380.3 317.9 417.9 171.3 418.9
##  [78] 425.3 461.3 384.1 406.5 131.9 377.7 418.5 499.6 456.0 224.6 266.0
##  [89] 425.4 454.4 444.4 441.2 218.9 137.8 193.4 182.9 140.4 171.6 485.0
## [100] 434.3 432.0 340.6 253.5 353.5 415.5 333.7 177.5 204.3 220.3 247.4
## [111] 390.9 350.3 401.5 161.3 373.6 377.7 523.4 416.0 281.7 421.7 476.3
## [122] 461.3 412.3 370.9 383.1 363.8 390.6 250.4 238.5 378.9 348.3 354.9
## [133] 384.7 395.9 392.5 371.3 137.9 231.5 392.9 348.8 153.3 368.3 336.0
## [144] 357.6 148.2 298.3 168.3 147.6 334.9 271.2 331.3 271.0 361.5
colMeans(airquality)
##     Ozone   Solar.R      Wind      Temp     Month       Day 
##        NA        NA  9.957516 77.882353  6.993464 15.803922
colMeans(airquality, na.rm = TRUE)
##      Ozone    Solar.R       Wind       Temp      Month        Day 
##  42.129310 185.931507   9.957516  77.882353   6.993464  15.803922
head(rowMeans(airquality))
## [1] 51.90000 40.16667 42.60000 68.91667       NA       NA
head(rowMeans(airquality, na.rm = TRUE))
## [1] 51.90000 40.16667 42.60000 68.91667 20.07500 23.98000

No comments:

Post a Comment

duplicated() {base}

duplicated()  function determines which elements are duplicated and returns a logical vector. The parameters of the function are:   ...