Statistical Analysis: an Introduction using R/R/Vectors
One of the most fundamental objects in R is the vector, used to store multiple measurements of the same type (e.g. data variables). There are several different sorts of data that can be stored in a vector. Most common is the numeric vector, in which each element of the vector is simply a number. Other commonly used types of vector are character vectors (where each element is a piece of text) and logical vectors (where each element is either
Note that the first part of your output may look slightly different to that above. Depending on the width of your screen, the number of elements printed on each line of output may differ. This is the reason for the numbers in square brackets, which are produced when vectors are printed to the screen. These bracketed numbers give the position of the first element on that line, which is a useful visual aid. For instance, looking at the printout of state.name, and counting across from the second line, we can tell that the eighth state is Delaware.
TRUE
or FALSE
[1]). In this topic we will use some example vectors provided by the "datasets" package, containing data on States of the USA (see ?state
).
R is an inherently vector-based program; in fact the numbers we have been using in previous calculations are just treated as vectors with a single element. This means that most basic functions in R will behave sensibly when given a vector as a argument, as shown below.
Input:
1 state.area #a NUMERIC vector giving the area of US states, in square miles
2 state.name #a CHARACTER vector (note the quote marks) of state names
3 sq.km <- state.area*2.59 #Arithmetic works on numeric vectors, e.g. convert sq miles to sq km
4 sq.km #... the new vector has the calculation applied to each element in turn
5 sqrt(sq.km) #Many mathematical functions also apply to each element in turn
6 range(state.area) #But some functions return different length vectors (here, just the max & min).
7 length(state.area) #and some, like this useful one, just return a single value.
Result:
> state.area #a NUMERIC vector giving the area of US states, in square miles [1] 51609 589757 113909 53104 158693 104247 5009 2057 58560 58876 6450 83557 56400 [14] 36291 56290 82264 40395 48523 33215 10577 8257 58216 84068 47716 69686 147138 [27] 77227 110540 9304 7836 121666 49576 52586 70665 41222 69919 96981 45333 1214 [40] 31055 77047 42244 267339 84916 9609 40815 68192 24181 56154 97914 > state.name #a CHARACTER vector (note the quote marks) of state names [1] "Alabama" "Alaska" "Arizona" "Arkansas" [5] "California" "Colorado" "Connecticut" "Delaware" [9] "Florida" "Georgia" "Hawaii" "Idaho" [13] "Illinois" "Indiana" "Iowa" "Kansas" [17] "Kentucky" "Louisiana" "Maine" "Maryland" [21] "Massachusetts" "Michigan" "Minnesota" "Mississippi" [25] "Missouri" "Montana" "Nebraska" "Nevada" [29] "New Hampshire" "New Jersey" "New Mexico" "New York" [33] "North Carolina" "North Dakota" "Ohio" "Oklahoma" [37] "Oregon" "Pennsylvania" "The smallest state" "South Carolina" [41] "South Dakota" "Tennessee" "Texas" "Utah" [45] "Vermont" "Virginia" "Washington" "West Virginia" [49] "Wisconsin" "Wyoming" > sq.km <- state.area*2.59 #Standard arithmatic works on numeric vectors, e.g. convert sq miles to sq km > sq.km #... giving another vector with the calculation performed on each element in turn [1] 133667.31 1527470.63 295024.31 137539.36 411014.87 269999.73 12973.31 5327.63 [9] 151670.40 152488.84 16705.50 216412.63 146076.00 93993.69 145791.10 213063.76 [17] 104623.05 125674.57 86026.85 27394.43 21385.63 150779.44 217736.12 123584.44 [25] 180486.74 381087.42 200017.93 286298.60 24097.36 20295.24 315114.94 128401.84 [33] 136197.74 183022.35 106764.98 181090.21 251180.79 117412.47 3144.26 80432.45 [41] 199551.73 109411.96 692408.01 219932.44 24887.31 105710.85 176617.28 62628.79 [49] 145438.86 253597.26 > sqrt(sq.km) #Many mathematical functions also apply to each element in turn [1] 365.60540 1235.90883 543.16140 370.86299 641.10441 519.61498 113.90044 72.99062 [9] 389.44884 390.49819 129.24976 465.20171 382.19890 306.58390 381.82601 461.58830 [17] 323.45487 354.50609 293.30334 165.51263 146.23826 388.30328 466.62203 351.54579 [25] 424.83731 617.32278 447.23364 535.06878 155.23324 142.46136 561.35100 358.33202 [33] 369.04978 427.81111 326.74911 425.54695 501.17940 342.65503 56.07370 283.60615 [41] 446.71213 330.77479 832.11058 468.96955 157.75712 325.13205 420.25859 250.25745 [49] 381.36447 503.58441 > range(state.area) #But some functions return different length vectors (here, just the max & min). [1] 1214 589757 > length(state.area) #and some, like this useful one, just return a single value. [1] 50
You may occasionally need to create your own vectors from scratch (although most vectors are obtained from processing data in already-existing files). The most commonly used function for constructing vectors is
c()
, so named because it concatenates objects together. However, if you wish to create vectors consisting of regular sequences of numbers (e.g. 2,4,6,8,10,12, or 1,1,2,2,1,1,2,2) there are several alternative functions you can use, including seq()
, rep()
, and the :
operator.
Input:
1 c("one", "two", "three", "pi") #Make a character vector
2 c(1,2,3,pi) #Make a numeric vector
3 seq(1,3) #Create a sequence of numbers
4 1:3 #A shortcut for the same thing (but less flexible)
5 i <- 1:3 #You can store a vector
6 i
7 i <- c(i,pi) #To add more elements, you must assign again, e.g. using c()
8 i
9 i <- c(i, "text") #A vector cannot contain different data types, so ...
10 i #... R converts all elements to the same type
11 i+1 #The numbers are now strings of text: arithmetic is impossible
12 rep(1, 10) #The "rep" function repeats its first argument
13 rep(3:1,10) #The first argument can also be a vector
14 huge.vector <- 0:(10^7) #R can easily cope with very big vectors
15 #huge.vector #VERY BAD IDEA TO UNCOMMENT THIS, unless you want to print out 10 million numbers
16 rm(huge.vector) #"rm" removes objects. Deleting huge unused objects is sensible
Result:
> c("one", "two", "three", "pi") #Make a character vector [1] "one" "two" "three" "pi" > c(1,2,3,pi) #Make a numeric vector [1] 1.000000 2.000000 3.000000 3.141593 > seq(1,3) #Create a sequence of numbers [1] 1 2 3 > 1:3 #A shortcut for the same thing (but less flexible) [1] 1 2 3 > i <- 1:3 #You can store a vector > i [1] 1 2 3 > i <- c(i,pi) #To add more elements, you must assign again, e.g. using c() > i [1] 1.000000 2.000000 3.000000 3.141593 > i <- c(i, "text") #A vector cannot contain different data types, so ... > i #... R converts all elements to the same type [1] "1" "2" "3" "3.14159265358979" "text" > i+1 #The numbers are now strings of text: arithmetic is impossible Error in i + 1 : non-numeric argument to binary operator > rep(1, 10) #The "rep" function repeats its first argument [1] 1 1 1 1 1 1 1 1 1 1 > rep(3:1,10) #The first argument can also be a vector [1] 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 > huge.vector <- 0:(10^7) #R can easily cope with very big vectors > #huge.vector #VERY BAD IDEA TO UNCOMMENT THIS, unless you want to print out 10 million numbers > rm(huge.vector) #"rm" removes objects. Deleting huge unused objects is sensible
Notes
- ↑ These are special words in R, and cannot be used as names for objects. The objects
T
andF
are temporary shortcuts forTRUE
andFALSE
, but if you use them, watch out: since T and F are just normal object names you can change their meaning by overwriting them.
This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.