Statistical Analysis: an Introduction using R/R/Factors
The factor()
function creates a factor and defines the available levels. By default the levels are taken from the ones in the vector***. Actually, you don't often need to use factor()
, because when reading data in from a file, R assumes by default that text should be converted to factors (see Statistical_Analysis:_an_Introduction_using_R/R/R/Data_frames). You may need to use as.factor()
. Internally, R stores the levels as numbers from 1 upwards, but it is not always obvious which number corresponds to which level, and it should not normally be necessary to know.
Ordinal variables, that is factors in which the levels have a natural order, are known to R as ordered factors. They can be created in the normal way a factor is created, but in addition specifying ordered=TRUE
.
Input:
state.region #An example of a factor: note that the levels are printed out state.name #this is *NOT* a factor state.name[1] <- "Any text" #you can replace text in a character vector state.region[1] <- "Any text" #but you can't in a factor state.region[1] <- "South" #this is OK state.abb #this is not a factor, just a character vector character.vector <- c("Female", "Female", "Male", "Male", "Male", "Female", "Female", "Male", "Male", "Male", "Male", "Male", "Female", "Female" , "Male", "Female", "Female", "Male", "Male", "Male", "Male", "Female", "Female", "Female", "Female", "Male", "Male", "Male", "Female" , "Male", "Female", "Male", "Male", "Male", "Male", "Male", "Female", "Male", "Male", "Male", "Male", "Female", "Female", "Female") #a bit tedious to do all that typing #might be easier to use codes, e.g. 1 for female and 2 for male Coded <- factor(c(1, 1, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 2, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1)) Gender <- factor(Coded, labels=c("Female", "Male")) #we can then convert this to named levels