Beginner's guide to R: Easy ways to do basic data analysis

Part 3 of our hands-on series covers pulling stats from your data frame, and related topics.

Then run the combn() function, which takes two arguments -- your entire set first and then the number you want to have in each group:

combn(mypeople, 2)

R's combine function
Use the combine function to see all possible combinations from a group.

Probably most experienced R users would combine these two steps into one like this:

combn(c("Bob", "Joanne", "Sally", "Tim", "Neal"),2)

But separating the two can be more readable for beginners.

Get slices or subsets of your data

Maybe you don't need correlations for every column in your data frame and you just want to work with a couple of columns, not 15. Perhaps you want to see data that meets a certain condition, such as within 3 standard deviations. R lets you slice your data sets in various ways, depending on the data type.

To select just certain columns from a data frame, you can either refer to the columns by name or by their location (i.e., column 1, 2, 3, etc.).

For example, the mtcars sample data frame has these column names: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear and carb.

Can't remember the names of all the columns in your data frame? If you just want to see the column names and nothing else, instead of functions such as str(mtcars) and head(mtcars) you can type:

names(mtcars)

That's handy if you want to store the names in a variable, perhaps called mtcars.colnames (or anything else you'd like to call it):

mtcars.colnames <- names(mtcars)

But back to the task at hand. To access only the data in the mpg column in mtcars, you can use R's dollar sign notation:

mtcars$mpg

More broadly, then, the format for accessing a column by name would be:

dataframename$columnname

That will give you a 1-dimensional vector of numbers like this:

[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8

[12] 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5

[23] 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4

The numbers in brackets are not part of your data, by the way. They indicate what item number each line is starting with. If you've only got one line of data, you'll just see [1]. If there's more than one line of data and only the first 11 entries can fit on the first line, your second line will start with [12], and so on.

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about AustraliaUniversity of Adelaide

Show Comments
[]