Beginner's guide to R: Get your data into R

In part 2 of our hands-on guide to the hot data-analysis environment, we provide some tips on how to import data in various formats, both local and on the Web.

Other formats

There are R packages that will read files from Excel, SPSS, SAS, Stata and various relational databases. I don't bother with the Excel package; it requires both Java and Perl, and in general I'd rather export a spreadsheet to CSV in hopes of not running into Microsoft special-character problems. For more info on other formats, see UCLA's How to input data into R which discusses the foreign add-on package for importing several other statistical software file types.

If you'd like to try to connect R with a database, there are several dedicated packages such as RPostgreSQL, RMySQL, RMongo, RSQLite and RODBC. And, the popular dplyr package includes some database support.

(You can see the entire list of available R packages at the CRAN website.)

Remote data

read.csv() and read.table() work pretty much the same to access files from the Web as they do for local data.

Do you want Google Spreadsheets data in R? You don't have to download the spreadsheet to your local system as you do with a CSV. Instead, in your Google spreadsheet -- properly formatted with just one row for headers and then one row of data per line -- select File > Publish to the Web. (This will make the data public, although only to someone who has or stumbles upon the correct URL. Beware of this process, especially with sensitive data.)

Select the sheet with your data and click "Start publishing." You should see a box with the option to get a link to the published data. Change the format type from Web page to CSV and copy the link. Now you can read those data into R with a command such as:

mydata <- read.csv("http://bit.ly/10ER84j")

The command structure is the same for any file on the Web. For example, Pew Research Center data about mobile shopping are available as a CSV file for download. You can store the data in a variable called pew_data like this:

pew_data <- read.csv("http://bit.ly/11I3iuU")

It's important to make sure the file you're downloading is in an R-friendly format first: in other words, that it has a maximum of one header row, with each subsequent row having the equivalent of one data record. Even well-formed government data might include lots of blank rows followed by footnotes -- that's not what you want in an R data table if you plan on running statistical analysis functions on the file.

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags R Language

More about AdvancedAppleExcelGoogleMicrosoftSASSPSSTwitterUCLA

Show Comments
[]