Bonus special case: Grouping by date range
If you've got a series of dates and associated values, there's an extremely easy way to group them by date range such as week, month, quarter or year: R's cut() function.
Here are some sample data in a vector:
vDates <- as.Date(c("2013-06-01", "2013-07-08", "2013-09-01", "2013-09-15"))
Which creates:
[1] "2013-06-01" "2013-07-08" "2013-09-01" "2013-09-15"
The as.Date() function is important here; otherwise R will view each item as a string object and not a date object.
If you want a second vector that sorts those by month, you can use the cut() function using the basic syntax:
vDates.bymonth <- cut(vDates, breaks = "month")
That produces:
[1] 2013-06-01 2013-07-01 2013-09-01 2013-09-01
Levels: 2013-06-01 2013-07-01 2013-08-01 2013-09-01
It might be easier to see what's happening if we combine these into a data frame:
dfDates <- data.frame(vDates, vDates.bymonth)
Which creates:
vDates | vDates.bymonth | |
---|---|---|
1 | 2013-06-01 | 2013-06-01 |
2 | 2013-07-08 | 2013-07-01 |
3 | 2013-09-01 | 2013-09-01 |
4 | 2013-09-15 | 2013-09-01 |
The new column gives the starting date for each month, making it easy to then slice by month.
Sorting your results
For a simple sort by one column in base R, you can get the order you want with the order() function, such as:
companyOrder <- order(companiesData$margin)
This tells you how your rows would be reordered, producing a list of line numbers such as:
6 1 9 2 5 3 4 7 8
Chances are, you're not interested in the new order by line number but instead actually want to see the data reordered. You can use that order to reorder rows in your data frame with this code:
companiesOrdered <- companiesData[companyOrder,]
where companyOrder is the order you created earlier. Or, you can do this in a single (but perhaps less human-readable) line of code:
companiesOrdered <- companiesData[order(companiesData$margin),]
If you forget that comma after the new order for your rows you'll get an error, because R needs to know what columns to return. Once again, a comma followed by nothing defaults to "all columns" but you can also specify just certain columns like:
companiesOrdered <- companiesData[order(companiesData$margin),c("fy", "company")]
To sort in descending order, you'd want companyOrder to have a minus sign before the ordering column:
companyOrder <- order(-companiesData$margin)
And then:
companiesOrdered <- companiesData[companyOrder,]
I find dplyr's arrange()
to be much more readable. It uses the format arrange(mydata, col1, col2)
to arrange a data frame first by col1 and then col2, or arrange(mydata, desc(col1), col2)
if you want the first column to be in descending order. (Add desc()
for any column that should be sorted in descending order.)
With dplyr, sorting companiesData by margin in descending order is as easy as
companiesOrdered <- arrange(companiesData, desc(margin))
fy | company | revenue | profit | margin | |
---|---|---|---|---|---|
8 | 2011 | Microsoft | 69943 | 23150 | 33.1 |
7 | 2010 | Microsoft | 62484 | 18760 | 30.0 |
4 | 2010 | 29321 | 8505 | 29.0 | |
3 | 2012 | Apple | 156508 | 41733 | 26.7 |
5 | 2011 | 37905 | 9737 | 25.7 | |
2 | 2011 | Apple | 108249 | 25922 | 23.9 |
9 | 2012 | Microsoft | 73723 | 16978 | 23.0 |
1 | 2010 | Apple | 65225 | 14013 | 21.5 |
6 | 2012 | 50175 | 10737 | 21.4 |