Rank() Over Partition by in R – Easiest Method to Follow

Topbullets.comCurrently I am working on R, SAS and SQL languages and recently I came across a new problem. I want to rank my data set on particular columns. Ranking on a single column is very easy but when it comes for multiple columns, it becomes little tough. I Googled my problem, searched for entire day but could not find any satisfactory solution.
Finally I made my own way and it really worked. Here is the solution to this problem.
Problem:


Country <- c("India","US", "India","US","India","India","India","US","US","UK","US","India")
State <- c ("IM","UM","IM","UA","IM","IH","IH","UM","UM","UKM","UM","IH")
Sales <- c(1245,3212,345,3412,2423,3787,997,12121,12232,12113,45432,12121)
Revenue <- c(124523,121253,23565,565765,56321,121434,3431212,4532312,122345,232542,12345232,2312212)
SalesReport <- data.frame(Country,State,Sales,Revenue)
View(SalesReport)

Country State Sales Revenue
India HM 1245 124523
UK KM 3212 121253
India KO 345 23565
US HM 3412 565765
India KM 2423 56321
India KO 3787 121434
India LH 997 3431212
US OL 12121 4532312
US NJ 12232 122345
US CA 12113 232542
US CA 45432 12345232
India CA 12121 2312212

I want to rank my data set over Revenue partition by Country and State. So output will be like.

Country State Sales Revenue Rank

India IH 997 3431212 1
India IH 12121 2312212 2
India IH 3787 121434 3
India IM 1245 124523 1
India IM 2423 56321 2
India IM 345 23565 3
UK UKM 12113 232542 1
US UA 3412 565765 1
US UM 45432 12345232 1
US UM 12121 4532312 2
US UM 12232 122345 3
US UM 3212 121253 4

SQL Program:


SELECT * RANK() OVER (PARTITION BY Country, State Order by Revenue DESC) as Rank FROM SalesReport GROUP BY Country, State

SAS Program:


proc rank data=SalesReport out=SalesReportRanking
descending
ties=low;
by Country State;
ranks Rank;
var Revenue;
run;

R Program:
1. After creating dataset, I am creating a KEY which is nothing but concatenate of all columns which are needed for partition by.
2. First I make the dataset ordered by these columns.
3. And now making a KEY “ID”
4. Now ranking the dataset by ID. -x for Descending order.
5. Arranging the dataset for better look.
6. Nullifying the ID (No one should know your trick :P) and view it.

SalesReport <- SalesReport[order(Country,State),]
row.names(SalesReport) <- NULL
SalesReport$ID <- paste(SalesReport[,1],SalesReport[,2])
SalesReportRanking <- transform(SalesReport, Rank = ave (Revenue, ID, FUN = function(x)
rank (-x, ties.method ="min")))
SalesReportRanking <- SalesReportRanking[order(SalesReportRanking$ID,SalesReportRanking$Rank),]
SalesReportRanking$ID <- NULL
View(SalesReportRanking)

The above code can be used for a single column as well. Just replace ID with your column name.
Compare your result with SAS output. Here we go. It matches if not please comment below. I hope it helped you a little bit. Thanks for reading.

Another important queries for this solution:

1. How to use Rank() in SQL Server
2. How to use Rank() in R
3. How to emulate SQL “partition by” in R?
4. How to partition when ranking on a particular column?

Signature

Deepesh Singh
logo

Advertisements

Please leave your valuable comment.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s