# Everything You Never Wanted To Know About Ratings



## elelegido (Sep 24, 2014)

There are lots of posts on here from drivers wondering why their ratings are falling, or rising, or wondering who rated them what, and the effects of 1* and other low ratings. The 500 rating rolling average removes a lot of clarity.

Uber's driver ratings average only takes into account the last 500 ratings. For drivers who have fewer than 500, it's pretty straightforward. Drivers with averages above 4, who get a 5* rating will increase their average, and any 4* rating or below will lower it.

However, for drivers with 500 or more ratings, it's not so straightforward. For these drivers, a series of new, back-to-back 5* ratings may do nothing to increase their average, and a 1* rating may also have no effect at all on their average. It's all because of the 500 rating rolling average system.

Example - a driver with a 4.73 rating and 500 ratings in total. If we knew what each one of his 500 ratings was, we could write his oldest rating, be it 1, 2, 3, 4 or five stars, on a slip of paper. We could then do the same for his next oldest rating, and place this slip of paper on top of the other one. We could repeat this 498 times and at the end of the process, we would have a stack of slips of paper, 500 slips high.

If we were to add up the numbers on all of these stacks of paper, the total would be 2,365. This is because, to work out his average of 4.73, Uber adds up all of the last 500 ratings, and divides by 500. 2365/500 = 4.73.

When the driver gets a new rating from a pax, this is equivalent to writing this new rating on a new slip of paper and placing it on top of the stack. And because the average is of 500 ratings only, the oldest rating is removed from the bottom of the stack.

To see the effect that the new rating had on his average, the numbers on all the slips of paper could be added up again, and the total divided by 500.

Adding them all up again is not necessary though; we already know that the total in the stack before the new rating was added (and before the oldest rating was removed) was 2365. Suppose the old rating which was removed from the bottom of the stack was a 5. If the new rating put on top of the stack was also a 5, then the total in the stack is still going to be 2365 - therefore the new 5* rating will have no effect at all on the driver's average. Similarly, if the bottom 20, or 30 ratings in the stack are 5*, then a driver who is having a good week and getting 20, or 30 5* ratings won't improve his average at all.

On the other hand, if the rating at the bottom of the stack is a 1*, and the driver gets a new 1* rating from a rider, this also will have no effect on his rating, as all he is doing is replacing a 1* at the bottom of the stack with a 1* at the top.

And if that 1* at the bottom of the stack is replaced by, say a new 3* rating, then that driver's average rating will actually go up, even though 3* is a crappy rating and well below his 4.73 average.

So, the point here is that movements up and down in drivers' ratings are extremely difficult to find the reason for. With the rolling average, drivers' averages depend not only on what recent passengers are rating him, but also on what happened 500 ratings ago.

It's important to understand this if you are seeing weekly summaries that say for example, "congratulations, you got 33 5* ratings out of 35", and yet you don't see your average rising very much. The explanation for this could be that your oldest ratings were relatively high, then you went through a bad patch with some low ratings, and then improved. It will take time for the lower ratings to work their way to the bottom of the stack and then out of it.

All the more reason not to try to figure out which specific rider gave which rating. Given all of the above, it is pretty pointless.


----------



## Fuzzyelvis (Dec 7, 2014)

elelegido said:


> There are lots of posts on here from drivers wondering why their ratings are falling, or rising, or wondering who rated them what, and the effects of 1* and other low ratings. The 500 rating rolling average for ratings removes a lot of clarity.
> 
> Uber's driver ratings average only takes into account the last 500 ratings. For drivers who have fewer than 500, it's pretty straightforward. Drivers with averages above 4, who get a 5* rating will increase their average, and any 4* rating or below will lower it.
> 
> ...


Explained well. As I pointed out to an asshole pax last night who said he would 1 star me that would hurt him more than me as he told me it was his 3rd ride. And yes he got 1 star from me as I'm sure he did the same to me.

He was pissed off and very verbally abusive due to the 2.7 surge. I almost kicked him out of my car but figured giving him 1 star was enough.

I think what made him really mad was when he *****ed and I told him without the surge I wouldn't have picked him up in the first place. He said I HAD to pick him up regardless and I told him not any more than he HAD to ride in my car.

I don't usually get into it with pax there's no point. But every 100 rides or so I get one who just takes the cake.


----------



## elelegido (Sep 24, 2014)

Fuzzyelvis said:


> Explained well. As I pointed out to an asshole pax last night who said he would 1 star me that would hurt him more than me as he told me it was his 3rd ride. And yes he got 1 star from me as I'm sure he did the same to me.
> 
> He was pissed off and very verbally abusive due to the 2.7 surge. I almost kicked him out of my car but figured giving him 1 star was enough.
> 
> ...


Right, every driver needs a sizeable asshole cushion in their average for such jokers. If you have that then you can pretty much do what you want when you get the occasional UberLoser.


----------



## Desert Driver (Nov 9, 2014)

elelegido said:


> There are lots of posts on here from drivers wondering why their ratings are falling, or rising, or wondering who rated them what, and the effects of 1* and other low ratings. The 500 rating rolling average removes a lot of clarity.
> 
> Uber's driver ratings average only takes into account the last 500 ratings. For drivers who have fewer than 500, it's pretty straightforward. Drivers with averages above 4, who get a 5* rating will increase their average, and any 4* rating or below will lower it.
> 
> ...


Explained well. And if I may, I'd like to coattail you and explain why the rating system is meaningless from a statistical point of view.

Uber wants us to believe that because our driver ratings are the result of averaging the individual star ratings our paxs give us that it has created a fair and valid driver rating system. The truth of the matter, however, is that nothing could be further from the truth, statistically speaking. The paxs rate drivers on an interval scale. The intervals are 1, 2, 3, 4, and 5. There are no partial score, like 3.5 or 4.8. However, Uber makes driver keep/kill decisions based on an ordinal scale. The problem is, you cannot use interval data to create an ordinal scale. Doing so results in a statistically invalid rating system that produces no meaningful output. And for those who understand statistics, it's basic statistical knowledge that mixing ordinal and interval scales produces no useable results.

In the current rating system, the validity of the score can be described as follows:

_Imagine receiving a message from Uber on your weekly summary that said, "Uber Partner, your driving rating score last week was lollipop. Two weeks ago your driving rating score was tarmac. Congratulations! You are a valued Partner. Keep up the good work and Uber on!_​
See the problem here? The data point lollipop has nothing to do with and possesses no relationship to the data point tarmac. Ergo, those two driver rating scores have precisely zero meaning. And this is exactly what happens when interval data (pax ratings of drivers) are used to create an ordinal scale (Uber's keep/kill threshold of 4.6.)


----------



## elelegido (Sep 24, 2014)

Desert Driver said:


> Explained well. And if I may, I'd like to coattail you and explain why the rating system is meaningless from a statistical point of view.
> 
> Uber wants us to believe that because our driver ratings are the result of averaging the individual star ratings our paxs give us that it has created a fair and valid driver rating system. The truth of the matter, however, is that nothing could be further from the truth, statistically speaking. The paxs rate drivers on an interval scale. The intervals are 1, 2, 3, 4, and 5. There are no partial score, like 3.5 or 4.8. However, Uber makes driver keep/kill decisions based on an ordinal scale. The problem is, you cannot use interval data to create an ordinal scale. Doing so results in a statistically invalid rating system that produces no meaningful output. And for those who understand statistics, it's basic statistical knowledge that mixing ordinal and interval scales produces no useable results.
> 
> ...


The star rating scale is a very poor, Fisher Price-level, attempt at psychometrics. The "results" it produces are scientifically both invalid and unreliable, for a number of reasons.

But there is a need to evaluate driver perfomance, and I can't think of a fair and accurate way to do this, that would not be expensive or too time consuming for customers.


----------



## Desert Driver (Nov 9, 2014)

elelegido said:


> The star rating scale is a very poor, Fisher Price-level, attempt at psychometrics. The "results" it produces are scientifically both invalid and unreliable, for a number of reasons.
> 
> But there is a need to evaluate driver perfomance, and I can't think of a fair and accurate way to do this, that would not be expensive or too time consuming for customers.


Oh, it's very simple. Have the paxs use the same ordinal scale to rate drivers that Uber uses for its keep/kill decisions. Put very simply, if 4.6 is the threshold Uber uses to deactivate drivers, then paxs must have the ability to rate a driver at 4.6. So, the paxs would be given a scale that starts at 1.0 and ends at 5.0, and is incremented by tenths. Viola! Rating system is now valid.


----------

