[Python][Math] Deciphering Uber rating

As a very worrisome neurotic I always been wondering, can I get my internal Uber statistics based only on rating(s)?

Hard to believe, but yes. But you should get as many ratings as possible, an interrupted 'chain' of ratings. Also, you need to predict a minimal and maximal number of ratings you've got. Taxi drivers don't always rate their passengers. To my experience, only half of them get ratings. (Maybe drivers are just too busy or lazy, but this is OK.)

The whole Python source code. It works in backtracking manner. It can serve as a demonstration for backtracking, generators, recursion...

#!/usr/bin/env python3

# my stat:
RATINGS=[4.69, 4.71, 4.73, 4.75, 4.76, 4.78, 4.79, 4.8]
MIN_RIDES=len(RATINGS)
MAX_RIDES=len(RATINGS)*3

def calc_rating(stat):
    a, b, c, d, e = stat
    return float(sum([a*5, b*4, c*3, d*2, e*1])) / float(sum([a, b, c, d, e]))

def rating_eq(r1, r2):
    return abs(r1 - r2) < 0.005

def find_first_stat (rating):
    for a in range(MAX_RIDES):
        for b in range(MAX_RIDES):
            if a+b>MAX_RIDES:
                continue
            for c in range(MAX_RIDES):
                if a+b+c>MAX_RIDES:
                    continue
                for d in range(MAX_RIDES):
                    if a+b+c+d>MAX_RIDES:
                        continue
                    for e in range(MAX_RIDES):
                        if a+b+c+d+e<MIN_RIDES:
                            continue
                        if a+b+c+d+e>MAX_RIDES:
                            continue
                        if rating_eq(calc_rating((a, b, c, d, e)), rating):
                            yield a, b, c, d, e

# a rare case when tab use is justified:
def print_history (history):
    print ("")
    print ("ride#\t 5\t 4\t 3\t 2\t 1\trating\ttotal")
    print ("-"*8*8)
    for i, h in enumerate(history):
        print ("%d\t%2d\t%2d\t%2d\t%2d\t%2d\t%1.2f\t%d" % (i+1, h[0], h[1], h[2], h[3], h[4], RATINGS[i], sum(h)))

def try_next_rating(history, score, ratings):
    if len(ratings)==0:
        # stop. found correct history:
        print_history (history+[score])
        return

    a, b, c, d, e = score

    new_history=history+[score]

    # increment each score and then try recursively:

    incremented_stat=a+1, b, c, d, e
    if rating_eq(calc_rating (incremented_stat), ratings[0]):
        try_next_rating(new_history, incremented_stat, ratings[1:])

    incremented_stat=a, b+1, c, d, e
    if rating_eq(calc_rating (incremented_stat), ratings[0]):
        try_next_rating(new_history, incremented_stat, ratings[1:])

    incremented_stat=a, b, c+1, d, e
    if rating_eq(calc_rating (incremented_stat), ratings[0]):
        try_next_rating(new_history, incremented_stat, ratings[1:])

    incremented_stat=a, b, c, d+1, e
    if rating_eq(calc_rating (incremented_stat), ratings[0]):
        try_next_rating(new_history, incremented_stat, ratings[1:])

    incremented_stat=a, b, c, d, e+1
    if rating_eq(calc_rating (incremented_stat), ratings[0]):
        try_next_rating(new_history, incremented_stat, ratings[1:])

    # stop. can't do anymore
    # backtrack

# first first stat.
# different function is used at start, because we need to find such a score
# that would be in MIN_RIDES..MAX_RIDES range:
for first_stat in find_first_stat(RATINGS[0]):
    # then try next score, incrementing each score:
    try_next_rating([], first_stat, RATINGS[1:])

And the result for my 'chain'. Five possible histories. Which one is correct? Impossible to say. I should ride more to collect more ratings, so to find, which one is correct.

ride#	 5	 4	 3	 2	 1	rating	total
----------------------------------------------------------------
1	 9	 4	 0	 0	 0	4.69	13
2	10	 4	 0	 0	 0	4.71	14
3	11	 4	 0	 0	 0	4.73	15
4	12	 4	 0	 0	 0	4.75	16
5	13	 4	 0	 0	 0	4.76	17
6	14	 4	 0	 0	 0	4.78	18
7	15	 4	 0	 0	 0	4.79	19
8	16	 4	 0	 0	 0	4.80	20

ride#	 5	 4	 3	 2	 1	rating	total
----------------------------------------------------------------
1	10	 2	 1	 0	 0	4.69	13
2	11	 2	 1	 0	 0	4.71	14
3	12	 2	 1	 0	 0	4.73	15
4	13	 2	 1	 0	 0	4.75	16
5	14	 2	 1	 0	 0	4.76	17
6	15	 2	 1	 0	 0	4.78	18
7	16	 2	 1	 0	 0	4.79	19
8	17	 2	 1	 0	 0	4.80	20

ride#	 5	 4	 3	 2	 1	rating	total
----------------------------------------------------------------
1	11	 0	 2	 0	 0	4.69	13
2	12	 0	 2	 0	 0	4.71	14
3	13	 0	 2	 0	 0	4.73	15
4	14	 0	 2	 0	 0	4.75	16
5	15	 0	 2	 0	 0	4.76	17
6	16	 0	 2	 0	 0	4.78	18
7	17	 0	 2	 0	 0	4.79	19
8	18	 0	 2	 0	 0	4.80	20

ride#	 5	 4	 3	 2	 1	rating	total
----------------------------------------------------------------
1	11	 1	 0	 1	 0	4.69	13
2	12	 1	 0	 1	 0	4.71	14
3	13	 1	 0	 1	 0	4.73	15
4	14	 1	 0	 1	 0	4.75	16
5	15	 1	 0	 1	 0	4.76	17
6	16	 1	 0	 1	 0	4.78	18
7	17	 1	 0	 1	 0	4.79	19
8	18	 1	 0	 1	 0	4.80	20

ride#	 5	 4	 3	 2	 1	rating	total
----------------------------------------------------------------
1	12	 0	 0	 0	 1	4.69	13
2	13	 0	 0	 0	 1	4.71	14
3	14	 0	 0	 0	 1	4.73	15
4	15	 0	 0	 0	 1	4.75	16
5	16	 0	 0	 0	 1	4.76	17
6	17	 0	 0	 0	 1	4.78	18
7	18	 0	 0	 0	 1	4.79	19
8	19	 0	 0	 0	 1	4.80	20

I first wrote this program in Racket. See my previous blog post for more information.

(the post first published at 20220507.)


List of my other blog posts.

Subscribe to my news feed

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.