For instance, you want to calculate a remainder of some very huge bignum: x=bignum % divisor. But bignum is very huge, you want to parallelize the process.

This is kind of problem in style of Project Euler. Here I split the (random) bignum by 3 parts and calculate 3 remainders. Can you write the missing function, that will reconstruct the 'whole' remainder?

#!/usr/bin/env python3 import random CHUNK_SIZE=10000 for _ in range(30000): hi=random.randint(0, CHUNK_SIZE-1) mi=random.randint(0, CHUNK_SIZE-1) lo=random.randint(0, CHUNK_SIZE-1) dividend=hi*(CHUNK_SIZE**2) + mi*CHUNK_SIZE + lo divisor=random.randint(1,CHUNK_SIZE-1) # calculate 3 remainders in parallel: rem_hi=hi % divisor rem_mi=mi % divisor rem_lo=lo % divisor remainder_my=reconstruct_remainder(rem_hi, rem_mi, rem_lo, divisor) # self-test: remainder_real=dividend % divisor assert remainder_real==remainder_my

Stop reading here. Try to work on this problem for a while. To continue reading, scroll down for the solution.

And there it is, with comments:

#!/usr/bin/env python3 def reconstruct_remainder(rem_hi, rem_mi, rem_lo, divisor): # all these 4 methods work # simplest version: # two remainders are just shifted left # in other words, 3 remainders 'stretch' #x=(rem_hi*((CHUNK_SIZE**2))) #x=x+(rem_mi*CHUNK_SIZE) #x=x+rem_lo #x=x % divisor #return x # with early modulo operation, so that all values will stay under divisor # this is faster #x=((rem_hi*((CHUNK_SIZE**2) % divisor)) % divisor) #x=x+((rem_mi*(CHUNK_SIZE % divisor) % divisor)) #x=x+rem_lo #x=x % divisor #return x # but if chunk size known beforehand, we can precompute these constants: precomputed_mult_hi=(CHUNK_SIZE**2) % divisor precomputed_mult_mi=CHUNK_SIZE % divisor #x=rem_hi*precomputed_mult_hi #x=x+(rem_mi*precomputed_mult_mi) #x=x+rem_lo #x=x % divisor #return x # probably the most performant method # with early modulo operation # all values always stays under divisor x=(rem_hi*precomputed_mult_hi) % divisor x=x+((rem_mi*precomputed_mult_mi) % divisor) x=x+rem_lo x=x % divisor return x

It can run on any number of cores/threads.

Also, another fact from modulo arithmetics that helped me to solve that problem:

#!/usr/bin/env python3 import random CHUNK_SIZE=10000 for _ in range(10000): hi=random.randint(0, CHUNK_SIZE-1) lo=random.randint(0, CHUNK_SIZE-1) dividend=hi*CHUNK_SIZE + lo divisor=random.randint(1, 50) remainder=dividend % divisor rem_hi=hi % divisor rem_lo=lo % divisor # note that. here we combine remainder from the high part with low part # and we can divide it by divisor x=(rem_hi*CHUNK_SIZE + lo) % divisor assert x==remainder x=(((rem_hi*(CHUNK_SIZE % divisor)) % divisor)+rem_lo) % divisor assert x==remainder

Actually, I have no need in parallelizing such an operation. But I did this for better understanding of modulo arithmetics. Hopefully, it can be of help for someone else.

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.