Greedy Algorithms

In this programming problem and the next you'll code up the greedy algorithms from lecture for minimizing the weighted sum of completion times..

The file jobs.text describes a set of jobs with positive and integral weights and lengths. It has the format

[number_of_jobs]

[job_1_weight] [job_1_length]

[job_2_weight] [job_2_length]

...

For example, the third line of the file is "74 59", indicating that the second job has weight 74 and length 59.

You should NOT assume that edge weights or lengths are distinct.

Your task in this problem is to run the greedy algorithm that schedules jobs in decreasing order of the difference (weight - length). Recall from lecture that this algorithm is not always optimal. IMPORTANT: if two jobs have equal difference (weight - length), you should schedule the job with higher weight first. Beware: if you break ties in a different way, you are likely to get the wrong answer. You should report the sum of weighted completion times of the resulting schedule --- a positive integer --- in the box below.

ADVICE: If you get the wrong answer, try out some small test cases to debug your algorithm (and post your test cases to the discussion forum).



In [41]:

    
# Initialize Modules
import collections 
import heapq

Scheduling by Difference Score



In [38]:

    
def schedule_by_difference(data):
    scores = collections.defaultdict(list)

    for d in data:
        d = d.split(" ")
        w = int(d[0])
        l = int(d[1])
        score = w - l
        heapq.heappush(scores[score], (-w, l))

    # out = sorted(data, key = lambda x: int(x.split(" ")[0]) - int(x.split(" ")[1]), reverse=True)

    all_keys = sorted(scores.keys(), reverse=True)
    #print (all_keys)

    C = 0 # Sum of Weighted Completion Time
    L = 0 # Completion Time 
    for k in all_keys:
        while scores[k]:
            job_k = (heapq.heappop(scores[k]))
            c_k = L = L + job_k[1] 
            C += c_k * (-job_k[0])

    return L, C

For this problem, use the same data set as in the previous problem.

Your task now is to run the greedy algorithm that schedules jobs (optimally) in decreasing order of the ratio (weight/length). In this algorithm, it does not matter how you break ties. You should report the sum of weighted completion times of the resulting schedule --- a positive integer --- in the box below.

Scheduling by Ratio Score



In [39]:

    
def schedule_by_ratio(data):
    scores = collections.defaultdict(list)

    for d in data:
        d = d.split(" ")
        w = int(d[0])
        l = int(d[1])
        score = w / l
        heapq.heappush(scores[score], (-w, l))

    # out = sorted(data, key = lambda x: int(x.split(" ")[0]) - int(x.split(" ")[1]), reverse=True)

    all_keys = sorted(scores.keys(), reverse=True)
    #print (all_keys)

    C = 0 # Sum of Weighted Completion Time
    L = 0 # Completion Time 
    for k in all_keys:
        #print (k, scores[k])

        while scores[k]:
            job_k = (heapq.heappop(scores[k]))
            c_k = L = L + job_k[1] 
            C += c_k * (-job_k[0])

    return L, C
    #print ("Time to complete {} - Weighted Completion {}".format(L, C))

Unit Test



In [58]:

    
FILE = "jobs_test.txt"

fp = open(FILE, 'r')
data = fp.readlines()

n= data[0]
data= data[1:]

L1,C1 = schedule_by_difference(data)
# print ("Difference - Time to complete {} - Weighted Completion {}".format(L, C))

L2,C2 = schedule_by_ratio(data)
# print ("Ratio - Time to complete {} - Weighted Completion {}".format(L, C))

assert C1 == 68615, "Differece: error C is {}, expected {}".format(C1, 68615)
assert C2 == 67247, "Ratio: error C is {}, expected {}".format(C2, 67247)
print ("All Test Passed!")









    



All Test Passed!

Test on jobs.txt



In [59]:

    
FILE = "jobs.txt"
fp = open(FILE, 'r')
data = fp.readlines()

n= data[0]
data= data[1:]

L,C = schedule_by_difference(data)
print ("Difference - Time to complete {} - Weighted Completion {}".format(L, C))

L,C = schedule_by_ratio(data)
print ("Ratio - Time to complete {} - Weighted Completion {}".format(L, C))









    



Difference - Time to complete 510289 - Weighted Completion 69119377652
Ratio - Time to complete 510289 - Weighted Completion 67311454237