Assignment 6 Gale Shapley Python implementation

Assignment 6 - Gale Shapley Python implementation

Submit zip file to Canvas by 11:59 Saturday, September 23.

Goals

Understand the Gale Shapley algorithm deeply Apply your knowledge about the target complexity of the parts of the algorithm

o Write code that roughly meets these bounds Test and time your code to see if it meets the above goals

Assignment: You must create an implementation of the Gale-Shapley algorithm that runs in our test environment. You must also provide time tests of the sample input files provided for this assignment, along with a statement about why or why not your implementation met or did not roughly meet the expected time complexity with the different input sizes present in these input files.

We will execute your program from a shell script as follows:

python3 gs.py

where gs.py is your implementation, has our test problems in it, and is where you will put your matchings.

Canvas Submission: You must submit the following items in a zip file:

gs.py (your python program file that implements the G-S algorithm) any additional files it imports such as helper files we supply or other files you create a text file that shows the results of timing your program on different sized inputs and a brief discussion

of how well you think your implementation meets the theoretical complexity bounds for the algorithm based on input size

See the following sections on entry points and input and output specifications, and testing facilities you can use to check your work.

Partial Rubric: We expect your program to produce correct matchings with a variety of problem sizes, so we will test

correctness. We expect your program to perform at near-expected complexity bounds, so we will test your program on

both small and huge problem sizes. We expect you to use appropriate and efficient data structures and design in your implementation. We will

be checking your source code. We cannot tell you more than this, other than to say that just using lists is not sufficient and won't achieve expected timing bounds. We expect you to use GitHub as a remote repository and push your code to it periodically, not just at the end of your development. We will be checking the commit logs of your private GitHub repository. We reserve the right to change this rubric as necessary.

Download Helper and Testing Files: Go to the Assignments tab on the CS320 webpage, and use the link described for this assignment to download a zip file containing 2 sample input files and 2 sample output files, and 2 python files, generate_problems.py and stable_matching_helpers.py. The file that generates problems is described below in the Sample testing & timing section. The generator functions imports the helpers file.

You may use any of the functions in these two files in your implementation of the Gale-Shapley algorithm, or none of them. However, you will need to create stable matching problems of various sizes for testing, and while you are free to write your own code to do this, you may find it convenient to use the code we have provided. See below for directions on using this code.

Programmatic entry point:

Your implementation must be in a file called gs.py. This file must contain a main block:

if __name__ == "__main__": # do stuff

Input specification:

The first argument to your program (sys.argv[1]) is the name of a JSON text file containing a list of stable matching problem specifications. JSON is a mechanism to serialize data for easy transfer. In Python 3, it is used to serialize Python dictionaries. Therefore the JSON input file must be interpreted as a list of pairs of dictionaries that represent matching problems. Each problem specification consists of two dictionaries containing the preferences of the proposers and the preferences of the acceptors. For example an input file might contain:

[[{"a0": ["b0", "b1"], "a1": ["b1", "b0"]}, {"b0": ["a0", "a1"], "b1": ["a0", "a1"]}],

[{"j0": ["k0", "k1"], "j1": ["k0", "k1"]}, {"k0": ["j1", "j0"], "k1": ["j0", "j1"]}]]

The above file is a list of two separate stable matching problems. The first problem matches "a"s and "b"s. The second matches "j"s and "k"s. For every problem, the `people' in the first dictionary must do the asking during the algorithm, and must be the first element of any matching. Thus, the "a"s and the "j"s are the `people' who will do the asking during the algorithm.

The preferences are given as a mapping from the name of the person to that person's ordering of partners (by name), from most to least preferred.

In the first problem contained in our example, we see that:

"a0" prefers "b0" over "b1" "a1" prefers "b1" over "b0" "b0" prefers "a0" over "a1" "b1" has this same ordering of preferences.

Output specification:

The second argument to your program (sys.argv[2]) is the name of the JSON file to write the output to. The contents of this file must be a list of dictionaries of matchings corresponding to the Gale Shapley matching for each problem.

Since the "a"s are doing the asking, for this example, the dictionary form of a matching that might be computed for the "a" "b" example above is:

{"a0": "b0", "a1": "b1"}

A matching is represented by a mapping from each person in the original problem to their partner. The order of the key-value pairs is not important.

Similarly, for the "j" and "k" example, since the "j"s do the asking, a matching might be:

{"j0": "k1", "j1": "k0"}

The output matching file must then look like this:

[{"a0": "b0", "a1": "b1"}, {"j0": "k1", "j1": "k0"}]

Additional Resources:

JSON:

You can use functions like these to read and write JSON files. Remember that the objects you are reading and writing are dictionaries. The file generate_problems.py also has functions that read and write json files called parse_json_file and write_obj_to_json_file, respectively.

def write_json(obj, filename): with open(filename, mode='w') as f: json.dump(obj, f)

def read_json(filename): with open(filename) as f: return json.load(f)

Sample testing & timing:

We have provided 2 sample input files along with their associated output files so you can see if your program generates the correct output:

Input file name

Associated output file name

Number of problems & size

small_prob_input.json

small_prob_out.json

2 problems, one with 2 `people', one with 3 `people'

big_prob_input.json

big_prob_out.json

2 problems, one with 40 `people', one with 200 `people'

You need to create your own problem files of different numbers of `people' in order to see if your implementation performs according to the expected complexity. You'll need some problems that are very big ? on the order of thousands to really see how your data structures and design affect performance. Try creating several problems of different sizes such as n=50, 200, 500, 800, 1000, 2000, and even bigger to test performance.

Test your Gale-Shapley implementation's performance separately from the JSON serialization/deserialization like so:

if __name__ == "__main__": # read in input file import time start_time = time.process_time() # run Gale-Shapley calculations end_time = time.process_time() # write output file print("Ran in: {:.5f} secs".format(end_time - start_time))

Plot the timing results and analyze the growth behavior of your implementation relative to the bounds we proved in the abstract about the algorithm. We may be able to give you hints at reasonable expectations for growth behavior. If you find that your program does not perform as it should (is clearly worse than the big O bound for Gale-Shapley), then you should re-design your data structures and implementation and try again. Include your timing results and analysis in the text file included in the turn-in zip, and we will evaluate this as part of your grade.

Creating data sets of varying sizes:

To help you generate problem files, we have included a source file generate_problems.py that contains a function that will generate random problems. This python file imports a set of helper functions from stable_matching_helpers.py, which is also included.

def create_random_problems_json_file(filename, number_of_problems, sizes=(2, 3), specs=(None,), pretty_print=False):

To use this function, call it with the file name that you want to create, the number of problems you want in the file, the sizes of those problems (by default either 2 or 3 `people' per problem, a specification of the keys to use for each group in a problem (by default none, but you should include them ? see examples below), and whether the file should be pretty-printed (by default no).

Here is the call we used to create the small problem file (2 problems, one of each size 2 `people' and 3 `people', first problem keys `a' and `b', second problem keys `j' and `k', file pretty-printed):

generate_problems.create_random_problems_json_file ('small_prob_input.json',2, sizes=[2,3], specs = [{'group1': 'a', 'group2': 'b'}, {'group1': 'j', 'group2': 'k'}], pretty_print=True)

Here is the call we used to create the big problem file (2 problems, one of each size 40 and 200, first problem keys `a' and `b', second problem keys `j' and `k', file pretty-printed):

generate_problems.create_random_problems_json_file ('big_prob_input.json', 2, sizes=[40,200],

specs=[{'group1': 'a', 'group2': 'b', 'verbose':True}, {'group1': 'j', 'group2': 'k', 'verbose':True}, pretty_print=True)

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download