Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Common Optimization Methods #361

Merged
merged 19 commits into from
Apr 12, 2024
Merged

Conversation

jenni-niels
Copy link
Member

This PR adds common optimization methods to the gerrychain codebase.
The SingleMetricOptimizer class represents the class of optimization problems over a single plan metric and currently implements short bursts, a few variants, and tilted runs, with more to come.

The Gingleator class is a subclass of SingleMetricOptimizer and can be used to search for plans with increased numbers of Gingles' districts.

…ased number of Gingles' districts; add further documentation SingleMetricOptimizer class methods; delint optimization.py
@pizzimathy
Copy link
Member

I have no comments (for now) other than Gingleator is a great class name

Copy link
Contributor

@gabeschoenbach gabeschoenbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great jumping-off point for us to bring short bursts into harmony with guided acceptance functions!

I like the flexibility of thresholding our searches — I'm going to keep saying "majority" (i.e. threshold = 0.50) for simplicity but this should all work if the threshold is set to something else). But I think we should broaden the gingleator initialization to be able to search for majority-minority or majority-party districts. This flexibility is sketched out here and here in my One-Click-Chains repo, though admittedly not in an "object-oriented" format. While broadening to partisan stuff makes the code a little less straightforward, I think we could gain that back by altogether dropping the minority_perc_col and just expect to be passed either a Tally updater name for the demographic group (along with a Tally updater for the total population) or an ElectionResults updater name that we can use .percents(party) to query for the party percents.

Conceptually, one thing all of the score functions in the gingleator class have in common is that they look built to be used as simple comparators as you traverse the chain, i.e. accept the child if there is an improvement, otherwise accept with some fixed probability p. I think we should build out the functionality to make p dynamically change depending on how much worse the proposal is than its parent. In gingleator terms, this means a helper function like my get_majdistricts_info() function that returns the number of majority-{group} (demographic group or party) districts, and the percentages of a) the smallest district above the threshold and b) the largest district below the threshold. Then, in SingleMetricOptimizer we can build acceptance functions that use those helpers to cleverly accept with a variable p. As I see it, this would be an extension/improvement to your tilted_short_bursts().

This is really exciting — I think that if we build this right we can be really flexible in how we search through the metagraph, and I would love to run experiments to see if layering all of these tricks together gives us more of a leg up (variable length short bursts with a custom acceptance function that rejects worse plans proportional how bad they are?? could be huge)...

Small bug, I think:

        if minority_perc_col is None:
            perc_up = {min_perc_column_name:
                            lambda part: {k: part[minority_pop_col][k] / part[total_pop_col][k]
                                          for k in part.parts.keys()}}
            initial_state.updaters.update(perc_up)

        score = partial(score_function, minority_perc_col=minority_perc_col, threshold=threshold)

        super().__init__(proposal, constraints, initial_state, score, minmax="max",
                         tracking_funct=tracking_funct)

    """
    Score Functions
    """

    @classmethod
    def num_opportunity_dists(cls, part, minority_perc_col, threshold):
        """
        Given a partition, returns the number of opportunity districts.
        :param `part`: Partition to score.
        :param `minority_perc_col`: Which updater is a mapping of district ids to the fraction of
            minority popultion within that district.
        :param `threshold`: Beyond which fraction to consider something a "Gingles" 
            (or opportunity) district.
        :rtype int
        """
        dist_percs = part[minority_perc_col].values()
        return sum(list(map(lambda v: v >= threshold, dist_percs)))

...if minority_perc_col is None then the updater that maps district IDs to the fraction of minority population will be called minority_perc_column_name. But the score functions all seem to call partition[minority_perc_col] which seems like it would return an error in this case.

@gabeschoenbach
Copy link
Contributor

This looks really great! Just to check my understanding — if we called a simulated annealing run with a beta_function as something like:

gingles.hot_cold_cycle_beta_function_factory(0,1000)

(in other words only ever cold), would this be equivalent to a tilted run that always accepts better partitions, and accepts worse partitions with a dynamic probability p that depends on the beta_magnitude?

I made some small changes in docstrings, mostly just fixing some typos. I also want to flag a couple spots I think the documentation is unclear — might just be me, so would love to get other folks' input as well...

SingleMetricOptimizer optimizer.py, lines 12-25
I would change In instance of this class encapsulates the dualgraph and updaters via the initial partition to This class includes the initial partition (which gives access to the underlying dual graph and updaters).... I'm a little confused by Note that these are reset every time an optimization run is invoked and do not persist, but I'm not sure whether/how to reword that.

hot_cold_cycle_beta_function_factory optimizer.py, lines 137-144
I wonder if we can think of a more concise name for this function? And maybe add one more sentence of explanation as to its use, although I suppose this is pretty clear if you read the args on lines 140-141...

The Optimization notebook looks great! I made a small change to increase the size of the traceplots, so it's easier to see the differences in the different chains. I think it could be useful to add a little bit more documentation to the cells of the notebook, so someone could understand how the various functions work without going to the documentation. This is something I could add if you don't have the bandwidth!

@codecov-commenter
Copy link

codecov-commenter commented Mar 1, 2022

Codecov Report

Attention: Patch coverage is 0% with 170 lines in your changes are missing coverage. Please review.

Project coverage is 80.14%. Comparing base (f2b1acd) to head (665d0ae).

❗ Current head 665d0ae differs from pull request most recent head f4725de. Consider uploading reports for the commit f4725de to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main     #361       +/-   ##
===========================================
- Coverage   91.91%   80.14%   -11.77%     
===========================================
  Files          38       40        +2     
  Lines        1942     1894       -48     
===========================================
- Hits         1785     1518      -267     
- Misses        157      376      +219     
Files Coverage Δ
gerrychain/optimization/__init__.py 0.00% <0.00%> (ø)
gerrychain/optimization/gingleator.py 0.00% <0.00%> (ø)
gerrychain/optimization/optimization.py 0.00% <0.00%> (ø)

... and 35 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2b1acd...f4725de. Read the comment docs.

@jenni-niels
Copy link
Member Author

Thanks for the review! (as well as typo catching - spelling is not my strong suit)
I'll work on clarifying the documentation in the places you mentioned.

This looks really great! Just to check my understanding — if we called a simulated annealing run with a beta_function as something like:

gingles.hot_cold_cycle_beta_function_factory(0,1000)

(in other words only ever cold), would this be equivalent to a tilted run that always accepts better partitions, and accepts worse partitions with a dynamic probability p that depends on the beta_magnitude?

Yes that would be equivalent to a titled run with a dynamic probability of excepting worst scoring plans. Although, it might be simpler to simply call a simulated annealing run with beta function:

beta_function = lambda _: 1

which has slightly less computational overhead that overloading the gingles.hot_cold_cycle_beta_function_factory method.

The Optimization notebook looks great! I made a small change to increase the size of the traceplots, so it's easier to see the differences in the different chains. I think it could be useful to add a little bit more documentation to the cells of the notebook, so someone could understand how the various functions work without going to the documentation. This is something I could add if you don't have the bandwidth!

Yes I can add some more context/docs to the notebook! I'd like to expand on the pros/cons of the different optimization methods, although that might take way longer runs to show in a plot so I'm not sure if an example notebook is the place for that code. I also think it might be useful to show the usage of the SingleMetricOptimization class beyond the gingleator use case. Perhaps seeking maps that are close to aggregate proportionality or some other target?
Thoughts?

@gabeschoenbach
Copy link
Contributor

Definitely agree it would be good to show SingleMetricOptimization for other things — aggregate proportionality would make sense. I also was thinking it would be cool to compare/contrast all these different methods, but I think that would be best in a different file, maybe not necessarily an intro notebook. If we do do that comparison, one thing I'd love to try is just optimizing for cut edges, since for large graphs its a pretty granular metric and it would be easy to see change over time.

jenni-niels and others added 3 commits March 1, 2022 14:36
…xpose `best_part`, `best_score`, and `score` as readonly properties. Add stubs for new cycling beta functions.
@gabeschoenbach
Copy link
Contributor

Looks good! I just updated some stuff in the Optimization notebook so the annealing calls work with the new jumpcycle function.

@pjrule pjrule added the summer-project Summer projects for 2023 and beyond label Apr 25, 2023
@peterrrock2 peterrrock2 changed the base branch from main to dev/0.3.2 April 12, 2024 19:09
@peterrrock2 peterrrock2 dismissed gabeschoenbach’s stale review April 12, 2024 19:59

This was fixed, but github will not show the comment in the code for me to resolve the conversation, so I have to do this the long way

@peterrrock2 peterrrock2 merged commit 2fc3d1e into mggg:dev/0.3.2 Apr 12, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
summer-project Summer projects for 2023 and beyond work-in-progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants