Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated Maps #137

Closed
zmarkovich opened this issue May 10, 2022 · 3 comments
Closed

Duplicated Maps #137

zmarkovich opened this issue May 10, 2022 · 3 comments
Labels

Comments

@zmarkovich
Copy link

Hello,

While using redist_smc function, we've encountered a problem where the simulation seem to include a very larger number of identical maps. Indeed, the problem is so bad that in one case only 437 unique maps were generated out of 10,000 draws. This seems to occur regardless of the simulations parameters.

Is this intended behavior? Is there any parameters we should tweak to resolve the issue? I'd be happy to email you files/ code for a reproducible example, the data is just a bit large for github.

Thank you for your help with understanding this behavior.

Best,
Zach

@CoryMcCartan
Copy link
Member

The SMC algorithm isn't designed to generate as many independent maps as possible; it tries to produce a representative sample so that when you take averages w.r.t. that sample, they are correct. That being said, 437 uniques out of 10,000 is on the low side. How many districts & precincts is your map, and are you using any constraints?

@zmarkovich
Copy link
Author

Thanks for the follow up. We're drawing 63 districts out of about 15k precincts. In terms of constraints, we had limited county splits using the "counties argument (we specified 62 counties in the state). We also set seq_alpha=.25. The only other constraint is population tolerance (set to .05 for state legislative redistricting); and compactness was left at the default (1) along with all other defaults. One interesting thing is we didn't have nearly as much duplication in another set of simulations where we just left seq_alpha at its default; not sure if that's relevant info or not.

@CoryMcCartan
Copy link
Member

OK, so 63 is a relatively large number districts. I think seq_alpha=0.6 or 0.7 is probably more appropriate -- the smaller values like 0.25 are going to not do aggressive enough pruning of bad samples & will let the range of the weights grow too extreme.

If you install the current dev version (soon to be 4.0) with remotes::install_github("alarm-redist/redist@dev"), you can call summary() on your plans object and see some useful diagnostic information that will help with this. In particular, there's a column which keeps track of the rough number of unique plans seen at each iteration. As you adjust seq_alpha you should see how these trend -- ideally you want the # at each iteration to be roughly the same, & certainly not a big drop at the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants