Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the pool API and default behavior, and change param precedence #207

Merged
merged 6 commits into from
Jul 2, 2024

Conversation

Gallaecio
Copy link
Contributor

@Gallaecio Gallaecio commented Jun 25, 2024

  • A backward-incompatible change to the way parameter precedence is handled, detailed already in the changelog. I think it makes more sense than the original one I implemented, the bad part is that it makes the code a bit more complex.
  • Added a zyte_api_session_pool request metadata key, to allow overriding the pool ID per request.
  • Made it so that using zyte_api_session_params or zyte_api_session_location request metadata keys results in different pool IDs by default per param set or location. It is <domain>[<number>] (e.g. example.com[0]) for param sets, with the corresponding parameters being logged as INFO for reference, and <domain>@<comma-separated non-empty location components> for locations (e.g. example.com@US,TX,10001).

@Gallaecio Gallaecio requested review from kmike and wRAR June 25, 2024 16:17
Copy link

codecov bot commented Jun 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.62%. Comparing base (2209d84) to head (f6a1db6).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #207      +/-   ##
==========================================
+ Coverage   97.58%   97.62%   +0.04%     
==========================================
  Files          14       14              
  Lines        1491     1517      +26     
  Branches      314      320       +6     
==========================================
+ Hits         1455     1481      +26     
  Misses         15       15              
  Partials       21       21              
Files Coverage Δ
scrapy_zyte_api/_session.py 100.00% <100.00%> (ø)

CHANGES.rst Outdated Show resolved Hide resolved
docs/usage/session.rst Outdated Show resolved Hide resolved
@Gallaecio
Copy link
Contributor Author

Gallaecio commented Jun 26, 2024

I must say I have second thoughts about the precedence order.

The main reasons behind the proposed change are the following assumptions:

  • If you set zyte_api_session_params in request meta, that should take priority over everything.
  • If you set zyte_api_session_location in request meta, that should take priority over everything except zyte_api_session_params, but if params() implements a better way to set the location than using the setLocation action, that params() implementation should be used with zyte_api_session_location as location.
  • Similar for settings, both with lower priority than request metadata.
  • location() should only be overridden to change the default location for a given session config, so any meta or setting that defines a different location should take precedence.

However, I think zyte_api_session_params and ZYTE_API_SESSION_PARAMS should triumph params(). The current implementation relies on developers enforcing that themselves in their implementation of params(), but maybe we should not rely on that. On the other hand, params() should still be used to set a location specified through zyte_api_session_location, ZYTE_API_SESSION_LOCATION or location(). I feel like implementing it this way would complicate further both the implementation and the docs (feels hard to explain the logic), but it may be worth it?

Updated the PR to prevent the params() method from overriding params defined through meta or settings.

@Gallaecio
Copy link
Contributor Author

Gallaecio commented Jul 1, 2024

I wonder if, in line with not calling params unless there is no meta or setting overriding it, if we should do something to achieve the same for location, i.e. make it much less likely for a developer to mess up and ignore meta and settings when overriding location, and as a result prevent user-defined values from being used.

We could, for example, define a location-specific counterpart to params, location_params, that receives a location as a parameter, and let that location come straight from meta or settings where appropriate, rather than from self.location. Users can still ignore that param in their implementation if they wish, but it would be more obvious that they are ignoring a value on purpose if they rely on a hardcoded value or a self.location(request) call when there is a location parameter available. Static tools may even report that they are not using the parameter.

The main issue I see is that, for existing params implementations that provide custom code for locations, there is no trivial way to warn about the backward-incompatible change or maintain backward compatibility.

@Gallaecio
Copy link
Contributor Author

Nevermind. I started working on that, but it complicates things too much. Having a separate location_params can be kind of nice, but having a check_location method may also be necessary, and then that could become a problem if e.g. you used zyte_api_session_params to set custom params on a request but those custom params include setLocation and its outcome should be checked. Because using zyte_api_session_params does not mean you are not setting a location, for that you would also need to set zyte_api_session_params to {} to override a setting-defined or fallback location.

@Gallaecio Gallaecio merged commit a9d5bff into scrapy-plugins:main Jul 2, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants