Adding bypass for Google Cookie Consent if present#105
Conversation
📝 WalkthroughWalkthroughThe fetcher module gains consent wall detection and handling capabilities. New utility functions identify Google consent pages via HTML parsing and submit rejection forms, with updated Changes
Sequence DiagramsequenceDiagram
participant Client
participant GoogleConsent as Google Consent<br/>(if present)
participant FlightsServer as Flights Server
Client->>GoogleConsent: GET /flights page
GoogleConsent-->>Client: HTML (consent wall)
Client->>Client: _is_consent_page() detects consent
Client->>Client: _submit_consent() extracts form data
Client->>GoogleConsent: POST /consent/save (reject form)
GoogleConsent-->>Client: Redirect/Response
Client->>FlightsServer: GET /flights page (retry)
FlightsServer-->>Client: HTML (flights data)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Warning |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
fast_flights/fetcher.py (1)
24-29: Fragile reject-all heuristic — add a fallback or assertion.Identifying the reject-all form by
set_eom == "true"and absence ofset_screlies on undocumented Google field names that can change without notice. Since this is the single point of failure for the whole consent flow, consider:
- Additionally preferring a form whose action is
/saveand whose submit button text / aria-label matches something recognizable, OR- Logging the discovered form fields when falling through so regressions are diagnosable, rather than just raising a bare
RuntimeError.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@fast_flights/fetcher.py` around lines 24 - 29, The reject-all form detection in the loop over parser.css("form") is fragile because it only checks inputs.get("set_eom") == "true" and absence of "set_sc"; update the logic in the same loop that builds inputs (and sets reject_form) to prefer forms whose action attribute equals "/save" or whose submit button text/aria-label contains recognizable labels (e.g., "Reject", "Decline"), and if no form matches, log the discovered forms/inputs (the inputs dicts) before raising RuntimeError so regressions are diagnosable; keep the existing reject_form variable and the break behavior but add the fallback selection and structured logging of fields when falling through.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@fast_flights/fetcher.py`:
- Around line 14-15: The _is_consent_page function only checks for the English
phrase "Before you continue" and will miss non-English consent walls; update
_is_consent_page (and any callers) to use language-independent signals instead —
e.g., detect a redirect/response URL containing "consent.google.com" from the
client.get response history or parse the HTML for a form/action pointing to
"consent.google.com/save" (or similar consent domains) rather than relying on
localized body text; ensure the change is applied where _is_consent_page is used
so _submit_consent still runs for non-English locales.
- Line 34: Capture and check the response from client.post(CONSENT_SAVE_URL,
data=reject_form) and raise or handle when the status is not successful
(non-2xx) so failures (captcha/rate-limit) aren’t ignored; after the retry GET
(the call that currently feeds parse()), call _is_consent_page(res.text) again
and raise a clear error (or return an explicit failure) if the consent wall
still persists instead of passing consent HTML into parse(), ensuring you
reference client.post, CONSENT_SAVE_URL, reject_form, the retry client.get call,
_is_consent_page, and parse when making these changes.
---
Nitpick comments:
In `@fast_flights/fetcher.py`:
- Around line 24-29: The reject-all form detection in the loop over
parser.css("form") is fragile because it only checks inputs.get("set_eom") ==
"true" and absence of "set_sc"; update the logic in the same loop that builds
inputs (and sets reject_form) to prefer forms whose action attribute equals
"/save" or whose submit button text/aria-label contains recognizable labels
(e.g., "Reject", "Decline"), and if no form matches, log the discovered
forms/inputs (the inputs dicts) before raising RuntimeError so regressions are
diagnosable; keep the existing reject_form variable and the break behavior but
add the fallback selection and structured logging of fields when falling
through.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| def _is_consent_page(html: str) -> bool: | ||
| return "consent.google.com/save" in html and "Before you continue" in html |
There was a problem hiding this comment.
Consent detection is English-only and will silently miss non-English locales.
The "Before you continue" substring only matches when Google serves the consent wall in English. Users in other locales (e.g., French "Avant de continuer", German "Bevor Sie fortfahren", etc.) will bypass this check, _submit_consent will not run, and they’ll continue hitting the consent wall. Prefer language-independent signals — the response URL (redirect to consent.google.com) or the form action — rather than localized body text.
🔧 Suggested approach
-def _is_consent_page(html: str) -> bool:
- return "consent.google.com/save" in html and "Before you continue" in html
+def _is_consent_page(html: str) -> bool:
+ # Language-independent: the consent wall always posts to consent.google.com/save.
+ return "consent.google.com/save" in htmlOr check res.url / response history from the client.get call, since Google typically 302-redirects to consent.google.com/... before serving the wall.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _is_consent_page(html: str) -> bool: | |
| return "consent.google.com/save" in html and "Before you continue" in html | |
| def _is_consent_page(html: str) -> bool: | |
| # Language-independent: the consent wall always posts to consent.google.com/save. | |
| return "consent.google.com/save" in html |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@fast_flights/fetcher.py` around lines 14 - 15, The _is_consent_page function
only checks for the English phrase "Before you continue" and will miss
non-English consent walls; update _is_consent_page (and any callers) to use
language-independent signals instead — e.g., detect a redirect/response URL
containing "consent.google.com" from the client.get response history or parse
the HTML for a form/action pointing to "consent.google.com/save" (or similar
consent domains) rather than relying on localized body text; ensure the change
is applied where _is_consent_page is used so _submit_consent still runs for
non-English locales.
| if reject_form is None: | ||
| raise RuntimeError("Could not find consent 'Reject all' form in Google consent page") | ||
|
|
||
| client.post(CONSENT_SAVE_URL, data=reject_form) |
There was a problem hiding this comment.
Don’t silently trust the POST; check status and guard against a consent-retry loop.
client.post(...) return value is discarded, so a failed rejection (non-2xx, captcha, rate limit) goes unnoticed and the subsequent client.get(URL, ...) on line 120 may still be a consent page — whose HTML will then be returned to parse() as if it were a flights page, yielding confusing downstream errors.
Consider:
- Capturing the POST response and raising on non-success status.
- After the retry GET on line 120, re-checking
_is_consent_page(res.text)and raising a clear error if the wall persists, rather than returning consent HTML as flight HTML.
🔧 Proposed fix
client.post(CONSENT_SAVE_URL, data=reject_form) res = client.get(URL, params=params)
- if _is_consent_page(res.text):
- _submit_consent(client, res.text)
- res = client.get(URL, params=params)
+ if _is_consent_page(res.text):
+ _submit_consent(client, res.text)
+ res = client.get(URL, params=params)
+ if _is_consent_page(res.text):
+ raise RuntimeError("Google consent wall persisted after submitting 'Reject all'")
return res.text🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@fast_flights/fetcher.py` at line 34, Capture and check the response from
client.post(CONSENT_SAVE_URL, data=reject_form) and raise or handle when the
status is not successful (non-2xx) so failures (captcha/rate-limit) aren’t
ignored; after the retry GET (the call that currently feeds parse()), call
_is_consent_page(res.text) again and raise a clear error (or return an explicit
failure) if the consent wall still persists instead of passing consent HTML into
parse(), ensuring you reference client.post, CONSENT_SAVE_URL, reject_form, the
retry client.get call, _is_consent_page, and parse when making these changes.
Google now blocks requests behind a "Before you continue" consent wall, causing the upstream parser to crash on the ErrorResponse payload it gets back. Adapted from upstream PRs AWeirdDev#108 and AWeirdDev#105. - preset SOCS cookie on the Google domain so Google skips the consent wall - fall back to parsing and submitting the consent "Reject all" form if the wall still appears - raise typed FlightsError on ErrorResponse / missing ds:1 script instead of crashing with JSONDecodeError - drop debug print(data) from parser - bump example.py date past today so it actually runs Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary by CodeRabbit