Skip to content

Adding bypass for Google Cookie Consent if present#105

Open
pete-mcneill wants to merge 1 commit into
AWeirdDev:devfrom
pete-mcneill:cookies_consent
Open

Adding bypass for Google Cookie Consent if present#105
pete-mcneill wants to merge 1 commit into
AWeirdDev:devfrom
pete-mcneill:cookies_consent

Conversation

@pete-mcneill
Copy link
Copy Markdown

@pete-mcneill pete-mcneill commented Apr 22, 2026

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced flight data retrieval to automatically detect and handle additional page scenarios that may appear during requests, ensuring uninterrupted access to flight information without requiring manual intervention.

@dosubot dosubot Bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Apr 22, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

📝 Walkthrough

Walkthrough

The fetcher module gains consent wall detection and handling capabilities. New utility functions identify Google consent pages via HTML parsing and submit rejection forms, with updated fetch_flights_html control flow that detects consent responses, posts the rejection form, then re-requests the target page.

Changes

Cohort / File(s) Summary
Consent Page Handling
fast_flights/fetcher.py
Added _is_consent_page and _submit_consent utilities using Selectolax to detect Google consent walls and extract/submit rejection form data. Updated fetch_flights_html to detect consent responses, submit the "Reject all" form to CONSENT_SAVE_URL, then re-request flights page. Added RuntimeError handling for missing consent forms.

Sequence Diagram

sequenceDiagram
    participant Client
    participant GoogleConsent as Google Consent<br/>(if present)
    participant FlightsServer as Flights Server

    Client->>GoogleConsent: GET /flights page
    GoogleConsent-->>Client: HTML (consent wall)
    Client->>Client: _is_consent_page() detects consent
    Client->>Client: _submit_consent() extracts form data
    Client->>GoogleConsent: POST /consent/save (reject form)
    GoogleConsent-->>Client: Redirect/Response
    Client->>FlightsServer: GET /flights page (retry)
    FlightsServer-->>Client: HTML (flights data)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A clever bypass for walls of consent,
Where forms are rejected ere flight is sent,
With parsing swift and forms now tamed,
The fetcher flows—no more constrained!
✨🛫

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding functionality to bypass Google Cookie Consent pages, which is the core purpose of the added consent detection and form submission logic.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Warning

⚠️ This pull request might be slop. It has been flagged by CodeRabbit slop detection and should be reviewed carefully.

@dosubot dosubot Bot added the enhancement New feature or request label Apr 22, 2026
@pete-mcneill pete-mcneill mentioned this pull request Apr 22, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
fast_flights/fetcher.py (1)

24-29: Fragile reject-all heuristic — add a fallback or assertion.

Identifying the reject-all form by set_eom == "true" and absence of set_sc relies on undocumented Google field names that can change without notice. Since this is the single point of failure for the whole consent flow, consider:

  • Additionally preferring a form whose action is /save and whose submit button text / aria-label matches something recognizable, OR
  • Logging the discovered form fields when falling through so regressions are diagnosable, rather than just raising a bare RuntimeError.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fast_flights/fetcher.py` around lines 24 - 29, The reject-all form detection
in the loop over parser.css("form") is fragile because it only checks
inputs.get("set_eom") == "true" and absence of "set_sc"; update the logic in the
same loop that builds inputs (and sets reject_form) to prefer forms whose action
attribute equals "/save" or whose submit button text/aria-label contains
recognizable labels (e.g., "Reject", "Decline"), and if no form matches, log the
discovered forms/inputs (the inputs dicts) before raising RuntimeError so
regressions are diagnosable; keep the existing reject_form variable and the
break behavior but add the fallback selection and structured logging of fields
when falling through.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@fast_flights/fetcher.py`:
- Around line 14-15: The _is_consent_page function only checks for the English
phrase "Before you continue" and will miss non-English consent walls; update
_is_consent_page (and any callers) to use language-independent signals instead —
e.g., detect a redirect/response URL containing "consent.google.com" from the
client.get response history or parse the HTML for a form/action pointing to
"consent.google.com/save" (or similar consent domains) rather than relying on
localized body text; ensure the change is applied where _is_consent_page is used
so _submit_consent still runs for non-English locales.
- Line 34: Capture and check the response from client.post(CONSENT_SAVE_URL,
data=reject_form) and raise or handle when the status is not successful
(non-2xx) so failures (captcha/rate-limit) aren’t ignored; after the retry GET
(the call that currently feeds parse()), call _is_consent_page(res.text) again
and raise a clear error (or return an explicit failure) if the consent wall
still persists instead of passing consent HTML into parse(), ensuring you
reference client.post, CONSENT_SAVE_URL, reject_form, the retry client.get call,
_is_consent_page, and parse when making these changes.

---

Nitpick comments:
In `@fast_flights/fetcher.py`:
- Around line 24-29: The reject-all form detection in the loop over
parser.css("form") is fragile because it only checks inputs.get("set_eom") ==
"true" and absence of "set_sc"; update the logic in the same loop that builds
inputs (and sets reject_form) to prefer forms whose action attribute equals
"/save" or whose submit button text/aria-label contains recognizable labels
(e.g., "Reject", "Decline"), and if no form matches, log the discovered
forms/inputs (the inputs dicts) before raising RuntimeError so regressions are
diagnosable; keep the existing reject_form variable and the break behavior but
add the fallback selection and structured logging of fields when falling
through.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9ec9bbe1-0e96-4541-9708-64231498c6d6

📥 Commits

Reviewing files that changed from the base of the PR and between 0138641 and fa038f2.

📒 Files selected for processing (1)
  • fast_flights/fetcher.py

Comment thread fast_flights/fetcher.py
Comment on lines +14 to +15
def _is_consent_page(html: str) -> bool:
return "consent.google.com/save" in html and "Before you continue" in html
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Consent detection is English-only and will silently miss non-English locales.

The "Before you continue" substring only matches when Google serves the consent wall in English. Users in other locales (e.g., French "Avant de continuer", German "Bevor Sie fortfahren", etc.) will bypass this check, _submit_consent will not run, and they’ll continue hitting the consent wall. Prefer language-independent signals — the response URL (redirect to consent.google.com) or the form action — rather than localized body text.

🔧 Suggested approach
-def _is_consent_page(html: str) -> bool:
-    return "consent.google.com/save" in html and "Before you continue" in html
+def _is_consent_page(html: str) -> bool:
+    # Language-independent: the consent wall always posts to consent.google.com/save.
+    return "consent.google.com/save" in html

Or check res.url / response history from the client.get call, since Google typically 302-redirects to consent.google.com/... before serving the wall.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _is_consent_page(html: str) -> bool:
return "consent.google.com/save" in html and "Before you continue" in html
def _is_consent_page(html: str) -> bool:
# Language-independent: the consent wall always posts to consent.google.com/save.
return "consent.google.com/save" in html
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fast_flights/fetcher.py` around lines 14 - 15, The _is_consent_page function
only checks for the English phrase "Before you continue" and will miss
non-English consent walls; update _is_consent_page (and any callers) to use
language-independent signals instead — e.g., detect a redirect/response URL
containing "consent.google.com" from the client.get response history or parse
the HTML for a form/action pointing to "consent.google.com/save" (or similar
consent domains) rather than relying on localized body text; ensure the change
is applied where _is_consent_page is used so _submit_consent still runs for
non-English locales.

Comment thread fast_flights/fetcher.py
if reject_form is None:
raise RuntimeError("Could not find consent 'Reject all' form in Google consent page")

client.post(CONSENT_SAVE_URL, data=reject_form)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don’t silently trust the POST; check status and guard against a consent-retry loop.

client.post(...) return value is discarded, so a failed rejection (non-2xx, captcha, rate limit) goes unnoticed and the subsequent client.get(URL, ...) on line 120 may still be a consent page — whose HTML will then be returned to parse() as if it were a flights page, yielding confusing downstream errors.

Consider:

  • Capturing the POST response and raising on non-success status.
  • After the retry GET on line 120, re-checking _is_consent_page(res.text) and raising a clear error if the wall persists, rather than returning consent HTML as flight HTML.
🔧 Proposed fix
     client.post(CONSENT_SAVE_URL, data=reject_form)
         res = client.get(URL, params=params)
-        if _is_consent_page(res.text):
-            _submit_consent(client, res.text)
-            res = client.get(URL, params=params)
+        if _is_consent_page(res.text):
+            _submit_consent(client, res.text)
+            res = client.get(URL, params=params)
+            if _is_consent_page(res.text):
+                raise RuntimeError("Google consent wall persisted after submitting 'Reject all'")
         return res.text
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@fast_flights/fetcher.py` at line 34, Capture and check the response from
client.post(CONSENT_SAVE_URL, data=reject_form) and raise or handle when the
status is not successful (non-2xx) so failures (captcha/rate-limit) aren’t
ignored; after the retry GET (the call that currently feeds parse()), call
_is_consent_page(res.text) again and raise a clear error (or return an explicit
failure) if the consent wall still persists instead of passing consent HTML into
parse(), ensuring you reference client.post, CONSENT_SAVE_URL, reject_form, the
retry client.get call, _is_consent_page, and parse when making these changes.

thoughtpunch added a commit to thoughtpunch/flights that referenced this pull request May 25, 2026
Google now blocks requests behind a "Before you continue" consent wall,
causing the upstream parser to crash on the ErrorResponse payload it
gets back. Adapted from upstream PRs AWeirdDev#108 and AWeirdDev#105.

- preset SOCS cookie on the Google domain so Google skips the consent wall
- fall back to parsing and submitting the consent "Reject all" form if the
  wall still appears
- raise typed FlightsError on ErrorResponse / missing ds:1 script instead
  of crashing with JSONDecodeError
- drop debug print(data) from parser
- bump example.py date past today so it actually runs

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant