Skip to content

fix: use .get() for optional fields in User.__init__ and Client.request#418

Open
PawiX25 wants to merge 1 commit into
d60:mainfrom
PawiX25:fix/keyerror-optional-fields
Open

fix: use .get() for optional fields in User.__init__ and Client.request#418
PawiX25 wants to merge 1 commit into
d60:mainfrom
PawiX25:fix/keyerror-optional-fields

Conversation

@PawiX25
Copy link
Copy Markdown

@PawiX25 PawiX25 commented Apr 16, 2026

Fixes #417

Twitter/X API does not always return all expected fields in user legacy data or error response objects, causing KeyError crashes.

Changes

twikit/user.py

  • Replace ~30 direct dict accesses (legacy['key']) in User.__init__ with .get('key', default) using appropriate type defaults:
    • str fields → ''
    • int fields → 0
    • bool fields → False
    • list fields → []
  • Handle nested entities.description.urls with chained .get() calls
  • Fields already using .get() (profile_banner_url, url, protected) left unchanged

twikit/client/client.py

  • Change response_data['errors'][0]['code'] to .get('code') on line 158 to handle error responses without a code field

Testing

  • Verified fix resolves KeyError crashes observed in production (KeyError: 'urls', KeyError: 'withheld_in_countries', KeyError: 'code')
  • No behavioral changes for responses that include all fields — .get() returns the same value when the key is present

Summary by Sourcery

Make user and error parsing resilient to missing fields in Twitter API responses.

Bug Fixes:

  • Prevent KeyError crashes in User initialization when optional legacy user fields or nested entity URLs are absent.
  • Avoid KeyError when handling error responses that omit the error code field.

Enhancements:

  • Default missing user attributes to sensible empty values (empty strings, zero counts, empty lists, or False) when not provided by the API.

Summary by CodeRabbit

  • Bug Fixes
    • Strengthened error handling for API responses where error code information may be missing, preventing potential failures.
    • Improved robustness of user profile data parsing by safely handling missing or incomplete fields such as profile attributes, verification status, and follower metrics, allowing the application to function gracefully.

Twitter/X API does not always return all expected fields in user legacy
data or error response objects, causing KeyError crashes.

Changes:
- twikit/user.py: Replace ~30 direct dict accesses in User.__init__
  with .get() and appropriate type defaults (str='', int=0, bool=False,
  list=[])
- twikit/client/client.py: Change response_data['errors'][0]['code']
  to .get('code') to handle error responses without a code field

Fixes d60#417
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Apr 16, 2026

Reviewer's Guide

Makes User model initialization and client error handling resilient to missing fields in Twitter/X API responses by replacing direct dict indexing with safe .get() calls and appropriate defaults.

Class diagram for updated User model initialization

classDiagram
    class User {
        +Client _client
        +str id
        +str created_at
        +str name
        +str screen_name
        +str profile_image_url
        +str profile_banner_url
        +str url
        +str location
        +str description
        +list description_urls
        +list urls
        +list pinned_tweet_ids
        +bool is_blue_verified
        +bool verified
        +bool possibly_sensitive
        +bool can_dm
        +bool can_media_tag
        +bool want_retweets
        +bool default_profile
        +bool default_profile_image
        +bool has_custom_timelines
        +int followers_count
        +int fast_followers_count
        +int normal_followers_count
        +int following_count
        +int favourites_count
        +int listed_count
        +int media_count
        +int statuses_count
        +bool is_translator
        +str translator_type
        +list withheld_in_countries
        +bool protected
        +User(Client client, dict data)
    }

    class Client {
    }

    Client <.. User : uses
Loading

File-Level Changes

Change Details Files
Harden User.init against missing fields in legacy user payloads.
  • Use data.get('rest_id', '') instead of direct indexing for user id.
  • Use data.get('legacy', {}) and replace all legacy[...] field accesses with legacy.get(..., default) using type-appropriate defaults for strings, ints, bools, and lists.
  • Guard nested entities.description and entities.url lookups with chained .get() calls and default empty containers to avoid KeyError on missing nested objects.
  • Preserve existing .get() usage for fields that were already optional (profile_banner_url, url, protected).
twikit/user.py
Relax client error parsing to tolerate error entries without a code field.
  • Change response_data['errors'][0]['code'] to response_data['errors'][0].get('code') while keeping existing handling for the message field.
  • Ensure request() no longer raises KeyError when error responses omit the code attribute.
twikit/client/client.py

Assessment against linked issues

Issue Objective Addressed Explanation
#417 Update User.init to handle missing or partial Twitter user legacy data without raising KeyError by using .get() (with sensible defaults) for all optional fields, including nested entities like entities.description.urls.
#417 Update Client.request error handling so that missing 'code' fields in response_data['errors'][0] do not raise KeyError, while still correctly processing error responses when 'code' is present.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

📝 Walkthrough

Walkthrough

The pull request adds defensive programming measures to prevent KeyError exceptions when the Twitter API returns incomplete response payloads. Direct dictionary indexing is replaced with .get() calls using sensible defaults in two locations: error handling in Client.request and field extraction in User.__init__.

Changes

Cohort / File(s) Summary
Error Handling Resilience
twikit/client/client.py
Changed error code extraction from direct indexing to .get('code') with implicit None default, allowing graceful handling when error objects lack a code field.
User Initialization Resilience
twikit/user.py
Converted ~30 direct dictionary accesses (data[...], legacy[...]) to safe .get() calls with defaults (empty strings, empty lists, False, 0, etc.), preventing KeyError when Twitter API omits expected fields like profile metadata and follower counts.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A rabbit hops through API calls with glee,
No more crashes when fields disagree!
With .get() and defaults, so safe and so kind,
Missing data will never make twikit unwind. 🌿✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly summarizes the main change: replacing direct dictionary access with .get() for optional fields in two specific modules (User.init and Client.request).
Linked Issues check ✅ Passed The pull request fully addresses all coding requirements from issue #417: (1) replacing ~30 direct dict accesses in User.init with .get() calls using appropriate defaults, (2) handling nested entities.description.urls safely, and (3) changing Client.request to use .get('code') for error handling.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the stated objectives: modifications to User.init and Client.request to use .get() for optional fields, with no unrelated alterations or refactoring.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • Using data.get('rest_id', '') and defaulting many fields to ''/0/False may silently hide malformed responses; consider making these attributes Optional[...] and defaulting to None (or raising) for fields that are expected to be present in normal operation.
  • In Client.request, you guard for 'errors' in response_data but still assume response_data['errors'][0] exists; consider handling an empty errors array or non-list value to avoid a potential IndexError/TypeError.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Using `data.get('rest_id', '')` and defaulting many fields to `''`/`0`/`False` may silently hide malformed responses; consider making these attributes `Optional[...]` and defaulting to `None` (or raising) for fields that are expected to be present in normal operation.
- In `Client.request`, you guard for `'errors' in response_data` but still assume `response_data['errors'][0]` exists; consider handling an empty `errors` array or non-list value to avoid a potential `IndexError`/`TypeError`.

## Individual Comments

### Comment 1
<location path="twikit/user.py" line_range="102-104" />
<code_context>
-        self.withheld_in_countries: list[str] = legacy['withheld_in_countries']
+        self.location: str = legacy.get('location', '')
+        self.description: str = legacy.get('description', '')
+        self.description_urls: list = legacy.get('entities', {}).get('description', {}).get('urls', [])
+        self.urls: list = legacy.get('entities', {}).get('url', {}).get('urls')
+        self.pinned_tweet_ids: list[str] = legacy.get('pinned_tweet_ids_str', [])
+        self.is_blue_verified: bool = data.get('is_blue_verified', False)
</code_context>
<issue_to_address>
**suggestion:** Align `urls` default with `description_urls` to consistently return a list.

`description_urls` always returns a list (defaulting to `[]`), but `urls` may be `None` when `entities`/`url` is missing. This forces consumers to handle both `None` and `list` for similar fields. To keep the API consistent and predictable, consider defaulting `urls` to an empty list as well (e.g. `...get('urls', [])`).

```suggestion
        self.description_urls: list = legacy.get('entities', {}).get('description', {}).get('urls', [])
        self.urls: list = legacy.get('entities', {}).get('url', {}).get('urls', [])
        self.pinned_tweet_ids: list[str] = legacy.get('pinned_tweet_ids_str', [])
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread twikit/user.py
Comment on lines +102 to +104
self.description_urls: list = legacy.get('entities', {}).get('description', {}).get('urls', [])
self.urls: list = legacy.get('entities', {}).get('url', {}).get('urls')
self.pinned_tweet_ids: list[str] = legacy.get('pinned_tweet_ids_str', [])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Align urls default with description_urls to consistently return a list.

description_urls always returns a list (defaulting to []), but urls may be None when entities/url is missing. This forces consumers to handle both None and list for similar fields. To keep the API consistent and predictable, consider defaulting urls to an empty list as well (e.g. ...get('urls', [])).

Suggested change
self.description_urls: list = legacy.get('entities', {}).get('description', {}).get('urls', [])
self.urls: list = legacy.get('entities', {}).get('url', {}).get('urls')
self.pinned_tweet_ids: list[str] = legacy.get('pinned_tweet_ids_str', [])
self.description_urls: list = legacy.get('entities', {}).get('description', {}).get('urls', [])
self.urls: list = legacy.get('entities', {}).get('url', {}).get('urls', [])
self.pinned_tweet_ids: list[str] = legacy.get('pinned_tweet_ids_str', [])

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@twikit/user.py`:
- Around line 102-103: The code uses chained dict.get calls on legacy which can
be None (e.g., legacy.get('entities', {}) returns None if entities exists but is
None), causing AttributeError; change accesses in the User initializer to
defensively coalesce with or {} (e.g., use (legacy.get('entities') or {}) then
.get('description') or {} and .get('url') or {}) and ensure both
self.description_urls and self.urls default to empty lists ([]) not None to keep
types stable; also consider hardening the earlier legacy extraction in
build_user_data by using (data.get('legacy') or {}) so legacy is never None.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 17b1673c-6f40-49ff-a80c-1b847f1a4d0e

📥 Commits

Reviewing files that changed from the base of the PR and between c3b7220 and 32f58db.

📒 Files selected for processing (2)
  • twikit/client/client.py
  • twikit/user.py

Comment thread twikit/user.py
Comment on lines +102 to +103
self.description_urls: list = legacy.get('entities', {}).get('description', {}).get('urls', [])
self.urls: list = legacy.get('entities', {}).get('url', {}).get('urls')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Chained .get(..., {}) still breaks when intermediate values are explicitly None.

dict.get(key, default) only returns default when the key is missing; it returns None if the key is present with a None value. Looking at build_user_data in twikit/utils.py (used by follow_user/block_user/mute_user/etc.), legacy['entities'] is set via raw_data.get('entities'), so it can be None. In that case:

legacy.get('entities', {})          # -> None
    .get('description', {})         # -> AttributeError: 'NoneType' object has no attribute 'get'

This re-introduces the same class of crash the PR is trying to eliminate. Same concern applies to entities.description and entities.url being None.

Also, there is an inconsistency on Line 103: self.urls defaults to None while the docstring/type annotation says list, and self.description_urls on Line 102 defaults to []. Prefer [] for both to keep the attribute type stable for downstream consumers.

🛡️ Proposed fix
-        self.description_urls: list = legacy.get('entities', {}).get('description', {}).get('urls', [])
-        self.urls: list = legacy.get('entities', {}).get('url', {}).get('urls')
+        entities = legacy.get('entities') or {}
+        description_entities = entities.get('description') or {}
+        url_entities = entities.get('url') or {}
+        self.description_urls: list = description_entities.get('urls', [])
+        self.urls: list = url_entities.get('urls', [])

The same or {} pattern also covers the data.get('legacy', {}) on Line 91 being None if you want to harden that further:

-        legacy = data.get('legacy', {})
+        legacy = data.get('legacy') or {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
self.description_urls: list = legacy.get('entities', {}).get('description', {}).get('urls', [])
self.urls: list = legacy.get('entities', {}).get('url', {}).get('urls')
entities = legacy.get('entities') or {}
description_entities = entities.get('description') or {}
url_entities = entities.get('url') or {}
self.description_urls: list = description_entities.get('urls', [])
self.urls: list = url_entities.get('urls', [])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@twikit/user.py` around lines 102 - 103, The code uses chained dict.get calls
on legacy which can be None (e.g., legacy.get('entities', {}) returns None if
entities exists but is None), causing AttributeError; change accesses in the
User initializer to defensively coalesce with or {} (e.g., use
(legacy.get('entities') or {}) then .get('description') or {} and .get('url') or
{}) and ensure both self.description_urls and self.urls default to empty lists
([]) not None to keep types stable; also consider hardening the earlier legacy
extraction in build_user_data by using (data.get('legacy') or {}) so legacy is
never None.

@farzmain
Copy link
Copy Markdown

thanks

ctrl-alt-raccoon pushed a commit to ctrl-alt-raccoon/twikit that referenced this pull request Apr 18, 2026
- PATCHES.md documents the cherry-picks that diverge this branch from
  upstream d60/twikit:main. Anyone landing here can see at a glance
  which commits are ours and why each exists.
- .github/workflows/drift-check.yml runs weekly, checks whether any of
  the upstream PRs listed in PATCHES.md have been merged, and opens an
  issue here when one has so we know to retire the cherry-pick.

Cherry-picks currently carried (see PATCHES.md for detail):
  - d60#419 (7 commits) — SearchTimeline queryId refresh +
    defensive parsing + 429 recursion guard + ondemand.s extractor.
  - d60#418 (1 commit)  — .get() for optional fields in
    User.__init__ and Client.request.
@PriyanshAroraa
Copy link
Copy Markdown

Landed here from #421 (now closed as duplicate). One small thing worth folding into this PR — the .get('code') fix still assumes errors[0] exists. In practice I've seen X return {"errors": []} during partial outages, which will IndexError before .get() ever runs. Something like:

errors = response_data.get('errors') if isinstance(response_data, dict) else None
if errors and isinstance(errors, list):
    first_error = errors[0] if isinstance(errors[0], dict) else {}
    error_code = first_error.get('code')
    error_message = first_error.get('message')

Also flags malformed-error cases (where errors[0] is a string or non-dict) which I hit once during a weird gateway issue. Totally your call whether to include it here or leave for a followup.

@PawiX25
Copy link
Copy Markdown
Author

PawiX25 commented Apr 18, 2026

Stop copy-pasting AI suggestions you don't understand. You opened a duplicate PR, closed it, and now you're trying to pad your contribution by pushing bot-generated edge cases onto my fix. This PR addresses real crashes — if you have a real bug to fix, open your own PR and own it.

@C0ldSmi1e
Copy link
Copy Markdown

twikit/guest/user.py has the same pattern and will break with identical KeyErrors on the guest client path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KeyError in User.__init__ and Client.request when Twitter API omits expected fields

4 participants