-
-
Notifications
You must be signed in to change notification settings - Fork 106
Description
Recently patreon updated their API to represent post contents as json (content_json_string field) instead of html (content field, now gone). This broke additional url extraction using PluginManager.ExtractSupportedUrls because it expects html data.
In order to fix that PluginManager and plugins are likely need to be updated with additional enum input variable which will tell plugin what kind of input data is provided (HTML or Text) and PatreonPageCrawler will likely need to pre-process json to make a raw text out of it.
Update:
Quick glance at the json format shows that all links are possibly being provided as "type":"link" entries in the json, so there might be a better way of handling it - finding all of those entries in json and creating a new subentry with the type of PatreonCrawledUrlType.ExternalUrl for each of them might work. That will make previous call of PluginManager.ExtractSupportedUrls completely obsolete.
Please note that contributions created with the help of AI agents are not accepted.