Skip to content

RSS "pubDate", DST timezone identifiers not handled #237

@ghost

Description

DST timezone identifiers are not currently being handled properly in RFC 822 formatted dates.

Take the CBC RSS Feeds as an example. Mon, 21 Apr 2025 06:00:00 EDT is wrongly parsed as being UTC, which leads to items being marked as 4 hours later they actually are.

RFC 822 does allow the use of defined offsets as timezone identifiers:

 zone        =  "UT"  / "GMT"                ; Universal Time
                                                 ; North American : UT
                 /  "EST" / "EDT"                ;  Eastern:  - 5/ - 4
                 /  "CST" / "CDT"                ;  Central:  - 6/ - 5
                 /  "MST" / "MDT"                ;  Mountain: - 7/ - 6
                 /  "PST" / "PDT"                ;  Pacific:  - 8/ - 7
                 /  1ALPHA                       ; Military: Z = UT;
                                                 ;  A:-1; (J not used)
                                                 ;  M:-12; N:+1; Y:+12
                 / ( ("+" / "-") 4DIGIT )        ; Local differential
                                                 ;  hours+min. (HHMM)

Now, one could argue that using RFC 822 in 2025 is a bit stupid, and I'd agree, but I don't work at CBC unfortunately. The issue has been reported to them and their reply was that their RSS feeds are not a priority.

Would you be open to adding a fix to dateparser.go to handle this edge-case? I can contribute a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions