Parse feeds in Python
"Confident" means "metadata of the document explicitly indicates that the encoding is UTF-8". ## Background of the patch When a UTF-8 feed has a few invalid characters but the rest is fine, feedparser will only parse it as `iso-8859-2` (or other encodings detected by `chardet`, if installed), even if both the HTTP and XML headers explicitly indicate that its encoding is `utf-8`. To handle it better, we should decode the feed as UTF-8 with `errors='replace'`. - I met the problem at https://github.com/Rongronggg9/RSS-to-Telegram-Bot/issues/391 - Feed URL: http://iptvin.ru/component/jcomments/?task=rss&object_id=1000707&object_group=com_content&tmpl=component - Snapshot of the feed: [iptvin.xml.gz](https://github.com/kurtmckee/feedparser/files/13762749/iptvin.xml.gz) - Snapshot of HTTP headers: ``` Date: Sun, 24 Dec 2023 16:23:48 GMT Server: Apache/2.0.59 (Win32) PHP/5.1.6 X-Powered-By: PHP/5.1.6 Cache-Control: no-store, no-cache, must-revalidate Expires: Sun, 24 Dec 2023 16:38:48 GMT Set-Cookie: REDACTED P3P: REDACTED Access-Control-Allow-Origin: * Transfer-Encoding: chunked Content-Type: application/rss+xml; charset=utf-8 ```
This issue appears to be discussing a feature request or bug report related to the repository. Based on the content, it seems to be still under discussion. The issue was opened by Rongronggg9 and has received 2 comments.