Parse feeds in Python
Hi! First off, huge thanks for maintaining feedparser. It's legendary! We're all lucky to have it. I hit a new (to me) `AssertionError` today when parsing the RSS at https://snrk.de/feed/ . Here's the relevant RSS snippet: ```xml <content:encoded><![CDATA[ ... <p><strong>If you don’t like that, don’t use snrk.de!</strong><![dsgvo_service_control]></p> ... ]]></content:encoded> ``` ...and here's the assert: ``` >>> feedparser.parse(rss) Traceback (most recent call last): File ".../site-packages/feedparser/api.py", line 263, in parse saxparser.parse(source) File ".../python3.11/xml/sax/expatreader.py", line 111, in parse xmlreader.IncrementalParser.parse(self, source) File ".../python3.11/xml/sax/xmlreader.py", line 125, in parse self.feed(buffer) File ".../python3.11/xml/sax/expatreader.py", line 217, in feed self._parser.Parse(data, isFinal) File "/private/tmp/pythonA3.11-20240402-4978-3ygh5v/Python-3.11.9/Modules/pyexpat.c", line 477, in EndElement File ".../python3.11/xml/sax/expatreader.py", line 395, in end_element_ns self._cont_handler.endElementNS(pair, None) File ".../site-packages/feedparser/parsers/strict.py", line 124, in endElementNS self.unknown_endtag(localname) File ".../site-packages/feedparser/mixin.py", line 321, in unknown_endtag method() File ".../site-packages/feedparser/namespaces/_base.py", line 488, in _end_content value = self.pop_content('content') ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".../site-packages/feedparser/mixin.py", line 629, in pop_content value = self.pop(tag) ^^^^^^^^^^^^^ File ".../site-packages/feedparser/mixin.py", line 548, in pop output = _sanitize_html(output, self.encoding, self.contentparams.get('type', 'text/html')) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".../site-packages/feedparser/sanitizer.py", line 883, in _sanitize_html p.feed(html_source) File ".../site-packages/feedparser/html.py", line 156, in feed super(_BaseHTMLProcessor, self).feed(data) File ".../site-packages/sgmllib.py", line 98, in feed self.goahead(0) File ".../site-packages/sgmllib.py", line 168, in goahead k = self.parse_declaration(i) ^^^^^^^^^^^^^^^^^^^^^^^^^ File ".../site-packages/feedparser/html.py", line 351, in parse_declaration return sgmllib.SGMLParser.parse_declaration(self, i) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".../python3.11/_markupbase.py", line 91, in parse_declaration return self.parse_marked_section(i) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".../python3.11/_markupbase.py", line 154, in parse_marked_section raise AssertionError( AssertionError: unknown status keyword 'dsgvo_service_control' in marked section ``` Is this expected? Should I catch `AssertionError` everywhere I use feedparser? Any other thoughts? feedparser 6.0.11, Python 3.11.9. Maybe related to #378...but not exactly the same. Thanks in advance!
This issue appears to be discussing a feature request or bug report related to the repository. Based on the content, it seems to be still under discussion. The issue was opened by snarfed and has received 2 comments.