Python script that scrapes public and private Mailman archive pages and republishes them to local files, and generates an RSS feed of recent emails.

Problem with multilingual scraping#5

Open
Opened 12/3/20131 commentsby cjgb
cjgb

I was trying to scrape https://stat.ethz.ch/pipermail/r-help-es/ It seems that scrapeList gets the year-month variables from the rownames in the table there, which happen to be in Spanish. However, the link is in English. So, it fails to retrieve https://stat.ethz.ch/pipermail/r-help-es/2013-Diciembre/date.html (which does not exist). The link that does exist is https://stat.ethz.ch/pipermail/r-help-es/2013-December/date.html however. Wouldn't it be possible to get the relative path from the `<A>` in the table to solve these issues? Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com

AI Analysis

This issue appears to be discussing a feature request or bug report related to the repository. Based on the content, it seems to be still under discussion. The issue was opened by cjgb and has received 1 comments.

Add a comment
Comment form would go here