WeChat articles render their content via JavaScript. Standard web scraping tools can only get the empty shell - title and basic framework, but not the actual article text.
Problem Analysis
Using web_fetch to request a WeChat article URL only returns the title and basic structure. The content is empty because it's dynamically generated by JavaScript in the browser.
Solution: curl with Browser Headers
The article content is actually in the HTML source code - it's just controlled by JavaScript to "display." Regular parsers only see "rendered" content, so they can't get it. curl gets the raw HTML, and the content is in the tag with id="js_content".
Step 1: Download Article HTML
curl -s -L \
-H "User-Agent: Mozilla/5.0 Chrome/120.0.0.0" \
-H "Accept-Language: zh-CN,zh;q=0.9" \
-H "Referer: https://mp.weixin.qq.com/" \
"article_url" -o article.html
Step 2: Extract Content
python3 /path/to/skills/wechat-article-fetch/scripts/extract.py article.html
Key Parameters
-s: Silent mode, no progress output-L: Follow redirectsUser-Agent: Pretend to be Chrome browserReferer: Pretend to come from WeChat domain
Principle
The article content is in the HTML source, just controlled by JavaScript for "display." curl gets the raw HTML, and the content sits in the id="js_content" tag.
Skill封装
封装为 OpenClaw Skill