WeChat Article Fetching - Solving Dynamic Rendering

WeChat articles render their content via JavaScript. Standard web scraping tools can only get the empty shell - title and basic framework, but not the actual article text.

Problem Analysis

Using web_fetch to request a WeChat article URL only returns the title and basic structure. The content is empty because it's dynamically generated by JavaScript in the browser.

Solution: curl with Browser Headers

The article content is actually in the HTML source code - it's just controlled by JavaScript to "display." Regular parsers only see "rendered" content, so they can't get it. curl gets the raw HTML, and the content is in the tag with id="js_content".

Step 1: Download Article HTML

curl -s -L \
  -H "User-Agent: Mozilla/5.0 Chrome/120.0.0.0" \
  -H "Accept-Language: zh-CN,zh;q=0.9" \
  -H "Referer: https://mp.weixin.qq.com/" \
  "article_url" -o article.html

Step 2: Extract Content

python3 /path/to/skills/wechat-article-fetch/scripts/extract.py article.html

Key Parameters

-s: Silent mode, no progress output
-L: Follow redirects
User-Agent: Pretend to be Chrome browser
Referer: Pretend to come from WeChat domain

Principle

The article content is in the HTML source, just controlled by JavaScript for "display." curl gets the raw HTML, and the content sits in the id="js_content" tag.

Skill封装

封装为 OpenClaw Skill