Papers Skill
Overview
Papers Skill turns a coding agent into a literature-research assistant. It
orchestrates a bundled Python CLI (scripts/papers.py) that hits the free
Semantic Scholar and arXiv APIs, downloads arXiv PDFs, and extracts text with
PyMuPDF. The agent decides which subcommand to invoke and how to combine
results into a literature scan, a deep read of one paper, an impact analysis,
or a reading list.
This skill is the Skill-mode port of the papers-mcp MCP server by the same author. Both projects share the same feature set; this one ships as a Claude Code plugin so it can be installed with a single command and needs no long-running MCP process.
When to Use This Skill
- Use when the user asks to search academic papers by topic, author, or venue.
- Use when the user names a specific paper (by DOI, arXiv ID, or title) and wants metadata, the abstract, the TL;DR, or its reference list.
- Use when the user wants to find work that cites a known paper (impact analysis, follow-up tracking).
- Use when the user wants to download an arXiv PDF and have it summarized.
- Use when the user asks to build a reading list around a topic.
Do Not Use This Skill When
- The user wants paywalled non-arXiv full text. This skill cannot bypass publisher paywalls; it can only fetch arXiv PDFs and metadata everywhere.
- The user wants OCR over scanned PDFs. PyMuPDF extracts embedded text only; scanned image-PDFs return the fallback message and need a separate OCR step.
- The user wants real-time citation alerts or RSS-style watching. This skill is request-driven.
How It Works
Step 1: Verify dependencies
Three Python packages are required. The skill should check once per session, using the same interpreter to import-check and install so the dependency check and install target stay in sync:
python -c "import httpx, arxiv, fitz" 2>&1 || python -m pip install httpx arxiv PyMuPDF
If python is not on PATH, fall back to py (Windows launcher) or the
absolute interpreter path — and remember to invoke pip via the same
interpreter, e.g. py -m pip install httpx arxiv PyMuPDF.
Step 2: Invoke the bundled CLI
The script lives at ${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py
and is bundled with this skill (no separate install needed). Always quote the
path so it survives spaces.
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" <subcommand> [args]
Step 3: Pick the right subcommand
| Subcommand | Purpose | Example |
|---|---|---|
search <query> [--limit N] | Semantic Scholar search, max 20 | search "diffusion models" --limit 5 |
detail <paper_id> | Full metadata, TL;DR, top references | detail 10.48550/arXiv.2310.06825 |
citations <paper_id> [--limit N] | Papers citing this one, max 20 | citations <id> --limit 15 |
arxiv <query> [--max-results N] | arXiv preprint search, max 10 | arxiv "RLHF" --max-results 5 |
download <arxiv_id> [--save-dir D] | Save PDF locally | download 2310.06825 --save-dir ./pdfs |
read <pdf_path> [--max-pages N] | Extract PDF text via PyMuPDF | read ./pdfs/foo.pdf --max-pages 20 |
detail and citations auto-detect the ID type: DOIs starting with 10.
are used as-is, bare numeric IDs of 10+ digits are treated as arXiv IDs, and
long hex strings are treated as Semantic Scholar paperIds.
Examples
Example 1: Literature scan on a topic
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" search "retrieval augmented generation" --limit 10
Present results as a ranked table with # | Title | Year | Citations | ID, then ask the user which papers to dig into.
Example 2: Deep-read one paper
# 1. Confirm match
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" detail 2005.11401
# 2. Download
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" download 2005.11401 --save-dir ./pdfs
# 3. Extract abstract + intro + conclusion
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" read ./pdfs/2005.11401v4.RAG.pdf --max-pages 10
Summarize as: problem · method · key result · limitations.
Example 3: Impact analysis on an anchor paper
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" detail 10.48550/arXiv.2005.11401
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" citations 10.48550/arXiv.2005.11401 --limit 20
Cluster the citing papers by year/theme and highlight the most-cited follow-ups.
Best Practices
- ✅ Always call
detailbeforedownloadto confirm the paper matches user intent. Skipping this leads to wrong PDFs being fetched. - ✅ Include the paper ID alongside every title in your output so the user can re-query precisely.
- ✅ Cite as
[FirstAuthor et al., Year] *Title* (cites: N). - ✅ For PDFs you download, always report the absolute save path.
- ❌ Don't crawl. The script auto-retries 429s with exponential backoff; don't pile on parallel queries.
- ❌ Don't raise
--max-pagesto 100+ without warning the user — it can consume a large amount of context.
Limitations
- The skill cannot fetch full text from paywalled publishers (Elsevier, Springer, Wiley, etc.). It can only read open arXiv PDFs.
- PyMuPDF extracts embedded text only. Scanned image-PDFs return the
fallback message
PDF无法提取文本(可能是扫描件); offer the user an alternative version or note that OCR is required. - Semantic Scholar's anonymous tier rate-limits aggressively. The script
retries 3× with exponential backoff; persistent 429s during heavy use
surface as
搜索失败: rate limit, retries exhausted. - This skill does not replace environment-specific validation, testing, or expert review. Stop and ask for clarification if required inputs are missing.
Security & Safety Notes
- The CLI performs outbound HTTPS only to
api.semanticscholar.organdarxiv.org(and the arXiv-listed mirror for the bundledarxivpackage). No authentication tokens are sent. downloadwrites a PDF to the directory the user specifies (default: the current working directory). Confirm the save path with the user before downloading to an unexpected location.readopens a local PDF file with PyMuPDF — make sure the path the user supplies is one they trust.- No credentials or API keys are needed or stored anywhere.
Common Pitfalls
-
Problem:
需要安装 arxiv: pip install arxivor需要安装 PyMuPDF: pip install PyMuPDF. Solution: The script returns this friendly message instead of crashing when an optional dependency is missing. Offer to run the install command. -
Problem:
搜索失败: rate limit, retries exhaustedfromsearchordetailorcitations. Solution: Semantic Scholar is rate-limiting. Wait ~10 seconds and retry once. For repeated runs, fall back toarxivfor arXiv-indexed work. -
Problem:
downloadfails with找不到 arXiv ID: …. Solution: The user gave a non-arXiv ID (likely a DOI for a non-arXiv paper). Usedetailto inspect; only papers with anexternalIds.ArXivfield can be downloaded. -
Problem: Garbled Chinese output on Windows. Solution: The script already forces UTF-8 stdout. If the host terminal is still misconfigured, set
PYTHONIOENCODING=utf-8in the shell environment.
Additional Resources
- Skill home (this plugin): https://github.com/xwmxcz/papers-skill
- Upstream MCP server: https://github.com/xwmxcz/papers-mcp
- Semantic Scholar API docs: https://api.semanticscholar.org/
- arXiv API docs: https://info.arxiv.org/help/api/
- PyMuPDF docs: https://pymupdf.readthedocs.io/