Papers Skill

← Back to skills

Papers Skill turns a coding agent into a literature-research assistant. It orchestrates a bundled Python CLI (`scripts/papers.py`) that hits the free Semantic Scholar and arXiv APIs, downloads arXiv PDFs, and extracts text with PyMuPDF. The agent decides which subcommand to invoke and how to combine results into a literature scan, a deep read of one paper, an impact analysis, or a reading list.

Category: General & Miscellaneous
Repo: antigravity-awesome-skills
Path: skills/papers-skill/SKILL.md
Updated: 6/12/2026, 5:19:41 PM

Papers Skill

Overview

Papers Skill turns a coding agent into a literature-research assistant. It orchestrates a bundled Python CLI (scripts/papers.py) that hits the free Semantic Scholar and arXiv APIs, downloads arXiv PDFs, and extracts text with PyMuPDF. The agent decides which subcommand to invoke and how to combine results into a literature scan, a deep read of one paper, an impact analysis, or a reading list.

This skill is the Skill-mode port of the papers-mcp MCP server by the same author. Both projects share the same feature set; this one ships as a Claude Code plugin so it can be installed with a single command and needs no long-running MCP process.

When to Use This Skill

  • Use when the user asks to search academic papers by topic, author, or venue.
  • Use when the user names a specific paper (by DOI, arXiv ID, or title) and wants metadata, the abstract, the TL;DR, or its reference list.
  • Use when the user wants to find work that cites a known paper (impact analysis, follow-up tracking).
  • Use when the user wants to download an arXiv PDF and have it summarized.
  • Use when the user asks to build a reading list around a topic.

Do Not Use This Skill When

  • The user wants paywalled non-arXiv full text. This skill cannot bypass publisher paywalls; it can only fetch arXiv PDFs and metadata everywhere.
  • The user wants OCR over scanned PDFs. PyMuPDF extracts embedded text only; scanned image-PDFs return the fallback message and need a separate OCR step.
  • The user wants real-time citation alerts or RSS-style watching. This skill is request-driven.

How It Works

Step 1: Verify dependencies

Three Python packages are required. The skill should check once per session, using the same interpreter to import-check and install so the dependency check and install target stay in sync:

python -c "import httpx, arxiv, fitz" 2>&1 || python -m pip install httpx arxiv PyMuPDF

If python is not on PATH, fall back to py (Windows launcher) or the absolute interpreter path — and remember to invoke pip via the same interpreter, e.g. py -m pip install httpx arxiv PyMuPDF.

Step 2: Invoke the bundled CLI

The script lives at ${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py and is bundled with this skill (no separate install needed). Always quote the path so it survives spaces.

python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" <subcommand> [args]

Step 3: Pick the right subcommand

SubcommandPurposeExample
search <query> [--limit N]Semantic Scholar search, max 20search "diffusion models" --limit 5
detail <paper_id>Full metadata, TL;DR, top referencesdetail 10.48550/arXiv.2310.06825
citations <paper_id> [--limit N]Papers citing this one, max 20citations <id> --limit 15
arxiv <query> [--max-results N]arXiv preprint search, max 10arxiv "RLHF" --max-results 5
download <arxiv_id> [--save-dir D]Save PDF locallydownload 2310.06825 --save-dir ./pdfs
read <pdf_path> [--max-pages N]Extract PDF text via PyMuPDFread ./pdfs/foo.pdf --max-pages 20

detail and citations auto-detect the ID type: DOIs starting with 10. are used as-is, bare numeric IDs of 10+ digits are treated as arXiv IDs, and long hex strings are treated as Semantic Scholar paperIds.

Examples

Example 1: Literature scan on a topic

python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" search "retrieval augmented generation" --limit 10

Present results as a ranked table with # | Title | Year | Citations | ID, then ask the user which papers to dig into.

Example 2: Deep-read one paper

# 1. Confirm match
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" detail 2005.11401
# 2. Download
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" download 2005.11401 --save-dir ./pdfs
# 3. Extract abstract + intro + conclusion
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" read ./pdfs/2005.11401v4.RAG.pdf --max-pages 10

Summarize as: problem · method · key result · limitations.

Example 3: Impact analysis on an anchor paper

python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" detail 10.48550/arXiv.2005.11401
python "${CLAUDE_PLUGIN_ROOT}/skills/papers-skill/scripts/papers.py" citations 10.48550/arXiv.2005.11401 --limit 20

Cluster the citing papers by year/theme and highlight the most-cited follow-ups.

Best Practices

  • ✅ Always call detail before download to confirm the paper matches user intent. Skipping this leads to wrong PDFs being fetched.
  • ✅ Include the paper ID alongside every title in your output so the user can re-query precisely.
  • ✅ Cite as [FirstAuthor et al., Year] *Title* (cites: N).
  • ✅ For PDFs you download, always report the absolute save path.
  • ❌ Don't crawl. The script auto-retries 429s with exponential backoff; don't pile on parallel queries.
  • ❌ Don't raise --max-pages to 100+ without warning the user — it can consume a large amount of context.

Limitations

  • The skill cannot fetch full text from paywalled publishers (Elsevier, Springer, Wiley, etc.). It can only read open arXiv PDFs.
  • PyMuPDF extracts embedded text only. Scanned image-PDFs return the fallback message PDF无法提取文本(可能是扫描件); offer the user an alternative version or note that OCR is required.
  • Semantic Scholar's anonymous tier rate-limits aggressively. The script retries 3× with exponential backoff; persistent 429s during heavy use surface as 搜索失败: rate limit, retries exhausted.
  • This skill does not replace environment-specific validation, testing, or expert review. Stop and ask for clarification if required inputs are missing.

Security & Safety Notes

  • The CLI performs outbound HTTPS only to api.semanticscholar.org and arxiv.org (and the arXiv-listed mirror for the bundled arxiv package). No authentication tokens are sent.
  • download writes a PDF to the directory the user specifies (default: the current working directory). Confirm the save path with the user before downloading to an unexpected location.
  • read opens a local PDF file with PyMuPDF — make sure the path the user supplies is one they trust.
  • No credentials or API keys are needed or stored anywhere.

Common Pitfalls

  • Problem: 需要安装 arxiv: pip install arxiv or 需要安装 PyMuPDF: pip install PyMuPDF. Solution: The script returns this friendly message instead of crashing when an optional dependency is missing. Offer to run the install command.

  • Problem: 搜索失败: rate limit, retries exhausted from search or detail or citations. Solution: Semantic Scholar is rate-limiting. Wait ~10 seconds and retry once. For repeated runs, fall back to arxiv for arXiv-indexed work.

  • Problem: download fails with 找不到 arXiv ID: …. Solution: The user gave a non-arXiv ID (likely a DOI for a non-arXiv paper). Use detail to inspect; only papers with an externalIds.ArXiv field can be downloaded.

  • Problem: Garbled Chinese output on Windows. Solution: The script already forces UTF-8 stdout. If the host terminal is still misconfigured, set PYTHONIOENCODING=utf-8 in the shell environment.

Additional Resources

Related skills