Question 1

什么是“ICLR 2026 – Institutional Affiliations Dataset and Analysis”？

Accepted Answer

ICLR 2026 – Institutional Affiliations Dataset and Analysis 是 Link News 基于事实数据库聚合的新闻话题，当前摘要为：End-to-end pipeline that turns 5,356 ICLR 2026 accepted papers into a clean, PDF-derived institutional-affiliation dataset and a publication-ready treemap of who is shaping AI research right now.

This avoids the OpenReview-profile drift problem (where authors' current job appears on every paper they ever wrote — e.g. listing Wyoming as the affiliation for a paper actually written at UBC). Affiliations come from the paper's title block PDF , not from author profiles.

Follow me for more analysis like this, plus AI engineering & research insights:

If this dataset or the pipeline is useful to your work, a follow / star is the easiest way to encourage me to keep publishing this kind of analysis.

Each rectangle is one institution sized by the number of accepted papers it appears on (counted once per paper , regardless of how many of the paper's authors are affiliated with it). Region cells are sized by the cumulative count of their top-50 institutions. Lighter shade = academia / research institute, darker shade = industry.

Square version (for social posts): charts/iclr2026_top50_treemap_unique_grouped_square.png

This reads data/iclr2026_public.csv and writes the treemap PNGs/SVGs into charts/ .

Add --shape square for a 1:1 version. Add --source openreview to compare against the OpenReview-profile-only version (requires running the scraper first).

You only need this if you want to re-derive the dataset (e.g., for a new conference). It takes ~1–2 hours of network time and ~5 GB of disk for the PDF cache.

parse_pdf_affiliations.py handles four layout patterns common in ICLR template papers:

Plus a footnote-text filter that catches and discards "Equal contribution", "Corresponding author", "Project lead", "These authors contributed equally" — these used to leak into affiliation strings before being filtered out.

Result: 96% of papers parse successfully ; the remaining 4% fall back to OpenReview profile data (transparently flagged in the Affiliation_source column).

MIT . The data is derived from publicly available OpenReview submissions and ICLR 2026 paper PDFs; please cite this repository if you use it in published work.

If you build something on top of this, ping me — I'm always interested in seeing where this kind of pipeline gets used. And if you want more posts like this (research-engineering deep dives, applied AI analysis, papers I'm reading), the best place is:

— Dmytro Lopushanskyy

Question 2

这个话题覆盖了哪些来源？

Accepted Answer

这个话题当前覆盖 1 个来源平台，并持续汇总相关新闻、搜索与社交讨论信号。

Question 3

这个话题有哪些可追溯证据？

Accepted Answer

当前页面展示 1 条来源证据、1 个时间线节点，并保留原始出处链接便于核验。

ICLR 2026 – Institutional Affiliations Dataset and Analysis

Why this topic matters

Keywords

Source evidence

ICLR 2026 – Institutional Affiliations Dataset and Analysis

Timeline

Related topics