Back to graph

Topic analysis

ICLR 2026 – Institutional Affiliations Dataset and Analysis

End-to-end pipeline that turns 5,356 ICLR 2026 accepted papers into a clean, PDF-derived institutional-affiliation dataset and a publication-ready treemap of who is shaping AI research right now. This avoids the OpenReview-profile drift problem (where authors' current job appears on every paper they ever wrote — e.g. listing Wyoming as the affiliation for a paper actually written at UBC). Affiliations come from the paper's title block PDF , not from author profiles. Follow me for more analysis like this, plus AI engineering & research insights: If this dataset or the pipeline is useful to your work, a follow / star is the easiest way to encourage me to keep publishing this kind of analysis. Each rectangle is one institution sized by the number of accepted papers it appears on (counted once per paper , regardless of how many of the paper's authors are affiliated with it). Region cells are sized by the cumulative count of their top-50 institutions. Lighter shade = academia / research institute, darker shade = industry. Square version (for social posts): charts/iclr2026_top50_treemap_unique_grouped_square.png This reads data/iclr2026_public.csv and writes the treemap PNGs/SVGs into charts/ . Add --shape square for a 1:1 version. Add --source openreview to compare against the OpenReview-profile-only version (requires running the scraper first). You only need this if you want to re-derive the dataset (e.g., for a new conference). It takes ~1–2 hours of network time and ~5 GB of disk for the PDF cache. parse_pdf_affiliations.py handles four layout patterns common in ICLR template papers: Plus a footnote-text filter that catches and discards "Equal contribution", "Corresponding author", "Project lead", "These authors contributed equally" — these used to leak into affiliation strings before being filtered out. Result: 96% of papers parse successfully ; the remaining 4% fall back to OpenReview profile data (transparently flagged in the Affiliation_source column). MIT . The data is derived from publicly available OpenReview submissions and ICLR 2026 paper PDFs; please cite this repository if you use it in published work. If you build something on top of this, ping me — I'm always interested in seeing where this kind of pipeline gets used. And if you want more posts like this (research-engineering deep dives, applied AI analysis, papers I'm reading), the best place is: — Dmytro Lopushanskyy

Heat score

1

Sources

1

Platforms

1

Relations

0
First seen
May 15, 2026, 6:50 AM
Last updated
May 15, 2026, 8:01 AM

Why this topic matters

ICLR 2026 – Institutional Affiliations Dataset and Analysis is currently shaped by signals from 1 source platforms. This page organizes AI analysis summaries, 1 timeline events, and 0 relationship edges so search engines and AI systems can understand the topic's factual basis and propagation arc.

News

Keywords

9 tags
2026pipelineturns356acceptedpaperscleanderivedaffiliation

Source evidence

1 evidence items

ICLR 2026 – Institutional Affiliations Dataset and Analysis

News · 1
May 15, 2026, 6:50 AMOpen original source

Timeline

ICLR 2026 – Institutional Affiliations Dataset and Analysis

May 15, 2026, 6:50 AM

Related topics

No related topics have been aggregated yet, but this page still preserves the AI summary, source links, and timeline.