Paper2Poster

We address How to create a poster from a paper and How to evaluate poster.

Long-Context Long-Horizon Task: Scientific papers span multiple pages and thousands of words. Summarizing key insights while preserving coherence demands hierarchical understanding and selective abstraction. The complexity further necessitates long-horizon reasoning and multiple iterative interactions, making the task especially challenging.
Interleaved Multimodal Inputs: Papers integrate numerous figures, tables, and charts, each semantically linked to the surrounding text. Successful poster generation demands the ability to extract, interpret, and align these multimodal elements in a contextually appropriate manner.
Layout-aware Multimodal Outputs: Unlike tasks focused solely on text (e.g., blog) or vision, poster generation requires producing interleaved text–image outputs within a constrained spatial layout. This necessitates joint reasoning over language, visual content, and layout to prevent overflow, imbalance, and logical misalignment.

Academic poster generation is a crucial yet challenging task in scientific communication, requiring the compression of long-context interleaved documents into a single, visually coherent page. To address this challenge, we introduce the first benchmark and metric suite for poster generation, which pairs recent conference papers with author-designed posters and evaluates outputs on (i) Visual Quality—semantic alignment with human posters, (ii) Textual Coherence—language fluency, (iii) Holistic Assessment—six fine-grained aesthetic and informational criteria scored by a VLM-as-judge, and notably (iv) PaperQuiz—the poster’s ability to convey core paper content as measured by VLMs answering generated quizzes. Building on this benchmark, we propose PosterAgent, a top-down, visualin-the-loop multi-agent pipeline: the (a) Parser distills the paper into a structured asset library; the (b) Planner aligns text–visual pairs into a binary-tree layout that preserves reading order and spatial balance; and the (c) Painter–Commenter loop refines each panel by executing rendering code and using VLM feedback to eliminate overflow and ensure alignment. In our comprehensive evaluation, we find that GPT-4o outputs—though visually appealing at first glance—often exhibit noisy text and poor PaperQuiz scores, and we find that reader engagement is the primary aesthetic bottleneck, as human-designed posters rely largely on visual semantics to convey meaning. Our fully open-source Paper2Poster pipeline outperforms GPT-4o–based systems across nearly all metrics while consuming 87% fewer tokens. These findings chart clear directions for the next generation of fully automated poster-generation models.

BibTeX


@misc{pang2025paper2postermultimodalposterautomation,
      title={Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers}, 
      author={Wei Pang and Kevin Qinghong Lin and Xiangru Jian and Xi He and Philip Torr},
      year={2025},
      eprint={2505.21497},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.21497}, 
}

🎓Paper2Poster
Towards Multimodal Poster Automation from Scientific Papers

TL;DR

Can AI assistants create a well-designed Poster given a Paper?

How GPT-4o or Open-source Multi-agents behave?

Poster Generated by PosterAgent

What are the Challenges?

How to create poster 👉 PosterAgent

How to evaluate poster 👉 PaperQuiz

Abstract

Data Statistic

Main Results on Existing Solutions

More Examples

BibTeX

🎓Paper2PosterTowards Multimodal Poster Automation from Scientific Papers

TL;DR

Can AI assistants create a well-designed Poster given a Paper?

How GPT-4o or Open-source Multi-agents behave?

Poster Generated by PosterAgent

What are the Challenges?

How to create poster 👉 PosterAgent

How to evaluate poster 👉 PaperQuiz

Abstract

Data Statistic

Main Results on Existing Solutions

More Examples

BibTeX

🎓Paper2Poster
Towards Multimodal Poster Automation from Scientific Papers