November 25, 2024

AI

RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts

Nov 25, 2024 userComment

Authors: Hjalmar Wijk, Tao Lin, Joel Becker, Sami Jawhar, Neev Parikh, Thomas Broadley, Lawrence Chan,

AI

VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement

Nov 25, 2024 userComment

Authors: Daeun Lee, Jaehong Yoon, Jaemin Cho, Mohit Bansal Abstract: Recent text-to-video (T2V) diffusion models

AI

ReXrank: A Public Leaderboard for AI-Powered Radiology Report Generation

Nov 25, 2024

AI

Health AI Developer Foundations

Nov 25, 2024

AI

Measuring Bullshit in the Language Games played by ChatGPT

Nov 25, 2024