Svmuu News: OpenAI has released a new evaluation benchmark, LifeSciBench, designed to measure the capabilities of AI systems in real-world scientific research scenarios. According to reports, LifeSciBench is based on 750 expert-authored tasks covering seven types of research workflows and seven biological domains. The tasks were contributed by 173 researchers with PhDs and experience in the biotechnology or pharmaceutical industries. The benchmark emphasizes the evaluation of complex scientific capabilities—including evidence integration, experimental design, data analysis, scientific reasoning, and scientific communication—rather than simple factual questions. More than 79% of the tasks involve multi-step reasoning, with an average of approximately 4 reasoning steps per question, and include 1,062 real-world research-related data attachments (such as papers, charts, sequence data, and structural files).