AI News for 05-09-2025

Arxiv Papers

On Path to Multimodal Generalist: General-Level and General-Bench

The authors, including Hao Fei and 31 others, present a paper titled "On Path to Multimodal Generalist: General-Level and General-Bench". The paper introduces General-Level, an evaluation framework that defines 5-scale levels of MLLM performance and generality. It offers a methodology to compare MLLMs and gauge progress towards more robust multimodal generalists. The framework is centered around the concept of Synergy, which measures whether models maintain consistent capabilities across comprehension and generation, and across multiple modalities. Read more

Flow-GRPO: Integrating Online Reinforcement Learning into Flow Matching Models for Text-to-Image Generation

The researchers propose Flow-GRPO, a method that integrates online reinforcement learning (RL) into flow matching models for text-to-image (T2I) generation. Flow-GRPO uses Group Relative Policy Optimization (GRPO) to improve flow matching models. The method addresses two critical challenges: flow models rely on a deterministic generative process, making it difficult to sample stochastically during inference, and online RL requires efficient sampling. Read more

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

The authors present a comprehensive survey of Large Multimodal Reasoning Models (LMRMs), which integrate multiple modalities like text, images, audio, and video to support complex reasoning capabilities. The survey discusses the evolution of multimodal reasoning from modular, perception-driven pipelines to unified, language-centric frameworks. Read more

Sentient Agent as a Judge (SAGE): Evaluating Social Cognition of Large Language Models

The researchers propose SAGE, a novel framework that simulates human-like emotions and inner thoughts to assess how well a large language model (LLM) understands and responds to a user's emotional needs. SAGE uses a sentient agent that instantiates a persona, dialogue background, explicit conversation goal, and hidden intention. Read more

Scalable Chain of Thoughts via Elastic Reasoning

The authors propose Elastic Reasoning, a framework that separates reasoning into two phases - thinking and solution - with independently allocated budgets. Elastic Reasoning consists of two key components: GRPO training with budget-constrained rollout, and separate budgeting for inference. Read more

FG-CLIP: Fine-Grained Visual and Textual Alignment

The paper introduces Fine-Grained CLIP (FG-CLIP), a novel approach designed to enhance CLIP's fine-grained understanding capabilities. FG-CLIP leverages large multimodal models to generate 1.6 billion long caption-image pairs, capturing global-level semantic details. Read more

A Survey on 3D Scene Generation

The survey provides a systematic overview of state-of-the-art approaches in 3D scene generation, organizing them into four paradigms: procedural generation, neural 3D-based generation, image-based generation, and video-based generation. The survey analyzes technical foundations, trade-offs, and representative results. Read more

IC ON: A Novel Data Selection Method for Improving Performance of Large Language Models

The researchers propose IC ON, a novel data selection method for improving the performance of Large Language Models (LLMs) while reducing training costs. IC ON leverages in-context learning (ICL) to measure the contribution of individual samples without relying on gradient computation or manual heuristics. Read more

Can General-Domain Text-Based Post-Training Enable Generalizable Reasoning?

The authors explore whether general-domain text-based post-training can enable generalizable reasoning across modalities and domains. They propose a two-stage approach: supervised fine-tuning (SFT) on general-domain text data with distilled long chain-of-thoughts (CoTs), and reinforcement learning with verifiable rewards (RLVR). Read more

LiftFeat: A Lightweight Network for 3D Geometry-Aware Local Feature Matching

The researchers propose LiftFeat, a novel lightweight network that integrates 3D geometric features to enhance the robustness of local feature matching. LiftFeat uses a pre-trained monocular depth estimation model to generate pseudo surface normal labels, which supervise the extraction of 3D geometric features. Read more

StreamBridge: Transforming Offline Video-LLMs into Proactive Streaming Assistants

The authors present StreamBridge, a framework that transforms offline Video Large Language Models (Video-LLMs) into proactive streaming assistants. StreamBridge consists of a memory buffer, round-decayed compression strategy, and activation model. Read more

Generating Physically Stable and Buildable LEGO Designs from Text

The researchers introduce LegoGPT, a novel approach for generating physically stable LEGO brick models from text prompts. They constructed a large-scale dataset of physically stable LEGO designs, along with their associated captions. Read more

Crosslingual Reasoning through Test-Time Scaling

The authors investigate the effectiveness of test-time scaling for crosslingual reasoning in large language models (LLMs). They explore how scaling up inference compute for English-centric reasoning language models can improve multilingual mathematical reasoning. Read more

BrowseComp-ZH: A Benchmark for Evaluating Web Browsing Abilities of Large Language Models in Chinese

The authors introduce BrowseComp-ZH, a benchmark designed to evaluate the web browsing abilities of large language models (LLMs) in the Chinese language. The benchmark addresses the gap in evaluating LLMs in non-English environments. Read more

PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes

The researchers introduce PlaceIt3D, a novel task that involves language-guided object placement in real 3D scenes. They propose a benchmark, evaluation protocol, dataset, and baseline method for this task. Read more

Chain-of-Thought Tokens are Computer Program Variables

The authors explore the role of Chain-of-Thought (CoT) tokens in Large Language Models (LLMs). They propose a hypothesis that CoT tokens function like computer program variables, storing intermediate values used in subsequent computations. Read more

Social Media News

Companies

The following companies have been mentioned in the recent AI news: OpenAI, Nvidia, Mistral-ai, Google, Apple, HuggingFace, and Meta. OpenAI has launched both Reinforcement Finetuning and Deep Research on GitHub repos. Nvidia has open-sourced Open Code Reasoning models, including 32B, 14B, and 7B versions under the Apache 2.0 license, which are compatible with vLLM and have demonstrated strong performance. Google has released the Gemini 2.5 Pro model, which has reportedly closed the gap with GPT models in terms of performance.

Models

Models mentioned include Open Code Reasoning (32B, 14B, 7B), Open Code Reasoning-Nemotron (32B, 14B), phi-4, and Qwen3-14B. Nvidia has released the Open Code Reasoning models, and the Qwen3-14B model is considered an all-around excellent choice for multiple tasks. The following models have also been mentioned: GPT-4o, GPT-3.5, Qwen3-0.6B-Base, and Mistral Medium 3. Additionally, models such as Llama, Llama 4.0, LLaMA, and LLaMAX70B have been mentioned, showcasing advancements in AI capabilities.

Topics

The following topics were discussed: reinforcement learning, fine-tuning, code generation, reasoning, vision, on-device AI, model performance, dataset releases, and model optimization.

Trends

Trends discussed include the continued development of AI safety standards, advancements in large language models, and integration of more AI-driven tools within companies like Microsoft, with CEO Satya Nadella announcing aggressive expansion into AI agent technology for simplifying traditional productivity applications.

Research

The topics of deep learning, natural language processing, and machine learning frameworks such as PyTorch, as well as the increasing importance of reinforcement learning, have been explored. Discussions around the use of specific models such as LLaMA, and the applications of large language models in areas such as computer vision and natural language processing have also been covered.

People

Key figures include Sam Altman, the CEO of OpenAI, and Fidji Simo, the new CEO of Applications at OpenAI, and other researchers.

LLM and Safety

Regarding the current state of large language models (LLMs), discussions have centered on the integration of reasoning for coding tasks, with Qwen3-14B and phi4 leading the pack. Gemini, an AI developed by Google, has been highlighted for its competitive performance, although GPT-4o has been noted for its excessive personality. Grok 3.5 has garnered significant attention, and the need for improved model interpretability has been underscored as crucial in addressing safety concerns. Moreover, advancements in reinforcement learning, such as the integration of reinforcement learning from human feedback (RLHF) into models, have been reported, alongside growing interest in models capable of handling large context windows, such as LLaMA, and LLaMA variants like LLaMA. Finally, a range of applications of AI in industry, including code generation and language translation, point to a thriving ecosystem around AI research, highlighting a strong push toward innovation within the field of AI engineering. Moreover, the ongoing development of AI tools such as HuggingFace Transformers library and the LLaMA model indicates a growth towards better AI research frameworks. Additionally, ongoing discussions on the potential for AI technology to drive social progress and development, have been noted, and there has been significant interest in the applications of machine learning in areas such as education, computer hardware, and software development, as exemplified by the interest in LLaMA Agents and LLMs in general.

News

Google Expands AI Safety Features

Google has announced a new suite of AI-powered safety features across its platforms to protect users from increasingly sophisticated scams. This was announced today, May 9, 2025, as part of Google's ongoing efforts to combat evolving scam tactics [1].

FDA Completes First AI-Assisted Scientific Review Pilot

The FDA has completed its first AI-assisted scientific review pilot and announced an aggressive timeline to scale AI use internally across all FDA centers by June 30, 2025. The generative AI tools allow scientists to spend less time on tedious tasks, with tasks that previously took three days now taking minutes [5].

Thomson Reuters Releases 2025 Generative AI Report

A new report by the Thomson Reuters Institute found that 95% of surveyed professionals believe generative AI will be central to their organization's workflow within the next five years. The report highlights the importance of professional-grade GenAI tools specifically designed for legal work [2].

New Study on Generative AI and Copyright

A new study released yesterday explores the role of generative AI in using copyrighted material. The research addresses copyright and compensation issues in generative AI, with potential implications for ongoing legal cases such as allegations that Meta used pirated books to train its AI models [3].

10Web Launches API for Generative AI Website Builder

10Web has made its generative AI website builder accessible via an API. The technology enables users to create and edit unique page structures, business-specific designs, content, functions, and visuals, making website creation more efficient [4].

IBM Introduces Generative Computing at Think2025

IBM unveiled "generative computing," a new approach that moves away from prompt engineering toward structured programming for large language models. The framework connects high-performance computing with quantum systems, enabling "quantum-centric supercomputing" [2]. Read more Read more Read more Read more Read more Read more

Youtube Buzz

AI Robots and The $10T Arms Race

This video discusses major developments in AI technology, with a specific mention that "Google is about to drop something MASSIVE." The video appears to explore the intersection of AI and robotics, possibly focusing on significant investments and competition in this space, as suggested by the "$10T Arms Race" in the title.

Recreating DOOM with AI

This video explores the process of recreating the classic game DOOM using artificial intelligence technology. Released on May8,2025, the video has already garnered108K views and runs for26 minutes and5 seconds.

AI thinks you're a GENIUS

Released on May8,2025, this26-minute video has accumulated104K views. The content appears to explore how artificial intelligence systems can perceive or evaluate human intelligence.

Most LLMs are Bad at this Simple Benchmark Test!

The video examines how large language models (LLMs) perform on a straightforward benchmark test. It reveals surprising shortcomings in model performance.

BREAKING: OpenAI ABANDONS Plan To Go For Profit

This recent video has gathered66K views and runs for18 minutes and53 seconds. Released around May8,2025, it appears to cover significant news regarding OpenAI's business strategy.

Re-Coding Reality: Theorem Prover & Next-Gen AI (DeepSeek)

This video from May8,2025 discusses next-generation AI and the future of vision language agents.

Unlocking AI's True Potential: Game-Changing Insights for Business

Released on May9,2025, this video explores how businesses can effectively harness AI's power.

AI and the Law LLM at Queen Mary

Published on May8,2025, this video features Programme Director Laura Edgar discussing Queen Mary's brand new AI and the Law LLM programme.

Unlock the Future: Discover AI Integration in Microsoft Fabric!

This video from May8,2025 focuses on AI integration within Microsoft Fabric.

Deep Agent - Builds & Deploys Your App—No Coding Needed

This video explores DeepAgent from Abacus AI, an autonomous agent system that helps users build and deploy web apps and software using just a single prompt.

Free Beginners Step-by-Step Course:10 Best ChatGPT AI Hacks

This comprehensive tutorial presents a beginner-friendly, step-by-step guide to leveraging ChatGPT for practical everyday tasks.

Easily Create An AI Cold Caller | AI Voice Agents

Demonstrating the latest advancements in conversational AI, this video provides a walkthrough on setting up an AI-powered cold calling system.

MolyPix AI Tutorial | The Smartest Graphic Editor You've Never Used?

This tutorial introduces MolyPix, a powerful and lesser-known AI-driven graphic editor.

FULLY FREE AI Coder to Build Apps Without Writing Code (Tested)

This video explores a new, completely free AI coding tool that allows users to build apps without writing any code.

These are My3 Must-Have MCP Servers for AI Coding

In this video, the presenter shares three essential MCP (Model Control Platform) servers for enhancing AI coding workflows.

Google’s New VIDEO VISION Model is INSANE, here are10 SaaS Ideas

The video introduces Google's latest video vision model, which excels at video understanding.

Introducing Augment Remote Agent: Parallel Autonomous AI Agents

This video demonstrates how to leverage Augment Code's new remote agent feature within popular IDEs like VS Code.

Master AI in2025 as a Beginner! (proven blueprint)

This comprehensive guide presents a structured roadmap for beginners aiming to master AI in2025.

Gemini2.5 Pro Just Changed Everything in AI Coding!

This video reviews the recent release of Gemini2.5 Pro, focusing on its transformative impact on AI-assisted coding.

This NEW AI Video Generator Tool is INSANE (Access Every Model)

A walkthrough of an advanced AI video generation tool, this video demonstrates how users can create high-quality videos and images using the latest AI models.