Mastering Tool Use and Function Calling in LLMs
Explore function calling and tool use in LLMs, their mechanics, and designing effective tools for AI agents.
Every important AI story, distilled and tagged. Filter by topic, source, or recency.
Explore function calling and tool use in LLMs, their mechanics, and designing effective tools for AI agents.
According to a post by Min Choi on X, OpenAI has released Codex Sites, a tool that converts plans, dashboards, launch documents, or ideas into interactive apps with URLs. The thread highlights five example use cases: public equity investing, product design, sales, data analytics, and creative production.
Explore prompt engineering fundamentals that still work for today's instruction-tuned models and understand what changed in AI interactions.
Explore what long context LLMs provide, their quadratic costs, and effective usage strategies for optimal performance.
Explore how the mixture of experts architecture efficiently scales parameters while minimizing cost per token in AI models.
Explore retrieval augmented generation, its mechanics, and advantages over fine-tuning in natural language processing.
Microsoft unveiled a wave of AI products including new MAI-branded foundation models, a Copilot super app with long-running Autopilot agents, and a built-in OpenClaw Companion agent for Windows.
OpenAI introduced role-specific plugins, annotations, and a preview of shareable interactive Sites for Codex, reporting that over 5 million people now use the tool weekly. Non-developers represent roughly 20% of users and are growing more than three times as fast as developers, according to the company.
OpenAI has called for the creation of an international youth safety institute to advance global standards for age-appropriate AI use ahead of the G7 Leaders' Summit in France. The company outlined nine principles for youth AI safety and detailed existing ChatGPT safeguards for minors.
OpenAI has made its frontier models and Codex generally available on AWS via Amazon Bedrock, allowing enterprises to deploy AI within existing security, governance, and procurement workflows. The announcement includes future plans to bring OpenAI's Daybreak cyber capabilities to AWS.
OpenAI announced the Rosalind Biodefense program to equip trusted developers with GPT-Rosalind for building defensive biosecurity tools. The company is also expanding access to the model for select U.S. and allied government partners, according to the OpenAI Blog.
OpenAI published recommendations for designing trustworthy third-party evaluations of frontier AI models, emphasizing that the surrounding "harness"—the environment, tools, and setup enabling agentic execution—fundamentally shapes measured capabilities and safeguard robustness. The post categorizes evaluation claims and urges evaluators to transparently report their setup, budget, and validity checks to avoid under-elicitation or miscalibrated results.
OpenAI has published a Frontier Governance Framework detailing how its safety and security practices align with emerging legal requirements such as California’s Transparency in Frontier AI Act and the EU AI Act. The document translates aspects of the company’s internal Preparedness Framework into public governance commitments covering risk assessment, mitigation, and reporting for advanced AI systems.
Anthropic announced Claude Opus 4.8, an upgrade to its flagship model that it says improves benchmarks, honesty, and collaboration while maintaining the same price. The release also introduces effort controls for claude.ai, dynamic workflows in Claude Code, and cheaper fast-mode pricing, according to Anthropic News.
Anthropic launched Claude Design, a research-preview product powered by Claude Opus 4.7 that lets Claude Pro, Max, Team, and Enterprise subscribers collaborate with Claude to create designs, prototypes, slides, and other visual work. Enterprise organizations require administrator activation to enable access.
Anthropic announced Project Glasswing, a coalition including AWS, Apple, Google, Microsoft, and others, to use its unreleased Claude Mythos Preview model for defensive cybersecurity. The initiative aims to address the dual-use risk of advanced AI vulnerability-discovery capabilities by finding and fixing flaws in critical software before malicious actors can exploit them.
Anthropic interviewed 80,508 Claude users across 159 countries and 70 languages using an AI interviewer to understand their aspirations and concerns for artificial intelligence. The study found that respondents most often want AI to support professional excellence, personal transformation, and life management, while simultaneously holding multiple fears about the technology's impact.