从零构建 AI Agent：工具篇

原创。上一篇文章我们搭建了最基本的 AI Agent，但那个 Agent 只能聊天。这篇教你怎么给它加上实实在在的工具——让 Agent 不仅能说话，还能操作你的电脑。

从聊天到动手

上一篇文章中，我们构建了最基本的 AI Agent：一个连接到语言模型的循环，接收用户输入，维护对话上下文，持续运行。但这个 Agent 除了聊天什么也做不了——它只能凭借模型内置的知识回答问题。

要让 Agent 真正有用，得给它一个跟外界交互的途径。这个途径就是工具（Tool）。

什么是工具？

工具是你向 LLM 暴露的一个函数或程序，让模型能够在必要时自主调用它。工具可以简单到 AI Agent 代码里的一个 Python 函数，也可以复杂到通过 MCP（Model Context Protocol）调用远程 API 读写数据库的完整服务。

注意：MCP 不在本篇文章的讨论范围内，后续会单独讲解。

Agent 如何使用工具？

大语言模型输出的是文本，那它们怎么调用工具呢？早期实现靠的是让模型输出特定格式的文本（比如 Action: web_fetch），然后 Agent 的调度代码去解析这个文本并执行对应的函数。这种做法不太可靠——模型有时候就是不能精确地按照你期望的格式输出。

现在这个问题已经解决了。现代 LLM 内置了原生的**工具调用（Function Calling/Tool Calling）**能力。这些模型经过了专门的微调，能够生成结构化的 JSON 来描述它想要调用的工具和参数。这种原生实现包含内置的参数校验，显著降低了幻觉，让 Agent 在调用工具时更加稳定可靠。

扩充我们的 Agent

这次我们在上一篇文章搭建的基础 Agent 上做扩展，实现 AI Agent 最核心、最常用的一组工具。这些工具是几乎所有 Agent 框架的内置能力——它们看起来很简单，但组合起来威力巨大。

1. Bash 执行工具

code

def run_bash(command: str) -> str:
    """Run a bash command and return its output."""
    result = subprocess.run(
        command, shell=True, text=True, capture_output=True
    )
    output = result.stdout
    if result.stderr:
        output += f"\nSTDERR:\n{result.stderr}"
    return output or "(no output)"

这是最强大的一个工具。允许 Agent 在电脑上执行任何 bash 命令，意味着它有能力操作这台电脑上的所有资源——文件、进程、网络、包管理，一切。好处是你不用为每个程序单独实现一个工具——模型已经知道怎么用 bash 调用它们。坏处也很明显：这是最危险的工具。在后续文章里我们会讨论如何加固安全性。

2. 文件读取工具

code

def read_file(path: str, offset: int = 1, limit: int = 200) -> str:
    """Read lines from a file, with optional offset and limit."""
    p = Path(path)
    if not p.exists():
        return f"Error: file not found: {path}"
    lines = p.read_text(errors="replace").splitlines()
    selected = lines[offset - 1: offset - 1 + limit]
    return "\n".join(f"{offset + i}: {line}" for i, line in enumerate(selected))

允许 Agent 读取电脑上的文件。配合分页参数可以高效地处理大文件。在编程场景中，Agent 可以通过这个工具逐步阅读代码库，理解项目结构和代码逻辑。

3. 文件搜索工具

code

def glob_files(pattern: str, path: str = ".") -> str:
    """Find files matching a glob pattern inside a directory."""
    matches = glob_module.glob(f"{path}/**/{pattern}", recursive=True)
    matches += glob_module.glob(f"{path}/{pattern}")
    unique = sorted(set(matches))
    return "\n".join(unique) if unique else "(no matches)"

用来在目录中查找匹配模式的文件。Agent 在探索代码库时，先通过这个工具了解存在哪些文件，再决定读取哪些——跟人类开发者的工作方式一模一样。

4. 内容搜索工具

code

def grep(pattern: str, path: str = ".", include: str = "*") -> str:
    """Search file contents for a regex pattern."""
    results = []
    for filepath in glob_module.glob(f"{path}/**/{include}", recursive=True):
        fp = Path(filepath)
        if not fp.is_file():
            continue
        try:
            for i, line in enumerate(fp.read_text(errors="replace").splitlines(), 1):
                if re.search(pattern, line):
                    results.append(f"{filepath}:{i}: {line}")
        except OSError:
            pass
    return "\n".join(results) if results else "(no matches)"

在文件内容中搜索正则表达式，返回匹配的行及其文件路径和行号。它和 glob_files 配合使用：先用 glob 找到文件，再用 grep 搜索具体内容。可选的 include 参数能让搜索限定在特定类型的文件中，避免翻遍二进制文件。

5. 文件写入工具

code

def write_file(path: str, content: str) -> str:
    """Write content to a file, creating parent directories if needed."""
    p = Path(path)
    p.parent.mkdir(parents=True, exist_ok=True)
    p.write_text(content)
    return f"Wrote {len(content)} bytes to {path}"

让 Agent 能够创建新文件并写入内容。它自动创建缺失的父目录，Agent 不需要事先确定目录结构是否存在。生成代码、保存结果、输出文件——这个工具是 Agent 对外界产生实际影响的入口。

6. 文件编辑工具

code

def edit_file(path: str, old_string: str, new_string: str) -> str:
    """Replace the first occurrence of old_string with new_string in a file."""
    p = Path(path)
    if not p.exists():
        return f"Error: file not found: {path}"
    original = p.read_text()
    if old_string not in original:
        return f"Error: string not found in {path}"
    p.write_text(original.replace(old_string, new_string, 1))
    return f"Edited {path}"

write_file 会完全覆盖一个文件，而 edit_file 只做精确的字符串替换。这在只需要小范围修改现有文件时安全得多——不用担心 Agent 因为没读完内容就误写了整个文件。对于需要修补特定代码行的编程 Agent 来说，这是主力工具。

7. 网页抓取工具

code

def webfetch(url: str) -> str:
    """Fetch a URL and return its plain-text content (up to 2 MB)."""
    parsed = urlparse(url)
    if parsed.scheme not in ("http", "https"):
        return f"Error fetching {url}: unsupported scheme '{parsed.scheme}'."
    req = urllib.request.Request(url, headers={"User-Agent": "agent/1.0"})
    with urllib.request.urlopen(req, timeout=15) as resp:
        raw = b"".join(...).decode(charset, errors="replace")
    soup = BeautifulSoup(raw, "html.parser")
    text = soup.get_text(separator="\n", strip=True)
    return re.sub(r"\n{3,}", "\n\n", text).strip()

让 Agent 能够获取公开网页的文本内容。它用 BeautifulSoup 剥离 HTML 标签，只保留可读文本，保持上下文窗口整洁。限制为 HTTP/HTTPS 协议，响应上限 2MB，避免大页面撑爆上下文。

注册工具：给模型一张地图

工具实现完了，Agent 还不知道它们的存在。你需要给模型一份工具 Schema——告诉模型有哪些工具可用、每个工具做什么、接受什么参数。

code

def get_tool_schemas():
    return [
        {
            "type": "function",
            "function": {
                "name": "run_bash",
                "description": "Run a bash command on the user's machine and return the output.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "command": {
                            "type": "string",
                            "description": "The bash command to execute.",
                        }
                    },
                    "required": ["command"],
                },
            },
        },
        # ... 其他工具的 schema 类似
        {
            "type": "function",
            "function": {
                "name": "webfetch",
                "description": "Fetch a URL and return its plain-text content.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "url": {
                            "type": "string",
                            "description": "The URL to fetch.",
                        }
                    },
                    "required": ["url"],
                },
            },
        },
    ]

这个 Schema 的格式来自 OpenAI 的 Function Calling API 规范，目前已经成为事实标准——大多数主流模型（GPT、Claude、Gemini、DeepSeek 等）都兼容这个格式。

改造 Agent 循环

有了工具实现和 Schema，接下来要改造 Agent 的主循环。之前的循环只做了"接收输入 → 模型回复 → 输出"，现在要加入工具调用的处理逻辑：

code

def agent_loop():
    messages = []
    tools = get_tool_schemas()
    tool_map = {
        "run_bash": run_bash,
        "read_file": read_file,
        "glob_files": glob_files,
        "grep": grep,
        "write_file": write_file,
        "edit_file": edit_file,
        "webfetch": webfetch,
    }
 
    while True:
        user_input = input("You: ")
        if user_input.lower() in ("exit", "quit"):
            break
        messages.append({"role": "user", "content": user_input})
 
        while True:
            response = client.chat.completions.create(
                model="your-model",
                messages=messages,
                tools=tools,
            )
            choice = response.choices[0]
 
            if choice.finish_reason == "tool_calls":
                for tc in choice.message.tool_calls:
                    fn_name = tc.function.name
                    args = json.loads(tc.function.arguments)
                    result = tool_map[fn_name](**args)
                    messages.append({
                        "role": "tool",
                        "tool_call_id": tc.id,
                        "content": result,
                    })
            else:
                print(f"Agent: {choice.message.content}")
                messages.append(choice.message)
                break

关键的变化在于内层的 while True 循环。当模型的 finish_reason 是 "tool_calls" 时，表示模型想要调用工具——这时你不输出结果，而是执行对应的函数，把结果以 role: "tool" 的消息添加回对话，然后让模型继续处理。直到模型的 finish_reason 变成 "stop" 或 "length"，才认为 Agent 完成了当前轮的思考。

这个模式叫做工具循环。模型可以一次性请求多个工具调用（比如先搜索文件再读取），也可以根据工具返回的结果决定下一步操作——这就是 Agent 能够自主执行复杂任务的基础。

完整示例：通过 Agent 了解系统

假设你给 Agent 配置了以上所有工具，然后问它：

"我的服务器上运行着什么？今天有什么新的日志错误？"

Agent 可能会这样工作：

先调用 run_bash("w") 查看当前登录用户
然后 run_bash("ps aux --sort=-%mem | head -10") 查看最耗内存的进程
再调用 run_bash("journalctl --since today | grep error -i | tail -20") 检查今天的日志
如果发现了可疑的错误信息，用 webfetch 搜索相关内容的理解
把分析结果整理成人类可读的报告返给你

整个过程不需要你写一行命令——Agent 自己规划步骤、执行工具、分析结果。

安全考量

让 Agent 能够操作你的电脑，意味着你正在把钥匙交给它。以下是一些基本的安全措施：

沙箱执行：不要把 Agent 直接跑在自己的工作环境里。用 Docker 容器、subprocess 配合资源限制、或者专门的安全用户来运行。

命令白名单：对于 bash 工具，可以限制只允许执行白名单中的命令（如 ls、cat、git status 等），禁止网络命令、包管理操作和文件删除。

权限最小化：给 Agent 的运行用户限定为只读权限，只开放少数特定目录的写入权限。

人类审批：对于敏感操作（删除文件、安装软件、重启服务），让 Agent 停下来等用户确认。

速率限制：限制 Agent 调用工具的频率，避免意外进入无限循环。

下一步

现在你的 Agent 可以操作电脑了。但这还只是开始——接下来的挑战包括：给 Agent 实现记忆机制让它能记住跨会话的信息，引入 MCP 协议让它能与外部服务交互，以及建立评测体系来衡量 Agent 的表现。

这些内容会在后续文章里一一展开。

参考

原文系列：Build A Basic AI Agent From Scratch: Tools — Ruxu.dev
OpenAI Function Calling 文档
Anthropic Tool Use 文档
本系列上一篇：从零构建 AI Agent：基础篇

从零构建 AI Agent：工具篇

从聊天到动手

什么是工具？

Agent 如何使用工具？

扩充我们的 Agent

1. Bash 执行工具

2. 文件读取工具

3. 文件搜索工具

4. 内容搜索工具

5. 文件写入工具

6. 文件编辑工具

7. 网页抓取工具

注册工具：给模型一张地图

改造 Agent 循环

完整示例：通过 Agent 了解系统

安全考量

下一步

参考

相关文章

Python 不透明类型：用 NewType 隐藏内部实现的数据封装模式

n8n 入门指南：2026 年搭建你的第一个 AI Agent 工作流

一个 AI 编程怀疑论者亲自尝试 AI Agent 编程：详尽实录