Gemini Chat Completions 批量测试(BatchTest)¶
📝 简介¶
本文档说明 Gemini 系列模型 在 OpenAI Chat Completions 兼容接口(POST /v1/chat/completions)下可批量验证的能力与参数行为,包括:thinking / reasoning、流式 SSE、Function Calling、response_format、长上下文与常用生成参数等。默认网关:https://api-cs-al.naci-tech.com/v1。
一、项目目录结构¶
Gemini 批量测试仅需以下目录与文件(output/ 为运行后自动生成):
ks_gemini/
├── requirements.txt
├── test_models.py # 批量测试入口
└── output/ # 运行后生成,内含 test_results.json
二、安装依赖¶
按顺序执行以下步骤:
步骤 1:进入 ks_gemini 目录(与 requirements.txt、test_models.py 同目录)。
步骤 2:创建并激活虚拟环境(可选,推荐):
步骤 3:在当前目录安装依赖(requirements.txt 位于本目录):
requirements.txt 内容如下:
若本目录无 requirements.txt,可直接安装:
三、配置环境变量¶
步骤 1:在 ks_gemini 目录 或项目根目录下创建 .env 文件(或导出到当前 shell)。
步骤 2:写入或设置 API Key:
脚本会通过 python-dotenv 自动加载同目录或上级目录的 .env。
四、启动与运行¶
步骤 1:进入 Gemini 测试目录:
步骤 2:运行测试(二选一):
- 跑全部场景(不传参):
- 只跑指定场景(传场景名或别名,可多个):
步骤 3:查看结果:控制台会打印模型 × 场景的 PASS/FAIL 表格;完整结果会写入 ks_gemini/output/test_results.json。
测试覆盖的模型列表¶
gemini-2.5-flash-litegemini-2.5-flashgemini-2.5-progemini-3-flash-previewgemini-3-pro-preview
可选:gemini-3-pro-image-preview(若需测试可加入模型列表)。
📦 输出结果¶
- 控制台:输出模型 × 场景的 PASS/FAIL 表格及摘要信息
- 结果文件:完整结果写入
ks_gemini/output/test_results.json
🧪 场景与别名¶
| 场景 | 说明 | 可用别名 |
|---|---|---|
| Thinking | thinking / reasoning 输出 | thinking, think |
| Function Calling | 工具调用(流式) | fc, function |
| Tool Choice | tool_choice 行为对比 |
tc, tool |
| JSON Object | response_format: json_object |
so, json |
| JSON Schema | response_format: json_schema |
js, schema |
| 200k Context | 长上下文压力测试 | ctx, 200k |
| max_tokens | max_tokens 截断行为 |
mt |
| max_completion_tokens | max_completion_tokens 截断行为 |
mct |
| Gen Params | stop / 流式 usage |
gp, params |
🧾 请求 payload 示例¶
以下为 POST https://api-cs-al.naci-tech.com/v1/chat/completions 的请求体示例(需在请求中附带 model 字段)。流式响应需在客户端按 SSE 解析,并逐块拼接 content、thinking/reasoning_content 以及按 index 合并 tool_calls。
Thinking(思考内容)¶
{
"messages": [{ "role": "user", "content": "你是谁" }],
"enable_thinking": true,
"reasoning_effort": "low",
"stream": false
}
Function Calling(工具调用,流式)¶
{
"thinking": { "type": "enabled", "budget_tokens": 4096 },
"top_p": 0.95,
"stream_options": { "include_thinking": true },
"stream": true,
"messages": [
{ "role": "user", "content": "北京天气怎么样,以及北京是几点钟?" }
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_time",
"description": "当你想知道现在的时间时非常有用。",
"parameters": { "type": "object", "properties": {} }
}
},
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "当你想查询指定城市的天气时非常有用。",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "城市或县区,比如北京市、杭州市、余杭区等。"
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}
Tool Choice(required vs auto)¶
可分别发送两次请求对比:
tool_choice: "required"tool_choice: "auto"
required 版本 payload(简化工具定义):
{
"temperature": 0.9,
"stream": true,
"messages": [{ "role": "user", "content": "杭州在哪个国家" }],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"parameters": {
"type": "object",
"properties": { "location": { "type": "string" } },
"required": ["location"]
}
}
}
],
"tool_choice": "required"
}
auto 版本仅将 tool_choice 改为 "auto"。
JSON Object(结构化输出)¶
{
"messages": [
{
"role": "user",
"content": "generate a short json to describe the first digital computer, ENIAC, in the world."
}
],
"response_format": { "type": "json_object" }
}
JSON Schema(结构化输出)¶
{
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "Extract the key information from this email: John Smith (john@example.com) is interested in our Enterprise plan and wants to schedule a demo for next Tuesday at 2pm."
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "email_extraction",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"email": { "type": "string" },
"plan_interest": { "type": "string" },
"demo_requested": { "type": "boolean" }
},
"required": ["name", "email", "plan_interest", "demo_requested"],
"additionalProperties": false
}
}
}
}
200k Context(长上下文)¶
使用较长的 messages[0].content 进行长上下文能力验证:
max_tokens / max_completion_tokens(截断)¶
Gen Params(stop + 流式 usage)¶
{
"messages": [{ "role": "user", "content": "北京天气怎么样" }],
"top_p": 0.9,
"stop": ["北京"],
"stream": true,
"stream_options": { "include_usage": true }
}
附录:test_models.py 源码¶
以下为 test_models.py 的完整代码,可直接复制到 ks_gemini/ 下使用。
import os
import json
import sys
from typing import List, Dict, Any
import httpx
from dotenv import load_dotenv
load_dotenv()
API_BASE_URL = "https://api-cs-al.naci-tech.com/v1"
API_KEY = os.getenv("API_DEMO_API_KEY")
MODELS = [
"gemini-2.5-flash-lite",
"gemini-2.5-flash",
"gemini-2.5-pro",
"gemini-3-flash-preview",
"gemini-3-pro-preview",
]
class GeminiTester:
def __init__(self, base_url: str, api_key: str):
self.base_url = base_url
self.api_key = api_key
self.client = httpx.Client(base_url=base_url, headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}, timeout=120.0)
def run_test(self, model: str, test_name: str, payload: Dict[str, Any]) -> Dict[str, Any]:
payload["model"] = model
try:
if payload.get("stream"):
full_content = ""
full_thinking = ""
tool_calls_chunks = {}
finish_reason = None
usage = None
with self.client.stream("POST", "/chat/completions", json=payload) as response:
if response.status_code != 200:
return {"success": False, "error": f"Status {response.status_code}: {response.read().decode()}"}
for line in response.iter_lines():
if not line.startswith("data: "):
continue
data_str = line[6:]
if data_str == "[DONE]":
break
try:
chunk = json.loads(data_str)
if "usage" in chunk and chunk["usage"]:
usage = chunk["usage"]
if not chunk.get("choices"):
continue
delta = chunk["choices"][0].get("delta", {})
if "content" in delta and delta["content"]:
full_content += delta["content"]
if "thinking" in delta and delta["thinking"]:
full_thinking += delta["thinking"]
if "reasoning_content" in delta and delta["reasoning_content"]:
full_thinking += delta["reasoning_content"]
if "tool_calls" in delta:
for tc in delta["tool_calls"]:
idx = tc["index"]
if idx not in tool_calls_chunks:
tool_calls_chunks[idx] = {"id": tc.get("id"), "type": "function", "function": {"name": "", "arguments": ""}}
f = tc["function"]
if f.get("name"):
tool_calls_chunks[idx]["function"]["name"] += f["name"]
if f.get("arguments"):
tool_calls_chunks[idx]["function"]["arguments"] += f["arguments"]
if chunk["choices"][0].get("finish_reason"):
finish_reason = chunk["choices"][0]["finish_reason"]
except Exception as e:
print(f"解析流式数据出错: {e}")
continue
final_tool_calls = [v for k, v in sorted(tool_calls_chunks.items())]
simulated_response = {
"choices": [{"message": {"role": "assistant", "content": full_content, "thinking": full_thinking, "tool_calls": final_tool_calls if final_tool_calls else None}, "finish_reason": finish_reason}],
"usage": usage
}
return {"success": True, "response": simulated_response}
else:
response = self.client.post("/chat/completions", json=payload)
if response.status_code == 200:
return {"success": True, "response": response.json()}
return {"success": False, "error": f"Status {response.status_code}: {response.text}"}
except Exception as e:
return {"success": False, "error": str(e)}
def test_thinking(self, model: str):
payload = {"messages": [{"role": "user", "content": "你是谁"}], "enable_thinking": True, "reasoning_effort": "low", "stream": False}
res = self.run_test(model, "Thinking", payload)
if res["success"]:
content = res["response"]["choices"][0]["message"]
res["info"] = "Found thinking content" if ("thinking" in content or "reasoning_content" in content) else "No thinking content found"
return res
def test_function_calling(self, model: str):
payload = {
"thinking": {"type": "enabled", "budget_tokens": 4096}, "top_p": 0.95, "stream_options": {"include_thinking": True}, "stream": True,
"messages": [{"role": "user", "content": "北京天气怎么样,以及北京是几点钟?"}],
"tools": [
{"type": "function", "function": {"name": "get_current_time", "description": "当你想知道现在的时间时非常有用。", "parameters": {"type": "object", "properties": {}}}},
{"type": "function", "function": {"name": "get_current_weather", "description": "当你想查询指定城市的天气时非常有用。", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "城市或县区,比如北京市、杭州市、余杭区等。"}}, "required": ["location"]}}}
],
"tool_choice": "auto"
}
return self.run_test(model, "Function Calling", payload)
def test_tool_choice(self, model: str):
payload_base = {"temperature": 0.9, "stream": True, "messages": [{"role": "user", "content": "杭州在哪个国家"}], "tools": [{"type": "function", "function": {"name": "get_current_weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}}]}
payload_req = payload_base.copy()
payload_req["tool_choice"] = "required"
res_req = self.run_test(model, "Tool Choice Required", payload_req)
payload_auto = payload_base.copy()
payload_auto["tool_choice"] = "auto"
res_auto = self.run_test(model, "Tool Choice Auto", payload_auto)
final_res = {"success": True, "info": ""}
if not res_req["success"] or not res_auto["success"]:
final_res["success"] = False
final_res["error"] = f"Req Error: {res_req.get('error')} | Auto Error: {res_auto.get('error')}"
return final_res
tc_req = res_req["response"]["choices"][0]["message"].get("tool_calls")
tc_auto = res_auto["response"]["choices"][0]["message"].get("tool_calls")
if tc_req and not tc_auto:
final_res["info"] = "PASS: Both 'required' and 'auto' respected"
else:
final_res["success"] = False
details = []
if not tc_req: details.append("'required' failed")
if tc_auto: details.append("'auto' failed")
final_res["info"] = "FAIL: " + " & ".join(details)
final_res["response"] = {"required": res_req["response"], "auto": res_auto["response"]}
return final_res
def test_structured_output(self, model: str):
payload = {"messages": [{"role": "user", "content": "generate a short json to describe the first digital computer, ENIAC, in the world."}], "response_format": {"type": "json_object"}}
res = self.run_test(model, "JSON Object", payload)
if res["success"]:
try:
json.loads(res["response"]["choices"][0]["message"].get("content", ""))
res["info"] = "Valid JSON Object returned"
except:
res["success"] = False
res["info"] = "Invalid JSON string"
return res
def test_json_schema(self, model: str):
payload = {
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Extract the key information from this email: John Smith (john@example.com) is interested in our Enterprise plan and wants to schedule a demo for next Tuesday at 2pm."}],
"response_format": {"type": "json_schema", "json_schema": {"name": "email_extraction", "schema": {"type": "object", "properties": {"name": {"type": "string"}, "email": {"type": "string"}, "plan_interest": {"type": "string"}, "demo_requested": {"type": "boolean"}}, "required": ["name", "email", "plan_interest", "demo_requested"], "additionalProperties": False}}}
}
res = self.run_test(model, "JSON Schema", payload)
if res["success"]:
content = res["response"]["choices"][0]["message"].get("content", "")
try:
data = json.loads(content)
res["info"] = "PASS: Schema respected" if all(k in data for k in ["name", "email", "plan_interest", "demo_requested"]) else "FAIL: Missing required fields"
if "FAIL" in res["info"]:
res["success"] = False
except:
res["success"] = False
res["info"] = "FAIL: Invalid JSON"
return res
def test_context_200k(self, model: str):
base_text = "这是测试上下文的内容。我们正在测试 Gemini 的上下文能力。"
large_content = (base_text + "\n") * 7140 + "\n请问上述文本主要在测试什么能力?"
payload = {"messages": [{"role": "user", "content": large_content}], "stream": False}
return self.run_test(model, "200k Context", payload)
def test_max_tokens(self, model: str):
payload = {"messages": [{"role": "user", "content": "讲一个100字的故事"}], "max_tokens": 10}
res = self.run_test(model, "max_tokens", payload)
if res["success"]:
res["info"] = f"Finish reason: {res['response']['choices'][0].get('finish_reason')}"
return res
def test_max_completion_tokens(self, model: str):
payload = {"messages": [{"role": "user", "content": "讲一个100字的故事"}], "max_completion_tokens": 10}
res = self.run_test(model, "max_completion_tokens", payload)
if res["success"]:
res["info"] = f"Finish reason: {res['response']['choices'][0].get('finish_reason')}"
return res
def test_generation_params(self, model: str):
payload = {"messages": [{"role": "user", "content": "北京天气怎么样"}], "top_p": 0.9, "stop": ["北京"], "stream": True, "stream_options": {"include_usage": True}}
res = self.run_test(model, "Gen Params", payload)
if res["success"]:
usage = res["response"].get("usage")
content = res["response"]["choices"][0]["message"].get("content", "")
if usage and "total_tokens" in usage and "北京" not in content:
res["info"] = f"PASS: Usage found ({usage.get('total_tokens')} tokens), Stop respected"
elif not usage or "total_tokens" not in usage:
res["success"] = False
res["info"] = "FAIL: No usage info found"
else:
res["info"] = "Usage found, but Stop sequence failed"
return res
def main():
if not API_KEY:
print("错误: 请设置环境变量 API_DEMO_API_KEY")
return
tester = GeminiTester(API_BASE_URL, API_KEY)
scenarios = {
"Thinking": {"func": tester.test_thinking, "aliases": ["thinking", "think"]},
"Function Calling": {"func": tester.test_function_calling, "aliases": ["fc", "function"]},
"Tool Choice": {"func": tester.test_tool_choice, "aliases": ["tc", "tool"]},
"JSON Object": {"func": tester.test_structured_output, "aliases": ["so", "json"]},
"JSON Schema": {"func": tester.test_json_schema, "aliases": ["js", "schema"]},
"200k Context": {"func": tester.test_context_200k, "aliases": ["ctx", "200k"]},
"max_tokens": {"func": tester.test_max_tokens, "aliases": ["mt"]},
"max_completion_tokens": {"func": tester.test_max_completion_tokens, "aliases": ["mct"]},
"Gen Params": {"func": tester.test_generation_params, "aliases": ["gp", "params"]},
}
args = sys.argv[1:]
selected_scenarios = {}
if not args:
selected_scenarios = {k: v["func"] for k, v in scenarios.items()}
else:
for arg in args:
arg_lower = arg.lower()
for name, config in scenarios.items():
if arg_lower == name.lower() or arg_lower in config["aliases"]:
selected_scenarios[name] = config["func"]
break
if not selected_scenarios:
return
results = {}
print(f"开始测试 Gemini 模型,API Base: {API_BASE_URL}")
for model in MODELS:
print(f"\n[{model}] 测试中...")
model_results = {}
for name, func in selected_scenarios.items():
model_results[name] = func(model)
results[model] = model_results
print("\n" + "="*120)
print(f"| {'模型':<30} | {'测试场景':<25} | {'状态':<10} | {'详细信息'} |")
print("|" + "-"*32 + "|" + "-"*27 + "|" + "-"*12 + "|" + "-"*45 + "|")
for model, m_results in results.items():
for scenario, res in m_results.items():
status = "✅ PASS" if res.get("success") else "❌ FAIL"
display_info = res.get("info", "")
if not res.get("success"):
display_info = res.get("error") or res.get("info") or "Unknown error"
if len(display_info) > 42:
display_info = display_info[:42] + "..."
print(f"| {model:<30} | {scenario:<25} | {status:<10} | {display_info:<45} |")
print("="*120)
output_dir = "output"
os.makedirs(output_dir, exist_ok=True)
output_path = os.path.join(output_dir, "test_results.json")
with open(output_path, "w", encoding="utf-8") as f:
json.dump(results, f, indent=2, ensure_ascii=False)
print(f"\n测试结果已保存至 {output_path}")
if __name__ == "__main__":
main()