P0 contexto: ventana por modelo + recuperación ante overflow + self-heal del catálogo
Que las conversaciones largas no se rompan ni gasten de más: Ventana de contexto por modelo (antes: budget estático 120k/200k para todos): - cost.resolve_context_window: lee context_length del catálogo OpenRouter/DeepSeek en Redis, con fallback a litellm. config.budget_for_window deriva el budget de la ventana real (window - max_output - reserve). build_context lo aplica por turno (param model_id) en vez del fijo de settings. - Self-heal del catálogo OpenRouter: el admin panel lo cachea con TTL 1h y solo lo repuebla al abrir su ventana de IA → en runtime caducaba y se perdían ventana y precio. Ahora cost._get_catalog lo refresca solo (fetch público, mismo shape, cooldown 5min, TTL 24h). Arregla también el coste (caía al fijo). Recuperación ante overflow: - adapters.base.ContextOverflowError; openai_adapter traduce el error de context-length del proveedor (init e iteración del stream). - base.py: retry proactivo que recompacta hasta caber en la ventana ANTES de llamar al LLM; si ni así cabe → error accionable (no rompe la sesión). - engine.py: mensaje user-facing claro (modelo + ventana). Tests: ventana/budget, self-heal (mockeado), overflow, y sesión REAL de Redis. 106 verdes. evals/: harness para evaluar al agente acai-code (driver + README + resultados). Comparativa kimi vs deepseek vs glm (deepseek-v4-pro high = mejor calidad/precio). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
93
tests/test_overflow_recovery.py
Normal file
93
tests/test_overflow_recovery.py
Normal file
@@ -0,0 +1,93 @@
|
||||
"""Tests de recuperación ante overflow de ventana de contexto.
|
||||
|
||||
Cubre: detección del error de context-length del proveedor, y el envoltorio del
|
||||
adapter que lo traduce a `ContextOverflowError` (dominio) tanto si salta al
|
||||
iniciar el stream como durante la iteración.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import enum
|
||||
import sys
|
||||
import types
|
||||
|
||||
import pytest
|
||||
|
||||
if not hasattr(enum, "StrEnum"):
|
||||
class _CompatStrEnum(str, enum.Enum):
|
||||
pass
|
||||
|
||||
enum.StrEnum = _CompatStrEnum
|
||||
|
||||
if "anthropic" not in sys.modules:
|
||||
anthropic_stub = types.ModuleType("anthropic")
|
||||
anthropic_stub.AsyncAnthropic = type("_AsyncAnthropic", (), {})
|
||||
sys.modules["anthropic"] = anthropic_stub
|
||||
|
||||
if "openai" not in sys.modules:
|
||||
openai_stub = types.ModuleType("openai")
|
||||
openai_stub.AsyncOpenAI = type("_AsyncOpenAI", (), {})
|
||||
sys.modules["openai"] = openai_stub
|
||||
|
||||
from src.adapters.base import ContextOverflowError
|
||||
from src.adapters.openai_adapter import OpenAIAdapter, _is_context_overflow
|
||||
|
||||
|
||||
class TestOverflowDetection:
|
||||
def test_detects_by_message(self):
|
||||
assert _is_context_overflow(
|
||||
Exception("This model's maximum context length is 8192 tokens, however you requested 9000")
|
||||
)
|
||||
assert _is_context_overflow(Exception("context_length_exceeded"))
|
||||
assert _is_context_overflow(Exception("Please reduce the length of the messages"))
|
||||
|
||||
def test_does_not_flag_unrelated_errors(self):
|
||||
assert not _is_context_overflow(Exception("rate limit exceeded"))
|
||||
assert not _is_context_overflow(Exception("invalid api key"))
|
||||
|
||||
def test_detects_by_type_name(self):
|
||||
class ContextWindowExceededError(Exception):
|
||||
pass
|
||||
|
||||
assert _is_context_overflow(ContextWindowExceededError("boom"))
|
||||
|
||||
|
||||
class TestStreamWrapperMapsOverflow:
|
||||
def _make_adapter(self):
|
||||
# Saltamos __init__ (no necesitamos el cliente AsyncOpenAI: parcheamos
|
||||
# _stream_impl). Así el test no depende del stub de openai.
|
||||
return OpenAIAdapter.__new__(OpenAIAdapter)
|
||||
|
||||
def test_overflow_at_stream_init_becomes_domain_error(self, monkeypatch):
|
||||
adapter = self._make_adapter()
|
||||
|
||||
async def _impl(messages, tools=None, config=None):
|
||||
raise RuntimeError("maximum context length is 32768 tokens")
|
||||
yield # noqa: hace de esto un async generator
|
||||
|
||||
monkeypatch.setattr(adapter, "_stream_impl", _impl)
|
||||
|
||||
async def _run():
|
||||
async for _ in adapter.stream([{"role": "user", "content": "hola"}]):
|
||||
pass
|
||||
|
||||
with pytest.raises(ContextOverflowError):
|
||||
asyncio.run(_run())
|
||||
|
||||
def test_non_overflow_error_propagates_unchanged(self, monkeypatch):
|
||||
adapter = self._make_adapter()
|
||||
|
||||
async def _impl(messages, tools=None, config=None):
|
||||
raise RuntimeError("connection reset by peer")
|
||||
yield
|
||||
|
||||
monkeypatch.setattr(adapter, "_stream_impl", _impl)
|
||||
|
||||
async def _run():
|
||||
async for _ in adapter.stream([{"role": "user", "content": "hola"}]):
|
||||
pass
|
||||
|
||||
with pytest.raises(RuntimeError) as exc:
|
||||
asyncio.run(_run())
|
||||
assert not isinstance(exc.value, ContextOverflowError)
|
||||
Reference in New Issue
Block a user