Hardening: lock de sesion atomico, monitor off por defecto, fix DeepSeek reasoning-only

- session_lock: token uuid + compare-and-delete (Lua), TTL > timeout de
  ejecucion; abort solo limpia el lock tras cancelacion confirmada.
  Evita doble ejecucion concurrente sobre la misma sesion.
- monitor HTTP (puerto 4545) deshabilitado salvo MCP_MONITOR_ENABLED=true
  y atado a 127.0.0.1; no se acumula historial en memoria si esta off.
- DeepSeek/LiteLLM: turnos que llegan solo con reasoning_content (sin
  content ni tool_calls) ya no rompen la sesion (400 'Invalid assistant
  message') ni se pintan como 'pensando': se promueven a texto en el
  historial y en el snapshot persistido.
- litellm pinneado a ==1.80.0 (builds reproducibles).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Jordan Diaz
2026-06-10 15:17:52 +00:00
parent 6a03fdf284
commit 43337e8554
8 changed files with 107 additions and 28 deletions

View File

@@ -413,9 +413,18 @@ class OpenAIAdapter(ModelAdapter):
"arguments": json.dumps(b.get("input", {}), ensure_ascii=False),
},
})
m: dict[str, Any] = {"role": "assistant", "content": ("\n".join(p for p in text_parts if p) or None)}
text_joined = "\n".join(p for p in text_parts if p)
m: dict[str, Any] = {"role": "assistant", "content": (text_joined or None)}
if reasoning_parts:
m["reasoning_content"] = "\n".join(reasoning_parts)
if not text_joined and not tool_calls:
# Quirk DeepSeek thinking: a veces emite TODA la respuesta
# en reasoning_content y cierra sin content ni tool_calls.
# Reenviar content=None sin tool_calls rompe la API
# ("content or tool_calls must be set"), asi que promovemos
# el reasoning a content (sin duplicarlo como reasoning_content).
m["content"] = "\n".join(reasoning_parts)
else:
m["reasoning_content"] = "\n".join(reasoning_parts)
if tool_calls:
m["tool_calls"] = tool_calls
out.append(m)