Hardening: lock de sesion atomico, monitor off por defecto, fix DeepSeek reasoning-only

- session_lock: token uuid + compare-and-delete (Lua), TTL > timeout de ejecucion; abort solo limpia el lock tras cancelacion confirmada. Evita doble ejecucion concurrente sobre la misma sesion. - monitor HTTP (puerto 4545) deshabilitado salvo MCP_MONITOR_ENABLED=true y atado a 127.0.0.1; no se acumula historial en memoria si esta off. - DeepSeek/LiteLLM: turnos que llegan solo con reasoning_content (sin content ni tool_calls) ya no rompen la sesion (400 'Invalid assistant message') ni se pintan como 'pensando': se promueven a texto en el historial y en el snapshot persistido. - litellm pinneado a ==1.80.0 (builds reproducibles). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 15:17:52 +00:00
parent 6a03fdf284
commit 43337e8554
8 changed files with 107 additions and 28 deletions
--- a/src/orchestrator/agents/base.py
+++ b/src/orchestrator/agents/base.py
@@ -289,6 +289,34 @@ class BaseAgent:

            # If no tool calls, we're done
            if not tool_calls:
+                # Quirk DeepSeek thinking: a veces el modelo emite TODA su
+                # respuesta como reasoning y cierra el turno sin text ni
+                # tool_use. Si el turno termina SOLO con bloques thinking,
+                # promovemos el thinking a un bloque text en el snapshot que
+                # se persiste — asi el UI no lo muestra como "pensando" al
+                # recargar y el siguiente turno no rompe con
+                # "content or tool_calls must be set".
+                if turn_blocks and all(b.get("type") == "thinking" for b in turn_blocks):
+                    promoted = "\n".join(
+                        b.get("thinking", "") for b in turn_blocks if b.get("thinking")
+                    )
+                    turn_blocks = [{"type": "text", "text": promoted}]
+                    accumulated_content += promoted
+                    if promoted and self.profile.stream_deltas:
+                        # Emision en vivo via AGENT_DELTA normal: el
+                        # ClaudeFormatEmitter cierra el thinking block abierto
+                        # (content_block_stop) y abre un text block nuevo con
+                        # su propio indice (start/delta/stop), asi que el
+                        # protocolo de bloques no se rompe.
+                        await self.sse.emit(
+                            EventType.AGENT_DELTA,
+                            {
+                                "agent": self.profile.role,
+                                "delta": promoted,
+                                "step": step,
+                            },
+                            session_id=session.session_id,
+                        )
                if turn_blocks:
                    conversation.append({"role": "assistant", "content": turn_blocks})
                elif full_text: