Tests unitarios: 51 tests para compactor, key_data, fingerprint y costes

- tests/test_compactor.py: 24 tests (estimate_tokens, extract_facts, build_summary, summarize_tool_output, compact_sections) - tests/test_key_data_extraction.py: 11 tests (extracción de tables, records, sections, modules, pages desde tool executions) - tests/test_fingerprint.py: 8 tests (deduplicación MD5, sort_keys, nested args) - tests/test_cost_calculation.py: 8 tests (pricing formula, custom pricing, rounding) - README.md: sección Tests con instrucciones de ejecución Todos offline, sin Docker/Redis/LLM. Ejecutar: python3 -m pytest tests/ -v Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Token tracking y cálculo de costes por mensaje
2026-04-03 14:28:32 +00:00 · 2026-04-03 14:18:23 +00:00 · 2026-04-03 13:40:47 +00:00 · 2026-04-03 13:29:09 +00:00 · 2026-04-03 13:19:20 +00:00 · 2026-04-03 12:39:16 +00:00
23 changed files with 1057 additions and 119 deletions
--- a/9
+++ b/9
@@ -26,6 +26,12 @@ RUN pip install --no-cache-dir -r requirements.txt
 COPY agenticSystem/mcp-server/package.json agenticSystem/mcp-server/package-lock.json* ./mcp-server/
 RUN cd mcp-server && npm install --production
 # Instalar dependencias de sistema de Playwright (como root)
 RUN cd mcp-server && npx playwright install-deps chromium
 # Instalar uv (incluye uvx) para mcp-server-fetch
 RUN pip install --no-cache-dir uv
 # Copiar codigo fuente del MCP server
 COPY agenticSystem/mcp-server/ ./mcp-server/
@@ -45,6 +51,9 @@ RUN useradd -m appuser \
    && chown -R appuser:appuser /opt/acai/webs
 USER appuser
 # Descargar Chromium como appuser (queda en ~/.cache/ms-playwright/)
 RUN cd mcp-server && npx playwright install chromium
 EXPOSE 8000
 CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
--- a/README.md
+++ b/README.md
@@ -59,6 +59,22 @@ python3 -m uvicorn src.main:app --reload --port 8001
 # 5. Dashboard en http://localhost:8001/dashboard/
 ```
 ### Tests
 ```bash
 # Ejecutar todos los tests unitarios (no necesita Docker, Redis ni LLM)
 pip install pytest
 python3 -m pytest tests/ -v
 # Ejecutar un archivo específico
 python3 -m pytest tests/test_compactor.py -v
 # Ejecutar un test específico
 python3 -m pytest tests/test_cost_calculation.py::TestCostCalculation::test_1m_input_tokens -v
 ```
 Los tests validan: compactación de contexto, extracción de key_data para historial, fingerprinting de tool calls, y cálculo de costes. Son 100% offline — no consumen tokens ni necesitan servicios externos.
 ### Cargar Knowledge Base
 ```bash
--- a/docs/ACAI-CLAUDE.md
+++ b/docs/ACAI-CLAUDE.md
@@ -105,7 +105,7 @@ Do NOT modify web-base files — they are shared across all projects.
 1. **Before working with any area (hooks, modules, templates, CSS/JS, etc.), read the corresponding documentation in `docs/` first.** Do not guess or assume — always consult the docs before taking action.
 2. **NEVER use `mkdir` to create directories.** Instead, use the `Write` tool to create the first file inside the directory — this creates parent directories automatically. For example, to create a new module, directly write the `index-base.tpl` file.
 3. Only edit `index-base.tpl` in modules — `index.tpl`, `index-twig.tpl`, and `builder.json` are auto-generated
-3. **After editing any `index-base.tpl`, ALWAYS call the `compile_module` MCP tool** to compile the module/section. This is mandatory — without compilation, changes won't take effect in the CMS.
+3. **Edit `index-base.tpl` using `acai_write` or `acai_line_replace`** — the server compiles automatically when the file is saved. No need to call `compile_module` manually.
 4. Use Twig **filters** (with `|`), never Twig functions
 5. Table names without `cms_` prefix everywhere
 6. Primary key is `num`, never `id`
@@ -122,8 +122,8 @@ This project has MCP tools for managing modules, records, media, and more. **Bef
 See [docs/mcp-tools-reference.md](docs/mcp-tools-reference.md) for the complete list of available tools and step-by-step workflows.
 Key workflows:
- **Create module**: Read [docs/module-creation-guide.md](docs/module-creation-guide.md) first → `create_module` → `add_module_to_record` (returns sectionId) → `set_module_config_vars` (returns uploadFields) → images via uploadFields
+- **Create module**: Read [docs/module-creation-guide.md](docs/module-creation-guide.md) first → Write `index-base.tpl` via `acai_write` → `add_module_to_record` (returns sectionId) → `set_module_config_vars` (returns uploadFields) → images via uploadFields
- **Edit module**: read vars → edit `index-base.tpl` → `compile_module`
+- **Edit module**: read vars → edit `index-base.tpl` with `acai_write` or `acai_line_replace` (server compiles automatically)
 - **Add images**: use `uploadFields` from `set_module_config_vars` response → `upload_record_image`
 - **Generate images**: `generate_image` → `upload_record_image` with returned URL
--- a/docs/mcp-tools-reference.md
+++ b/docs/mcp-tools-reference.md
@@ -4,7 +4,7 @@
 | Tool | Categoría | Acción |
 |------|-----------|--------|
-| `create_module` | Módulos | Crea módulo nuevo (directorio + archivos + compila) |
+| `create_module` | Módulos | (Legacy) Alternativa para crear módulo — preferir acai_write |
 | `compile_module` | Módulos | Compila módulo tras editar index-base.tpl |
 | `check_module` | Módulos | Preview de cómo renderiza un módulo |
 | `check_module_usage` | Módulos | Qué páginas usan un módulo |
@@ -37,7 +37,7 @@
 ### Crear un módulo nuevo desde cero
-1. `create_module` — Crea el directorio con index-base.tpl, style.css, script.js y compila
+1. `acai_write` — Escribe `index-base.tpl` en `template/estandar/modulos/NOMBRE/`. El server crea la carpeta si no existe, compila y genera todos los archivos derivados (index-twig.tpl, index.tpl, builder.json, screenshots)
 2. `add_module_to_record` — Añade el módulo a una página (tabla padre, ej: `apartados`)
 3. `set_module_config_vars` — Rellena las variables con contenido (textos, colores, opciones). **OBLIGATORIO** — sin esto el módulo no muestra nada. Devuelve:
   - `configVars`: mapa de variables → recordNums
@@ -46,12 +46,13 @@
 4. Para imágenes: `generate_image` o `upload_record_image` usando el `recordNum` y `fieldName` del `uploadFields` devuelto en el paso 3
 5. Verificar con `check_module` o recargando la página
 > **Nota:** `create_module` es una alternativa legacy que hace lo mismo pero con menos control sobre el contenido del template.
 ### Editar un módulo existente
 1. `get_module_config_vars` — Leer el estado actual del módulo (variables, recordNums)
-2. Editar `index-base.tpl` con la tool `Write` o `Edit`
+2. Editar `index-base.tpl` con `acai_write` o `acai_line_replace` — el server compila automáticamente al guardar
-3. `compile_module` — **OBLIGATORIO** tras cada edición de index-base.tpl
+3. Si cambias variables: `set_module_config_vars` para actualizar valores
 4. Si cambias variables: `set_module_config_vars` para actualizar valores
 ### Añadir/modificar imágenes de un módulo
--- a/docs/module-creation-guide.md
+++ b/docs/module-creation-guide.md
@@ -38,7 +38,7 @@ Each module lives in `template/estandar/modulos/<moduleId>/` with:
 ## Creating a Module — Full Workflow
 1. **Read style reference** (steps above)
-2. **`create_module`** — Creates the directory with index-base.tpl, style.css, script.js and compiles. Use descriptive `moduleId` and clear `label`.
+2. **`acai_write`** — Write `index-base.tpl` to `template/estandar/modulos/MODULE_ID/index-base.tpl`. The server automatically creates the directory, compiles and generates all derived files. `create_module` is a legacy alternative.
 3. **`add_module_to_record`** — Adds the module to a page. Response includes `sectionId` — use it directly in the next step.
 4. **`set_module_config_vars`** — Fill variables with content. Response includes `uploadFields` with `{ fieldName, recordNum }` for each upload variable.
 5. **Upload images** — Use `generate_image` then `upload_record_image` with the `recordNum` and `fieldName` from step 4's `uploadFields`. No need to read builder.json or call get_module_config_vars.
@@ -72,6 +72,6 @@ Modules with `MJMLModule: true` in their schema are email modules:
 - Use `section_id` variable for unique anchors/scoping
 - Use `interno` variable to detect CMS editor vs public view
 - Include other modules with: `<module_id :param1="value1"></module_id>`
- After editing `index-base.tpl`, ALWAYS call `compile_module`
+- After editing `index-base.tpl` with `acai_write` or `acai_line_replace`, the server compiles automatically — no need to call `compile_module`
 - Twig uses filters (with `|`), never functions
 - Twig concatenation uses `~`: `'value=' ~ variable`
--- a/mcp-server/stdio.js
+++ b/mcp-server/stdio.js
@@ -22,9 +22,21 @@ registerResources(server);
 // Static env vars (web_url and website don't change, token does)
 const projectDir = process.env.ACAI_PROJECT_DIR || "";
-const website = process.env.ACAI_WEBSITE || "";
+const acaiFilePath = projectDir ? path.join(projectDir, ".acai") : "";
-const webUrl = process.env.ACAI_WEB_URL || "";
+
 // Read .acai once at startup for URL fallbacks
 let acaiFileData = {};
 if (acaiFilePath) {
    try {
        acaiFileData = JSON.parse(fs.readFileSync(acaiFilePath, "utf-8"));
    } catch { /* ignore - fall back to env vars */ }
 }
 const website = process.env.ACAI_WEBSITE || acaiFileData.domain || "";
 const webUrl = process.env.ACAI_WEB_URL || acaiFileData.local_web_url || "";
 const derivedForgeHost = (() => {
    // First check .acai for explicit forge host
    if (acaiFileData.local_forge_host) return acaiFileData.local_forge_host;
    if (!webUrl) return "";
    try {
        const parsed = new URL(webUrl);
@@ -35,7 +47,6 @@ const derivedForgeHost = (() => {
 })();
 const apiWebUrl = process.env.ACAI_API_WEB_URL || (derivedForgeHost ? "http://web:80/" : webUrl);
 const forgeHost = process.env.ACAI_FORGE_HOST || derivedForgeHost;
 const acaiFilePath = projectDir ? path.join(projectDir, ".acai") : "";
 // Read fresh credentials from .acai file
 function readFreshCredentials() {
--- a/mcp.json
+++ b/mcp.json
@@ -3,13 +3,15 @@
    "acai-code": {
      "command": "node",
      "args": ["mcp-server/stdio.js"],
-      "env": {},
+      "env": {
        "LOCAL_SERVER_URL": "http://app:9091"
      },
      "timeout": 30,
      "startup_timeout": 10
    },
    "playwright": {
      "command": "npx",
-      "args": ["@playwright/mcp", "--headless"],
+      "args": ["@playwright/mcp", "--headless", "--executable-path", "/home/appuser/.cache/ms-playwright/chromium-1212/chrome-linux64/chrome"],
      "timeout": 30,
      "startup_timeout": 15
    },
--- a/src/adapters/openai_adapter.py
+++ b/src/adapters/openai_adapter.py
@@ -44,6 +44,7 @@ class OpenAIAdapter(ModelAdapter):
            "temperature": config.temperature,
            "messages": messages,
            "stream": True,
            "stream_options": {"include_usage": True},
        }
        if tools:
            kwargs["tools"] = self._format_tools(tools)
@@ -52,9 +53,22 @@ class OpenAIAdapter(ModelAdapter):
        tool_calls_acc: dict[int, dict[str, str]] = {}
        final_usage: dict[str, int] = {}
        async for chunk in stream:
            # With include_usage, the last chunk has usage but no choices
            if chunk.usage:
                final_usage = {
                    "input_tokens": chunk.usage.prompt_tokens or 0,
                    "output_tokens": chunk.usage.completion_tokens or 0,
                }
            choice = chunk.choices[0] if chunk.choices else None
            if not choice:
                # Usage-only chunk (last one with include_usage) — emit it
                if final_usage:
                    yield StreamChunk(usage=final_usage)
                    final_usage = {}  # Only emit once
                continue
            delta = choice.delta
@@ -99,16 +113,15 @@ class OpenAIAdapter(ModelAdapter):
                            tool_arguments=acc["arguments"],
                            finish_reason="tool_use",
                        )
                    # Emit usage after tool_use chunks
                    if final_usage:
                        yield StreamChunk(usage=final_usage)
                else:
                    yield StreamChunk(
                        finish_reason="end_turn"
                        if choice.finish_reason == "stop"
                        else choice.finish_reason,
-                        usage={
+                        usage=final_usage,
                            "output_tokens": chunk.usage.completion_tokens
                            if chunk.usage
                            else 0
                        },
                    )
    # ------------------------------------------------------------------
--- a/src/config.py
+++ b/src/config.py
@@ -48,6 +48,10 @@ class Settings(BaseSettings):
    mcp_timeout_seconds: float = 30.0
    mcp_startup_timeout_seconds: float = 10.0
    # --- Pricing (per 1M tokens) ---
    cost_per_1m_input: float = 2.50
    cost_per_1m_output: float = 15.00
    # --- Orchestrator ---
    max_execution_steps: int = 25
    subagent_max_steps: int = 10
--- a/src/context/engine.py
+++ b/src/context/engine.py
@@ -62,10 +62,15 @@ class ContextEngine:
        session: SessionState,
        agent: AgentProfile,
        artifacts: list[ArtifactSummary] | None = None,
-        working_items: list[dict[str, Any]] | None = None,
+        conversation: list[dict[str, Any]] | None = None,
        extra_instructions: str = "",
    ) -> ContextPackage:
-        """Build a full ContextPackage for the given agent and session."""
+        """Build a full ContextPackage for the given agent and session.
        The conversation parameter contains real assistant/tool messages
        with complete tool results. These go into the messages array,
        not the system prompt — like professional agentic tools.
        """
        sections: list[ContextSection] = []
        allowed = set(agent.context_sections)
@@ -88,28 +93,18 @@ class ContextEngine:
        if "task_state" in allowed and session.task_history:
            sections.append(self._build_task_history(session))
-        # 5. Task state — current task
+        # 5. Task state — current task (includes compacted previous steps)
        if "task_state" in allowed and session.current_task:
            sections.append(self._build_task_state(session.current_task))
        # 6. Artifact memory — summarised, never raw (only current task's)
        if "artifact_memory" in allowed and artifacts:
            sections.append(self._build_artifact_memory(artifacts))
        # 6. Working context — recent relevant items
        if "working_context" in allowed:
            sections.append(
                self._build_working_context(working_items or [], extra_instructions)
            )
        # Compact to fit budget
        sections = self.compactor.compact_sections(sections)
        # Assemble system prompt from sections
        system_prompt = self._assemble_system_prompt(sections)
-        # Build messages (just user message — no chat history)
+        # Build messages with real conversation history
-        messages = self._build_messages(session)
+        messages = self._build_messages(session, conversation)
        total_tokens = estimate_tokens(system_prompt) + sum(
            estimate_tokens(m.get("content", "")) for m in messages
@@ -133,6 +128,7 @@ class ContextEngine:
                "preview": s.content[:150].replace("\n", " "),
            })
        conv_len = len(conversation) if conversation else 0
        debug_entry = {
            "timestamp": time.time(),
            "agent": agent.role.value,
@@ -144,7 +140,7 @@ class ContextEngine:
            "system_prompt_tokens": estimate_tokens(system_prompt),
            "user_message_preview": messages[0]["content"][:200] if messages else "",
            "artifacts_count": len(artifacts) if artifacts else 0,
-            "working_items_count": len(working_items) if working_items else 0,
+            "conversation_messages": conv_len,
        }
        history = self._history[session.session_id]
@@ -153,18 +149,13 @@ class ContextEngine:
            self._history[session.session_id] = history[-self._max_history:]
        logger.info(
-            "Context built for [%s/%s] — %d sections, ~%d tokens, artifacts=%d, working_items=%d",
+            "Context built for [%s/%s] — %d sections, ~%d tokens, artifacts=%d, conversation=%d msgs",
            session.session_id[:8],
            agent.role.value,
            len(sections),
            total_tokens,
            len(artifacts) if artifacts else 0,
-            len(working_items) if working_items else 0,
+            conv_len,
        )
        for s in section_summary:
            logger.debug(
                "  Section [%s] prio=%d tokens=%d chars=%d",
                s["type"], s["priority"], s["tokens"], s["chars"],
        )
        return package
@@ -236,10 +227,11 @@ class ContextEngine:
            [
                "",
                "## Contrato de Contexto",
-                "- NUNCA recibirás salidas crudas de herramientas en tu contexto.",
+                "- Los resultados de herramientas se incluyen completos en la conversación.",
-                "- Los resultados de herramientas se resumen como artefactos.",
+                "- Los steps anteriores pueden estar compactados como resúmenes.",
                "- Solicita rehidratación si necesitas el contenido completo.",
                "- Mantén las respuestas enfocadas en el paso actual.",
                "- Si ya tienes la información necesaria, genera tu respuesta final.",
                "- NO repitas llamadas a herramientas con los mismos argumentos.",
                "- Responde SIEMPRE en español.",
            ]
        )
@@ -406,6 +398,20 @@ class ContextEngine:
                lines.append(f"  Result: {summary}")
            if facts:
                lines.append(f"  Facts: {'; '.join(facts[:5])}")
            # Key structured data (recordNums, sectionIds, etc.)
            key_data = entry.get("key_data", {})
            if key_data:
                kd_parts = []
                for table, nums in key_data.get("tables", {}).items():
                    kd_parts.append(f"{table}: records {nums}")
                for page, num in key_data.get("pages", {}).items():
                    kd_parts.append(f"page '{page}' = record {num}")
                if key_data.get("sections"):
                    kd_parts.append(f"sections: {key_data['sections']}")
                if key_data.get("modules"):
                    kd_parts.append(f"modules: {key_data['modules']}")
                if kd_parts:
                    lines.append(f"  Key data: {'; '.join(kd_parts)}")
            review = entry.get("review", "")
            if review:
                lines.append(f"  Review: {review[:100]}")
@@ -451,6 +457,14 @@ class ContextEngine:
            for c in task.constraints:
                lines.append(f"- {c}")
        # Show compacted previous steps results
        compacted_steps = [s for s in task.plan if s.compacted and s.result_summary]
        if compacted_steps:
            lines.append("")
            lines.append("## Previous Steps (compacted)")
            for step in compacted_steps:
                lines.append(f"- [{step.agent_role}] {step.description}: {step.result_summary[:300]}")
        # Show plan overview (compact)
        if task.plan:
            lines.append("")
@@ -458,8 +472,9 @@ class ContextEngine:
            for i, step in enumerate(task.plan):
                marker = "→" if i == task.current_step_index else "·"
                status_label = step.status.value
                compacted_label = " (compacted)" if step.compacted else ""
                lines.append(
-                    f"  {marker} Step {i + 1} [{status_label}]: {step.description}"
+                    f"  {marker} Step {i + 1} [{status_label}{compacted_label}]: {step.description}"
                )
        content = "\n".join(lines)
@@ -483,26 +498,6 @@ class ContextEngine:
            token_estimate=estimate_tokens(content),
        )
    def _build_working_context(
        self,
        items: list[dict[str, Any]],
        extra_instructions: str,
    ) -> ContextSection:
        lines = ["# Working Context"]
        if extra_instructions:
            lines.append(f"\n{extra_instructions}")
        for item in items[: settings.working_context_max_items]:
            role = item.get("role", "info")
            content_val = item.get("content", "")
            lines.append(f"[{role}] {content_val}")
        content = "\n".join(lines)
        return ContextSection(
            section_type=ContextSectionType.WORKING_CONTEXT,
            content=content,
            priority=30,
            token_estimate=estimate_tokens(content),
        )
    # ------------------------------------------------------------------
    # Assembly
    # ------------------------------------------------------------------
@@ -510,14 +505,11 @@ class ContextEngine:
    def _assemble_system_prompt(self, sections: list[ContextSection]) -> str:
        """Combine sections into a single system prompt string."""
        parts: list[str] = []
        # Order: rules → profile → task → artifacts → working
        order = [
            ContextSectionType.IMMUTABLE_RULES,
            ContextSectionType.PROJECT_PROFILE,
            ContextSectionType.KNOWLEDGE_BASE,
            ContextSectionType.TASK_STATE,
            ContextSectionType.ARTIFACT_MEMORY,
            ContextSectionType.WORKING_CONTEXT,
        ]
        section_map: dict[ContextSectionType, ContextSection] = {
            s.section_type: s for s in sections
@@ -527,11 +519,15 @@ class ContextEngine:
                parts.append(section_map[st].content)
        return "\n\n---\n\n".join(parts)
-    def _build_messages(self, session: SessionState) -> list[dict[str, Any]]:
+    def _build_messages(
-        """Build the messages array. We do NOT include chat history.
+        self,
        session: SessionState,
        conversation: list[dict[str, Any]] | None = None,
    ) -> list[dict[str, Any]]:
        """Build the messages array with real conversation history.
-        The user message is the current task objective (or a sentinel
+        Includes the user objective message followed by the full
-        if no task is active).
+        assistant/tool conversation — like professional agentic tools.
        """
        if session.current_task:
            step = session.current_task.current_step()
@@ -545,4 +541,10 @@ class ContextEngine:
        else:
            user_content = "Awaiting task assignment."
-        return [{"role": "user", "content": user_content}]
+        messages: list[dict[str, Any]] = [{"role": "user", "content": user_content}]
        # Append real conversation (assistant messages + tool results)
        if conversation:
            messages.extend(conversation)
        return messages
--- a/src/mcp/client.py
+++ b/src/mcp/client.py
@@ -74,6 +74,7 @@ class MCPClient:
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE,
            env=self._env,
            limit=1024 * 1024,  # 1MB buffer for large MCP responses
        )
        self._running = True
        self._reader_task = asyncio.create_task(self._read_loop())
--- a/src/models/session.py
+++ b/src/models/session.py
@@ -36,6 +36,7 @@ class TaskStep(BaseModel):
    status: TaskStatus = TaskStatus.PENDING
    result_summary: str = ""
    tools_used: list[str] = Field(default_factory=list)
    compacted: bool = False  # True when step results have been compacted
    started_at: datetime | None = None
    completed_at: datetime | None = None
--- a/src/models/tools.py
+++ b/src/models/tools.py
@@ -33,7 +33,8 @@ class ToolExecution(BaseModel):
    tool_name: str
    arguments: dict[str, Any] = Field(default_factory=dict)
    status: ToolExecutionStatus = ToolExecutionStatus.PENDING
-    result_summary: str = ""  # Summarised result — raw output is NEVER stored here
+    result_summary: str = ""  # Summarised result for artifacts and compacted history
    raw_output: str = ""  # Truncated raw output for conversation messages
    error: str = ""
    duration_ms: float = 0.0
    started_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
--- a/src/orchestrator/agents/base.py
+++ b/src/orchestrator/agents/base.py
@@ -2,6 +2,7 @@
 from __future__ import annotations
 import hashlib
 import json
 import logging
 import time
@@ -47,6 +48,10 @@ class BaseAgent:
    ) -> dict[str, Any]:
        """Run the agent's execution loop.
        Uses real conversation messages with complete tool results,
        like professional agentic tools (Claude Code, Cursor).
        Compaction happens at the step level, not per tool result.
        Returns a result dict with keys: content, artifacts, tool_executions.
        """
        artifacts: list[ArtifactSummary] = await self.memory.list_artifacts(
@@ -54,15 +59,20 @@ class BaseAgent:
        )
        tool_executions: list[ToolExecution] = []
        accumulated_content = ""
-        working_items: list[dict[str, Any]] = []
+        total_input_tokens = 0
        total_output_tokens = 0
        # Real conversation history: assistant messages + tool results
        conversation: list[dict[str, Any]] = []
        tool_fingerprints: dict[str, ToolExecution] = {}
        all_duplicates_streak = 0  # consecutive steps where ALL calls are duplicates
        for step in range(max_steps):
-            # Build context — NEVER includes raw tool output
+            # Build context with real conversation
            ctx = await self.context.build_context(
                session=session,
                agent=self.profile,
                artifacts=artifacts,
-                working_items=working_items,
+                conversation=conversation,
            )
            # Prepare tool definitions
@@ -77,7 +87,7 @@ class BaseAgent:
            full_text = ""
            tool_calls: list[dict[str, Any]] = []
-            current_tool: dict[str, Any] | None = None
+            active_tools: dict[str, dict[str, Any]] = {}
            async for chunk in self.model.stream(
                messages=ctx.to_messages(),
@@ -96,8 +106,9 @@ class BaseAgent:
                        session_id=session.session_id,
                    )
-                if chunk.tool_name and (current_tool is None or not current_tool.get("name")):
+                if chunk.tool_name and chunk.tool_call_id:
-                    current_tool = {
+                    if chunk.tool_call_id not in active_tools:
                        active_tools[chunk.tool_call_id] = {
                            "id": chunk.tool_call_id,
                            "name": chunk.tool_name,
                            "arguments": "",
@@ -108,23 +119,32 @@ class BaseAgent:
                            session_id=session.session_id,
                        )
-                if chunk.tool_arguments and current_tool is not None and not chunk.finish_reason:
+                if chunk.tool_arguments and chunk.tool_call_id and not chunk.finish_reason:
-                    # Accumulate partial argument chunks (NOT the final one)
+                    tool = active_tools.get(chunk.tool_call_id)
-                    current_tool["arguments"] += chunk.tool_arguments
+                    if tool:
                        tool["arguments"] += chunk.tool_arguments
-                if chunk.finish_reason == "tool_use" and current_tool is not None and current_tool.get("name"):
+                if chunk.finish_reason == "tool_use" and chunk.tool_call_id:
-                    # Final chunk carries complete arguments — use those if
+                    tool = active_tools.pop(chunk.tool_call_id, None)
-                    # partial accumulation is empty, otherwise use accumulated
+                    if not tool:
-                    final_args = current_tool["arguments"] or chunk.tool_arguments or ""
+                        tool = {
                            "id": chunk.tool_call_id,
                            "name": chunk.tool_name or "",
                            "arguments": "",
                        }
                    final_args = tool["arguments"] or chunk.tool_arguments or ""
                    try:
                        args = json.loads(final_args) if final_args else {}
                    except json.JSONDecodeError:
                        logger.warning("Failed to parse tool args: %s", final_args[:200])
                        args = {}
-                    current_tool["parsed_arguments"] = args
+                    tool["parsed_arguments"] = args
-                    logger.debug("Tool call finalized: %s args=%s", current_tool["name"], json.dumps(args)[:200])
+                    tool_calls.append(tool)
-                    tool_calls.append(current_tool)
+
-                    current_tool = None
+                # Accumulate token usage from any chunk that has it
                if chunk.usage:
                    total_input_tokens += chunk.usage.get("input_tokens", 0)
                    total_output_tokens += chunk.usage.get("output_tokens", 0)
                if chunk.finish_reason == "end_turn":
                    break
@@ -133,28 +153,98 @@ class BaseAgent:
            # If no tool calls, we're done
            if not tool_calls:
                # Add final assistant message to conversation
                if full_text:
                    conversation.append({"role": "assistant", "content": full_text})
                break
-            # Execute tool calls
+            # Add assistant message with tool calls to conversation
            # (OpenAI format: assistant message carries tool_calls)
            assistant_msg: dict[str, Any] = {"role": "assistant"}
            if full_text:
                assistant_msg["content"] = full_text
            assistant_msg["tool_calls"] = [
                {
                    "id": tc["id"],
                    "type": "function",
                    "function": {
                        "name": tc["name"],
                        "arguments": json.dumps(tc.get("parsed_arguments", {})),
                    },
                }
                for tc in tool_calls
            ]
            conversation.append(assistant_msg)
            # Execute tool calls and add COMPLETE results to conversation
            duplicates_this_step = 0
            for tc in tool_calls:
                fp_raw = f"{tc['name']}:{json.dumps(tc.get('parsed_arguments', {}), sort_keys=True)}"
                fp = hashlib.md5(fp_raw.encode()).hexdigest()
                if fp in tool_fingerprints:
                    prev_exec = tool_fingerprints[fp]
                    tool_executions.append(prev_exec)
                    duplicates_this_step += 1
                    # Return cached result as tool message
                    conversation.append({
                        "role": "tool",
                        "tool_call_id": tc["id"],
                        "content": f"[DUPLICADO] Ya ejecutada con mismos argumentos. Resultado: {prev_exec.raw_output[:2000]}",
                    })
                    logger.warning("Duplicate tool call skipped: %s (fingerprint: %s)", tc["name"], fp[:8])
                    continue
                tool_exec = await self._execute_tool(
                    session=session,
                    tool_name=tc["name"],
                    arguments=tc.get("parsed_arguments", {}),
                    artifacts=artifacts,
                )
                tool_fingerprints[fp] = tool_exec
                tool_executions.append(tool_exec)
-                # Add summarised result to working context (NEVER raw)
+                # COMPLETE result in conversation (truncated to safe limit)
-                working_items.append({
+                conversation.append({
-                    "role": "tool_result",
+                    "role": "tool",
-                    "content": f"[{tc['name']}] {tool_exec.result_summary}",
+                    "tool_call_id": tc["id"],
                    "content": tool_exec.raw_output[:8000] if tool_exec.raw_output else tool_exec.result_summary,
                })
            # Loop detection: if ALL tool calls in this step were duplicates
            if duplicates_this_step == len(tool_calls):
                all_duplicates_streak += 1
                if all_duplicates_streak >= 2:
                    logger.warning("Loop detected: %d consecutive steps with all duplicate calls. Breaking.", all_duplicates_streak)
                    conversation.append({
                        "role": "user",
                        "content": "[SISTEMA] Se detectaron llamadas repetidas. Ya tienes toda la información necesaria. Genera tu respuesta final ahora.",
                    })
                    # One more chance to generate a final response
                    ctx = await self.context.build_context(
                        session=session, agent=self.profile,
                        artifacts=artifacts, conversation=conversation,
                    )
                    async for chunk in self.model.stream(
                        messages=ctx.to_messages(),
                        config=config,
                    ):
                        if chunk.delta:
                            accumulated_content += chunk.delta
                        if chunk.finish_reason:
                            break
                    break
            else:
                all_duplicates_streak = 0
        return {
            "content": accumulated_content,
            "artifacts": artifacts,
            "tool_executions": tool_executions,
            "usage": {
                "input_tokens": total_input_tokens,
                "output_tokens": total_output_tokens,
            },
        }
    async def _execute_tool(
@@ -200,6 +290,7 @@ class BaseAgent:
            tool_exec.status = ToolExecutionStatus.COMPLETED
            tool_exec.result_summary = artifact.summary
            tool_exec.raw_output = raw_output[:8000]
            tool_exec.duration_ms = duration
            await self.sse.emit(
--- a/src/orchestrator/agents/coder.py
+++ b/src/orchestrator/agents/coder.py
@@ -5,7 +5,7 @@ from __future__ import annotations
 from ...models.agent import AgentProfile, AgentRole
 from .base import BaseAgent
-CODER_SYSTEM_PROMPT = """Eres un Agente Programador. Tu rol es ejecutar tareas de implementación usando las herramientas disponibles.
+CODER_SYSTEM_PROMPT = """Eres un Agente Programador de Acai CMS. Tu rol es ejecutar tareas de implementación usando las herramientas MCP disponibles.
 ## Instrucciones
 - Concéntrate en la descripción del paso actual.
@@ -16,9 +16,17 @@ CODER_SYSTEM_PROMPT = """Eres un Agente Programador. Tu rol es ejecutar tareas d
 - Responde SIEMPRE en español.
 ## Uso de herramientas
- Usa herramientas cuando necesites leer archivos, escribir código o ejecutar comandos.
+- CONSULTA la Knowledge Base ANTES de actuar — tiene la referencia completa de tools y flujos de trabajo.
- Los resultados de herramientas se te presentarán resumidos — no verás la salida cruda.
+- Para CREAR/EDITAR MÓDULOS usa `acai_write` sobre `template/estandar/modulos/NOMBRE/index-base.tpl`. El server crea la carpeta si no existe, compila y genera todos los archivos derivados automáticamente. NO necesitas compile_module.
- Si necesitas más detalle de un resultado, solicita rehidratación.
+- `create_module` es legacy — funciona pero `acai_write` es el flujo estándar.
 - Para GESTIONAR REGISTROS de tablas (apartados, travesias, etc.) usa `create_or_update_record`.
 - Flujo de módulo nuevo: acai_write index-base.tpl → add_module_to_record → set_module_config_vars.
 - tableName siempre SIN prefijo cms_ (ej: apartados, NO cms_apartados).
 - La primary key es siempre `num`, nunca `id`.
 ## Datos del historial
 - Si el historial de sesión incluye Key Data con recordNums o sectionIds, ÚSALOS directamente sin re-consultar.
 - Ejemplo: si el historial dice "pages: Inicio = record 2", usa recordNum=2 para la portada.
 """
@@ -35,8 +43,6 @@ def create_coder_profile() -> AgentProfile:
            "project_profile",
            "knowledge_base",
            "task_state",
            "artifact_memory",
            "working_context",
        ],
    )
--- a/src/orchestrator/agents/planner.py
+++ b/src/orchestrator/agents/planner.py
@@ -55,9 +55,10 @@ def create_planner_profile() -> AgentProfile:
 class PlannerAgent(BaseAgent):
    """Generates execution plans from objectives."""
-    async def plan(self, session: SessionState) -> list[TaskStep]:
+    async def plan(self, session: SessionState) -> tuple[list[TaskStep], dict[str, int]]:
-        """Generate a plan and return TaskSteps."""
+        """Generate a plan and return (TaskSteps, usage)."""
        result = await self.execute(session, max_steps=1)
        usage = result.get("usage", {"input_tokens": 0, "output_tokens": 0})
        content = result["content"].strip()
        # Parse the JSON plan from the model output
@@ -92,7 +93,7 @@ class PlannerAgent(BaseAgent):
                    parsed.get("facts", [])
                )
-            return steps
+            return steps, usage
        except (json.JSONDecodeError, KeyError) as e:
            logger.warning("Failed to parse planner output: %s", e)
@@ -104,4 +105,4 @@ class PlannerAgent(BaseAgent):
                    else "Execute task",
                    agent_role="coder",
                )
-            ]
+            ], usage
--- a/src/orchestrator/engine.py
+++ b/src/orchestrator/engine.py
@@ -16,7 +16,7 @@ from ..context.engine import ContextEngine
 from ..mcp.manager import MCPManager
 from ..memory.store import MemoryStore
 from ..models.agent import AgentRole
-from ..models.session import SessionState, SessionStatus, TaskStatus
+from ..models.session import SessionState, SessionStatus, TaskState, TaskStatus
 from ..streaming.sse import SSEEmitter, EventType
 from .agents.coder import CoderAgent, create_coder_profile
 from .agents.collector import CollectorAgent, create_collector_profile
@@ -115,9 +115,10 @@ class OrchestratorEngine:
        # 2. Plan
        task.status = TaskStatus.PLANNING
        planner_usage: dict[str, int] = {"input_tokens": 0, "output_tokens": 0}
        try:
            planner = self._create_agent(AgentRole.PLANNER)
-            plan_steps = await planner.plan(session)
+            plan_steps, planner_usage = await planner.plan(session)
            task.plan = plan_steps
            task.status = TaskStatus.EXECUTING
        except Exception as e:
@@ -181,6 +182,10 @@ class OrchestratorEngine:
                for artifact in step_result.get("artifacts", []):
                    task.facts_extracted.extend(artifact.facts[:5])
                # Decide if previous steps should be compacted
                if i > 0:
                    self._maybe_compact_previous_steps(task, current_index=i)
            except Exception as e:
                logger.error("Step %d failed: %s", i + 1, e)
                step.status = TaskStatus.FAILED
@@ -230,6 +235,21 @@ class OrchestratorEngine:
            session_id=session.session_id,
        )
        # Accumulate token usage: planner + all steps + review
        total_input = planner_usage.get("input_tokens", 0)
        total_output = planner_usage.get("output_tokens", 0)
        for r in results:
            total_input += r.get("usage", {}).get("input_tokens", 0)
            total_output += r.get("usage", {}).get("output_tokens", 0)
        # Add review usage if any
        total_input += review_result.get("usage", {}).get("input_tokens", 0)
        total_output += review_result.get("usage", {}).get("output_tokens", 0)
        # Calculate cost
        cost_usd = (
            (total_input / 1_000_000) * settings.cost_per_1m_input
            + (total_output / 1_000_000) * settings.cost_per_1m_output
        )
        return {
            "session_id": session.session_id,
            "task_id": task.task_id,
@@ -241,6 +261,11 @@ class OrchestratorEngine:
            ),
            "review": review_result.get("content", ""),
            "status": status,
            "usage": {
                "input_tokens": total_input,
                "output_tokens": total_output,
            },
            "total_cost_usd": round(cost_usd, 6),
        }
    def _error_result(self, session: SessionState, error: str) -> dict[str, Any]:
@@ -292,12 +317,16 @@ class OrchestratorEngine:
        for step in task.plan:
            tools_used.update(step.tools_used)
        # Extract key structured data from tool executions
        key_data = self._extract_key_data_from_results(results)
        history_entry = {
            "task_id": task.task_id,
            "objective": task.objective,
            "status": task.status.value,
            "steps": len(task.plan),
            "facts": task.facts_extracted[-10:],
            "key_data": key_data,
            "tools_used": list(tools_used)[:10],
            "artifacts_count": len(task_artifacts),
            "summary": "; ".join(step_summaries)[:300],
@@ -323,6 +352,102 @@ class OrchestratorEngine:
            task.task_id, len(task.facts_extracted), len(tools_used), len(task_artifacts),
        )
    @staticmethod
    def _extract_key_data_from_results(results: list[dict[str, Any]]) -> dict[str, Any]:
        """Extract structured data from tool executions for task history.
        Preserves key identifiers (recordNum, sectionId, tableName, moduleId)
        so the model retains context across tasks without re-querying.
        """
        key_data: dict[str, Any] = {}
        seen_tables: dict[str, list[int]] = {}  # tableName -> recordNums
        seen_sections: list[str] = []
        seen_modules: list[str] = []
        seen_pages: dict[str, int] = {}  # page name/url -> recordNum
        for result in results:
            for te in result.get("tool_executions", []):
                args = te.arguments
                name = te.tool_name
                # Track table + record relationships
                table = args.get("tableName", "")
                record = args.get("recordNum")
                if table and record:
                    record_int = int(record) if str(record).isdigit() else None
                    if record_int and table not in seen_tables:
                        seen_tables[table] = []
                    if record_int and record_int not in seen_tables.get(table, []):
                        seen_tables[table].append(record_int)
                # Track section IDs
                section = args.get("sectionId", "")
                if section and section not in seen_sections:
                    seen_sections.append(section)
                # Track modules
                module = args.get("moduleId", "") or args.get("moduleName", "")
                if module and module not in seen_modules:
                    seen_modules.append(module)
                # Extract page info from raw output (enlace, name)
                if te.raw_output and "enlace" in te.raw_output:
                    try:
                        import json as _json
                        # Try to parse structured data from output
                        for line in te.raw_output.splitlines():
                            line = line.strip()
                            if line.startswith("{"):
                                try:
                                    data = _json.loads(line)
                                    if "enlace" in data and "num" in data:
                                        page_key = data.get("name", data["enlace"])
                                        seen_pages[page_key] = int(data["num"])
                                except _json.JSONDecodeError:
                                    pass
                    except Exception:
                        pass
        if seen_tables:
            key_data["tables"] = {t: nums[:10] for t, nums in seen_tables.items()}
        if seen_sections:
            key_data["sections"] = seen_sections[:20]
        if seen_modules:
            key_data["modules"] = seen_modules[:20]
        if seen_pages:
            key_data["pages"] = dict(list(seen_pages.items())[:20])
        return key_data
    def _maybe_compact_previous_steps(
        self, task: TaskState, current_index: int
    ) -> None:
        """Decide if previous steps should be compacted. Deterministic rules."""
        current_step = task.plan[current_index]
        for i in range(current_index):
            prev = task.plan[i]
            if prev.compacted or prev.status != TaskStatus.COMPLETED:
                continue
            # Rule 1: Change of agent role → previous steps are a different focus
            if prev.agent_role != current_step.agent_role:
                prev.compacted = True
                logger.info(
                    "Compacted step %d (%s) — agent changed to %s",
                    i + 1, prev.agent_role, current_step.agent_role,
                )
                continue
            # Rule 2: More than 3 completed non-compacted steps → compact oldest
            non_compacted = [
                s for s in task.plan[:current_index]
                if s.status == TaskStatus.COMPLETED and not s.compacted
            ]
            if len(non_compacted) > 3:
                non_compacted[0].compacted = True
                logger.info("Compacted oldest step to stay within budget")
    def _create_agent(self, role: AgentRole) -> PlannerAgent | CoderAgent | CollectorAgent | ReviewerAgent:
        """Instantiate a subagent for the given role."""
        profile = self._profiles[role]
--- a/tests/init.py
+++ b/tests/init.py
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -0,0 +1,8 @@
 """Configuracion de pytest para agenticSystem tests.
 Estos tests son 100% standalone — no importan desde src/ directamente
 porque el entorno de CI puede no tener las dependencias pesadas
 (anthropic, tiktoken, pydantic, etc.) ni Python 3.11+.
 La logica bajo test se replica o se extrae como funciones puras.
 """
--- a/tests/test_compactor.py
+++ b/tests/test_compactor.py
@@ -0,0 +1,362 @@
 """Tests para la logica de context/compactor.py — estimacion de tokens,
 extraccion de facts, construccion de summaries y compactacion de secciones.
 Se replica la logica pura sin importar src/ (evita dependencias pesadas).
 """
 import hashlib
 import re
 from dataclasses import dataclass, field
 from typing import List
 import pytest
 # =====================================================================
 # Replicas de la logica del compactor (funciones puras)
 # =====================================================================
 def estimate_tokens_fallback(text: str) -> int:
    """Replica del fallback de estimate_tokens (sin tiktoken)."""
    if not text:
        return 0
    return max(1, len(text) // 4)
 def extract_facts(raw_output: str) -> list:
    """Replica exacta de ContextCompactor._extract_facts."""
    facts = []
    lines = raw_output.strip().splitlines()
    for line in lines[:100]:
        line = line.strip()
        if not line or len(line) < 10:
            continue
        if re.match(r"^[\w\s]+:\s+.+", line) and len(line) < 200:
            facts.append(line)
        elif re.match(r"^(✓|✗|PASS|FAIL|ERROR|OK|INFO|WARNING)", line):
            facts.append(line)
        elif re.match(r"^[\w/\\.]+\s*[:\-]\s*.+", line) and len(line) < 200:
            facts.append(line)
    seen = set()
    unique = []
    for f in facts:
        if f not in seen:
            seen.add(f)
            unique.append(f)
    return unique[:15]
 def build_summary(tool_name: str, raw_output: str, facts: list) -> str:
    """Replica exacta de ContextCompactor._build_summary."""
    lines = raw_output.strip().splitlines()
    total_lines = len(lines)
    char_count = len(raw_output)
    parts = [f"Tool '{tool_name}' returned {total_lines} lines ({char_count} chars)."]
    if facts:
        parts.append(f"Key findings: {'; '.join(facts[:5])}")
    meaningful = [l.strip() for l in lines if l.strip()]
    if meaningful:
        parts.append(f"First: {meaningful[0][:120]}")
        if len(meaningful) > 1:
            parts.append(f"Last: {meaningful[-1][:120]}")
    return " ".join(parts)
 def infer_artifact_type(tool_name: str) -> str:
    """Replica de ContextCompactor._infer_artifact_type."""
    tool_lower = tool_name.lower()
    if any(k in tool_lower for k in ("read", "file", "code", "write", "edit")):
        return "code"
    if any(k in tool_lower for k in ("test", "check", "lint", "validate")):
        return "test_result"
    if any(k in tool_lower for k in ("search", "find", "grep", "glob")):
        return "analysis"
    if any(k in tool_lower for k in ("plan", "design", "architect")):
        return "plan"
    return "general"
 def summarize_tool_output(tool_name: str, raw_output: str, session_id: str, task_id: str) -> dict:
    """Replica simplificada de ContextCompactor.summarize_tool_output.
    Devuelve un dict con los mismos campos que ArtifactSummary.
    """
    facts = extract_facts(raw_output)
    summary = build_summary(tool_name, raw_output, facts)
    artifact_type = infer_artifact_type(tool_name)
    artifact_id = hashlib.sha256(
        f"{session_id}:{task_id}:{tool_name}:{raw_output[:200]}".encode()
    ).hexdigest()[:16]
    return {
        "artifact_id": artifact_id,
        "session_id": session_id,
        "task_id": task_id,
        "artifact_type": artifact_type,
        "title": f"Output of {tool_name}",
        "summary": summary,
        "facts": facts,
        "source_tool": tool_name,
        "char_count": len(raw_output),
    }
 # --- Modelo simplificado de ContextSection para test de compactacion ---
@dataclass
 class Section:
    section_type: str  # "immutable_rules", "working_context", "task_state", etc.
    content: str
    priority: int = 0
    token_estimate: int = 0
 def compact_sections(sections: list, max_tokens: int) -> list:
    """Replica de ContextCompactor.compact_sections (logica pura)."""
    # 1. Deduplicar
    seen = set()
    unique = []
    for s in sections:
        h = hashlib.md5(s.content.encode()).hexdigest()
        if h not in seen:
            seen.add(h)
            unique.append(s)
    sections = unique
    # 2. Estimar tokens
    for s in sections:
        s.token_estimate = estimate_tokens_fallback(s.content)
    total = sum(s.token_estimate for s in sections)
    if total <= max_tokens:
        return sections
    # 3. Ordenar por prioridad (mayor primero)
    sections.sort(key=lambda s: s.priority, reverse=True)
    # 4. Trim de menor prioridad
    while total > max_tokens and sections:
        lowest = sections[-1]
        if lowest.section_type == "immutable_rules":
            break
        # Compactacion simple: eliminar lineas vacias
        compacted_lines = [l.rstrip() for l in lowest.content.splitlines() if l.strip()]
        compacted = "\n".join(compacted_lines)
        new_est = estimate_tokens_fallback(compacted)
        saved = lowest.token_estimate - new_est
        if saved > 0:
            lowest.content = compacted
            lowest.token_estimate = new_est
            total -= saved
        else:
            total -= lowest.token_estimate
            sections.pop()
    return sections
 # =====================================================================
 # Tests: estimate_tokens
 # =====================================================================
 class TestEstimateTokens:
    def test_positive_for_nonempty_text(self):
        result = estimate_tokens_fallback("Hello world, this is a test string.")
        assert isinstance(result, int)
        assert result > 0
    def test_zero_for_empty_string(self):
        assert estimate_tokens_fallback("") == 0
    def test_longer_text_more_tokens(self):
        short = estimate_tokens_fallback("hi")
        long = estimate_tokens_fallback("hi " * 500)
        assert long > short
    def test_returns_int_type(self):
        assert isinstance(estimate_tokens_fallback("cualquier texto"), int)
    def test_minimum_is_one_for_short_text(self):
        # "ab" -> len 2 // 4 = 0, pero max(1, 0) = 1
        assert estimate_tokens_fallback("ab") == 1
 # =====================================================================
 # Tests: _extract_facts
 # =====================================================================
 class TestExtractFacts:
    def test_extracts_key_value_lines(self):
        raw = "Status: running\nVersion: 3.2.1\nIgnored short\nName: my-module"
        facts = extract_facts(raw)
        assert any("Status: running" in f for f in facts)
        assert any("Version: 3.2.1" in f for f in facts)
        assert any("Name: my-module" in f for f in facts)
    def test_extracts_status_indicators(self):
        raw = "PASS test_login completed\nFAIL test_logout broken\nOK everything fine"
        facts = extract_facts(raw)
        assert any("PASS" in f for f in facts)
        assert any("FAIL" in f for f in facts)
    def test_ignores_short_lines(self):
        raw = "ok\nhi\nyes\nStatus: this is long enough to be a fact"
        facts = extract_facts(raw)
        assert not any(f in ("ok", "hi", "yes") for f in facts)
    def test_deduplicates(self):
        raw = "Status: running value\nStatus: running value\nStatus: running value"
        facts = extract_facts(raw)
        assert facts.count("Status: running value") == 1
    def test_limits_to_15(self):
        lines = [f"Key{i}: value number {i} with enough length" for i in range(30)]
        raw = "\n".join(lines)
        facts = extract_facts(raw)
        assert len(facts) <= 15
    def test_empty_input(self):
        facts = extract_facts("")
        assert facts == []
 # =====================================================================
 # Tests: _build_summary
 # =====================================================================
 class TestBuildSummary:
    def test_includes_tool_name(self):
        summary = build_summary("read_file", "line1\nline2\nline3", [])
        assert "read_file" in summary
    def test_includes_line_count(self):
        raw = "line1\nline2\nline3"
        summary = build_summary("my_tool", raw, [])
        assert "3 lines" in summary
    def test_includes_char_count(self):
        raw = "some content here"
        summary = build_summary("my_tool", raw, [])
        assert str(len(raw)) in summary
    def test_includes_facts_when_present(self):
        facts = ["Status: ok", "Count: 42"]
        summary = build_summary("my_tool", "data", facts)
        assert "Status: ok" in summary
    def test_includes_first_line(self):
        raw = "primera linea importante\nsegunda\ntercera"
        summary = build_summary("tool", raw, [])
        assert "primera linea importante" in summary
 # =====================================================================
 # Tests: summarize_tool_output
 # =====================================================================
 class TestSummarizeToolOutput:
    def test_returns_dict_with_correct_fields(self):
        result = summarize_tool_output(
            tool_name="read_file",
            raw_output="Status: ok\nContent: hello world here",
            session_id="sess-001",
            task_id="task-001",
        )
        assert isinstance(result, dict)
        assert result["session_id"] == "sess-001"
        assert result["task_id"] == "task-001"
        assert result["source_tool"] == "read_file"
        assert result["title"] == "Output of read_file"
        assert result["artifact_id"]  # no vacio
        assert result["summary"]  # no vacio
        assert result["char_count"] > 0
    def test_artifact_type_inference(self):
        assert summarize_tool_output("read_file", "x", "s", "t")["artifact_type"] == "code"
        assert summarize_tool_output("test_run", "x", "s", "t")["artifact_type"] == "test_result"
        assert summarize_tool_output("search_records", "x", "s", "t")["artifact_type"] == "analysis"
        assert summarize_tool_output("deploy_app", "x", "s", "t")["artifact_type"] == "general"
    def test_artifact_id_is_deterministic(self):
        r1 = summarize_tool_output("tool", "output", "s", "t")
        r2 = summarize_tool_output("tool", "output", "s", "t")
        assert r1["artifact_id"] == r2["artifact_id"]
    def test_artifact_id_length(self):
        result = summarize_tool_output("tool", "output", "s", "t")
        assert len(result["artifact_id"]) == 16
 # =====================================================================
 # Tests: compact_sections
 # =====================================================================
 class TestCompactSections:
    def test_never_removes_immutable_rules(self):
        sections = [
            Section(
                section_type="immutable_rules",
                content="You must always follow these rules " * 20,
                priority=100,
            ),
            Section(
                section_type="working_context",
                content="Some working context data " * 50,
                priority=1,
            ),
        ]
        result = compact_sections(sections, max_tokens=50)
        types = [s.section_type for s in result]
        assert "immutable_rules" in types
    def test_respects_priority_order(self):
        """Secciones de mayor prioridad sobreviven a la compactacion.
        Usamos un budget que cabe la seccion alta pero no ambas."""
        high = Section(
            section_type="task_state",
            content="Important task data here",  # ~6 tokens
            priority=90,
        )
        low = Section(
            section_type="working_context",
            content="Low priority stuff " * 50,  # ~250 tokens
            priority=1,
        )
        # Budget suficiente para high (~6) pero no para high+low (~256)
        result = compact_sections([high, low], max_tokens=20)
        types = [s.section_type for s in result]
        assert "task_state" in types
        # La de baja prioridad deberia haberse eliminado o compactado
        assert len(result) <= 2
    def test_no_compaction_when_within_budget(self):
        sections = [
            Section(
                section_type="task_state",
                content="Short content",
                priority=50,
            ),
        ]
        result = compact_sections(sections, max_tokens=999_999)
        assert len(result) == 1
        assert result[0].content == "Short content"
    def test_deduplicates_identical_sections(self):
        sections = [
            Section(section_type="working_context", content="duplicated content", priority=10),
            Section(section_type="working_context", content="duplicated content", priority=10),
        ]
        result = compact_sections(sections, max_tokens=999_999)
        assert len(result) == 1
--- a/tests/test_cost_calculation.py
+++ b/tests/test_cost_calculation.py
@@ -0,0 +1,71 @@
 """Tests para el calculo de costes del orquestador.
 Replica la formula de coste de OrchestratorEngine._run_pipeline():
    cost_usd = (input_tokens / 1_000_000) * cost_per_1m_input
             + (output_tokens / 1_000_000) * cost_per_1m_output
 Defaults: cost_per_1m_input=2.50, cost_per_1m_output=15.00
 """
 import pytest
 def calculate_cost(
    input_tokens: int,
    output_tokens: int,
    cost_per_1m_input: float = 2.50,
    cost_per_1m_output: float = 15.00,
 ) -> float:
    """Replica exacta de la formula de coste en engine.py."""
    return (
        (input_tokens / 1_000_000) * cost_per_1m_input
        + (output_tokens / 1_000_000) * cost_per_1m_output
    )
 class TestCostCalculation:
    def test_1m_input_tokens(self):
        cost = calculate_cost(1_000_000, 0)
        assert cost == pytest.approx(2.50)
    def test_1m_output_tokens(self):
        cost = calculate_cost(0, 1_000_000)
        assert cost == pytest.approx(15.00)
    def test_500k_input_100k_output(self):
        cost = calculate_cost(500_000, 100_000)
        # (500_000 / 1_000_000) * 2.50 + (100_000 / 1_000_000) * 15.00
        # = 1.25 + 1.50 = 2.75
        assert cost == pytest.approx(2.75)
    def test_zero_tokens(self):
        cost = calculate_cost(0, 0)
        assert cost == 0.0
    def test_custom_pricing(self):
        cost = calculate_cost(
            1_000_000, 1_000_000,
            cost_per_1m_input=3.00,
            cost_per_1m_output=10.00,
        )
        assert cost == pytest.approx(13.00)
    def test_small_token_count(self):
        """Pocos tokens = coste muy bajo pero no cero."""
        cost = calculate_cost(100, 50)
        assert cost > 0
        assert cost < 0.01
    def test_round_to_6_decimals(self):
        """El engine hace round(cost_usd, 6)."""
        cost = calculate_cost(1, 1)
        rounded = round(cost, 6)
        # (1/1M)*2.50 + (1/1M)*15.00 = 1.75e-05
        # round(1.75e-05, 6) = 1.7e-05 (banker's rounding: 5 rounds to even)
        assert rounded == pytest.approx(0.000017, abs=1e-7)
    def test_output_more_expensive_than_input(self):
        """Con defaults, output es 6x mas caro que input."""
        input_cost = calculate_cost(1_000_000, 0)
        output_cost = calculate_cost(0, 1_000_000)
        assert output_cost == pytest.approx(input_cost * 6.0)
--- a/tests/test_fingerprint.py
+++ b/tests/test_fingerprint.py
@@ -0,0 +1,61 @@
 """Tests para la logica de fingerprinting/deduplicacion de tool calls.
 Replica la logica de BaseAgent.execute() (lineas con hashlib.md5) sin
 necesidad de instanciar BaseAgent ni sus dependencias.
 """
 import hashlib
 import json
 import pytest
 def compute_fingerprint(tool_name: str, args: dict) -> str:
    """Replica exacta de la logica de fingerprint en BaseAgent.execute()."""
    fp_raw = f"{tool_name}:{json.dumps(args, sort_keys=True)}"
    return hashlib.md5(fp_raw.encode()).hexdigest()
 class TestFingerprint:
    def test_same_tool_same_args_same_fingerprint(self):
        fp1 = compute_fingerprint("read_file", {"path": "/index.html"})
        fp2 = compute_fingerprint("read_file", {"path": "/index.html"})
        assert fp1 == fp2
    def test_same_tool_different_args_different_fingerprint(self):
        fp1 = compute_fingerprint("read_file", {"path": "/index.html"})
        fp2 = compute_fingerprint("read_file", {"path": "/style.css"})
        assert fp1 != fp2
    def test_different_tool_same_args_different_fingerprint(self):
        fp1 = compute_fingerprint("read_file", {"path": "/index.html"})
        fp2 = compute_fingerprint("write_file", {"path": "/index.html"})
        assert fp1 != fp2
    def test_fingerprint_is_md5_hex_32_chars(self):
        fp = compute_fingerprint("any_tool", {"key": "value"})
        assert len(fp) == 32
        assert all(c in "0123456789abcdef" for c in fp)
    def test_arg_order_does_not_matter(self):
        """json.dumps con sort_keys=True normaliza el orden."""
        fp1 = compute_fingerprint("tool", {"b": 2, "a": 1})
        fp2 = compute_fingerprint("tool", {"a": 1, "b": 2})
        assert fp1 == fp2
    def test_empty_args(self):
        fp = compute_fingerprint("tool", {})
        assert len(fp) == 32
        # Debe ser determinista
        assert fp == compute_fingerprint("tool", {})
    def test_nested_args(self):
        args = {"filter": {"table": "pages", "status": "active"}, "limit": 10}
        fp1 = compute_fingerprint("search", args)
        fp2 = compute_fingerprint("search", args)
        assert fp1 == fp2
    def test_different_nested_values(self):
        fp1 = compute_fingerprint("search", {"filter": {"status": "active"}})
        fp2 = compute_fingerprint("search", {"filter": {"status": "draft"}})
        assert fp1 != fp2
--- a/tests/test_key_data_extraction.py
+++ b/tests/test_key_data_extraction.py
@@ -0,0 +1,152 @@
 """Tests para la logica de _extract_key_data_from_results del OrchestratorEngine.
 Se replica la funcion como logica pura, sin importar src/ (evita dependencias).
 Los ToolExecution se representan como SimpleNamespace con .arguments y .tool_name.
 """
 import json
 from types import SimpleNamespace
 from typing import Any
 import pytest
 def _make_tool_execution(tool_name: str, arguments: dict, raw_output: str = "") -> SimpleNamespace:
    """Crea un objeto similar a ToolExecution con los atributos necesarios."""
    return SimpleNamespace(
        tool_name=tool_name,
        arguments=arguments,
        raw_output=raw_output,
    )
 def _make_result(*tool_executions) -> dict:
    return {"tool_executions": list(tool_executions), "content": "ok"}
 def extract_key_data_from_results(results: list) -> dict:
    """Replica exacta de OrchestratorEngine._extract_key_data_from_results."""
    key_data: dict[str, Any] = {}
    seen_tables: dict[str, list] = {}
    seen_sections: list = []
    seen_modules: list = []
    seen_pages: dict[str, int] = {}
    for result in results:
        for te in result.get("tool_executions", []):
            args = te.arguments
            name = te.tool_name
            table = args.get("tableName", "")
            record = args.get("recordNum")
            if table and record:
                record_int = int(record) if str(record).isdigit() else None
                if record_int and table not in seen_tables:
                    seen_tables[table] = []
                if record_int and record_int not in seen_tables.get(table, []):
                    seen_tables[table].append(record_int)
            section = args.get("sectionId", "")
            if section and section not in seen_sections:
                seen_sections.append(section)
            module = args.get("moduleId", "") or args.get("moduleName", "")
            if module and module not in seen_modules:
                seen_modules.append(module)
            if te.raw_output and "enlace" in te.raw_output:
                try:
                    for line in te.raw_output.splitlines():
                        line = line.strip()
                        if line.startswith("{"):
                            try:
                                data = json.loads(line)
                                if "enlace" in data and "num" in data:
                                    page_key = data.get("name", data["enlace"])
                                    seen_pages[page_key] = int(data["num"])
                            except json.JSONDecodeError:
                                pass
                except Exception:
                    pass
    if seen_tables:
        key_data["tables"] = {t: nums[:10] for t, nums in seen_tables.items()}
    if seen_sections:
        key_data["sections"] = seen_sections[:20]
    if seen_modules:
        key_data["modules"] = seen_modules[:20]
    if seen_pages:
        key_data["pages"] = dict(list(seen_pages.items())[:20])
    return key_data
 # =====================================================================
 # Tests
 # =====================================================================
 class TestExtractKeyDataFromResults:
    def test_extracts_table_and_record(self):
        te = _make_tool_execution("update_record", {"tableName": "pages", "recordNum": "42"})
        key_data = extract_key_data_from_results([_make_result(te)])
        assert "tables" in key_data
        assert "pages" in key_data["tables"]
        assert 42 in key_data["tables"]["pages"]
    def test_extracts_section_id(self):
        te = _make_tool_execution("get_section", {"sectionId": "hero-banner"})
        key_data = extract_key_data_from_results([_make_result(te)])
        assert "sections" in key_data
        assert "hero-banner" in key_data["sections"]
    def test_extracts_module_id(self):
        te = _make_tool_execution("compile_module", {"moduleId": "gallery-slider"})
        key_data = extract_key_data_from_results([_make_result(te)])
        assert "modules" in key_data
        assert "gallery-slider" in key_data["modules"]
    def test_extracts_module_name_fallback(self):
        te = _make_tool_execution("compile_module", {"moduleName": "contact-form"})
        key_data = extract_key_data_from_results([_make_result(te)])
        assert "modules" in key_data
        assert "contact-form" in key_data["modules"]
    def test_empty_results(self):
        key_data = extract_key_data_from_results([])
        assert key_data == {}
    def test_no_tool_executions_in_result(self):
        key_data = extract_key_data_from_results([{"content": "x", "tool_executions": []}])
        assert key_data == {}
    def test_result_without_tool_executions_key(self):
        key_data = extract_key_data_from_results([{"content": "just text"}])
        assert key_data == {}
    def test_tool_execution_without_relevant_args(self):
        te = _make_tool_execution("read_file", {"path": "/var/www/index.html"})
        key_data = extract_key_data_from_results([_make_result(te)])
        assert key_data == {}
    def test_multiple_tables_and_records(self):
        te1 = _make_tool_execution("update_record", {"tableName": "pages", "recordNum": "1"})
        te2 = _make_tool_execution("update_record", {"tableName": "pages", "recordNum": "5"})
        te3 = _make_tool_execution("get_record", {"tableName": "blog", "recordNum": "10"})
        key_data = extract_key_data_from_results([_make_result(te1, te2, te3)])
        assert 1 in key_data["tables"]["pages"]
        assert 5 in key_data["tables"]["pages"]
        assert 10 in key_data["tables"]["blog"]
    def test_deduplicates_records(self):
        te1 = _make_tool_execution("a", {"tableName": "t", "recordNum": "7"})
        te2 = _make_tool_execution("b", {"tableName": "t", "recordNum": "7"})
        key_data = extract_key_data_from_results([_make_result(te1, te2)])
        assert key_data["tables"]["t"].count(7) == 1
    def test_extracts_pages_from_raw_output(self):
        raw = '{"enlace": "/contacto", "num": 15, "name": "Contacto"}\nother line'
        te = _make_tool_execution("list_pages", {"tableName": "web"}, raw_output=raw)
        key_data = extract_key_data_from_results([_make_result(te)])
        assert "pages" in key_data
        assert key_data["pages"]["Contacto"] == 15
Author	SHA1	Message	Date
Jordan Diaz	6978764540	Tests unitarios: 51 tests para compactor, key_data, fingerprint y costes - tests/test_compactor.py: 24 tests (estimate_tokens, extract_facts, build_summary, summarize_tool_output, compact_sections) - tests/test_key_data_extraction.py: 11 tests (extracción de tables, records, sections, modules, pages desde tool executions) - tests/test_fingerprint.py: 8 tests (deduplicación MD5, sort_keys, nested args) - tests/test_cost_calculation.py: 8 tests (pricing formula, custom pricing, rounding) - README.md: sección Tests con instrucciones de ejecución Todos offline, sin Docker/Redis/LLM. Ejecutar: python3 -m pytest tests/ -v Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 14:28:32 +00:00
Jordan Diaz	7c891cf023	Token tracking y cálculo de costes por mensaje - Config: COST_PER_1M_INPUT y COST_PER_1M_OUTPUT configurables via .env - OpenAI adapter: stream_options include_usage para capturar tokens reales - base.py: acumula input/output tokens de cada iteración del agente - planner.py: devuelve usage junto con el plan - engine.py: suma tokens de planner + steps + review, calcula coste USD - Response incluye usage{input_tokens, output_tokens} y total_cost_usd Formato compatible con el bridge de Claude Code CLI para integración con el frontend y reporting a Acai webservice. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 14:18:23 +00:00
Jordan Diaz	2712c2fd49	Docs: create_module es legacy, acai_write es el flujo estándar El server compila automáticamente al guardar index-base.tpl via acai_write — no necesita create_module ni compile_module manual. - mcp-tools-reference.md: flujo actualizado, create_module marcado legacy - module-creation-guide.md: paso 2 usa acai_write - ACAI-CLAUDE.md: key workflows actualizados - coder.py: system prompt alineado Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 13:40:47 +00:00
Jordan Diaz	7bdb943e7f	Fix problemas detectados en evaluación: historial, prompting, artifacts 1. Task history preserva key_data estructurado (recordNums, sectionIds, moduleIds, pages) extraído de las tool executions reales — el modelo retiene contexto entre tasks sin re-consultar. 2. Coder system prompt mejorado: instrucciones explícitas sobre qué tool usar para cada operación (create_module vs create_or_update_record), consultar knowledge base antes de actuar, y reutilizar key_data del historial. 3. Eliminado artifact_memory y working_context del coder context_sections — ya no son necesarios con conversación real. Reduce acumulación de artifacts en el context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 13:29:09 +00:00
Jordan Diaz	3aa7a463d0	Fix LOCAL_SERVER_URL para escritura de archivos en Docker El MCP acai-code usa HTTP al server Python para operaciones de ficheros (write, view, delete). En Docker, el server Python está en el container app:9091, no en localhost:29871 (legacy local). - mcp.json: env LOCAL_SERVER_URL=http://app:9091 para acai-code Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 13:19:20 +00:00
Jordan Diaz	414210bceb	Playwright con Chromium + fetch con uvx + executable-path fix - Dockerfile: instalar deps sistema Playwright (root) + Chromium (appuser) - Dockerfile: instalar uv/uvx para mcp-server-fetch - mcp.json: --executable-path apunta al Chromium instalado por appuser - Eliminar entrypoint (dnsmasq resuelve DNS dinámicamente) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 12:39:16 +00:00
Jordan Diaz	b88917c18d	Rediseño tool results + compactación por step + integración Docker - Tool results completos en conversación (como Claude Code/Cursor) en vez de resúmenes en system prompt - Parser multi-tool: trackea tool calls por tool_call_id para OpenAI streaming interleaved - Deduplicación por fingerprint + detección de loop cuando todos los calls de un step son duplicados - Compactación inteligente por step: el orquestador decide cuándo comprimir steps anteriores (cambio de agente o >3 steps) - stdio.js lee URLs del .acai como fallback (local_web_url, local_forge_host) - Buffer MCP aumentado a 1MB para respuestas grandes - Dockerfile adaptado para build context desde raíz del proyecto Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 12:09:08 +00:00