Changelog
v0.5.0
Project-scoped conversations, unified file workspace, Monaco code viewer, TODO approval flow, comprehensive agent guidance, and test infrastructure improvements.
Project-Scoped Conversations
- Mandatory projects -- Every conversation must belong to a project; the "All Conversations" concept has been removed entirely
- Onboarding project creation -- 3-step onboarding flow: Providers > Model > Create First Project. Users name their project and understand the concept from the start
- Orphan cleanup -- Existing conversations without a project are automatically deleted when the conversation list is loaded
- Project selector -- No more null state; always shows a real project. Dropdown removed the "All" option
Unified File Workspace
- Resources materialized as files -- PLAN.md, completion reports, and paper drafts are now written as real files to the project workspace (
.project-meta/plans/,.project-meta/reports/,papers/) and appear in the Files panel - Resources panel removed -- The separate Resources section in the right panel has been replaced by the unified FileTree. Tasks tab renamed to Todos and displayed alongside Files as stacked collapsible panels (no more tab buttons)
- workspace_files_changed SSE event -- FileTree auto-refreshes when resources are written to the workspace
- File badges -- Pin icon for PLAN.md, clipboard icon for reports, book icon for paper drafts in the FileTree
Monaco Code Viewer
- Agent/Editor tabs -- Main content area now has two tabs: Agent (chat) and Editor (read-only code viewer)
- Monaco Editor integration -- VS Code engine (
@monaco-editor/react) with full syntax highlighting, minimap, line numbers, bracket pair colorization, and word wrap. Supports 15+ languages (Python, TypeScript, JSON, Markdown, YAML, LaTeX, shell, etc.) - Multi-tab file viewer -- Open multiple files as tabs in the Editor, close individually. Closing the last tab switches back to the Agent tab
- FileTree integration -- Clicking a file in the Files panel opens it in the Editor tab; inline preview removed
Agent Improvements
- Message persistence fix -- Assistant text that precedes tool calls is now persisted to the database via an
assistant_messageevent, surviving page refreshes - Plan-mode research budget -- Warns after 5+ research tool calls in Plan mode; guides the agent to save comprehensive research for Execute mode tasks
- TODO approval in Execute mode -- Creating or adding tasks in Execute mode requires user approval via a dedicated review UI with current vs. proposed diff view and inline task editing
- Blank section enforcement -- Writing tool warns about incomplete sections after each write;
get_draftappends a WARNING listing all[Not yet written]placeholders
Workspace Targeting
- Project workspace auto-targeting --
read,write,edit, andbashtools automatically target the active project workspace directory. Files created by the agent appear in the Files panel - Workspace context wiring -- Both inline and Celery worker sessions resolve the project workspace and set context for workspace tools (knowledge graph, notes) and local tools (file operations)
Comprehensive Agent Guidance
- System prompt v7 -- Full rewrite with a Tool Selection Decision Tree, Workspace Structure reference, expanded Code Execution constraints, and strengthened Plan/Execute mode rules
- Expanded tool descriptions --
bash(Docker constraints, common patterns),read(offset/limit usage),write(workspace targeting, anti-pattern for papers),edit(find-and-replace semantics),research(available tools, iteration limits),writing(full workflow, placeholder enforcement),plan_tool(enforcement rules, workspace file locations) - Research sub-agent prompt -- Now lists available tools, constraints (60 iterations, ~190k token budget), stop conditions, error handling guidance, and clarifies that the sub-agent cannot write to the workspace
Testing & Infrastructure
- Pytest hang fix -- Removed deprecated
event_loopsession fixture, added_dispose_engine_at_exitfor async engine cleanup, setasyncio_mode = "auto"in pytest config. Full backend suite now exits cleanly in ~50s - 47 new backend tests -- Message persistence, research budget, project workspace targeting, incomplete section warnings, orphan cleanup, workspace path resolution, TODO approval Redis helpers, report persistence
- 22 new frontend tests -- EditorPanel (12 tests), FileTree (10 tests)
- ProjectSelector tests -- 12 tests verifying no-null project selection
- TodoReviewDrawer tests -- 10 tests for the approval UI
- Total: 858 backend + 226 frontend = 1,084 tests
Bug Fixes
- Fixed silent exception swallowing in
_wire_persistence(now logs errors vialogger.exception) - Fixed inline-mode SSE events missing
conversation_uuid(prevented cross-conversation event leakage) - Fixed
pnpm-lock.yamlnot including@monaco-editor/reactfor Docker builds - Fixed
test_list_conversationsandtest_conversations_isolated_by_userfor project-scoped queries - Fixed
test_conversations.pyintegration tests to create conversations with a project
v0.4.0
Projects, workspaces, interactive terminal, multi-provider model picker, security hardening, and centralized version management.
Projects & Workspaces
- Project management — Create, rename, archive, and manage projects via the sidebar or Settings. Each project gets a persistent workspace directory with standard subdirectories (
code/,data/,papers/,research/, etc.) - Default project — Every user gets an auto-created "All Conversations" project; conversations can be moved between projects
- File browser — Interactive file tree panel for browsing project workspace files, with lazy-loading directories, file-type icons, inline text preview, and file sizes on hover
- File operations API — Browse, read, write, delete, and upload files within project workspaces via REST endpoints
- Knowledge graph — Per-project persistent knowledge graph (backed by
networkx) for cross-conversation memory. Supports 10 entity types (paper, concept, method, dataset, etc.) and 13 relationship types (cites, implements, extends, etc.). Context is injected into new conversations automatically - Workspace persistence — Saves search results, research notes, parsed papers, tool failure logs, compute probes, experiment logs, and cross-conversation state to the project workspace
- Workspace agent tools — New
workspacetool with 8 operations:status,search,note,knowledge_add,knowledge_relate,knowledge_query,knowledge_summary, andrecent_failures - Project manage modal — Bulk project management UI with inline rename, conversation counts, and delete with confirmation
Interactive Terminal
- WebSocket PTY terminal — Full interactive bash shell in the browser via xterm.js, connected to a real server-side PTY process scoped to the project workspace
- Environment scrubbing — Terminal sessions receive a minimal allowlisted environment; server secrets (API keys, DATABASE_URL, JWT_SECRET_KEY, etc.) are never exposed
- Terminal UI — Maximize/minimize toggle, connection status indicator, manual reconnect, 5,000-line scrollback, JetBrains Mono font
Model Picker & Providers
- Multi-provider model catalog — Browse models from OpenAI, Anthropic, OpenRouter, OpenCode Go, Ollama, and LM Studio in a unified picker. Live model list fetched from
models.devwith hardcoded fallbacks - Custom providers — Register arbitrary OpenAI-compatible, Anthropic SDK, OpenRouter, or LiteLLM endpoints with custom base URLs and API keys. Fetch and cache model lists from custom endpoints
- Model picker UI — Two-tab modal with search/filter, provider logos, recently used models (up to 5), and a custom model ID input for arbitrary model strings
- Provider settings — Tabbed settings page (Models, Search, Papers, Compute, Others) with per-provider status indicators, inline API key management, and an "Add Custom Provider" modal
- Automatic SDK routing — Models are routed to the correct SDK (OpenAI or Anthropic) based on provider and model name. Custom providers specify their SDK type explicitly
- Recently used models — Last 5 used models tracked per-user and shown at the top of the picker
Security
- Path traversal prevention — All file operations use
Path.relative_to()containment checking; symlinks pointing outside the workspace are blocked for both reads and deletes - Resource limits — File uploads capped at 100 MB, writes at 10 MB, reads truncated at 500 KB; knowledge graph limited to 10,000 nodes / 50,000 edges; terminal input capped at 4 KB per message
- Process isolation — Terminal shells spawned with
start_new_session=True,close_fds=True,--norc --noprofile; zombie prevention via SIGTERM/SIGKILL escalation with properwaitpidreaping - Protected directories — Top-level workspace directories (
code,data,papers, etc.) cannot be deleted via the API - Config allowlist —
POST /api/configonly accepts a hardcoded list of 10 environment variable names
Version Management
- Single source of truth — New
VERSIONfile at repo root; all version references derive from it - Automatic version bumping —
make version-patch,make version-minor,make version-majorcommands bump and propagate across the entire monorepo - Explicit version setting —
make version-set V=X.Y.Zfor arbitrary version changes - Version display — Current version shown in the webapp sidebar footer and docs site footer
Bug Fixes
- Fixed
test_list_projects_apifailing due to auto-created default project not being accounted for - Fixed
KeyManagerpath to use/app/.keysinstead of/.keys - Fixed dev entrypoint to always sync dependencies on startup
- Fixed eslint pre-commit hook path resolution for new files
- Added missing
python-multipartdependency for file upload endpoint
Internal
- 65 new tests for projects, workspaces, knowledge graph, and workspace tools (740 backend tests, 186 frontend tests total)
- Unified version strings: removed hardcoded versions from
app.py,health.py,test_app.py - Changelog moved to repo root; synced into docs site at build time with VitePress frontmatter
- Pydantic request/response models for all API endpoints (
backend/openmlr/models.py)
v0.3.0
Compute environments, UI improvements, and bug fixes.
Compute Environments
- Multi-backend compute — Execute code on local Docker, SSH remotes, or Modal cloud
- SSH key management — Generate Ed25519/RSA keys, upload existing keys via Settings > Compute
- Compute probing — Detect OS, GPUs (with CUDA version), Python versions, and disk space
- Compute selection — Switch between configured compute nodes mid-conversation
- Connection pooling — SSH connections are reused across tool calls for performance
- Docker-in-Docker detection — Worker container executes commands directly when already in Docker
UI Improvements
- Collapsible Tasks & Resources — Click section headers in the right panel to collapse/expand
- Fixed right panel layout — Right panel no longer causes page scroll issues when toggled
- Improved scroll behavior — Message list scrolls correctly without affecting page layout
Bug Fixes
- Fixed
scrollIntoViewcausing entire page to scroll when RightPanel is open - Fixed
_get_draftdatabase call using wrong function name - Fixed test suite failures related to async database mocking
Internal
- Added
_running_in_container()detection for Docker Compose worker environments - Improved test coverage for compute and writing tools
- Updated Settings nav to reflect current menu structure
v0.2.0
Major rewrite of the mode system, paper writing, processing architecture, and UI routing.
Mode System
- Simplified from Plan/Research/Write to Plan + Execute (two modes)
- Plan mode: ask questions, gather context, create plans. No execution.
- Execute mode: all tools available. Follow the plan.
- Toggle with P/E button,
Cmd+B(Plan), orCmd+E(Execute) - Amber border for Plan messages, blue border for Execute messages
- Three-layer mode enforcement: system prompt, tool filtering, runtime blocking
Paper Writing
- Writing tool with auto-save to database — survives across workers and restarts
- Paper preview in the Paper tab in the UI
- Client-side export to Markdown and LaTeX
- Outline, sections, and bibliography managed as structured data
Processing Architecture
- Per-conversation processing state — multiple conversations run in parallel
- Background jobs via Celery + Redis — close the browser, come back later
- Redis-based interrupt relay — actually kills running worker tasks, not just a flag check
- Sub-agent streaming — research tool shows nested tool calls in real-time
Settings & UI
- Settings as routed pages (
/settings/providers,/settings/agent,/settings/sandbox,/settings/writing) — no longer a modal - Sticky model selection — persisted per-user in the database
- Onboarding flow — guided setup when no LLM provider is configured
- Route restructure — app served from
/instead of/app - SSE reconnection catch-up — missed events replayed on reconnect
- PLAN.md auto-generated and pinned in resources panel
Testing & CI
- 149 backend tests + 29 frontend tests — comprehensive coverage
- GitHub CI — tests run on push and pull request
make testruns all tests (backend + frontend + docs build)make test-backend,make test-frontend,make test-docsfor targeted runs
