Prerequisites:
- Python 3.10 or higher (3.11+ recommended)
- Basic understanding of async/await patterns
- SQLite3
Architecture Overview
CosmaSense uses a monorepo structure with three main packages:- Backend (
packages/cosma-backend): Quart async web framework + Uvicorn - TUI (
packages/cosma-tui): Textual framework for terminal UI - CLI (root): Click-based orchestrator
Tech Stack
Backend Framework
| Component | Technology |
|---|---|
| Web Framework | Quart (async Flask) |
| ASGI Server | Uvicorn |
| Validation | QuartSchema |
| Database | asqlite (async SQLite) |
| File Watching | watchdog |
AI & Search
| Component | Technology |
|---|---|
| Vector Search | sqlite-vec |
| Keyword Search | FTS5 (Full-Text Search) |
| LLM Backend | LiteLLM/Ollama |
| Embeddings | sentence-transformers (e5-base-v2) |
| File Parsing | MarkItDown (20+ formats) |
Processing Pipeline
Files go through a 4-stage pipeline (seepipeline.py:56-174):
1
Discovery
Recursively scan directories and collect file metadata. Only files with modified timestamps different from the database are processed.
2
Parsing
Extract text from 20+ file formats using MarkItDown. Calculate content hash to detect changes.Supported formats:
- Documents: PDF, DOCX, TXT, MD
- Images: PNG, JPG, GIF (with OCR)
- Code: PY, JS, TS, JAVA, etc.
- Spreadsheets: XLSX, CSV
3
Summarization
AI generates:
- Title
- Summary (max 100 words)
- 3-5 relevant keywords
4
Embedding
Create 768-dimensional vectors using the e5-base-v2 model for semantic search.Embeddings stored in
file_embeddings virtual table with triggers to keep in sync.Hybrid Search System
CosmaSense combines two search methods (seesearcher.py:91-220):
Semantic Search (Vector Similarity)
- Embeds query using same e5-base-v2 model
- Calculates cosine similarity against file embeddings
- Score:
exp(-distance)scaled to 0-0.5 range
Keyword Search (FTS5)
- SQLite FTS5 searches content/title/keywords
- Uses BM25 ranking algorithm
- Score: relevance scaled to 0-0.5 range
Combined Scoring
Database Schema
Seeschema.sql for full details:
Async Programming in CosmaSense
CosmaSense uses Python’s async/await for non-blocking I/O operations.Key Concepts
Real-time Updates
Server-Sent Events (SSE)
The backend uses SSE to push updates to the TUI:file_parsing: File is being parsedfile_summarizing: AI is generating summaryfile_embedding: Creating vector embeddingfile_complete: File successfully indexedfile_failed: Error during processing
File Watching
Useswatchdog library to monitor filesystem changes:
Development Setup
1
Clone Repository
2
Install Dependencies
3
Run Backend
4
Run TUI
Testing
Troubleshooting
RuntimeWarning: coroutine was never awaited
RuntimeWarning: coroutine was never awaited
You forgot to use
await when calling an async function:App freezes during file processing
App freezes during file processing
You’re running blocking code in an async function. Use
asyncio.to_thread():Database locked errors
Database locked errors
Multiple async operations trying to write simultaneously. Use proper connection pooling: