Inside the Mitorix Engine
Mitorix is built on the principle that codebases are multidimensional spatial structures. This document details our parsing, vector-indexing, telemetry, and rendering engines.
1. Asynchronous Ingestion & Sandbox Lifecycle
The onboarding of a repository initiates an event-driven flow managed by **BullMQ** and backed by a **Redis** job queue. Because loading massive code structures blocks standard thread loops, ingestion returns an immediate HTTP status 202 Accepted, shifting the execution to specialized background processes.
Engine Process Sequence:
- A.Validation & Authentication:
Passport.js maps GitHub OAuth scopes to retrieve active personal access tokens. Private integrations authenticate using a server-side JSON Web Token keyed to a registered GitHub App client.
- B.Isolated Sandbox Disk Mount:
The worker mounts a short-lived storage volume to the host filesystem directory path
/tmp/mitorix-<repo_id>and performs a shallow git clone with depth limit of 1. - C.Telemetry & Analysis Dispatch:
Codeflow scanners index all file structures, triggering language compilers to process Abstract Syntax Trees and generate coordinate vector embeds.
- D.Secure Wipe Sequence:
Once vectors are upserted into Qdrant and schemas cached in MongoDB, the script initiates a cleanup utility. The directory path is entirely expunged from the physical disk, maintaining zero local codebase storage.
2. Tree-sitter AST Structural Deconstruction
Mitorix bypasses flat, regex-based patterns to parse scope boundaries. We utilize **Tree-sitter WASM modules** compiled directly to native code blocks. Tree-sitter parses the repository's syntax structures into node streams representing classes, function bodies, imports, exports, and call parameters.
Nodes represent specific definitions: classes, function blocks, variable declarations, and dependency connections. Lines are indexed by start/end coordinates so users can jump straight to scopes within Monaco.
3. Qdrant Vector DB & RAG Pipeline
Semantic lookup queries utilize the **Qdrant Vector Database**. Mitorix breaks functions and classes into discrete code blocks. We feed these blocks through **Nomic-embed-text** (running locally via **Ollama**) to generate 768-dimensional vectors representing semantic context — fully local, zero API cost, and zero data leakage.
To isolate lookups per MitorixSpace and prevent bleed, Qdrant vectors store metadata payloads. The API enforces strict field matching checks for repo_id on every query.
When a user asks a question, we embed the query text locally via Nomic, retrieve the top 15 matching AST code blocks, and combine them with MongoDB file metadata to construct the RAG prompt for Google Gemini LLM, generating exact codebase references.
4. 7 D3.js Graphic Visualization Modes
The CodeFlow engine translates AST entities and caller connections into interactive visualizations. D3.js scales vectors, handles canvas rendering, and coordinates hover logic with the Monaco Editor.
1. Force-Directed Node Graph
Maps files as nodes and caller-callee loops as directed edges. Includes 5 customized layout algorithms: Force-Directed, Radial spreads, Hierarchical trees, strict Grid coordinates, and Metro lines.
2. Treemap View (d3-hierarchy)
Calculates nested directories as mosaic grids. Individual node rectangles represent files, with dimensions mapped to Lines of Code (LOC) and colors indicating folder structures.
3. Adjacency Dependency Matrix
Renders a grid matrix identifying imports and functions. Columns and rows represent files. Deep colored intersections flag dense coupling zones.
4. Hierarchical Dendrogram
Extracts namespace mappings and projects them as horizontal node clusters. Smooth Bezier lines link folders to files, highlighting hierarchy.
5. Sankey Flow Diagrams
Rolls up call streams into path flows between directories. Displays width scales mapped directly to the volume of inter-folder function call references.
6. Disjoint Clusters View
Groups files into distinct force boundaries based on folders. Surrounds components with bounding hulls to reveal inter-module connections.
7. Circular Edge Bundling Layout
Positions active code files in a radial coordinate circle. Files are grouped by parent folders. Quadratic bezier arcs curve inside the circle to connect caller nodes, tracing cyclic bindings.
5. Telemetry & Analytics Algorithms
■ Cyclomatic Complexity
Evaluates cognitive complexity by traversing AST node types. It sums decision points by matching branch controls: if, else, for, while, switch, catch, as well as logical operations &&, ||, and ternary operators.
■ Dynamic Health Score Engine
Calculations start at a baseline of 100 points, applying deductions based on code quality and complexity metrics:
■ Blast Radius BFS Traversal
Computes structural risk coordinates when a file is modified. The algorithm triggers a **Breadth-First Search (BFS)** across the dependency graph:
BFS Propagation Equation Parameters:
1 / Depth to transitive links.■ Pull Request Risk Scoring
Evaluates codebase risk for branch merges. The score ranges from 0 to 100 based on several factors:
■ Design Pattern & Security Rules
- Singleton: Detects
getInstancemethods and static instance markers. - Factory: Matches files named
*factory*or containing `create` methods. - Observer: Traces event handler binds, calls to
addEventListener,emit, or subscriptions. - Repository: Matches
*repo*structures containing database actions.
- SQL Injection: Detects SQL queries built with string concatenation.
- XSS Injection: Matches usages of
innerHTMLanddangerouslySetInnerHTML. - Secrets: Traces variable keys matching
API_KEY,PASSWORD, orTOKEN. - Weak Cryptography: Flags implementations using outdated algorithms like md5 or sha1.
6. Hands-Free Speech Pipeline
The hands-free voice feature operates a sequential audio pipeline to capture, analyze, and reply to queries in real-time.
7. Production Infrastructure Stack
Mitorix is structured as a scalable, multi-container architecture.
Listens on port 80, serving built Next.js frontend pages and proxying API calls to the Express application backend.
MongoDB persists user metadata, file paths, and conversation logs. Redis manages BullMQ queues, while Qdrant stores high-dimensional semantic code vectors.
Backend parameters are isolated within environment variables, securing API keys for Gemini LLM, AssemblyAI, and Murf AI, alongside JWT secrets. Embeddings run locally via Ollama — no API key needed.
8. Vector MitorixLabs: Clone & Drift Telemetry
The MitorixLabs module repurposes Qdrant embeddings to unlock deep codebase intelligence algorithms that transcend traditional text-matching engines.
■ Semantic Clone Detection
A high-performance clustering algorithm scans the vector space comparing the Cosine distances of all AST fragments within the repository. It identifies hidden duplication where code logic is semantically identical, regardless of variable renaming or syntax formatting.
■ Version Drift Tracking
When a codebase is re-indexed, Mitorix computes the vector distance between the old fingerprint and the new fingerprint mappings. This reveals semantic code drift that standard git text diffs fail to capture.