Developer Reference Specs v2.5

Inside the Mitorix Engine

Mitorix is built on the principle that codebases are multidimensional spatial structures. This document details our parsing, vector-indexing, telemetry, and rendering engines.

1. Asynchronous Ingestion & Sandbox Lifecycle

The onboarding of a repository initiates an event-driven flow managed by **BullMQ** and backed by a **Redis** job queue. Because loading massive code structures blocks standard thread loops, ingestion returns an immediate HTTP status 202 Accepted, shifting the execution to specialized background processes.

Engine Process Sequence:

  • A.
    Validation & Authentication:

    Passport.js maps GitHub OAuth scopes to retrieve active personal access tokens. Private integrations authenticate using a server-side JSON Web Token keyed to a registered GitHub App client.

  • B.
    Isolated Sandbox Disk Mount:

    The worker mounts a short-lived storage volume to the host filesystem directory path /tmp/mitorix-<repo_id> and performs a shallow git clone with depth limit of 1.

  • C.
    Telemetry & Analysis Dispatch:

    Codeflow scanners index all file structures, triggering language compilers to process Abstract Syntax Trees and generate coordinate vector embeds.

  • D.
    Secure Wipe Sequence:

    Once vectors are upserted into Qdrant and schemas cached in MongoDB, the script initiates a cleanup utility. The directory path is entirely expunged from the physical disk, maintaining zero local codebase storage.

2. Tree-sitter AST Structural Deconstruction

Mitorix bypasses flat, regex-based patterns to parse scope boundaries. We utilize **Tree-sitter WASM modules** compiled directly to native code blocks. Tree-sitter parses the repository's syntax structures into node streams representing classes, function bodies, imports, exports, and call parameters.

RESOLVED AST SCHEMAS (11 Languages)
● TypeScript (.ts, .tsx)
● JavaScript (.js, .jsx)
● Python (.py)
● Go Lang (.go)
● Rust (.rs)
● Java (.java)
● C / C++ (.c, .cpp)
● C# (.cs)
● Ruby (.rb)
● PHP (.php)
● VBA (.vba)
SCOPED AST NODE ENTITIES

Nodes represent specific definitions: classes, function blocks, variable declarations, and dependency connections. Lines are indexed by start/end coordinates so users can jump straight to scopes within Monaco.

*Fallback parses use lightweight Regex algorithms for files outside primary models.

3. Qdrant Vector DB & RAG Pipeline

Semantic lookup queries utilize the **Qdrant Vector Database**. Mitorix breaks functions and classes into discrete code blocks. We feed these blocks through **Nomic-embed-text** (running locally via **Ollama**) to generate 768-dimensional vectors representing semantic context — fully local, zero API cost, and zero data leakage.

768
Vector Dimensionality (Nomic)
Cosine Similarity
Distance Metric
HNSW Graph
Vector Search Index
Payload Filtering Rules:

To isolate lookups per MitorixSpace and prevent bleed, Qdrant vectors store metadata payloads. The API enforces strict field matching checks for repo_id on every query.

When a user asks a question, we embed the query text locally via Nomic, retrieve the top 15 matching AST code blocks, and combine them with MongoDB file metadata to construct the RAG prompt for Google Gemini LLM, generating exact codebase references.

4. 7 D3.js Graphic Visualization Modes

The CodeFlow engine translates AST entities and caller connections into interactive visualizations. D3.js scales vectors, handles canvas rendering, and coordinates hover logic with the Monaco Editor.

1. Force-Directed Node Graph

Maps files as nodes and caller-callee loops as directed edges. Includes 5 customized layout algorithms: Force-Directed, Radial spreads, Hierarchical trees, strict Grid coordinates, and Metro lines.

2. Treemap View (d3-hierarchy)

Calculates nested directories as mosaic grids. Individual node rectangles represent files, with dimensions mapped to Lines of Code (LOC) and colors indicating folder structures.

3. Adjacency Dependency Matrix

Renders a grid matrix identifying imports and functions. Columns and rows represent files. Deep colored intersections flag dense coupling zones.

4. Hierarchical Dendrogram

Extracts namespace mappings and projects them as horizontal node clusters. Smooth Bezier lines link folders to files, highlighting hierarchy.

5. Sankey Flow Diagrams

Rolls up call streams into path flows between directories. Displays width scales mapped directly to the volume of inter-folder function call references.

6. Disjoint Clusters View

Groups files into distinct force boundaries based on folders. Surrounds components with bounding hulls to reveal inter-module connections.

7. Circular Edge Bundling Layout

Positions active code files in a radial coordinate circle. Files are grouped by parent folders. Quadratic bezier arcs curve inside the circle to connect caller nodes, tracing cyclic bindings.

Render Limits:Matrix: Max 40 filesDendrogram: Max 80 filesBundle: Max 70 filesDisjoint: Max 100 files

5. Telemetry & Analytics Algorithms

Cyclomatic Complexity

Evaluates cognitive complexity by traversing AST node types. It sums decision points by matching branch controls: if, else, for, while, switch, catch, as well as logical operations &&, ||, and ternary operators.

Dynamic Health Score Engine

Calculations start at a baseline of 100 points, applying deductions based on code quality and complexity metrics:

Rule Metric TriggerMaximum Deduction Impact
Vulnerability Leaks (SQL Injection, Credentials, XSS)-20 points total (-5 per issue)
Circular/Cyclic Dependency Loops-20 points total (-5 per link)
God Files (Files containing >15 distinct functions)-15 points total (-3 per file)
Coupling Density (Input-Output ratio imbalances)-15 points max
Dead Code Blocks (Percentage of unused functions)-20 points max

Blast Radius BFS Traversal

Computes structural risk coordinates when a file is modified. The algorithm triggers a **Breadth-First Search (BFS)** across the dependency graph:

BFS Propagation Equation Parameters:

1. Identifies files directly importing the modified module (Depth 1).
2. Propagates downstream relationships up to Depth 3, applying a decay weight multiplier of 1 / Depth to transitive links.
3. Counts exported functions and traces incoming call counts to calculate centrality.
4. Sets the severity rating: **Critical** (≥8 direct links or ≥5 functions), **High** (≥4 links or ≥3 functions), **Medium** (≥2 links), or **Low**.

Pull Request Risk Scoring

Evaluates codebase risk for branch merges. The score ranges from 0 to 100 based on several factors:

+30 Max
Blast Radius Impact
+25 Max
Lines of Code Altered
+20 Max
Number of Files Modified
+25 Max
Core & Configuration Edits

Design Pattern & Security Rules

PATTERN RULES
  • Singleton: Detects getInstance methods and static instance markers.
  • Factory: Matches files named *factory* or containing `create` methods.
  • Observer: Traces event handler binds, calls to addEventListener, emit, or subscriptions.
  • Repository: Matches *repo* structures containing database actions.
SECURITY SCANNER REGEXES
  • SQL Injection: Detects SQL queries built with string concatenation.
  • XSS Injection: Matches usages of innerHTML and dangerouslySetInnerHTML.
  • Secrets: Traces variable keys matching API_KEY, PASSWORD, or TOKEN.
  • Weak Cryptography: Flags implementations using outdated algorithms like md5 or sha1.

6. Hands-Free Speech Pipeline

The hands-free voice feature operates a sequential audio pipeline to capture, analyze, and reply to queries in real-time.

1. CAPTURE (STT)
AssemblyAI stream API processes audio inputs into plain text.
2. REASON (RAG)
Nomic-embed-text (Ollama) matches query embeddings with indexed files in Qdrant. Gemini LLM generates the answer.
3. SYNTHESIS (TTS)
Murf AI converts Gemini's output into natural speech streams.

7. Production Infrastructure Stack

Mitorix is structured as a scalable, multi-container architecture.

CONTAINER DEFINITION SPECIFICATIONS
Nginx Reverse Proxy:

Listens on port 80, serving built Next.js frontend pages and proxying API calls to the Express application backend.

Database Engines:

MongoDB persists user metadata, file paths, and conversation logs. Redis manages BullMQ queues, while Qdrant stores high-dimensional semantic code vectors.

Environment Staging:

Backend parameters are isolated within environment variables, securing API keys for Gemini LLM, AssemblyAI, and Murf AI, alongside JWT secrets. Embeddings run locally via Ollama — no API key needed.

8. Vector MitorixLabs: Clone & Drift Telemetry

The MitorixLabs module repurposes Qdrant embeddings to unlock deep codebase intelligence algorithms that transcend traditional text-matching engines.

Semantic Clone Detection

A high-performance clustering algorithm scans the vector space comparing the Cosine distances of all AST fragments within the repository. It identifies hidden duplication where code logic is semantically identical, regardless of variable renaming or syntax formatting.

≥95% Near Identical Match
≥90% High Similarity Clone
≥85% Semantic Clone Boundary

Version Drift Tracking

When a codebase is re-indexed, Mitorix computes the vector distance between the old fingerprint and the new fingerprint mappings. This reveals semantic code drift that standard git text diffs fail to capture.

Computes % Drift per AST Entity
Rolls up average drift per File Path
Cross-Repository version diffing