System Overview - Hymalaia Documentation

This page gives you a high-level overview of how Hymalaia works. It’s designed to provide clarity and transparency into the system design so you can use it with confidence. If you’re looking to customize Hymalaia or become an open source contributor, this is a great place to begin.

System Architecture

Hymalaia can be deployed on a single instance or a container orchestration platform (e.g., Kubernetes). Regardless of where it’s deployed, the data flow remains consistent.

Documents are pulled and processed via connectors.
These are then stored in Vespa or Postgres, running in containers within your setup.

LLM Communication

The only time-sensitive data that leaves your system is when Hymalaia makes a request to a Large Language Model (LLM) for generating a response.

All communication with the LLM is encrypted.
Data persistence policies depend on your LLM hosting provider.

🕵️ Hymalaia also includes minimal, anonymous telemetry to help improve the platform and monitor flaky connectors.
You can disable telemetry by setting the following environment variable:

DISABLE_TELEMETRY=True

Embedding Flow

Each document is split into chunks (smaller sections) for processing. Benefits of chunking:

Only relevant parts are passed to the LLM → less noise.
Reduced cost: LLMs charge per token.
Improved detail retention: Embedding vectors have limits on how much info they can store.

Mini-Chunks

Mini-chunks go one step further:

Create multiple embedding sizes.
Improve retrieval of both high-level context and fine-grained details.
Can be toggled using environment variables.

⚠️ Note: Mini-chunks may slow down indexing on low-performance hardware.

Embedding Model

Hymalaia uses a state-of-the-art biencoder, optimized for:

Running on CPUs
Subsecond document retrieval

Query Flow

The query pipeline is under constant improvement, adapting new research and open-source techniques. Everything is configurable:

Number of documents to retrieve
Number of reranked documents
Embedding and reranking models
Chunk selection passed to the LLM

❓Have questions or ideas? Don’t hesitate to reach out to the maintainers.

Ready to dive deeper? Explore the Multilingual Setup or the Connector Guide to further customize your Hymalaia deployment.

​System Architecture

​LLM Communication

​Embedding Flow

​Mini-Chunks

​Embedding Model

​Query Flow