HuggingFace Inference API - Hymalaia Documentation

To use HuggingFace Inference APIs with Hymalaia, follow the instructions below.

🧾 Prerequisites

You must have a Pro Account with HuggingFace to obtain an API key.

⚠️ Note: As of November 2023, HuggingFace no longer supports very large models (over 10GB) like LLaMA-2-70B on the Pro Plan. You’ll need to:

Use a dedicated Inference Endpoint (paid)

Or subscribe to an Enterprise Plan

The Pro Plan still works with smaller models, but these may yield suboptimal results for Hymalaia.

🔑 Get Your Access Token

Go to your HuggingFace user settings.
Copy your User Access Token (HFAccessToken).

⚙️ Set Up Hymalaia with HuggingFace

Refer to your deployment-specific documentation for setting environment variables.

🧠 Using LLaMA-2-70B via Inference API

To configure Hymalaia for next-token generation using HuggingFace’s Inference API:

Navigate to the LLM page in the Hymalaia Admin Panel.
Add a Custom LLM Provider with the following identifiers:

HFCustomLLMProvider1
HFCustomLLMProvider2

These custom providers allow Hymalaia to route prompt completion requests to the HuggingFace-hosted model endpoint.

For more detailed setup and environment configuration examples, refer to the Model Configs.

Vertex AI Ollama