npuserver

NPU Server (npuserver)

Welcome to the npuserver documentation!

npuserver is a high-performance Python library and Flask backend tailored specifically for running Large Language Models (LLMs) locally on Intel NPUs using OpenVINO GenAI.

This server provides an OpenAI-compatible API for seamless integration with existing tools, robust NPU memory management, and dynamic on-the-fly hardware compilation of Hugging Face models into optimized NPU blobs.

Core Features


📦 Installation

Ensure you have Python installed and your Intel NPU drivers configured properly on Windows.

1. Clone the repository

git clone https://github.com/durgasai299792458/npuserver.git
cd npuserver

2. Setup a virtual environment

python -m venv venv
venv\Scripts\activate

3. Install the package locally

pip install -e .

Required Core Dependencies: openvino-genai, flask, huggingface-hub


🛠️ Getting Started

Starting the Server

The server runs on Flask. You can spin it up programmatically using the library or via a Python script:

from npuserver import run_server

# Starts the NPU backend on port 8080
run_server(port=8080)

By default, the npuserver binds to localhost on port 8080. All endpoint paths in this documentation are relative to this base URL: http://localhost:8080

Next Steps