npuserver)Welcome to the npuserver documentation!
npuserver is a high-performance Python library and Flask backend tailored specifically for running Large Language Models (LLMs) locally on Intel NPUs using OpenVINO GenAI.
This server provides an OpenAI-compatible API for seamless integration with existing tools, robust NPU memory management, and dynamic on-the-fly hardware compilation of Hugging Face models into optimized NPU blobs.
/v1/chat/completions endpoint. Fully supports real-time Server-Sent Event (SSE) streaming..blob before serving it.Ensure you have Python installed and your Intel NPU drivers configured properly on Windows.
git clone https://github.com/durgasai299792458/npuserver.git
cd npuserver
python -m venv venv
venv\Scripts\activate
pip install -e .
Required Core Dependencies: openvino-genai, flask, huggingface-hub
The server runs on Flask. You can spin it up programmatically using the library or via a Python script:
from npuserver import run_server
# Starts the NPU backend on port 8080
run_server(port=8080)
By default, the npuserver binds to localhost on port 8080. All endpoint paths in this documentation are relative to this base URL: http://localhost:8080