Last Update: April 23, 2025
Model Inference
Welcome to the Hyperstack Gen AI Platform inference documentation. Inference is the process of using a trained model to make predictions or generate text based on input data. This page will guide you through the process of making API requests to the model.
Whether you're building a chatbot, generating text, or simply exploring the capabilities of Hyperstack Gen AI Platform, this page will provide you with the foundational knowledge you need to get started. We'll cover the requirements for making API requests to the model.
Making API Requests
To make requests to the model, you'll need to include your API key in the request headers:
curl -X POST https://api.genai.hyperstack.cloud/tailor/v1/generate/stream \
-H "Content-Type: application/json" \
-H "X-API-KEY: YOUR_API_KEY" \
-d '{
"adapter_name": "your-model-name",
"messages": [
{"role": "user", "content": "YOUR TEXT HERE"}
],
"max_tokens": 100,
"temperature": 0.5,
"top_p": 0.5,
"top_k": 40,
"presence_penalty": 0,
"repetition_penalty": 0.5,
}'
Generation Parameters
The API accepts several parameters to control the text generation:
max_tokens
: Maximum number of tokens to generate (default: 100)temperature
: Controls randomness in the output (0.0 to 2.0, default: 1)top_p
: Nucleus sampling parameter (0.0 to 1.0, default: 1)presence_penalty
: Penalizes new tokens based on their presence in the text (-2.0 to 2.0, default: 0)repetition_penalty
: Penalizes token repetition (-2.0 to 2.0, default: 0)