^{Last Update: April 23, 2025}

Model Inference

Welcome to the Hyperstack Gen AI Platform inference documentation. Inference is the process of using a trained model to make predictions or generate text based on input data. This page will guide you through the process of making API requests to the model.

Whether you're building a chatbot, generating text, or simply exploring the capabilities of Hyperstack Gen AI Platform, this page will provide you with the foundational knowledge you need to get started. We'll cover the requirements for making API requests to the model.

Making API Requests

To make requests to the model, you'll need to include your API key in the request headers:

curl -X POST "https://api.genai.hyperstack.cloud/api/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -d '{
    "model": "your-model-name",
    "messages": [
      {"role": "user", "content": "YOUR TEXT HERE"}
    ],
    "stream": true,
    "max_tokens": 100,
    "temperature": 0.5,
    "top_p": 0.5,
    "top_k": 40,
    "presence_penalty": 0,
    "repetition_penalty": 0.5
  }'

Or using OpenAI Python SDK:

from openai import OpenAI

client = OpenAI(
    api_key="<YOUR_API_KEY>",
    base_url="https://api.genai.hyperstack.cloud/api/v1"
)

response = client.chat.completions.create(
    model="<your-model-name>",
    messages=[
        {"role": "user", "content": "YOUR TEXT HERE"}
    ],
    stream=False,
    max_tokens=100,
    temperature=0.5,
    top_p=0.5,
    top_k=40,
    presence_penalty=0,
    repetition_penalty=0.5
)

print(response.choices[0].message.content)

Generation Parameters

The API accepts several parameters to control the text generation:

max_tokens: Maximum number of tokens to generate (default: 100)
temperature: Controls randomness in the output (0.0 to 2.0, default: 1)
top_p: Nucleus sampling parameter (0.0 to 1.0, default: 1)
presence_penalty: Penalizes new tokens based on their presence in the text (-2.0 to 2.0, default: 0)
repetition_penalty: Penalizes token repetition (1.0 to 2.0, default: 1.0)