Running Predictions with HuggingFace
Once you have a trained model — or want to use any public HuggingFace model — you can run inference directly from XGENIA.
What you will learn in this guide
- How to run predictions through the AI chat
- How to use the Auto ML Predictor pro node
- Serverless vs. dedicated endpoint inference
Inference Modes
| Mode | Description | Best For |
|---|---|---|
| Serverless | Uses HuggingFace's free inference API | Testing, low-volume predictions |
| Dedicated Endpoint | Uses a paid, always-on endpoint | Production workloads, guaranteed uptime |
Predictions via AI Chat
Ask the AI to run a prediction with any HuggingFace model:
Classify this text using bert-base-uncased: "I love this product, it works great!"
Or use a custom model you trained:
Predict churn for this customer using my-username/churn-model:
{"tenure": 12, "monthly_charges": 65.5, "contract": "month-to-month"}
The AI calls hf_predict under the hood, which sends the request to HuggingFace's Inference API.
Response Format
{
"success": true,
"prediction": [{"label": "POSITIVE", "score": 0.998}],
"model": "bert-base-uncased",
"mode": "serverless"
}
Using the Auto ML Predictor Pro Node
For visual workflows, use the Auto ML Predictor node.
Inputs
| Port | Type | Description |
|---|---|---|
inputData | Any | Data to predict on (text, object, or array) |
modelRepo | String | HuggingFace model repository (e.g. username/my-model) |
inferenceMode | Enum | serverless or endpoint |
endpointUrl | String | Dedicated endpoint URL (only for endpoint mode) |
parameters | Object | Model parameters (e.g. {temperature: 0.7}) |
hfToken | String | HuggingFace API token |
mlServerUrl | String | ML Coordinator URL |
Predict | Signal | Trigger prediction |
Outputs
| Port | Type | Description |
|---|---|---|
prediction | Any | The model's prediction result |
confidence | Number | Confidence score (0–1) |
rawResponse | Object | Full response from HuggingFace |
error | String | Error message if prediction fails |
Tips
- Cold start: Serverless models may take 10–30 seconds on first call while the model loads
- Rate limits: Free tier has rate limits — use dedicated endpoints for production
- Model compatibility: Ensure your input format matches what the model expects
- Parameters: Use
max_length,temperature,top_pfor text generation models
End-to-End Example
A typical ML pipeline on the canvas:
- Query Records → fetches customer data from your database
- Auto ML Analyzer → analyzes the data, suggests target and features
- Auto ML Trainer → trains a model on HuggingFace
- Auto ML Predictor → runs predictions with the trained model
- UI components → display predictions to users
Wire the Trainer's hfModelRepo output directly to the Predictor's modelRepo input for seamless end-to-end workflows.