Running Predictions with HuggingFace

Once you have a trained model — or want to use any public HuggingFace model — you can run inference directly from XGENIA.

What you will learn in this guide

Mode	Description	Best For
Serverless	Uses HuggingFace's free inference API	Testing, low-volume predictions
Dedicated Endpoint	Uses a paid, always-on endpoint	Production workloads, guaranteed uptime

Ask the AI to run a prediction with any HuggingFace model:

Classify this text using bert-base-uncased: "I love this product, it works great!"

Or use a custom model you trained:

Predict churn for this customer using my-username/churn-model:
{"tenure": 12, "monthly_charges": 65.5, "contract": "month-to-month"}

The AI calls hf_predict under the hood, which sends the request to HuggingFace's Inference API.

{
  "success": true,
  "prediction": [{"label": "POSITIVE", "score": 0.998}],
  "model": "bert-base-uncased",
  "mode": "serverless"
}

For visual workflows, use the Auto ML Predictor node.

Port	Type	Description
`inputData`	Any	Data to predict on (text, object, or array)
`modelRepo`	String	HuggingFace model repository (e.g. `username/my-model`)
`inferenceMode`	Enum	`serverless` or `endpoint`
`endpointUrl`	String	Dedicated endpoint URL (only for `endpoint` mode)
`parameters`	Object	Model parameters (e.g. `{temperature: 0.7}`)
`hfToken`	String	HuggingFace API token
`mlServerUrl`	String	ML Coordinator URL
`Predict`	Signal	Trigger prediction

Port	Type	Description
`prediction`	Any	The model's prediction result
`confidence`	Number	Confidence score (0–1)
`rawResponse`	Object	Full response from HuggingFace
`error`	String	Error message if prediction fails

Cold start: Serverless models may take 10–30 seconds on first call while the model loads
Rate limits: Free tier has rate limits — use dedicated endpoints for production
Model compatibility: Ensure your input format matches what the model expects
Parameters: Use max_length, temperature, top_p for text generation models

A typical ML pipeline on the canvas:

Wire the Trainer's hfModelRepo output directly to the Predictor's modelRepo input for seamless end-to-end workflows.