Skip to main content

Running Predictions with HuggingFace

Once you have a trained model — or want to use any public HuggingFace model — you can run inference directly from XGENIA.

What you will learn in this guide

  • How to run predictions through the AI chat
  • How to use the Auto ML Predictor pro node
  • Serverless vs. dedicated endpoint inference

Inference Modes

ModeDescriptionBest For
ServerlessUses HuggingFace's free inference APITesting, low-volume predictions
Dedicated EndpointUses a paid, always-on endpointProduction workloads, guaranteed uptime

Predictions via AI Chat

Ask the AI to run a prediction with any HuggingFace model:

Classify this text using bert-base-uncased: "I love this product, it works great!"

Or use a custom model you trained:

Predict churn for this customer using my-username/churn-model:
{"tenure": 12, "monthly_charges": 65.5, "contract": "month-to-month"}

The AI calls hf_predict under the hood, which sends the request to HuggingFace's Inference API.

Response Format

{
"success": true,
"prediction": [{"label": "POSITIVE", "score": 0.998}],
"model": "bert-base-uncased",
"mode": "serverless"
}

Using the Auto ML Predictor Pro Node

For visual workflows, use the Auto ML Predictor node.

Inputs

PortTypeDescription
inputDataAnyData to predict on (text, object, or array)
modelRepoStringHuggingFace model repository (e.g. username/my-model)
inferenceModeEnumserverless or endpoint
endpointUrlStringDedicated endpoint URL (only for endpoint mode)
parametersObjectModel parameters (e.g. {temperature: 0.7})
hfTokenStringHuggingFace API token
mlServerUrlStringML Coordinator URL
PredictSignalTrigger prediction

Outputs

PortTypeDescription
predictionAnyThe model's prediction result
confidenceNumberConfidence score (0–1)
rawResponseObjectFull response from HuggingFace
errorStringError message if prediction fails

Tips

  • Cold start: Serverless models may take 10–30 seconds on first call while the model loads
  • Rate limits: Free tier has rate limits — use dedicated endpoints for production
  • Model compatibility: Ensure your input format matches what the model expects
  • Parameters: Use max_length, temperature, top_p for text generation models

End-to-End Example

A typical ML pipeline on the canvas:

  1. Query Records → fetches customer data from your database
  2. Auto ML Analyzer → analyzes the data, suggests target and features
  3. Auto ML Trainer → trains a model on HuggingFace
  4. Auto ML Predictor → runs predictions with the trained model
  5. UI components → display predictions to users

Wire the Trainer's hfModelRepo output directly to the Predictor's modelRepo input for seamless end-to-end workflows.