Skip to main content

Analyzing Data for Machine Learning

Before training a model, XGENIA can analyze your dataset to detect the problem type, suggest the best target column, identify quality issues, and recommend preprocessing steps.

What you will learn in this guide​

  • How to use the AI chat to analyze data
  • What insights the analysis provides
  • How to use the Auto ML Analyzer pro node

Using the AI Chat​

The simplest way to analyze data is through the AI assistant. Ask it to analyze your data:

Analyze this customer data for churn prediction:
[paste your data or describe it]

The AI calls hf_analyze_data under the hood, which sends your data to the ML Coordinator for analysis.

Analysis Output​

The tool returns structured insights:

FieldDescription
problemTypeClassification, regression, or clustering
suggestedTargetThe column most likely to be the prediction target
suggestedFeaturesColumns that are good predictors
hfTaskTypeThe corresponding HuggingFace task (e.g. tabular-classification)
qualityIssuesMissing values, outliers, class imbalance warnings
preprocessingStepsRecommended data preparation steps
columnsPer-column statistics (type, uniqueness, missing rate)

Using the Auto ML Analyzer Pro Node​

For visual workflows, drag the Auto ML Analyzer node onto your canvas.

Inputs​

PortTypeDescription
dataArray/ObjectYour dataset (array of records)
mlServerUrlStringML server URL (default: http://localhost:3001)
contextStringWhat you want to predict (e.g. "customer churn")
hfTokenStringHuggingFace API token
AnalyzeSignalTrigger the analysis

Outputs​

PortTypeDescription
insightsObjectFull analysis results
problemTypeStringDetected problem type
targetColumnStringSuggested target column
suggestedFeaturesArrayRecommended feature columns
dataQualityIssuesArrayData quality warnings
preprocessingStepsArrayRecommended preprocessing
hfTaskTypeStringHuggingFace task type
errorStringError message if analysis fails

Example Workflow​

  1. Connect a Query Records node or REST node to provide data
  2. Wire the data output to the Analyzer's data input
  3. Set the context to describe your prediction goal
  4. Trigger the Analyze signal
  5. Use the outputs to configure training (target column, features, task type)

Retention Analysis​

For customer retention specifically, use the Client Retention Analyzer node. It provides specialized outputs:

  • Churn rate — percentage of churned customers
  • Risk factors — columns correlated with churn, ranked by significance
  • Recommendations — actionable suggestions to reduce churn

Wire it to a Retention Action Engine node to automatically generate action plans based on the risk factors.