Text Classification for Survey Data Using SetFit
Thumbnail Credit
What is SetFit?
SetFit (Sentence Transformer Fine-Tuning) is a framework for few-shot text classification. You only need 8-16 labeled examples per category to train a high-quality classifier. It works well with multilingual data (Bahasa + English).
Step 1: Installation
Using uv (recommended)
pyproject.toml:
[project]
dependencies = [
"datasets>=4.8.4",
"setfit>=1.1.3",
"torch>=2.11.0",
"transformers<5",
"pandas",
"openpyxl",
"scikit-learn",
"matplotlib",
]
Then run:
uv sync
Using pip
pip install "setfit>=1.1.3" "transformers<5" "datasets>=4.8.4" "torch>=2.11.0" pandas openpyxl scikit-learn matplotlib
transformers<5 is required to avoid the default_logdir import error.
Step 2: Download Base Model to Local (Optional but Recommended)
Download the model once, then load from local to avoid re-downloading every time:
from sentence_transformers import SentenceTransformer
# Download and save to local folder (run once)
model = SentenceTransformer("sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")
model.save("./models/paraphrase-multilingual-MiniLM-L12-v2")
Then use the local path in SetFit:
from setfit import SetFitModel
# Load from local path instead of HuggingFace Hub
model_col1 = SetFitModel.from_pretrained(
"./models/paraphrase-multilingual-MiniLM-L12-v2"
)
Step 3: Prepare Your Labeled Data
You need to manually label a small sample from your data.
Column 1: Suggestion Classification
Label ~8-16 examples per category:
from datasets import Dataset
# Example labeled data for Column 1
train_data_col1 = Dataset.from_dict({
"text": [
# === suggestion (label: 1) ===
"Tambahkan fitur timer agar peserta tahu sisa waktu",
"Sebaiknya ada practice test sebelum ujian dimulai",
"Improve the UI, it's hard to navigate",
"Akan lebih baik jika soal bisa di-review sebelum submit",
"Tolong perbaiki loading time, terlalu lama",
"Please add a progress bar",
"Mungkin bisa ditambahkan instruksi yang lebih jelas",
"Saran saya, buat tampilan lebih user friendly",
# === not-suggestion (label: 0) ===
"Tidak ada",
"-",
"Sudah bagus",
"No suggestion",
"N/A",
"Oke semua",
"Nothing to add",
"Sudah cukup baik",
],
"label": [1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0],
})
# Label mapping
col1_labels = {0: "not-suggestion", 1: "suggestion"}
Column 2: Feedback Classification
Label ~8-16 examples per category (4 categories × 4 examples minimum):
train_data_col2 = Dataset.from_dict({
"text": [
# === good no input (label: 0) ===
"Bagus",
"Good experience",
"Sudah oke",
"No feedback, everything is fine",
# === good with input (label: 1) ===
"Bagus, tapi waktu pengerjaan bisa ditambah sedikit",
"Good overall, the instructions were clear and helpful",
"Pengalaman baik, UI nya intuitif dan mudah dipahami",
"Great assessment, especially the coding section was well designed",
# === bad no input (label: 2) ===
"Jelek",
"Bad experience",
"Kurang bagus",
"Not good",
# === bad with input (label: 3) ===
"Pengalaman buruk karena loading sangat lambat dan sering error",
"Bad experience, the timer was too short for the number of questions",
"Kurang bagus, soalnya terlalu banyak dan tidak relevan",
"Poor experience because the system crashed twice during my test",
],
"label": [0, 0, 0, 0,
1, 1, 1, 1,
2, 2, 2, 2,
3, 3, 3, 3],
})
col2_labels = {
0: "good no input",
1: "good with input",
2: "bad no input",
3: "bad with input",
}
Use real examples from YOUR data for better accuracy.
Pick clear, unambiguous examples for training.
Step 4: Prepare Evaluation Data
Create a separate labeled dataset (do NOT reuse training data):
# Evaluation data for Column 1
eval_data_col1 = Dataset.from_dict({
"text": [
# Examples NOT used in training
"Mohon tambahkan dark mode", # suggestion
"Bisa ditambah fitur bookmark", # suggestion
"Sudah baik", # not-suggestion
"Tidak ada saran", # not-suggestion
"Please make the font bigger", # suggestion
"-", # not-suggestion
"Sebaiknya waktu ditambah", # suggestion
"Everything is fine", # not-suggestion
],
"label": [1, 1, 0, 0, 1, 0, 1, 0],
})
# Evaluation data for Column 2
eval_data_col2 = Dataset.from_dict({
"text": [
"Mantap", # good no input
"Oke lah", # good no input
"Bagus, soalnya relevan dengan posisi", # good with input
"Good, clear instructions and fair time limit", # good with input
"Buruk", # bad no input
"Disappointing", # bad no input
"Jelek, soalnya tidak relevan dan waktu kurang", # bad with input
"Bad, the system lagged and I lost my answers", # bad with input
],
"label": [0, 0, 1, 1, 2, 2, 3, 3],
})
Step 5: Train the Models
Train Column 1 Model (Suggestion Classifier)
from setfit import SetFitModel, Trainer, TrainingArguments
# Load from local path (see Step 2) or from HuggingFace Hub
BASE_MODEL = "./models/paraphrase-multilingual-MiniLM-L12-v2"
# Or use HuggingFace Hub directly:
# BASE_MODEL = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
model_col1 = SetFitModel.from_pretrained(BASE_MODEL)
args = TrainingArguments(
batch_size=8,
num_epochs=3,
num_iterations=20,
eval_strategy="epoch", # evaluate after each epoch
)
trainer_col1 = Trainer(
model=model_col1,
args=args,
train_dataset=train_data_col1,
eval_dataset=eval_data_col1,
metric="accuracy",
)
trainer_col1.train()
# Save the trained model
model_col1.save_pretrained("./models/model_suggestion")
Train Column 2 Model (Feedback Classifier)
model_col2 = SetFitModel.from_pretrained(BASE_MODEL)
trainer_col2 = Trainer(
model=model_col2,
args=args,
train_dataset=train_data_col2,
eval_dataset=eval_data_col2,
metric="accuracy",
)
trainer_col2.train()
# Save the trained model
model_col2.save_pretrained("./models/model_feedback")
Step 6: Evaluate the Models
Quick Accuracy Check
# Evaluate Column 1
metrics_col1 = trainer_col1.evaluate(eval_data_col1)
print(f"Column 1 Accuracy: {metrics_col1['accuracy']:.2%}")
# Evaluate Column 2
metrics_col2 = trainer_col2.evaluate(eval_data_col2)
print(f"Column 2 Accuracy: {metrics_col2['accuracy']:.2%}")
Detailed Metrics (Precision, Recall, F1)
from sklearn.metrics import classification_report
# Column 1
preds_col1 = model_col1.predict(eval_data_col1["text"])
print("=== Column 1: Suggestion Classification ===")
print(classification_report(
eval_data_col1["label"],
preds_col1,
target_names=["not-suggestion", "suggestion"]
))
# Column 2
preds_col2 = model_col2.predict(eval_data_col2["text"])
print("=== Column 2: Feedback Classification ===")
print(classification_report(
eval_data_col2["label"],
preds_col2,
target_names=["good no input", "good with input", "bad no input", "bad with input"]
))
Output example:
=== Column 1: Suggestion Classification ===
precision recall f1-score support
not-suggestion 0.88 0.87 0.88 4
suggestion 0.88 0.88 0.88 4
accuracy 0.88 8
macro avg 0.88 0.88 0.88 8
weighted avg 0.88 0.88 0.88 8
Confusion Matrix
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt
# Column 1 Confusion Matrix
cm1 = confusion_matrix(eval_data_col1["label"], preds_col1)
disp1 = ConfusionMatrixDisplay(cm1, display_labels=["not-suggestion", "suggestion"])
disp1.plot()
plt.title("Column 1 - Suggestion Classification")
plt.tight_layout()
plt.show()
# Column 2 Confusion Matrix
cm2 = confusion_matrix(eval_data_col2["label"], preds_col2)
disp2 = ConfusionMatrixDisplay(
cm2,
display_labels=["good no input", "good with input", "bad no input", "bad with input"]
)
disp2.plot()
plt.title("Column 2 - Feedback Classification")
plt.xticks(rotation=45, ha="right")
plt.tight_layout()
plt.show()
What to Look At
| Metric | What it Tells You |
|---|---|
| Accuracy | Overall correctness |
| Precision | Of all predicted as X, how many were actually X |
| Recall | Of all actual X, how many were predicted as X |
| F1-score | Balance of precision and recall (most important for imbalanced data) |
| Confusion Matrix | Exactly where the model makes mistakes |
Step 7: Predict on Your Full Dataset
import pandas as pd
# Load your data
df = pd.read_excel("your_data.xlsx") # <-- change filename
# Column names
col1 = "What would you suggest to improve the Online Assessment experience? (It is okay to answer in Bahasa)"
col2 = "Please share any additional input or feedback regarding your experience with this Online Assessment. (It is okay to answer in Bahasa)"
# Handle missing/empty values
df[col1] = df[col1].fillna("").astype(str).str.strip()
df[col2] = df[col2].fillna("").astype(str).str.strip()
# Load saved models
model_col1 = SetFitModel.from_pretrained("./models/model_suggestion")
model_col2 = SetFitModel.from_pretrained("./models/model_feedback")
# Predict Column 1
texts_col1 = df[col1].replace("", "Tidak ada").tolist()
predictions_col1 = model_col1.predict(texts_col1)
df["suggestion_flag"] = [col1_labels[int(p)] for p in predictions_col1]
# Predict Column 2
texts_col2 = df[col2].replace("", "Tidak ada").tolist()
predictions_col2 = model_col2.predict(texts_col2)
df["feedback_flag"] = [col2_labels[int(p)] for p in predictions_col2]
# Save results
df.to_excel("results_classified.xlsx", index=False)
print("\nDone! Results saved to results_classified.xlsx")
print("\n Summary ")
print("\nSuggestion flags:")
print(df["suggestion_flag"].value_counts())
print("\nFeedback flags:")
print(df["feedback_flag"].value_counts())
Step 8: Improve Accuracy
If accuracy is low, here's how to improve:
1. Add more training examples
Focus on patterns the model gets wrong (check the confusion matrix):
# Add more examples to training data
additional_data = Dataset.from_dict({
"text": [
# Add examples the model misclassified
"Mungkin bisa diperbaiki tampilannya", # suggestion
"Cukup", # not-suggestion
],
"label": [1, 0],
})
# Combine with original training data
from datasets import concatenate_datasets
train_data_col1 = concatenate_datasets([train_data_col1, additional_data])
2. Increase training iterations
args = TrainingArguments(
batch_size=8,
num_epochs=3,
num_iterations=40, # increased from 20
eval_strategy="epoch",
)
3. Try a larger base model
# Larger model = better accuracy, slower speed
model = SetFitModel.from_pretrained(
"sentence-transformers/paraphrase-multilingual-mpnet-base-v2"
)
4. Review random samples from predictions
# Sample 50 random rows to manually review
sample = df.sample(50, random_state=42)
sample[
[col1, "suggestion_flag", col2, "feedback_flag"]
].to_excel("review_sample.xlsx", index=False)
Full Pipeline (Copy-Paste Ready)
"""
Complete SetFit Classification Pipeline
For survey feedback analysis (Bahasa + English)
Requirements (pyproject.toml):
datasets>=4.8.4
setfit>=1.1.3
torch>=2.11.0
transformers<5
pandas
openpyxl
scikit-learn
matplotlib
Install with: uv sync
"""
import pandas as pd
from datasets import Dataset
from setfit import SetFitModel, Trainer, TrainingArguments
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt
# ============================================================
# 1. CONFIG
# ============================================================
# Use local model path (see Step 2) or HuggingFace Hub
BASE_MODEL = "./models/paraphrase-multilingual-MiniLM-L12-v2"
# BASE_MODEL = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
# ============================================================
# 2. PREPARE TRAINING DATA
# Replace these examples with REAL samples from your data!
# ============================================================
train_col1 = Dataset.from_dict({
"text": [
# suggestion (1)
"Tambahkan fitur timer agar peserta tahu sisa waktu",
"Sebaiknya ada practice test sebelum ujian dimulai",
"Improve the UI, it's hard to navigate",
"Akan lebih baik jika soal bisa di-review sebelum submit",
"Tolong perbaiki loading time, terlalu lama",
"Please add a progress bar",
"Mungkin bisa ditambahkan instruksi yang lebih jelas",
"Saran saya, buat tampilan lebih user friendly",
# not-suggestion (0)
"Tidak ada", "-", "Sudah bagus", "No suggestion",
"N/A", "Oke semua", "Nothing to add", "Sudah cukup baik",
],
"label": [1,1,1,1,1,1,1,1, 0,0,0,0,0,0,0,0],
})
train_col2 = Dataset.from_dict({
"text": [
# good no input (0)
"Bagus", "Good experience", "Sudah oke", "No feedback",
# good with input (1)
"Bagus, tapi waktu pengerjaan bisa ditambah",
"Good overall, instructions were clear and helpful",
"Pengalaman baik, UI intuitif dan mudah dipahami",
"Great assessment, coding section was well designed",
# bad no input (2)
"Jelek", "Bad experience", "Kurang bagus", "Not good",
# bad with input (3)
"Buruk karena loading lambat dan sering error",
"Bad, timer was too short for the questions",
"Kurang bagus, soalnya terlalu banyak",
"Poor, system crashed twice during my test",
],
"label": [0,0,0,0, 1,1,1,1, 2,2,2,2, 3,3,3,3],
})
col1_labels = {0: "not-suggestion", 1: "suggestion"}
col2_labels = {
0: "good no input",
1: "good with input",
2: "bad no input",
3: "bad with input",
}
# ============================================================
# 3. PREPARE EVALUATION DATA
# ============================================================
eval_col1 = Dataset.from_dict({
"text": [
"Mohon tambahkan dark mode",
"Bisa ditambah fitur bookmark",
"Sudah baik",
"Tidak ada saran",
"Please make the font bigger",
"-",
"Sebaiknya waktu ditambah",
"Everything is fine",
],
"label": [1, 1, 0, 0, 1, 0, 1, 0],
})
eval_col2 = Dataset.from_dict({
"text": [
"Mantap",
"Oke lah",
"Bagus, soalnya relevan dengan posisi",
"Good, clear instructions and fair time limit",
"Buruk",
"Disappointing",
"Jelek, soalnya tidak relevan dan waktu kurang",
"Bad, the system lagged and I lost my answers",
],
"label": [0, 0, 1, 1, 2, 2, 3, 3],
})
# ============================================================
# 4. TRAIN MODELS
# ============================================================
args = TrainingArguments(
batch_size=8,
num_epochs=3,
num_iterations=20,
eval_strategy="epoch",
)
print("Training Column 1 model (suggestion)...")
model_col1 = SetFitModel.from_pretrained(BASE_MODEL)
trainer_col1 = Trainer(
model=model_col1,
args=args,
train_dataset=train_col1,
eval_dataset=eval_col1,
metric="accuracy",
)
trainer_col1.train()
model_col1.save_pretrained("./models/model_suggestion")
print("\nTraining Column 2 model (feedback)...")
model_col2 = SetFitModel.from_pretrained(BASE_MODEL)
trainer_col2 = Trainer(
model=model_col2,
args=args,
train_dataset=train_col2,
eval_dataset=eval_col2,
metric="accuracy",
)
trainer_col2.train()
model_col2.save_pretrained("./models/model_feedback")
# ============================================================
# 5. EVALUATE MODELS
# ============================================================
# Accuracy
metrics_col1 = trainer_col1.evaluate(eval_col1)
metrics_col2 = trainer_col2.evaluate(eval_col2)
print(f"\nColumn 1 Accuracy: {metrics_col1['accuracy']:.2%}")
print(f"Column 2 Accuracy: {metrics_col2['accuracy']:.2%}")
# Detailed Report
preds_col1 = model_col1.predict(eval_col1["text"])
preds_col2 = model_col2.predict(eval_col2["text"])
print("\n=== Column 1: Suggestion Classification ===")
print(classification_report(
eval_col1["label"], preds_col1,
target_names=["not-suggestion", "suggestion"]
))
print("\n=== Column 2: Feedback Classification ===")
print(classification_report(
eval_col2["label"], preds_col2,
target_names=["good no input", "good with input", "bad no input", "bad with input"]
))
# Confusion Matrices
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
cm1 = confusion_matrix(eval_col1["label"], preds_col1)
ConfusionMatrixDisplay(cm1, display_labels=["not-suggestion", "suggestion"]).plot(ax=axes[0])
axes[0].set_title("Column 1 - Suggestion")
cm2 = confusion_matrix(eval_col2["label"], preds_col2)
ConfusionMatrixDisplay(
cm2,
display_labels=["good\nno input", "good\nwith input", "bad\nno input", "bad\nwith input"]
).plot(ax=axes[1])
axes[1].set_title("Column 2 - Feedback")
plt.tight_layout()
plt.savefig("confusion_matrices.png", dpi=150)
plt.show()
# ============================================================
# 6. PREDICT ON FULL DATASET
# ============================================================
# Load your data
df = pd.read_excel("your_data.xlsx") # <-- change filename
col1 = "suggestion"
col2 = "feedback"
df[col1] = df[col1].fillna("").astype(str).str.strip()
df[col2] = df[col2].fillna("").astype(str).str.strip()
# Predict
preds1 = model_col1.predict(df[col1].replace("", "Tidak ada").tolist())
df["suggestion_flag"] = [col1_labels[int(p)] for p in preds1]
preds2 = model_col2.predict(df[col2].replace("", "Tidak ada").tolist())
df["feedback_flag"] = [col2_labels[int(p)] for p in preds2]
# Save results
df.to_excel("results_classified.xlsx", index=False)
print("\nDone! Results saved to results_classified.xlsx")
print("\n Summary ")
print("\nSuggestion flags:")
print(df["suggestion_flag"].value_counts())
print("\nFeedback flags:")
print(df["feedback_flag"].value_counts())
Get Confidence Score
Use predict_proba instead of predict to get confidence scores:
# predict gives labels only
predictions = model_col1.predict(["Tambahkan fitur timer"])
print(predictions) # [1]
# predict_proba gives confidence scores per class
probabilities = model_col1.predict_proba(["Tambahkan fitur timer"])
print(probabilities)
# [[0.12, 0.88]]
# meaning: 12% not-suggestion, 88% suggestion
Key Tips
- Use REAL examples from your data — the tutorial examples are just placeholders
- More examples = better accuracy — 8 per class is minimum, 16-20 is better
- Pick clear examples — avoid ambiguous cases for training data
- Empty/blank responses — always handle with .fillna("") before predicting
- Training time — usually 2-5 minutes on CPU for small training sets
- Prediction speed — ~100-500 rows/second depending on hardware
- Local models — download once, load from local path to avoid re-downloading
- transformers<5 — pin this to avoid the default_logdir import error
- Evaluate before deploying — always check accuracy on held-out data first
- Confusion matrix — shows exactly where the model fails, so you know what training examples to add