Threshold Optimization and Calibration Guide

Introduction

Detection thresholds represent the critical balance point between catching violations and avoiding false positives—set them too low and legitimate content gets flagged, set them too high and obvious violations slip through. The three configurable thresholds in Telegram Bot App (Image Detection, Sentiment Analysis, and Spam Detection) control how confidently the AI must be before triggering enforcement, making threshold calibration one of the most important administrative skills for effective community moderation.

Understanding threshold optimization requires grasping the fundamental relationship between sensitivity and specificity. Lower thresholds (0.60-0.70) create high sensitivity—the system catches more violations including borderline cases, but also generates more false positives. Higher thresholds (0.80-0.90) create high specificity—the system only flags content it's very confident violates rules, minimizing false positives but potentially missing subtle violations. The optimal threshold depends on your community's specific needs, tolerance for false positives, and the severity of undetected violations.

This comprehensive guide provides the knowledge and methodology to calibrate thresholds scientifically based on your community's data rather than guesswork. Learn to interpret confidence scores, analyze violation patterns, recognize calibration signals, and adjust settings systematically to achieve optimal detection performance for your unique community context.

Understanding How Thresholds Work

The Confidence Score System

Every detection system (NSFW analysis, sentiment analysis, spam detection) produces a confidence score between 0.0 and 1.0 (displayed as 0-100% in the interface) indicating how certain the AI is that content violates rules. A confidence score of 0.85 means the system is 85% confident the content is inappropriate—based on patterns in its training data and statistical analysis of the specific content.

Thresholds act as gates that determine which confidence scores trigger enforcement. If your NSFW threshold is set to 0.70 (70%) and an image receives a confidence score of 0.75, enforcement triggers (0.75 > 0.70). If the same image receives 0.65, it passes through without action (0.65 < 0.70). The threshold defines the minimum confidence required for the system to act.

This threshold mechanism allows administrators to control the enforcement point without changing the underlying detection models. The AI still analyzes all content and produces confidence scores—thresholds simply determine where the enforcement boundary lies on the confidence spectrum.

The Three Adjustable Thresholds

Image Detection Threshold (0.0-1.0):

Controls NSFW content detection in images, GIFs, stickers, and profile pictures
Affects detection of pornographic content, sexual content, racy content, and spoofed content
Default: 0.70 (70%)
Uses quota: Yes (Premium feature)

Sentiment Detection Threshold (0.0-1.0):

Controls toxicity, profanity, insult, and threat detection in text messages
Evaluates language across four distinct dimensions
Default: 0.70 (70%)
Uses quota: Yes (Premium feature)

Spam Detection Threshold (0.0-1.0):

Controls machine learning-based spam pattern detection
Analyzes message structure, language patterns, and link characteristics
Default: 0.75 (75%)
Uses quota: No (Free feature)

Each threshold operates independently—you can set image detection to 0.80, sentiment to 0.65, and spam to 0.75 if that configuration matches your community's needs.

Confidence Score Interpretation Ranges

Understanding what different confidence ranges typically represent helps interpret threshold settings:

0.95-1.0 (Very High Confidence):

Blatant, unmistakable violations
Example: Hardcore pornography, severe hate speech, obvious spam
False positive rate: <1%

0.85-0.94 (High Confidence):

Clear violations with strong indicators
Example: Sexually explicit content, toxic language with slurs, promotional spam
False positive rate: 1-3%

0.70-0.84 (Moderate-High Confidence):

Likely violations with substantial evidence
Example: Suggestive content, insulting language, affiliate links
False positive rate: 3-8%

0.50-0.69 (Moderate Confidence):

Borderline content with mixed signals
Example: Artistic nudity, strong language without slurs, promotional but relevant
False positive rate: 8-20%

0.00-0.49 (Low Confidence):

Content with some flags but weak evidence
Example: Fashion photography, emphatic language, legitimate marketing
False positive rate: 20-50%

These ranges guide threshold selection—setting thresholds in the 0.70-0.80 range captures moderate-high confidence violations while avoiding the high false positive rates of lower thresholds.

Calibration Methodology

Step 1: Establish Baseline

Before adjusting any thresholds, document your current configuration and performance:

Record Current Settings:
- Image threshold: ___
- Sentiment threshold: ___
- Spam threshold: ___
Capture Baseline Statistics (from Group Statistics dashboard):
- Total messages (last 7 days): ___
- Total violations (last 7 days): ___
- Punishment rate per 1K messages: ___
- Top 3 violation types and counts: ___
Note Subjective Assessment:
- Are obvious violations being missed? (Yes/No)
- Are legitimate messages being flagged? (Yes/No)
- General satisfaction with current moderation: (Low/Medium/High)

This baseline provides the reference point for evaluating whether changes improve or worsen performance.

Step 2: Identify Calibration Signals

Examine your statistics and member feedback to identify which thresholds need adjustment:

Signals Threshold Too Low (too sensitive):

Members complaining about legitimate content being removed
High punishment rate (>10 per 1K messages)
Many violations with confidence scores just above threshold (clustering at threshold+0.05)
User Intelligence reports showing trusted users (spam rating <0.30) with violations

Signals Threshold Too High (not sensitive enough):

Obvious violations visible in chat before removal
Members reporting spam/inappropriate content that wasn't caught
Very low violation rate (<1 per 1K messages) despite known problem content
No violations detected in specific category despite community complaints

Signals Threshold Well-Calibrated:

Violations caught quickly with minimal member complaints
Moderate punishment rate (2-8 per 1K messages)
Confidence scores distributed across range (not clustering at threshold)
Few administrator overrides needed

Use these signals to determine which thresholds need adjustment and in which direction.

Step 3: Make Single Targeted Adjustment

Adjust only ONE threshold at a time by 0.05-0.10 (5-10 percentage points):

If threshold too low (reduce sensitivity):

Increase threshold by 0.05-0.10
Example: 0.70 → 0.75 or 0.80

If threshold too high (increase sensitivity):

Decrease threshold by 0.05-0.10
Example: 0.75 → 0.70 or 0.65

Avoid changing multiple thresholds simultaneously—this makes it impossible to determine which change caused which effects. Make one adjustment, monitor results, then make the next adjustment if needed.

Step 4: Monitor Impact Period (3-7 Days)

After making an adjustment, monitor performance for at least 3-7 days:

Check Statistics Daily:
- Violation count trends
- Punishment rate changes
- Violation type distribution shifts
Review Individual Violations:
- Examine confidence scores in User Intelligence reports
- Verify flagged content was actually violating
- Check for increased false positives or missed violations
Collect Member Feedback:
- Ask trusted members if they notice moderation changes
- Watch for complaints about over-enforcement or under-enforcement

Avoid judging results too quickly—random variance can make 1-2 days unrepresentative. A full week provides reliable data about the adjustment's true impact.

Step 5: Evaluate and Iterate

After the monitoring period, evaluate whether the adjustment improved performance:

Improvement Indicators:

Violation rate moved toward target range (2-8 per 1K messages)
Confidence score distribution looks healthier (less clustering)
Member feedback positive or neutral
Balance between false positives and false negatives improved

Worsening Indicators:

Violation rate moved away from target range
New categories of problems emerged
Member complaints increased
Balance between errors worsened

If improvement occurred, keep the change and consider whether further adjustment in same direction would help. If performance worsened, revert the change and try adjusting in opposite direction or adjusting a different threshold.