Scoring Methodology¶
python-checkup produces a single 0-100 health score by combining six category scores with configurable weights.
Overview¶
When a category has no data source (e.g., mypy is not installed), its weight is redistributed proportionally to the remaining categories. This ensures you always get a meaningful score, even with partial tool coverage.
Category Formulas¶
All formulas use density-based metrics (issues per 1,000 lines of code) so scores scale fairly from small scripts to large codebases.
Quality (default weight: 25)¶
Tool: Ruff
issues_per_kloc = (errors * 5 + warnings * 2) / (lines / 1000)
score = max(0, 100 - issues_per_kloc)
Errors are weighted 2.5x more than warnings to reflect their higher impact.
Type Safety (default weight: 20)¶
Tool: mypy
Type errors are weighted heavily because they indicate fundamental contract violations.
Security (default weight: 20)¶
Tools: Bandit, detect-secrets
Critical security findings (SQL injection, hardcoded passwords, shell=True) cap the score at 40, regardless of other results.
Complexity (default weight: 15)¶
Tool: Radon
Uses Radon's Maintainability Index (MI), which already produces a 0-100 score where higher is better. The MI score maps directly to the category score.
Dead Code (default weight: 10)¶
Tool: Vulture
Dependencies (default weight: 10)¶
Tools: pip-audit, deptry
Known vulnerabilities are penalized heavily. Unused or missing dependencies receive a lighter penalty.
Labels¶
| Score | Label |
|---|---|
| 75-100 | Healthy |
| 50-74 | Needs work |
| 0-49 | Critical |
Thresholds are configurable in pyproject.toml:
Weight Redistribution¶
When a category has no analyzer available, its weight is distributed to the remaining categories in proportion to their original weights:
For example, if mypy is not installed (type_safety weight = 20), and the remaining weights sum to 80, each remaining category's effective weight increases by 25%.