Evidence Grading Methodology
How we evaluate and grade supplement research.
Every ingredient-condition claim on our sites receives a letter grade (A through F) based on the totality of available peer-reviewed evidence. This page explains exactly how we arrive at each grade — our methodology is transparent and reproducible.
Grade Definitions
Strong Evidence
Multiple high-quality randomized controlled trials (RCTs) or meta-analyses with consistent positive results. Combined sample size exceeds 500 participants across 5 or more independent studies.
Example: Melatonin for sleep onset latency — supported by 20+ RCTs with consistent results.
Moderate Evidence
At least one well-designed RCT showing positive results, supported by additional studies. Results are mostly consistent across the evidence base, with adequate combined sample sizes.
Example: Magnesium for sleep quality — supported by several RCTs with mostly positive outcomes.
Limited Evidence
Preliminary positive findings from small RCTs, observational studies, or mixed results across studies. Evidence is promising but insufficient to draw firm conclusions.
Example: L-theanine for sleep — small positive studies exist but larger trials are needed.
Preliminary Evidence
Only in vitro, animal, case report, or pilot study data available. Or human studies exist but with inconsistent or inconclusive results. More research is needed.
Example: Ashwagandha for hair growth — limited to preclinical and small pilot studies.
Negative Evidence
Fewer than 30% of studies show positive effects, with two or more studies available. The weight of evidence suggests the ingredient does not provide the claimed benefit, or may cause harm.
Example: An ingredient where multiple RCTs show no benefit over placebo.
Scoring Algorithm
The evidence grade is calculated from four independent scoring dimensions, each contributing to a cumulative score that maps to a letter grade.
| Dimension | Score Range | Description |
|---|---|---|
| Study Type Quality | 0–4 | Highest quality study type: Meta-Analysis (4), RCT (3), CCT/Cohort (2), Observational (1), In Vitro/Review (0) |
| Consistency | -1 to +1 | >70% positive results: +1, <30% positive: -1, otherwise 0 |
| Sample Size | -1 to +1 | Total participants: ≥500 (+1), ≥100 (0), <100 (-1) |
| Study Count | -1 to +1 | Number of studies: ≥5 (+1), ≥2 (0), <2 (-1) |
Final grade mapping: Score ≥6 → A, ≥4 → B, ≥2 → C, ≥0 → D. Forced F when <30% positive with ≥2 studies.
Research Process
Systematic Search
Identify relevant research from PubMed
For each ingredient-condition pair, we conduct systematic PubMed searches using MeSH terms and title/abstract keywords. We prioritize randomized controlled trials (RCTs), meta-analyses, and systematic reviews, while also including observational studies and pilot trials for emerging evidence.
Paper Screening
Filter for relevance and quality
Retrieved papers are screened for relevance to the specific ingredient-condition relationship. We filter by study type (prioritizing interventional over observational), population (human studies preferred), and publication quality (peer-reviewed journals only).
PICO Extraction
Extract structured study data
From each included study, we extract structured PICO data: Population (sample size, demographics), Intervention (substance, dosage, duration, form), Comparison (placebo or active comparator), and Outcome (primary endpoint, effect size, statistical significance). AI-assisted extraction is validated against source text.
Evidence Grading
Calculate algorithmic grade (A-F)
Our grading algorithm scores each ingredient-condition pair based on four dimensions: (1) highest study type quality, (2) consistency of positive results across studies, (3) total combined sample size, and (4) number of independent studies. The final score maps to a letter grade from A (Strong) to F (Negative).
Publication
Review and publish evidence summaries
Generated evidence summaries undergo compliance review for FDA/FTC adherence. All language uses structure/function claims only. Evidence grades are recalculated automatically when new research is added to the database, ensuring grades reflect the most current body of evidence.
Data Sources
Limitations
Our methodology has known limitations that users should be aware of:
- We primarily search PubMed, which may not capture all relevant research (e.g., studies published in non-indexed journals).
- AI-assisted data extraction, while validated, may occasionally misinterpret complex study designs.
- Our grading algorithm weighs study count and sample size equally, which may not reflect the true importance of each factor for every context.
- Evidence grades reflect the current state of research and may change as new studies are published.
- Individual responses to supplements vary. A high evidence grade does not guarantee effectiveness for every individual.
FDA Disclaimer: These statements have not been evaluated by the Food and Drug Administration. The products and information on this website are not intended to diagnose, treat, cure, or prevent any disease. The evidence grades presented are based on our analysis of published peer-reviewed research and do not constitute medical advice. Always consult your healthcare provider before starting any supplement regimen.