Getting Started

Classify lets you automatically categorize documents into types you define (for example: invoice, receipt, contract) using natural-language rules.

warning

Classification is currently in beta and is subject to breaking changes.

Use Cases

Use as a pre-processing step
- Before extraction: Classify first, then run schema-specific extraction (e.g., invoice vs. contract) with different LlamaExtract agents to improve accuracy and reduce cost.
- Before parsing: Classify first, then run LlamaParse over labeled files with finely tuned parse settings for each classified category to improve accuracy and reduce cost.
- Before indexing: Classify first, then send classified files into appropriate LlamaCloud indices with tailored chunking, metadata, and access controls to improve retrieval quality.
Intake routing for back-office documents: Auto-separate invoices, receipts, purchase orders, and bank statements to the right queues, storage buckets, or approval workflows.
Dataset curation: Auto-tag large archives into meaningful categories to create labeled subsets for model training.

Rule: A content-based criterion for a document type. Each rule has:
- type: the label to assign (e.g., "invoice").
- description: a natural-language description of the content that should match this type.
Parsing configuration (optional): Controls how we parse documents before classification (e.g., language, page limits). Useful for speed/accuracy tradeoffs.
Results: For each file you get a type (predicted), confidence (0.0–1.0), and reasoning (step-by-step explanation).