Getting Started
Classify lets you automatically categorize documents into types you define (for example: invoice, receipt, contract) using natural-language rules.
Classification is currently a restricted feature. Please contact our team for early access.
Classification is currently in beta and is subject to breaking changes.
Use Cases
- Use as a pre-processing step
- Before extraction: Classify first, then run schema-specific extraction (e.g., invoice vs. contract) with different LlamaExtract agents to improve accuracy and reduce cost.
- Before parsing: Classify first, then run LlamaParse over labeled files with finely tuned parse settings for each classified category to improve accuracy and reduce cost.
- Before indexing: Classify first, then send classified files into appropriate LlamaCloud indices with tailored chunking, metadata, and access controls to improve retrieval quality.
- Intake routing for back-office documents: Auto-separate invoices, receipts, purchase orders, and bank statements to the right queues, storage buckets, or approval workflows.
- Dataset curation: Auto-tag large archives into meaningful categories to create labeled subsets for model training.
Concepts
-
Rule: A content-based criterion for a document type. Each rule has:
type
: the label to assign (e.g., "invoice").description
: a natural-language description of the content that should match this type.
-
Parsing configuration (optional): Controls how we parse documents before classification (e.g., language, page limits). Useful for speed/accuracy tradeoffs.
-
Results: For each file you get a
type
(predicted),confidence
(0.0–1.0), andreasoning
(step-by-step explanation).
Typical flow
-
Upload your files to LlamaCloud
-
Create rules for your target classes
-
Create a classify job with the file ids and rules
-
Fetch results and consume the predictions
Next steps
- Make sure you have an API key: Get an API key
- Jump straight to the SDK guide to run your first job: Python SDK
- For use with other languages, see our API reference