Skip to main content

Getting Started

Classify lets you automatically categorize documents into types you define (for example: invoice, receipt, contract) using natural-language rules.

warning

Classification is currently a restricted feature. Please contact our team for early access.

warning

Classification is currently in beta and is subject to breaking changes.

Use Cases

  • Use as a pre-processing step
    • Before extraction: Classify first, then run schema-specific extraction (e.g., invoice vs. contract) with different LlamaExtract agents to improve accuracy and reduce cost.
    • Before parsing: Classify first, then run LlamaParse over labeled files with finely tuned parse settings for each classified category to improve accuracy and reduce cost.
    • Before indexing: Classify first, then send classified files into appropriate LlamaCloud indices with tailored chunking, metadata, and access controls to improve retrieval quality.
  • Intake routing for back-office documents: Auto-separate invoices, receipts, purchase orders, and bank statements to the right queues, storage buckets, or approval workflows.
  • Dataset curation: Auto-tag large archives into meaningful categories to create labeled subsets for model training.

Concepts

  • Rule: A content-based criterion for a document type. Each rule has:

    • type: the label to assign (e.g., "invoice").
    • description: a natural-language description of the content that should match this type.
  • Parsing configuration (optional): Controls how we parse documents before classification (e.g., language, page limits). Useful for speed/accuracy tradeoffs.

  • Results: For each file you get a type (predicted), confidence (0.0–1.0), and reasoning (step-by-step explanation).

Typical flow

  1. Upload your files to LlamaCloud

  2. Create rules for your target classes

  3. Create a classify job with the file ids and rules

  4. Fetch results and consume the predictions

Next steps