Skip to main content

Python SDK

This guide shows how to classify documents using the Python SDK. You will:

  • Create classification rules
  • Upload files
  • Submit a classify job
  • Read predictions (type, confidence, reasoning)

The SDK is available in llama-cloud-services.

Setup

First, get an API key: Get an API key

Put it in a .env file:

LLAMA_CLOUD_API_KEY=llx-xxxxxx

Install dependencies:

pip install llama-cloud-services python-dotenv

or with uv:

uv add llama-cloud-services python-dotenv

Quick start

The snippet below uses a convenience ClassifyClient wrapper from llama-cloud-services that uploads files, creates a classify job, polls for completion and returns results.

import os
from dotenv import load_dotenv
from llama_cloud.client import AsyncLlamaCloud
from llama_cloud.types import ClassifierRule, ClassifyParsingConfiguration, ParserLanguages
from llama_cloud_services.classify.client import ClassifyClient # helper wrapper

load_dotenv()

client = AsyncLlamaCloud(token=os.environ["LLAMA_CLOUD_API_KEY"])
project_id = "your-project-id"
organization_id = "your-organization-id"
classify = ClassifyClient(client, project_id=project_id, organization_id=organization_id)

rules = [
ClassifierRule(
type="invoice",
description="Documents that contain an invoice number, invoice date, bill-to section, and line items with totals."
),
ClassifierRule(
type="receipt",
description="Short purchase receipts, typically from POS systems, with merchant, items and total, often a single page."
),
]

parsing = ClassifyParsingConfiguration(
lang=ParserLanguages.EN,
max_pages=5, # optional, parse at most 5 pages
# target_pages=[1] # optional, parse only specific pages (1-indexed), can't be used with max_pages
)

# for async usage, use `await classify.aclassify_file_paths(...)`
results = classify.classify_file_paths(
rules=rules,
file_input_paths=["/path/to/doc1.pdf", "/path/to/doc2.pdf"],
parsing_configuration=parsing,
)

for item in results.items:
# in cases of partial success, some of the items may not have a result
if item.result is None:
print(f"Classification job {item.classify_job_id} error-ed on file {item.file_id}")
continue
print(item.file_id, item.result.type, item.result.confidence)
print(item.result.reasoning)

Notes:

  • ClassifierRule requires a type and a descriptive description that the model can follow.
  • ClassifyParsingConfiguration is optional; set lang, max_pages, or target_pages to control parsing.
  • results.items contains one FileClassification per file with result.type, result.confidence, and result.reasoning.

Tips for writing good rules

  • Be specific about content features that distinguish the type.
  • Include key fields the document usually contains (e.g., invoice number, total amount).
  • Add multiple rules when needed to cover distinct patterns.
  • Start simple, test on a small set, then refine.

Next steps

  • Explore programmatic batching and progress: run multiple uploads concurrently and classify in one job.
  • Combine with Extract for downstream structured parsing after classification.