Python SDK
For a more programmatic approach, the Python SDK is the recommended way to experiment with different schemas and run extractions at scale. The Github repo for the Python SDK is here.
First, get an api key. We recommend putting your key in a file called .env
that looks like this:
LLAMA_CLOUD_API_KEY=llx-xxxxxx
Set up a new python environment using the tool of your choice, we used poetry init
. Then install the deps youβll need:
pip install llama-extract python-dotenv
Now we have our libraries and our API key available, letβs create a extract.py
file and extract data from files. In this case,
we're using some sample resumes from our example:
Quick Startβ
from llama_extract import LlamaExtract
from pydantic import BaseModel, Field
# bring in our LLAMA_CLOUD_API_KEY
from dotenv import load_dotenv
load_dotenv()
# Initialize client
extractor = LlamaExtract()
# Define schema using Pydantic
class Resume(BaseModel):
name: str = Field(description="Full name of candidate")
email: str = Field(description="Email address")
skills: list[str] = Field(description="Technical skills and technologies")
# Create extraction agent
agent = extractor.create_agent(name="resume-parser", data_schema=Resume)
# Extract data from document
result = agent.extract("resume.pdf")
print(result.data)
Now run it like any python file. This will print the results of the extraction.
python extract.py
Defining Schemasβ
Schemas can be defined using either Pydantic models or JSON Schema. Refer to the Schemas page for more details.
Other Extraction APIsβ
Batch Processingβ
Process multiple files asynchronously:
# Queue multiple files for extraction
jobs = await agent.queue_extraction(["resume1.pdf", "resume2.pdf"])
# Check job status
for job in jobs:
status = agent.get_extraction_job(job.id).status
print(f"Job {job.id}: {status}")
# Get results when complete
results = [agent.get_extraction_run_for_job(job.id) for job in jobs]
Updating Schemasβ
Schemas can be modified and updated after creation:
# Update schema
agent.data_schema = new_schema
# Save changes
agent.save()
Managing Agentsβ
# List all agents
agents = extractor.list_agents()
# Get specific agent
agent = extractor.get_agent(name="resume-parser")
# Delete agent
extractor.delete_agent(agent.id)
Examples
For more detailed examples on how to use the Python SDK, visit our GitHub repo.