Skip to main content

API & Clients

This guide highlights the core workflow.

tip

See full API reference here.

App setup

Install API client package

pip install llama-cloud 

Import and configure client

from llama_cloud.client import LlamaCloud

client = LlamaCloud(token='<llama-cloud-api-key>')

Create new index

Upload files

with open('test.pdf', 'rb') as f:
file = client.files.upload_file(upload_file=f)
tip

See Files API for full details on file management.

Configure data sources

from llama_cloud.types import CloudS3DataSource

ds = {
'name': 's3',
'source_type': 'S3',
'component': CloudS3DataSource(bucket='test-bucket')
}
data_source = client.data_sources.create_data_source(request=ds)
tip

See Data Sources API for full details on data source management.

See full list of data sources and specifications.

Configure data sinks

from llama_cloud.types import CloudPineconeVectorStore

ds = {
'name': 'pinecone',
'sink_type': 'PINECONE',
'component': CloudPineconeVectorStore(api_key='test-key', index_name='test-index')
}
data_sink = client.data_sinks.create_data_sink(request=ds)
tip

See Data Sinks API for full details on data sink management.

See full list of data sinks and specifications.

Create index (i.e. pipeline)

pipeline = {
'name': 'test-pipeline',
'configured_transformations': [
{
'configurable_transformation_type': 'SENTENCE_AWARE_NODE_PARSER',
'component': {
'chunk_size': 1024,
'chunk_overlap': 20,
}
},
{
'configurable_transformation_type': 'OPENAI_EMBEDDING',
'component': {
'model_name': 'text-embedding-ada-002',
'api_key': 'sk-...',
}
}
],
'data_sink_id': data_sink.id
}

pipeline = client.pipelines.upsert_pipeline(request=pipeline)
tip

See Pipeline API for full details on index (i.e. pipeline) management.

Add files to index

files = [
{'file_id': file.id}
]

pipeline_files = client.pipelines.add_files_to_pipeline(pipeline.id, request=files)

Add data sources to index

data_sources = [
{'data_source_id': data_source.id}
]

pipeline_data_sources = client.pipelines.add_data_sources_to_pipeline(pipeline.id, request=data_sources)

Add documents to index

from llama_cloud.types import CloudDocumentCreate

documents = [
CloudDocumentCreate(
text='test-text',
metadata={
'test-key': 'test-val'
}
)
]

documents = client.pipelines.create_batch_pipeline_documents(pipeline.id, request=documents)

Observe ingestion status & history

Get index status

status = client.pipelines.get_pipeline_status(pipeline.id)

Get ingestion job history

jobs = client.pipelines.list_pipeline_jobs(pipeline.id)

Run search (i.e. retrieval endpoint)

results = client.pipelines.run_search(pipeline.id, query='test-query')