API & Clients
This guide highlights the core workflow.
tip
See full API reference here.
App setup
- Python Client
- TypeScript Client
Install API client package
pip install llama-cloud
Import and configure client
from llama_cloud.client import LlamaCloud
client = LlamaCloud(token='<llama-cloud-api-key>')
Install API client package
npm install @llamaindex/cloud
Import and configure client
import { LlamaCloudApiClient } from '@llamaindex/cloud';
const client = new LlamaCloudApiClient({
token: apiKey
});
Create new index
Upload files
- Python Client
- TypeScript Client
with open('test.pdf', 'rb') as f:
file = client.files.upload_file(upload_file=f)
import fs from "fs"
const filePath = "node_modules/llamaindex/examples/abramov.txt";
file = client.files.uploadFile(project.id, fs.createReadStream(filePath))
tip
See Files API for full details on file management.
Configure data sources
- Python Client
- TypeScript Client
from llama_cloud.types import CloudS3DataSource
ds = {
'name': 's3',
'source_type': 'S3',
'component': CloudS3DataSource(bucket='test-bucket')
}
data_source = client.data_sources.create_data_source(request=ds)
const s3 = {
'name': 's3',
'sourceType': 'S3',
'component': {
'bucket': 'test-bucket'
}
}
data_source = await client.dataSources.createDataSource({
projectId: projectId,
body: s3
})
tip
See Data Sources API for full details on data source management.
Configure data sinks
- Python Client
- TypeScript Client
from llama_cloud.types import CloudPineconeVectorStore
ds = {
'name': 'pinecone',
'sink_type': 'PINECONE',
'component': CloudPineconeVectorStore(api_key='test-key', index_name='test-index')
}
data_sink = client.data_sinks.create_data_sink(request=ds)
const pinecone = {
'name': 'pinecone',
'sinkType': 'PINECONE',
'component': {
'api_key': 'test-key',
'index_name': 'test-index'
}
}
data_sink = client.dataSinks.createDataSink({
projectId: projectId,
body: pinecone
})
tip
See Data Sinks API for full details on data sink management.
Setup transformation and embedding config
tip
See Parsing & Transformation for full details on transformation guide.
# Embedding config
embedding_config = {
'type': 'OPENAI_EMBEDDING',
'component': {
'api_key': '<YOUR_API_KEY_HERE>', # editable
'model_name': 'text-embedding-ada-002' # editable
}
}
# Transformation auto config
transform_config = {
'mode': 'auto',
'config': {
'chunk_size': 1024, # editable
'chunk_overlap': 20 # editable
}
}
Create index (i.e. pipeline)
- Python Client
- TypeScript Client
pipeline = {
'name': 'test-pipeline',
'embedding_config': embedding_config,
'transform_config': transform_config,
'data_sink_id': data_sink.id
}
pipeline = client.pipelines.upsert_pipeline(request=pipeline)
const pipeline = {
'name': 'test-pipeline',
"embedding_config": embedding_config,
"transform_config": transform_config,
'dataSinkId': data_sink.id
}
await client.pipelines.upsertPipeline({
projectId: projectId,
body: pipeline
})
tip
See Pipeline API for full details on index (i.e. pipeline) management.
Add files to index
- Python Client
- TypeScript Client
files = [
{'file_id': file.id}
]
pipeline_files = client.pipelines.add_files_to_pipeline(pipeline.id, request=files)
const files = [
{'file_id': file.id}
]
pipeline_files = client.pipelines.addFilesToPipeline(pipeline.id, files)
Add data sources to index
- Python Client
- TypeScript Client
data_sources = [
{
'data_source_id': data_source.id,
'sync_interval': 43200.0 # Optional, scheduled sync frequency in seconds. In this case, every 12 hours.
}
]
pipeline_data_sources = client.pipelines.add_data_sources_to_pipeline(pipeline.id, request=data_sources)
const data_sources = [
{
'data_source_id': data_source.id,
'sync_interval': 43200.0 // Optional, scheduled sync frequency in seconds. In this case, every 12 hours.
}
]
pipeline_data_sources = client.pipelines.addDataSourcesToPipeline(pipeline.id, data_sources)
tip
For more details on scheduled sync, including how the sync timing works, and available sync frequencies, refer to Scheduled sync.
Add documents to index
- Python Client
- TypeScript Client
from llama_cloud.types import CloudDocumentCreate
documents = [
CloudDocumentCreate(
text='test-text',
metadata={
'test-key': 'test-val'
}
)
]
documents = client.pipelines.create_batch_pipeline_documents(pipeline.id, request=documents)
const documents = [
{
'text': 'test-text',
'metadata': {
'test-key': 'test-val'
}
}
]
documents = client.pipelines.createBatchPipelineDocuments(pipeline.id, documents)
Observe ingestion status & history
Get index status
- Python Client
- TypeScript Client
status = client.pipelines.get_pipeline_status(pipeline.id)
status = client.pipelines.getPipelineStatus(pipeline.id)
Get ingestion job history
- Python Client
- TypeScript Client
jobs = client.pipelines.list_pipeline_jobs(pipeline.id)
jobs = client.pipelines.listPipelineJobs(pipeline.id)
Run search (i.e. retrieval endpoint)
- Python Client
- TypeScript Client
results = client.pipelines.run_search(pipeline.id, query='test-query')
results = client.pipelines.runSearch(pipeline.id, {
query: 'test-query'
})