Sync Pipeline Data Source
POST/api/v1/pipelines/:pipeline_id/data-sources/:data_source_id/sync
Run ingestion for the pipeline data source by incrementally updating the data-sink with upstream changes from data-source.
Request
Path Parameters
Cookie Parameters
Responses
- 200
- 422
Successful Response
- application/json
- Schema
- Example (from schema)
Schema
- MOD1
- MOD1
- MOD1
- AzureOpenAIEmbeddingConfig
- CohereEmbeddingConfig
- GeminiEmbeddingConfig
- HuggingFaceInferenceAPIEmbeddingConfig
- OpenAIEmbeddingConfig
- VertexAIEmbeddingConfig
- BedrockEmbeddingConfig
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- Pooling
- MOD1
- MOD1
- MOD1
- MOD2
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
Array [
- CharacterSplitter
- PageSplitterNodeParser
- CodeSplitter
- SentenceSplitter
- TokenTextSplitter
- MarkdownNodeParser
- MarkdownElementNodeParser
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- LLM
- MOD1
- BasePromptTemplate
- NodeParser
- MOD1
]
- PipelineConfigurationHashes
- MOD1
- MOD1
- MOD1
- AutoTransformConfig
- AdvancedModeTransformConfig
- NoneSegmentationConfig
- PageSegmentationConfig
- ElementSegmentationConfig
- NoneChunkingConfig
- CharacterChunkingConfig
- TokenChunkingConfig
- SentenceChunkingConfig
- SemanticChunkingConfig
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MetadataFilters
Array [
- MetadataFilter
- MOD1
- MOD2
- MOD3
- MOD4
- MOD5
- MOD6
Array [
]
Array [
]
Array [
]
]
- FilterCondition
- MOD1
- LlamaParseParameters
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- DataSink
- MOD1
- MOD1
- CloudPineconeVectorStore
- CloudPostgresVectorStore
- CloudQdrantVectorStore
- CloudAzureAISearchVectorStore
- CloudMongoDBAtlasVectorSearch
- CloudMilvusVectorStore
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
- MOD1
Unique identifier
created_at
object
Creation datetime
anyOf
string
updated_at
object
Update datetime
anyOf
string
Possible values: [PLAYGROUND
, MANAGED
]
Default value: MANAGED
Enum for representing the type of a pipeline
managed_pipeline_id
object
The ID of the ManagedPipeline this playground pipeline is linked to.
anyOf
string
embedding_config
object
required
oneOf
Possible values: [AZURE_EMBEDDING
]
Default value: AZURE_EMBEDDING
Type of the embedding model.
component
object
Configuration for the Azure OpenAI embedding model.
Default value: text-embedding-ada-002
The name of the OpenAI embedding model.
Possible values: > 0
and <= 2048
Default value: 10
The batch size for embedding calls.
num_workers
object
The number of workers to use for async embedding calls.
anyOf
integer
Additional kwargs for the OpenAI API.
api_key
object
The OpenAI API key.
anyOf
string
The base URL for Azure deployment.
The version for Azure OpenAI API.
Default value: 10
Maximum number of retries.
Default value: 60
Timeout for each request.
default_headers
object
The default headers for API requests.
anyOf
Default value: true
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
dimensions
object
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
anyOf
integer
azure_endpoint
object
The Azure endpoint to use.
anyOf
string
azure_deployment
object
The Azure deployment to use.
anyOf
string
Default value: AzureOpenAIEmbedding
Possible values: [COHERE_EMBEDDING
]
Default value: COHERE_EMBEDDING
Type of the embedding model.
component
object
Configuration for the Cohere embedding model.
Default value: embed-english-v3.0
The modelId of the Cohere model to use.
Possible values: > 0
and <= 2048
Default value: 10
The batch size for embedding calls.
num_workers
object
The number of workers to use for async embedding calls.
anyOf
integer
api_key
object
required
The Cohere API key.
anyOf
string
Default value: END
Truncation type - START/ END/ NONE
input_type
object
Model Input type. If not provided, search_document and search_query are used when needed.
anyOf
string
Default value: float
Embedding type. If not provided float embedding_type is used when needed.
Default value: CohereEmbedding
Possible values: [GEMINI_EMBEDDING
]
Default value: GEMINI_EMBEDDING
Type of the embedding model.
component
object
Configuration for the Gemini embedding model.
Default value: models/embedding-001
The modelId of the Gemini model to use.
Possible values: > 0
and <= 2048
Default value: 10
The batch size for embedding calls.
num_workers
object
The number of workers to use for async embedding calls.
anyOf
integer
title
object
Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.
anyOf
string
task_type
object
The task for embedding model.
anyOf
string
api_key
object
API key to access the model. Defaults to None.
anyOf
string
api_base
object
API base to access the model. Defaults to None.
anyOf
string
transport
object
Transport to access the model. Defaults to None.
anyOf
string
Default value: GeminiEmbedding
Possible values: [HUGGINGFACE_API_EMBEDDING
]
Default value: HUGGINGFACE_API_EMBEDDING
Type of the embedding model.
component
object
Configuration for the HuggingFace Inference API embedding model.
model_name
object
Hugging Face model name. If None, the task will be used.
anyOf
string
Possible values: > 0
and <= 2048
Default value: 10
The batch size for embedding calls.
num_workers
object
The number of workers to use for async embedding calls.
anyOf
integer
pooling
object
Pooling strategy. If None, the model's default pooling is used.
anyOf
string
Possible values: [cls
, mean
, last
]
query_instruction
object
Instruction to prepend during query embedding.
anyOf
string
text_instruction
object
Instruction to prepend during text embedding.
anyOf
string
token
object
Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.
anyOf
string
boolean
timeout
object
The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.
anyOf
number
headers
object
Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.
anyOf
cookies
object
Additional cookies to send to the server.
anyOf
task
object
Optional task to pick Hugging Face's recommended model, used when model_name is left as default of None.
anyOf
string
Default value: HuggingFaceInferenceAPIEmbedding
Possible values: [OPENAI_EMBEDDING
]
Default value: OPENAI_EMBEDDING
Type of the embedding model.
component
object
Configuration for the OpenAI embedding model.
Default value: text-embedding-ada-002
The name of the OpenAI embedding model.
Possible values: > 0
and <= 2048
Default value: 10
The batch size for embedding calls.
num_workers
object
The number of workers to use for async embedding calls.
anyOf
integer
Additional kwargs for the OpenAI API.
api_key
object
The OpenAI API key.
anyOf
string
api_base
object
The base URL for OpenAI API.
anyOf
string
api_version
object
The version for OpenAI API.
anyOf
string
Default value: 10
Maximum number of retries.
Default value: 60
Timeout for each request.
default_headers
object
The default headers for API requests.
anyOf
Default value: true
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
dimensions
object
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
anyOf
integer
Default value: OpenAIEmbedding
Possible values: [VERTEXAI_EMBEDDING
]
Default value: VERTEXAI_EMBEDDING
Type of the embedding model.
component
object
Configuration for the VertexAI embedding model.
Default value: textembedding-gecko@003
The modelId of the VertexAI model to use.
Possible values: > 0
and <= 2048
Default value: 10
The batch size for embedding calls.
num_workers
object
The number of workers to use for async embedding calls.
anyOf
integer
The default location to use when making API calls.
The default GCP project to use when making Vertex API calls.
Possible values: [default
, classification
, clustering
, similarity
, retrieval
]
Default value: retrieval
The embedding mode to use.
Additional kwargs for the Vertex.
client_email
object
required
The client email for the VertexAI credentials.
anyOf
string
token_uri
object
required
The token URI for the VertexAI credentials.
anyOf
string
private_key_id
object
required
The private key ID for the VertexAI credentials.
anyOf
string
private_key
object
required
The private key for the VertexAI credentials.
anyOf
string
Default value: VertexTextEmbedding
Possible values: [BEDROCK_EMBEDDING
]
Default value: BEDROCK_EMBEDDING
Type of the embedding model.
component
object
Configuration for the Bedrock embedding model.
Default value: amazon.titan-embed-text-v1
The modelId of the Bedrock model to use.
Possible values: > 0
and <= 2048
Default value: 10
The batch size for embedding calls.
num_workers
object
The number of workers to use for async embedding calls.
anyOf
integer
profile_name
object
The name of aws profile to use. If not given, then the default profile is used.
anyOf
string
aws_access_key_id
object
AWS Access Key ID to use
anyOf
string
aws_secret_access_key
object
AWS Secret Access Key to use
anyOf
string
aws_session_token
object
AWS Session Token to use
anyOf
string
region_name
object
AWS region name to use. Uses region configured in AWS CLI if not passed
anyOf
string
Possible values: > 0
Default value: 10
The maximum number of API retries.
Default value: 60
The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.
Additional kwargs for the bedrock client.
Default value: BedrockEmbedding
configured_transformations
object[]
Deprecated don't use it, List of configured transformations.
Possible values: [CHARACTER_SPLITTER
, PAGE_SPLITTER_NODE_PARSER
, CODE_NODE_PARSER
, SENTENCE_AWARE_NODE_PARSER
, TOKEN_AWARE_NODE_PARSER
, MARKDOWN_NODE_PARSER
, MARKDOWN_ELEMENT_NODE_PARSER
]
Name for the type of transformation this is (e.g. SIMPLE_NODE_PARSER). Can also be an enum instance of llama_index.ingestion.transformations.ConfigurableTransformations. This will be converted to ConfigurableTransformationNames.
component
object
required
Component that implements the transformation
anyOf
Default value: true
Whether or not to consider metadata when splitting.
Default value: true
Include prev/next node relationships.
id_func
object
Function to generate node IDs.
anyOf
string
Possible values: > 0
Default value: 1024
The token chunk size for each chunk.
Default value: 200
The token overlap of each chunk when splitting.
Default value:
Default separator for splitting into words
Default value: `
`
Separator between paragraphs.
secondary_chunking_regex
object
Backup regex for splitting into sentences.
anyOf
string
Default value: SentenceSplitter
Default value: true
Whether or not to consider metadata when splitting.
Default value: true
Include prev/next node relationships.
id_func
object
Function to generate node IDs.
anyOf
string
page_separator
object
Separator to split text into pages.
anyOf
string
Default value: base_component
Default value: true
Whether or not to consider metadata when splitting.
Default value: true
Include prev/next node relationships.
id_func
object
Function to generate node IDs.
anyOf
string
The programming language of the code being split.
Possible values: > 0
Default value: 40
The number of lines to include in each chunk.
Possible values: > 0
Default value: 15
How many lines of code each chunk overlaps with.
Possible values: > 0
Default value: 1500
Maximum number of characters per chunk.
Default value: CodeSplitter
Default value: true
Whether or not to consider metadata when splitting.
Default value: true
Include prev/next node relationships.
id_func
object
Function to generate node IDs.
anyOf
string
Possible values: > 0
Default value: 1024
The token chunk size for each chunk.
Default value: 200
The token overlap of each chunk when splitting.
Default value:
Default separator for splitting into words
Default value: `
`
Separator between paragraphs.
secondary_chunking_regex
object
Backup regex for splitting into sentences.
anyOf
string
Default value: SentenceSplitter
Default value: true
Whether or not to consider metadata when splitting.
Default value: true
Include prev/next node relationships.
id_func
object
Function to generate node IDs.
anyOf
string
Possible values: > 0
Default value: 1024
The token chunk size for each chunk.
Default value: 20
The token overlap of each chunk when splitting.
Default value:
Default separator for splitting into words
Additional separators for splitting.
Default value: TokenTextSplitter
Default value: true
Whether or not to consider metadata when splitting.
Default value: true
Include prev/next node relationships.
id_func
object
Function to generate node IDs.
anyOf
string
Default value: MarkdownNodeParser
Default value: true
Whether or not to consider metadata when splitting.
Default value: true
Include prev/next node relationships.
id_func
object
Function to generate node IDs.
anyOf
string
llm
object
LLM model to use for summarization.
anyOf
system_prompt
object
System prompt for LLM calls.
anyOf
string
Function to convert a list of messages to an LLM prompt.
Function to convert a completion to an LLM prompt.
output_parser
object
Output parser to parse, validate, and correct errors programmatically.
anyOf
Possible values: [default
, openai
, llm
, function
, guidance
, lm-format-enforcer
]
Default value: default
Pydantic program mode.
query_wrapper_prompt
object
Query wrapper prompt for LLM calls.
anyOf
kwargs
object
required
output_parser
object
required
anyOf
template_var_mappings
object
Template variable mappings (Optional).
anyOf
function_mappings
object
Function mappings (Optional). This is a mapping from template variable names to functions that take in the current kwargs and return a string.
anyOf
Default value: What is this table about? Give a very concise summary (imagine you are adding a new caption and summary for this table), and output the real/existing table title/caption if context provided.and output the real/existing table id if context provided.and also output whether or not the table should be kept.
Query string to use for summarization.
Default value: 4
Num of workers for async jobs.
Default value: true
Whether to show progress.
nested_node_parser
object
Other types of node parsers to handle some types of nodes.
anyOf
Default value: true
Whether or not to consider metadata when splitting.
Default value: true
Include prev/next node relationships.
id_func
object
Function to generate node IDs.
anyOf
string
Default value: base_component
Default value: MarkdownElementNodeParser
config_hash
object
Hashes for the configuration of the pipeline.
anyOf
embedding_config_hash
object
Hash of the embedding config.
anyOf
string
parsing_config_hash
object
Hash of the llama parse parameters.
anyOf
string
transform_config_hash
object
Hash of the transform config.
anyOf
string
transform_config
object
Configuration for the transformation.
anyOf
Possible values: [auto
]
Default value: auto
Possible values: > 0
Default value: 1024
Chunk size for the transformation.
Default value: 200
Chunk overlap for the transformation.
Possible values: [advanced
]
Default value: advanced
segmentation_config
object
Configuration for the segmentation.
anyOf
Possible values: [none
]
Default value: none
Possible values: [page
]
Default value: page
Default value: `
`
Possible values: [element
]
Default value: element
chunking_config
object
Configuration for the chunking.
anyOf
Possible values: [none
]
Default value: none
Possible values: > 0
Default value: 1024
Default value: 200
Possible values: [character
]
Default value: character
Possible values: > 0
Default value: 1024
Default value: 200
Possible values: [token
]
Default value: token
Default value:
Possible values: > 0
Default value: 1024
Default value: 200
Possible values: [sentence
]
Default value: sentence
Default value:
Default value: `
`
Possible values: [semantic
]
Default value: semantic
Default value: 1
Default value: 95
preset_retrieval_parameters
object
Preset retrieval parameters for the pipeline.
dense_similarity_top_k
object
Number of nodes for dense retrieval.
anyOf
integer
Possible values: >= 1
and <= 100
sparse_similarity_top_k
object
Number of nodes for sparse retrieval.
anyOf
integer
Possible values: >= 1
and <= 100
enable_reranking
object
Enable reranking for retrieval
anyOf
boolean
rerank_top_n
object
Number of reranked nodes for returning.
anyOf
integer
Possible values: >= 1
and <= 100
alpha
object
Alpha value for hybrid retrieval to determine the weights between dense and sparse retrieval. 0 is sparse retrieval and 1 is dense retrieval.
anyOf
number
Possible values: <= 1
search_filters
object
Search filters for retrieval.
anyOf
filters
object[]
required
anyOf
value
object
required
anyOf
integer
number
string
string
number
integer
Possible values: [==
, >
, <
, !=
, >=
, <=
, in
, nin
, any
, all
, text_match
, contains
, is_empty
]
Default value: ==
Vector store filter operator.
condition
object
anyOf
string
Possible values: [and
, or
]
files_top_k
object
Number of files to retrieve (only for retrieval mode files_via_metadata and files_via_content).
anyOf
integer
Possible values: >= 1
and <= 5
Possible values: [chunks
, files_via_metadata
, files_via_content
]
Default value: chunks
The retrieval mode for the query.
Whether to retrieve image nodes.
Default value: base_component
eval_parameters
object
Eval parameters for the pipeline.
Possible values: [GPT_3_5_TURBO
, GPT_4
, GPT_4_TURBO
, GPT_4O
, GPT_4O_MINI
, AZURE_OPENAI
]
Default value: GPT_4O
The LLM model to use within eval execution.
Default value: `Context information is below.
{context_str}
Given the context information and not prior knowledge, answer the query. Query: {query_str} Answer: `
The template to use for the question answering prompt.
llama_parse_parameters
object
Settings that can be configured for how to use LlamaParse to parse files within a LlamaCloud pipeline.
anyOf
Possible values: [af
, az
, bs
, cs
, cy
, da
, de
, en
, es
, et
, fr
, ga
, hr
, hu
, id
, is
, it
, ku
, la
, lt
, lv
, mi
, ms
, mt
, nl
, no
, oc
, pi
, pl
, pt
, ro
, rs_latin
, sk
, sl
, sq
, sv
, sw
, tl
, tr
, uz
, vi
, ar
, fa
, ug
, ur
, bn
, as
, mni
, ru
, rs_cyrillic
, be
, bg
, uk
, mn
, abq
, ady
, kbd
, ava
, dar
, inh
, che
, lbe
, lez
, tab
, tjk
, hi
, mr
, ne
, bh
, mai
, ang
, bho
, mah
, sck
, new
, gom
, sa
, bgc
, th
, ch_sim
, ch_tra
, ja
, ko
, ta
, te
, kn
], >= 1
Default value: true
page_separator
object
anyOf
string
azure_openai_deployment_name
object
anyOf
string
azure_openai_endpoint
object
anyOf
string
azure_openai_api_version
object
anyOf
string
azure_openai_key
object
anyOf
string
data_sink
object
The data sink for the pipeline. If None, the pipeline will use the fully managed data sink.
anyOf
Unique identifier
created_at
object
Creation datetime
anyOf
string
updated_at
object
Update datetime
anyOf
string
The name of the data sink.
Possible values: [PINECONE
, POSTGRES
, QDRANT
, AZUREAI_SEARCH
, MONGODB_ATLAS
, MILVUS
]
component
object
required
anyOf
Possible values: [true
]
Default value: true
namespace
object
anyOf
string
insert_kwargs
object
anyOf
Default value: CloudPineconeVectorStore
Possible values: [false
]
hybrid_search
object
anyOf
boolean
Default value: CloudPostgresVectorStore
Possible values: [true
]
Default value: true
Default value: 3
Default value: CloudQdrantVectorStore
Possible values: [true
]
Default value: true
search_service_api_version
object
anyOf
string
index_name
object
anyOf
string
filterable_metadata_field_keys
object
anyOf
embedding_dimension
object
anyOf
integer
client_id
object
anyOf
string
client_secret
object
anyOf
string
tenant_id
object
anyOf
string
Default value: CloudAzureAISearchVectorStore
Possible values: [false
]
vector_index_name
object
anyOf
string
fulltext_index_name
object
anyOf
string
Default value: CloudMongoDBAtlasVectorSearch
Possible values: [false
]
collection_name
object
anyOf
string
token
object
anyOf
string
embedding_dimension
object
anyOf
integer
Default value: CloudMilvusVectorStore
{
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"created_at": "2024-10-03T01:48:33.777Z",
"updated_at": "2024-10-03T01:48:33.777Z",
"name": "string",
"project_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"pipeline_type": "MANAGED",
"managed_pipeline_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"embedding_config": {},
"configured_transformations": [
{
"id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"configurable_transformation_type": "CHARACTER_SPLITTER",
"component": {}
}
],
"config_hash": {},
"transform_config": {},
"preset_retrieval_parameters": {
"dense_similarity_top_k": 0,
"sparse_similarity_top_k": 0,
"enable_reranking": true,
"rerank_top_n": 0,
"alpha": 0,
"search_filters": {},
"files_top_k": 0,
"retrieval_mode": "chunks",
"retrieve_image_nodes": false,
"class_name": "base_component"
},
"eval_parameters": {
"llm_model": "GPT_4O",
"qa_prompt_tmpl": "Context information is below.\n---------------------\n{context_str}\n---------------------\nGiven the context information and not prior knowledge, answer the query.\nQuery: {query_str}\nAnswer: "
},
"llama_parse_parameters": {},
"data_sink": {}
}
Validation Error
- application/json
- Schema
- Example (from schema)
Schema
Array [
Array [
- MOD1
- MOD2
]
]
detail
object[]
loc
object[]
required
anyOf
string
integer
{
"detail": [
{
"loc": [
"string",
0
],
"msg": "string",
"type": "string"
}
]
}