Skip to main content

Sync Pipeline

POST 

/api/v1/pipelines/:pipeline_id/sync

Run ingestion for the pipeline by incrementally updating the data-sink with upstream changes from data-sources & files.

Request

Path Parameters

    pipeline_id uuidrequired

Cookie Parameters

    session any

Responses

Successful Response

Schema

    id uuidrequired

    Unique identifier

    created_at

    object

    Creation datetime

    anyOf

    string

    updated_at

    object

    Update datetime

    anyOf

    string

    name Name (string)required
    project_id uuidrequired

    embedding_model_config_id

    object

    The ID of the EmbeddingModelConfig this pipeline is using.

    anyOf

    string

    pipeline_type PipelineType (string)

    Enum for representing the type of a pipeline

    Possible values: [PLAYGROUND, MANAGED]

    Default value: MANAGED

    managed_pipeline_id

    object

    The ID of the ManagedPipeline this playground pipeline is linked to.

    anyOf

    string

    embedding_config

    object

    required

    oneOf

    type Type (string)

    Type of the embedding model.

    Possible values: [AZURE_EMBEDDING]

    Default value: AZURE_EMBEDDING

    component

    object

    Configuration for the Azure OpenAI embedding model.

    model_name Model Name (string)

    The name of the OpenAI embedding model.

    Default value: text-embedding-ada-002
    embed_batch_size Embed Batch Size (integer)

    The batch size for embedding calls.

    Possible values: > 0 and <= 2048

    Default value: 10

    num_workers

    object

    The number of workers to use for async embedding calls.

    anyOf

    integer

    additional_kwargs object

    Additional kwargs for the OpenAI API.

    api_key

    object

    The OpenAI API key.

    anyOf

    string

    api_base Api Base (string)

    The base URL for Azure deployment.

    Default value:
    api_version Api Version (string)

    The version for Azure OpenAI API.

    Default value:
    max_retries Max Retries (integer)

    Maximum number of retries.

    Default value: 10
    timeout Timeout (number)

    Timeout for each request.

    Default value: 60

    default_headers

    object

    The default headers for API requests.

    anyOf

    property name* string
    reuse_client Reuse Client (boolean)

    Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

    Default value: true

    dimensions

    object

    The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

    anyOf

    integer

    azure_endpoint

    object

    The Azure endpoint to use.

    anyOf

    string

    azure_deployment

    object

    The Azure deployment to use.

    anyOf

    string

    class_name Class Name (string)
    Default value: AzureOpenAIEmbedding

    configured_transformations

    object[]

    Deprecated don't use it, List of configured transformations.

  • Array [

  • id uuid
    configurable_transformation_type ConfigurableTransformationNames (string)required

    Name for the type of transformation this is (e.g. SIMPLE_NODE_PARSER). Can also be an enum instance of llama_index.ingestion.transformations.ConfigurableTransformations. This will be converted to ConfigurableTransformationNames.

    Possible values: [CHARACTER_SPLITTER, PAGE_SPLITTER_NODE_PARSER, CODE_NODE_PARSER, SENTENCE_AWARE_NODE_PARSER, TOKEN_AWARE_NODE_PARSER, MARKDOWN_NODE_PARSER, MARKDOWN_ELEMENT_NODE_PARSER]

    component

    object

    required

    Component that implements the transformation

    anyOf

    object

  • ]

  • config_hash

    object

    Hashes for the configuration of the pipeline.

    anyOf

    Hashes for the configuration of a pipeline.

    embedding_config_hash

    object

    Hash of the embedding config.

    anyOf

    string

    parsing_config_hash

    object

    Hash of the llama parse parameters.

    anyOf

    string

    transform_config_hash

    object

    Hash of the transform config.

    anyOf

    string

    transform_config

    object

    Configuration for the transformation.

    anyOf

    mode Mode (string)

    Possible values: [auto]

    Default value: auto
    chunk_size Chunk Size (integer)

    Chunk size for the transformation.

    Possible values: > 0

    Default value: 1024
    chunk_overlap Chunk Overlap (integer)

    Chunk overlap for the transformation.

    Default value: 200

    preset_retrieval_parameters

    object

    Preset retrieval parameters for the pipeline.

    dense_similarity_top_k

    object

    Number of nodes for dense retrieval.

    anyOf

    integer

    Possible values: >= 1 and <= 100

    dense_similarity_cutoff

    object

    Minimum similarity score wrt query for retrieval

    anyOf

    number

    Possible values: <= 1

    sparse_similarity_top_k

    object

    Number of nodes for sparse retrieval.

    anyOf

    integer

    Possible values: >= 1 and <= 100

    enable_reranking

    object

    Enable reranking for retrieval

    anyOf

    boolean

    rerank_top_n

    object

    Number of reranked nodes for returning.

    anyOf

    integer

    Possible values: >= 1 and <= 100

    alpha

    object

    Alpha value for hybrid retrieval to determine the weights between dense and sparse retrieval. 0 is sparse retrieval and 1 is dense retrieval.

    anyOf

    number

    Possible values: <= 1

    search_filters

    object

    Search filters for retrieval.

    anyOf

    Metadata filters for vector stores.

    filters

    object[]

    required

  • Array [

  • anyOf

    Comprehensive metadata filter for vector stores to support more operators.

    Value uses Strict* types, as int, float and str are compatible types and were all converted to string before.

    See: https://docs.pydantic.dev/latest/usage/types/#strict-types

    key Key (string)required

    value

    object

    required

    anyOf

    integer

    operator FilterOperator (string)

    Vector store filter operator.

    Possible values: [==, >, <, !=, >=, <=, in, nin, any, all, text_match, contains, is_empty]

    Default value: ==
  • ]

  • condition

    object

    anyOf

    Vector store filter conditions to combine different filters.

    string

    Possible values: [and, or]

    files_top_k

    object

    Number of files to retrieve (only for retrieval mode files_via_metadata and files_via_content).

    anyOf

    integer

    Possible values: >= 1 and <= 5

    retrieval_mode RetrievalMode (string)

    The retrieval mode for the query.

    Possible values: [chunks, files_via_metadata, files_via_content, auto_routed]

    Default value: chunks
    retrieve_image_nodes Retrieve Image Nodes (boolean)

    Whether to retrieve image nodes.

    Default value: false
    class_name Class Name (string)
    Default value: base_component

    eval_parameters

    object

    Eval parameters for the pipeline.

    llm_model SupportedLLMModelNames (string)

    The LLM model to use within eval execution.

    Possible values: [GPT_3_5_TURBO, GPT_4, GPT_4_TURBO, GPT_4O, GPT_4O_MINI, AZURE_OPENAI]

    Default value: GPT_4O
    qa_prompt_tmpl Qa Prompt Tmpl (string)

    The template to use for the question answering prompt.

    Default value: Context information is below. --------------------- {context_str} --------------------- Given the context information and not prior knowledge, answer the query. Query: {query_str} Answer:

    llama_parse_parameters

    object

    Settings that can be configured for how to use LlamaParse to parse files within a LlamaCloud pipeline.

    anyOf

    Settings that can be configured for how to use LlamaParse to parse files within a LlamaCloud pipeline.

    languages ParserLanguages (string)[]

    Possible values: [af, az, bs, cs, cy, da, de, en, es, et, fr, ga, hr, hu, id, is, it, ku, la, lt, lv, mi, ms, mt, nl, no, oc, pi, pl, pt, ro, rs_latin, sk, sl, sq, sv, sw, tl, tr, uz, vi, ar, fa, ug, ur, bn, as, mni, ru, rs_cyrillic, be, bg, uk, mn, abq, ady, kbd, ava, dar, inh, che, lbe, lez, tab, tjk, hi, mr, ne, bh, mai, ang, bho, mah, sck, new, gom, sa, bgc, th, ch_sim, ch_tra, ja, ko, ta, te, kn], >= 1

    parsing_instruction Parsing Instruction (string)
    Default value:
    disable_ocr Disable Ocr (boolean)
    Default value: false
    annotate_links Annotate Links (boolean)
    Default value: false
    disable_reconstruction Disable Reconstruction (boolean)
    Default value: false
    disable_image_extraction Disable Image Extraction (boolean)
    Default value: false
    invalidate_cache Invalidate Cache (boolean)
    Default value: false
    do_not_cache Do Not Cache (boolean)
    Default value: false
    fast_mode Fast Mode (boolean)
    Default value: false
    skip_diagonal_text Skip Diagonal Text (boolean)
    Default value: false
    gpt4o_mode Gpt4O Mode (boolean)
    Default value: false
    gpt4o_api_key Gpt4O Api Key (string)
    Default value:
    do_not_unroll_columns Do Not Unroll Columns (boolean)
    Default value: false
    guess_xlsx_sheet_name Guess Xlsx Sheet Name (boolean)
    Default value: false

    page_separator

    object

    anyOf

    string

    bounding_box Bounding Box (string)
    Default value:
    target_pages Target Pages (string)
    Default value:
    use_vendor_multimodal_model Use Vendor Multimodal Model (boolean)
    Default value: false
    vendor_multimodal_model_name Vendor Multimodal Model Name (string)
    Default value:
    vendor_multimodal_api_key Vendor Multimodal Api Key (string)
    Default value:
    page_prefix Page Prefix (string)
    Default value:
    page_suffix Page Suffix (string)
    Default value:
    webhook_url Webhook Url (string)
    Default value:
    take_screenshot Take Screenshot (boolean)
    Default value: false
    is_formatting_instruction Is Formatting Instruction (boolean)
    Default value: true
    premium_mode Premium Mode (boolean)
    Default value: false
    continuous_mode Continuous Mode (boolean)
    Default value: false
    s3_input_path S3 Input Path (string)
    Default value:
    s3_output_path_prefix S3 Output Path Prefix (string)
    Default value:

    project_id

    object

    anyOf

    string

    azure_openai_deployment_name

    object

    anyOf

    string

    azure_openai_endpoint

    object

    anyOf

    string

    azure_openai_api_version

    object

    anyOf

    string

    azure_openai_key

    object

    anyOf

    string

    input_url

    object

    anyOf

    string

    http_proxy

    object

    anyOf

    string

    auto_mode Auto Mode (boolean)
    Default value: false

    auto_mode_trigger_on_regexp_in_page

    object

    anyOf

    string

    auto_mode_trigger_on_text_in_page

    object

    anyOf

    string

    auto_mode_trigger_on_table_in_page Auto Mode Trigger On Table In Page (boolean)
    Default value: false
    auto_mode_trigger_on_image_in_page Auto Mode Trigger On Image In Page (boolean)
    Default value: false

    data_sink

    object

    The data sink for the pipeline. If None, the pipeline will use the fully managed data sink.

    anyOf

    Schema for a data sink.

    id uuidrequired

    Unique identifier

    created_at

    object

    Creation datetime

    anyOf

    string

    updated_at

    object

    Update datetime

    anyOf

    string

    name Name (string)required

    The name of the data sink.

    sink_type ConfigurableDataSinkNames (string)required

    Possible values: [PINECONE, POSTGRES, QDRANT, AZUREAI_SEARCH, MONGODB_ATLAS, MILVUS]

    component

    object

    required

    anyOf

    object

    project_id uuidrequired