Skip to main content

LlamaParse Document Pipeline Triggers

This document provides detailed information about all available triggers in the LlamaParse document pipeline. These triggers can be used in the auto_mode_configuration_json to conditionally apply specific parsing configurations to pages that match certain criteria.

Content-Based Triggers

TriggerDescriptionExample
text_in_pageActivates when a specific text string is found within the page's text or markdown content."text_in_page": "Executive Summary"
table_in_pageActivates when the page contains an HTML table element or markdown table syntax."table_in_page": true
image_in_pageActivates when the page contains images (excluding full-page screenshots)."image_in_page": true
regexp_in_pageActivates when the page's markdown content matches a specified regular expression pattern."regexp_in_page": "\\d{4}-\\d{2}-\\d{2}"

File-Based Triggers

TriggerDescriptionExample
filename_regexpActivates when the filename matches a specified regular expression pattern."filename_regexp": "invoice.*\\.pdf"

Text Metrics Triggers

TriggerDescriptionExample
page_longer_than_n_charsActivates when the page's text or markdown content exceeds a specified character count."page_longer_than_n_chars": "1000"
page_shorter_than_n_charsActivates when the page's text or markdown content is less than a specified character count."page_shorter_than_n_chars": "500"
page_contains_at_least_n_wordsActivates when the page contains more than a specified number of valid words (2+ characters)."page_contains_at_least_n_words": "200"
page_contains_at_most_n_wordsActivates when the page contains fewer than a specified number of valid words (2+ characters)."page_contains_at_most_n_words": "50"
page_contains_at_least_n_linesActivates when the page has more than a specified number of non-empty lines."page_contains_at_least_n_lines": "20"
page_contains_at_most_n_linesActivates when the page has fewer than a specified number of non-empty lines."page_contains_at_most_n_lines": "10"

Element Count Triggers

TriggerDescriptionExample
page_contains_at_least_n_imagesActivates when the page contains more than a specified number of images."page_contains_at_least_n_images": "2"
page_contains_at_most_n_imagesActivates when the page contains fewer than a specified number of images."page_contains_at_most_n_images": "1"
page_contains_at_least_n_tablesActivates when the page contains more than a specified number of tables."page_contains_at_least_n_tables": "1"
page_contains_at_most_n_tablesActivates when the page contains fewer than a specified number of tables."page_contains_at_most_n_tables": "3"
page_contains_at_least_n_linksActivates when the page contains more than a specified number of links."page_contains_at_least_n_links": "5"
page_contains_at_most_n_linksActivates when the page contains fewer than a specified number of links."page_contains_at_most_n_links": "10"
page_contains_at_least_n_chartsActivates when the page contains more than a specified number of charts."page_contains_at_least_n_charts": "1"
page_contains_at_most_n_chartsActivates when the page contains fewer than a specified number of charts."page_contains_at_most_n_charts": "2"
page_contains_at_least_n_layout_elementsActivates when the page contains more than a specified number of layout elements."page_contains_at_least_n_layout_elements": "10"
page_contains_at_most_n_layout_elementsActivates when the page contains fewer than a specified number of layout elements."page_contains_at_most_n_layout_elements": "5"

Numeric Content Triggers

TriggerDescriptionExample
page_contains_at_least_n_percent_numbersActivates when more than a specified percentage of words in the page are numbers. Numbers with punctuation (like "1,000.50") are correctly identified."page_contains_at_least_n_percent_numbers": "30"
page_contains_at_most_n_percent_numbersActivates when less than a specified percentage of words in the page are numbers. Numbers with punctuation are correctly identified."page_contains_at_most_n_percent_numbers": "10"

Layout-Based Triggers

TriggerDescriptionExample
layout_element_in_pageActivates when the page contains a specific layout element type."layout_element_in_page": "table"
layout_element_in_page_confidence_thresholdSpecifies the minimum confidence level for the layout_element_in_page trigger."layout_element_in_page_confidence_threshold": "0.8"

Usage Example

Here's an example of how to use these triggers in an auto_mode_configuration_json:

[
{
"parsing_conf": {
"user_prompt": "Extract all tabular data into a structured format",
},
"table_in_page": true
},
{
"parsing_conf": {
"user_prompt": "Summarize the executive summary section",
},
"text_in_page": "Executive Summary"
},
{
"parsing_conf": {
"user_prompt": "Extract financial figures from this numbers-heavy page",
},
"page_contains_at_least_n_percent_numbers": "25"
}
]

Notes

  • Multiple triggers can be specified in a single configuration object. All specified conditions must be met for the parsing configuration to be applied.
  • Values for numeric thresholds should be provided as strings, as shown in the examples.
  • Regular expressions should use proper escaping as shown in the examples.
  • When a page matches multiple configurations, only the first matching configuration in the array will be applied.