Skip to main content

Run Job With Parsed File Test

POST 

/api/v1/extractionv2/jobs/parsed/test

Run Job With Parsed File Test

Request

Cookie Parameters

    session any

Body

required

    job_create

    object

    required

    Schema for creating an extraction job.

    extraction_agent_id uuidrequired

    The id of the extraction agent

    file_id uuidrequired

    The id of the file

    data_schema_override

    object

    The data schema to override the extraction agent's data schema with

    anyOf

    property name*

    object

    anyOf

    object

    config_override

    object

    The config to override the extraction agent's config with

    anyOf

    Additional parameters for the extraction agent.

    extraction_mode ExtractMode (string)

    The extraction mode specified.

    Possible values: [PER_DOC, PER_PAGE]

    Default value: PER_DOC
    handle_missing Handle Missing (boolean)

    Whether to handle missing fields in the schema.

    Default value: false

    system_prompt

    object

    The system prompt to use for the extraction.

    anyOf

    string

    extract_settings

    object

    All settings for the extraction agent. Only the settings in ExtractConfig are exposed to the user.

    model Model (string)

    The model to use for the extraction.

    Default value: gpt-4o
    temperature Temperature (number)

    The temperature to use for the extraction.

    Default value: 0
    max_file_size Max File Size (integer)

    The maximum file size (in bytes) allowed for the document.

    Default value: 5242880
    max_num_pages Max Num Pages (integer)

    The maximum number of pages allowed for the document.

    Default value: 30
    extraction_prompt Extraction Prompt (string)

    The prompt to use for the extraction.

    Default value: The extracted data using the given JSON schema.
    error_handling_prompt Error Handling Prompt (string)

    The prompt to use for error handling.

    Default value: If the text does not contain enough information to comply with the schema, explain the reason. Else, output null and fill out the 'extracted' field.

    llama_parse_params

    object

    Settings that can be configured for how to use LlamaParse to parse files within a LlamaCloud pipeline.

    languages ParserLanguages (string)[]

    Possible values: [af, az, bs, cs, cy, da, de, en, es, et, fr, ga, hr, hu, id, is, it, ku, la, lt, lv, mi, ms, mt, nl, no, oc, pi, pl, pt, ro, rs_latin, sk, sl, sq, sv, sw, tl, tr, uz, vi, ar, fa, ug, ur, bn, as, mni, ru, rs_cyrillic, be, bg, uk, mn, abq, ady, kbd, ava, dar, inh, che, lbe, lez, tab, tjk, hi, mr, ne, bh, mai, ang, bho, mah, sck, new, gom, sa, bgc, th, ch_sim, ch_tra, ja, ko, ta, te, kn], >= 1

    parsing_instruction Parsing Instruction (string)
    Default value:
    disable_ocr Disable Ocr (boolean)
    Default value: false
    annotate_links Annotate Links (boolean)
    Default value: false
    disable_reconstruction Disable Reconstruction (boolean)
    Default value: false
    disable_image_extraction Disable Image Extraction (boolean)
    Default value: false
    invalidate_cache Invalidate Cache (boolean)
    Default value: false
    output_pdf_of_document Output Pdf Of Document (boolean)
    Default value: false
    do_not_cache Do Not Cache (boolean)
    Default value: false
    fast_mode Fast Mode (boolean)
    Default value: false
    skip_diagonal_text Skip Diagonal Text (boolean)
    Default value: false
    gpt4o_mode Gpt4O Mode (boolean)
    Default value: false
    gpt4o_api_key Gpt4O Api Key (string)
    Default value:
    do_not_unroll_columns Do Not Unroll Columns (boolean)
    Default value: false
    extract_layout Extract Layout (boolean)
    Default value: false
    html_make_all_elements_visible Html Make All Elements Visible (boolean)
    Default value: false
    html_remove_navigation_elements Html Remove Navigation Elements (boolean)
    Default value: false
    html_remove_fixed_elements Html Remove Fixed Elements (boolean)
    Default value: false
    guess_xlsx_sheet_name Guess Xlsx Sheet Name (boolean)
    Default value: false

    page_separator

    object

    anyOf

    string

    bounding_box Bounding Box (string)
    Default value:

    bbox_top

    object

    anyOf

    number

    bbox_right

    object

    anyOf

    number

    bbox_bottom

    object

    anyOf

    number

    bbox_left

    object

    anyOf

    number

    target_pages Target Pages (string)
    Default value:
    use_vendor_multimodal_model Use Vendor Multimodal Model (boolean)
    Default value: false
    vendor_multimodal_model_name Vendor Multimodal Model Name (string)
    Default value:
    vendor_multimodal_api_key Vendor Multimodal Api Key (string)
    Default value:
    page_prefix Page Prefix (string)
    Default value:
    page_suffix Page Suffix (string)
    Default value:
    webhook_url Webhook Url (string)
    Default value:
    take_screenshot Take Screenshot (boolean)
    Default value: false
    is_formatting_instruction Is Formatting Instruction (boolean)
    Default value: true
    premium_mode Premium Mode (boolean)
    Default value: false
    continuous_mode Continuous Mode (boolean)
    Default value: false
    s3_input_path S3 Input Path (string)
    Default value:
    input_s3_region Input S3 Region (string)
    Default value:
    s3_output_path_prefix S3 Output Path Prefix (string)
    Default value:
    output_s3_region Output S3 Region (string)
    Default value:

    project_id

    object

    anyOf

    string

    azure_openai_deployment_name

    object

    anyOf

    string

    azure_openai_endpoint

    object

    anyOf

    string

    azure_openai_api_version

    object

    anyOf

    string

    azure_openai_key

    object

    anyOf

    string

    input_url

    object

    anyOf

    string

    http_proxy

    object

    anyOf

    string

    auto_mode Auto Mode (boolean)
    Default value: false

    auto_mode_trigger_on_regexp_in_page

    object

    anyOf

    string

    auto_mode_trigger_on_text_in_page

    object

    anyOf

    string

    auto_mode_trigger_on_table_in_page Auto Mode Trigger On Table In Page (boolean)
    Default value: false
    auto_mode_trigger_on_image_in_page Auto Mode Trigger On Image In Page (boolean)
    Default value: false
    structured_output Structured Output (boolean)
    Default value: false

    structured_output_json_schema

    object

    anyOf

    string

    structured_output_json_schema_name

    object

    anyOf

    string

    max_pages

    object

    anyOf

    integer

    max_pages_enforced

    object

    anyOf

    integer

    extract_charts Extract Charts (boolean)
    Default value: false

    formatting_instruction

    object

    anyOf

    string

    complemental_formatting_instruction

    object

    anyOf

    string

    content_guideline_instruction

    object

    anyOf

    string

    spreadsheet_extract_sub_tables Spreadsheet Extract Sub Tables (boolean)
    Default value: false

    job_timeout_in_seconds

    object

    anyOf

    number

    job_timeout_extra_time_per_page_in_seconds

    object

    anyOf

    number

    strict_mode_image_extraction Strict Mode Image Extraction (boolean)
    Default value: false
    strict_mode_image_ocr Strict Mode Image Ocr (boolean)
    Default value: false
    strict_mode_reconstruction Strict Mode Reconstruction (boolean)
    Default value: false
    strict_mode_buggy_font Strict Mode Buggy Font (boolean)
    Default value: false

Responses

Successful Response

Schema

    run_id uuidrequired

    The id of the extraction run

    extraction_agent_id uuidrequired

    The id of the extraction agent

    data

    object

    required

    The data extracted from the file

    anyOf

    property name*

    object

    anyOf

    object

    extraction_metadata

    object

    required

    The metadata extracted from the file

    property name*

    object

    anyOf

    object