Skip to main content

Migration Guide: Parse Upload Endpoint v1 to v2

This guide will help you migrate from the v1 Parse upload endpoint to the new v2 endpoint, which introduces a structured configuration approach and improved organization of parsing options.

⚠️ Alpha Version Warning: The v2 endpoint is currently in alpha (v2alpha1) and is subject to breaking changes until the stable release. We recommend testing thoroughly and being prepared for potential API changes during development.

Overview of Changes

The v2 endpoint replaces individual form parameters with a single JSON configuration string, providing:

  • Better organization: Related options are grouped into logical sections
  • Type safety: Structured validation with clear schemas
  • Parse mode separation: Only relevant options for your chosen parse mode are required
  • Extensibility: Easier to add new features without endpoint bloat
  • Validation: Better error messages and configuration validation

Key Differences

v1 Endpoint

POST /api/v1/parsing/upload
Content-Type: multipart/form-data

- 70+ individual form parameters
- Flat parameter structure
- All parameters available regardless of parse mode

v2 Endpoint

POST /api/v2alpha1/parse/upload
Content-Type: multipart/form-data

- Single 'configuration' JSON string parameter
- Hierarchical, structured configuration
- Parse mode-specific options
- Strict validation with clear error messages

Migration Steps

1. Update the Endpoint URL

Before (v1):

POST https://api.cloud.llamaindex.ai/api/v1/parsing/upload

After (v2):

POST https://api.cloud.llamaindex.ai/api/v2alpha1/parse/upload

2. Replace Form Parameters with Configuration JSON

Instead of sending individual form parameters, you now send a single configuration parameter containing a JSON string.

Note: The file parameter remains unchanged - you still upload files the same way using multipart form data. Only the configuration approach has changed.

3. Migration Checklist

Before migrating, review this checklist:

  • Check for always-enabled parameters: adaptive_long_table, high_res_ocr, merge_tables_across_pages_in_markdown, outlined_table_extraction are always enabled in v2
  • Update page indexing: Change target_pages from 0-based to 1-based indexing
  • Replace deprecated parameters: Remove gpt4o_mode, premium_mode, fast_mode, etc.
  • Move language parameter: Move language to parse mode specific ocr_parameters
  • Update cache parameters: Replace invalidate_cache + do_not_cache with single disable_cache
  • Convert webhooks: Change from single webhook_url to webhook_configurations array
  • Update prompts: Move prompt parameters to parse mode specific sections
  • Test thoroughly: The alpha API may have additional breaking changes

Configuration Structure

The v2 configuration follows this structure:

{
"client_name": "string (optional)",
"parse_options": {
"parse_mode": "preset|parse_with_llm|parse_with_agent|etc.",
// Mode-specific options (see examples below)
},
"source_url": {
"url": "string (optional)",
"http_proxy": "string (optional)"
},
"webhook_configurations": [...],
"input_options": {...},
"crop_box": {...},
"page_ranges": {...},
"disable_cache": "boolean (optional)",
"output_options": {...},
"processing_control": {...}
}

Parse Mode Options

Important: You can only include the sub-object that corresponds to your chosen parse mode. For example, if you choose parse_mode: "preset", you can only include preset_options, not parse_with_llm_options.

Preset Mode

v1 Example:

curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F "preset=scientific" \
-F "language=en,es" \
"https://api.cloud.llamaindex.ai/api/v1/parsing/upload"

v2 Example:

curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F 'configuration={
"parse_options": {
"parse_mode": "preset",
"preset_options": {
"preset": "scientific",
"ocr_parameters": {
"languages": ["en", "es"]
}
}
}
}' \
"https://api.cloud.llamaindex.ai/api/v2alpha1/parse/upload"

Parse with LLM Mode

v1 Example:

curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F "parse_mode=parse_page_with_llm" \
-F "model=gpt-4o" \
-F "user_prompt=Extract key information" \
-F "disable_ocr=true" \
"https://api.cloud.llamaindex.ai/api/v1/parsing/upload"

v2 Example:

curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F 'configuration={
"parse_options": {
"parse_mode": "parse_with_llm",
"parse_with_llm_options": {
"model": "gpt-4o",
"prompts": {
"user_prompt": "Extract key information"
},
"ignore": {
"ignore_text_in_image": true
}
}
}
}' \
"https://api.cloud.llamaindex.ai/api/v2alpha1/parse/upload"

External Provider Mode (Azure OpenAI)

v1 Example:

curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F "parse_mode=parse_page_with_lvm" \
-F "azure_openai_endpoint=https://myresource.openai.azure.com/" \
-F "azure_openai_deployment_name=gpt-4-vision" \
-F "azure_openai_key=your-key" \
-F "azure_openai_api_version=2024-02-01" \
"https://api.cloud.llamaindex.ai/api/v1/parsing/upload"

v2 Example:

curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F 'configuration={
"parse_options": {
"parse_mode": "parse_with_external_provider",
"parse_with_external_provider_options": {
"azure_openai": {
"endpoint": "https://myresource.openai.azure.com/",
"deployment_name": "gpt-4-vision",
"api_key": "your-key",
"api_version": "2024-02-01"
}
}
}
}' \
"https://api.cloud.llamaindex.ai/api/v2alpha1/parse/upload"

Parameter Mapping Reference

Basic Options

v1 Parameterv2 LocationNotes
input_urlsource_url.urlMoved to structured source configuration
http_proxysource_url.http_proxySame functionality
max_pagespage_ranges.max_pagesSame functionality
target_pagespage_ranges.target_pagesBreaking change: Now uses 1-based indexing (user inputs "1,2,3" instead of "0,1,2")
invalidate_cache and do_not_cachedisable_cacheBreaking change: Single boolean combines both v1 parameters
languageparse_options.{mode}_options.ocr_parameters.languagesSame functionality

Important: In v1, target_pages used 0-based indexing (e.g., "0,1,2" for pages 1, 2, 3). In v2, it uses 1-based indexing (e.g., "1,2,3" for the same pages) to be homogenous with the rest of the platform.

Always Enabled in v2 (Breaking Changes)

The following parameters are always enabled in v2 and cannot be disabled. We're doing this to simplify calling LlamaParse and because these options give better results:

v1 Parameterv2 BehaviorBreaking Change
adaptive_long_tableAlways trueBreaking: Cannot be disabled in v2
high_res_ocrAlways trueBreaking: Cannot be disabled in v2
merge_tables_across_pages_in_markdownAlways trueBreaking: Cannot be disabled in v2
outlined_table_extractionAlways trueBreaking: Cannot be disabled in v2

Removed/Deprecated Parameters

The following v1 parameters are not supported in v2:

v1 Parameterv2 StatusMigration Path
use_vendor_multimodal_modelRemoved (was deprecated)Use parse_mode: "parse_with_external_provider" instead
gpt4o_modeRemovedUse parse_mode: "parse_with_llm" with model: "gpt-4o"
gpt4o_api_keyRemovedUse parse_mode: "parse_with_external_provider" with appropriate provider config
premium_modeRemovedUse appropriate parse mode instead
fast_modeRemovedUse parse_mode: "parse_without_ai" for faster processing
continuous_modeRemovedNo direct equivalent
parsing_instructionRenamedUse parse_options.{mode}_options.prompts.user_prompt
formatting_instructionRenamedUse parse_options.{mode}_options.prompts.user_prompt
system_promptRenamedUse parse_options.{mode}_options.prompts.system_prompt_append
bounding_boxRenamedUse crop_box object instead
input_s3_path and input_s3_regionRemovedNot supported in v2alpha1
output_s3_path_prefix and output_s3_regionRemovedNot supported in v2alpha1

Webhook Configuration Breaking Changes

v1 Parameterv2 LocationNotes
webhook_urlwebhook_configurations[0].webhook_urlBreaking: Now an array, but only first entry is used at the moment
webhook_configurations (string)webhook_configurations (array)Breaking: Format changed from JSON string to structured array

Not Yet Implemented in v2

The following options exist in the v2 schema but are not yet implemented:

  • ignore_strikethrough_text (exists in schema but not processed)
  • input_options.pdf.password (placeholder for future implementation)

Crop Box Options

v1 Parameterv2 Location
bbox_topcrop_box.top
bbox_bottomcrop_box.bottom
bbox_leftcrop_box.left
bbox_rightcrop_box.right

Input Format Options

v1 Parameterv2 Location
html_make_all_elements_visibleinput_options.html.make_all_elements_visible
html_remove_fixed_elementsinput_options.html.remove_fixed_elements
html_remove_navigation_elementsinput_options.html.remove_navigation_elements
disable_image_extractioninput_options.pdf.disable_image_extraction
spreadsheet_extract_sub_tablesinput_options.spreadsheet.detect_sub_tables_in_sheets

Ignore Options (Parse Mode Specific)

v1 Parameterv2 LocationAvailable In Modes
skip_diagonal_textparse_options.{mode}_options.ignore.ignore_diagonal_textAll modes except preset
disable_ocrparse_options.{mode}_options.ignore.ignore_text_in_imageAll modes except preset

Output Options

v1 Parameterv2 Location
annotate_linksoutput_options.markdown.annotate_links
page_prefixoutput_options.markdown.pages.prefix
page_separatoroutput_options.markdown.pages.custom_page_separator
page_suffixoutput_options.markdown.pages.suffix
hide_headersoutput_options.markdown.headers_footers.hide_headers
hide_footersoutput_options.markdown.headers_footers.hide_footers
compact_markdown_tableoutput_options.markdown.tables.compact_markdown_tables
output_tables_as_HTMLoutput_options.markdown.tables.output_tables_as_markdown (inverted)
guess_xlsx_sheet_nameoutput_options.tables_as_spreadsheet.guess_sheet_name
extract_layoutoutput_options.extract_layout.enable
take_screenshotoutput_options.screenshots.enable
output_pdf_of_documentoutput_options.export_pdf.enable

Processing Control

v1 Parameterv2 Location
job_timeout_in_secondsprocessing_control.timeouts.base_in_seconds
job_timeout_extra_time_per_page_in_secondsprocessing_control.timeouts.extra_time_per_page_in_seconds
page_error_toleranceprocessing_control.job_failure_conditions.allowed_page_failure_ratio

Complete Migration Examples

Simple Document Parsing

v1:

curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F "parse_mode=parse_page_with_agent" \
"https://api.cloud.llamaindex.ai/api/v1/parsing/upload"

v2:

curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F 'configuration={
"parse_options": {
"parse_mode": "parse_with_agent",
"parse_with_agent_options": {}
}
}
}' \
"https://api.cloud.llamaindex.ai/api/v2alpha1/parse/upload"

Complex Configuration with Custom Output

v1:

curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F "parse_mode=parse_page_with_llm" \
-F "model=gpt-4o" \
-F "user_prompt=Extract financial data" \
-F "max_pages=10" \
-F "page_prefix=## Page " \
-F "hide_headers=true" \
-F "extract_layout=true" \
-F "webhook_url=https://example.com/webhook" \
"https://api.cloud.llamaindex.ai/api/v1/parsing/upload"

v2:

curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F 'configuration={
"parse_options": {
"parse_mode": "parse_with_llm",
"parse_with_llm_options": {
"model": "gpt-4o",
"prompts": {
"user_prompt": "Extract financial data"
}
}
},
"page_ranges": {
"max_pages": 10
},
"output_options": {
"markdown": {
"pages": {
"prefix": "## Page "
},
"headers_footers": {
"hide_headers": true
}
},
"extract_layout": {
"enable": true
}
},
"webhook_configurations": [{
"webhook_url": "https://example.com/webhook",
"webhook_events": ["parse.done"]
}]
}' \
"https://api.cloud.llamaindex.ai/api/v2alpha1/parse/upload"

Python SDK Migration

v1 (llama-parse):

from llama_cloud_services import LlamaParse

parser = LlamaParse(
api_key="llx-...",
result_type="markdown",
parsing_instruction="Extract key information",
max_pages=10
)
result = parser.load_data("document.pdf")

v2 (llama-cloud-services):

from llama_cloud_services import LlamaParse

# Simple preset usage
parser = LlamaParse(
api_key="llx-...",
preset="scientific",
max_pages=10
)
result = parser.parse("document.pdf")

# Advanced configuration
config = {
"parse_options": {
"parse_mode": "parse_with_llm",
"parse_with_llm_options": {
"prompts": {
"user_prompt": "Extract key information"
}
}
},
"page_ranges": {"max_pages": 10}
}

parser = LlamaParse(api_key="llx-...", configuration=config)
result = parser.parse("document.pdf")

Error Handling

v2 provides more detailed error messages:

v1 Errors:

400: Invalid parameter combination

v2 Errors:

{
"detail": [
{
"type": "value_error",
"loc": ["parse_options", "parse_with_llm_options"],
"msg": "parse_with_llm_options can only be used with parse_mode 'parse_with_llm'",
"input": {...}
}
]
}

The v1 endpoint will remain available for the foreseeable future, so you can migrate at your own pace. However, new features and improvements will be focused on the v2 endpoint structure.