Migration Guide: Parse Upload Endpoint v1 to v2
This guide will help you migrate from the v1 Parse upload endpoint to the new v2 endpoint, which introduces a structured configuration approach and improved organization of parsing options.
⚠️ Alpha Version Warning: The v2 endpoint is currently in alpha (
v2alpha1
) and is subject to breaking changes until the stable release. We recommend testing thoroughly and being prepared for potential API changes during development.
Overview of Changes
The v2 endpoint replaces individual form parameters with a single JSON configuration string, providing:
- Better organization: Related options are grouped into logical sections
- Type safety: Structured validation with clear schemas
- Parse mode separation: Only relevant options for your chosen parse mode are required
- Extensibility: Easier to add new features without endpoint bloat
- Validation: Better error messages and configuration validation
Key Differences
v1 Endpoint
POST /api/v1/parsing/upload
Content-Type: multipart/form-data
- 70+ individual form parameters
- Flat parameter structure
- All parameters available regardless of parse mode
v2 Endpoint
POST /api/v2alpha1/parse/upload
Content-Type: multipart/form-data
- Single 'configuration' JSON string parameter
- Hierarchical, structured configuration
- Parse mode-specific options
- Strict validation with clear error messages
Migration Steps
1. Update the Endpoint URL
Before (v1):
POST https://api.cloud.llamaindex.ai/api/v1/parsing/upload
After (v2):
POST https://api.cloud.llamaindex.ai/api/v2alpha1/parse/upload
2. Replace Form Parameters with Configuration JSON
Instead of sending individual form parameters, you now send a single configuration
parameter containing a JSON string.
Note: The
file
parameter remains unchanged - you still upload files the same way using multipart form data. Only the configuration approach has changed.
3. Migration Checklist
Before migrating, review this checklist:
- Check for always-enabled parameters:
adaptive_long_table
,high_res_ocr
,merge_tables_across_pages_in_markdown
,outlined_table_extraction
are always enabled in v2 - Update page indexing: Change
target_pages
from 0-based to 1-based indexing - Replace deprecated parameters: Remove
gpt4o_mode
,premium_mode
,fast_mode
, etc. - Move language parameter: Move
language
to parse mode specificocr_parameters
- Update cache parameters: Replace
invalidate_cache
+do_not_cache
with singledisable_cache
- Convert webhooks: Change from single
webhook_url
towebhook_configurations
array - Update prompts: Move prompt parameters to parse mode specific sections
- Test thoroughly: The alpha API may have additional breaking changes
Configuration Structure
The v2 configuration follows this structure:
{
"client_name": "string (optional)",
"parse_options": {
"parse_mode": "preset|parse_with_llm|parse_with_agent|etc.",
// Mode-specific options (see examples below)
},
"source_url": {
"url": "string (optional)",
"http_proxy": "string (optional)"
},
"webhook_configurations": [...],
"input_options": {...},
"crop_box": {...},
"page_ranges": {...},
"disable_cache": "boolean (optional)",
"output_options": {...},
"processing_control": {...}
}
Parse Mode Options
Important: You can only include the sub-object that corresponds to your chosen parse mode. For example, if you choose parse_mode: "preset"
, you can only include preset_options
, not parse_with_llm_options
.
Preset Mode
v1 Example:
curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F "preset=scientific" \
-F "language=en,es" \
"https://api.cloud.llamaindex.ai/api/v1/parsing/upload"
v2 Example:
curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F 'configuration={
"parse_options": {
"parse_mode": "preset",
"preset_options": {
"preset": "scientific",
"ocr_parameters": {
"languages": ["en", "es"]
}
}
}
}' \
"https://api.cloud.llamaindex.ai/api/v2alpha1/parse/upload"
Parse with LLM Mode
v1 Example:
curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F "parse_mode=parse_page_with_llm" \
-F "model=gpt-4o" \
-F "user_prompt=Extract key information" \
-F "disable_ocr=true" \
"https://api.cloud.llamaindex.ai/api/v1/parsing/upload"
v2 Example:
curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F 'configuration={
"parse_options": {
"parse_mode": "parse_with_llm",
"parse_with_llm_options": {
"model": "gpt-4o",
"prompts": {
"user_prompt": "Extract key information"
},
"ignore": {
"ignore_text_in_image": true
}
}
}
}' \
"https://api.cloud.llamaindex.ai/api/v2alpha1/parse/upload"
External Provider Mode (Azure OpenAI)
v1 Example:
curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F "parse_mode=parse_page_with_lvm" \
-F "azure_openai_endpoint=https://myresource.openai.azure.com/" \
-F "azure_openai_deployment_name=gpt-4-vision" \
-F "azure_openai_key=your-key" \
-F "azure_openai_api_version=2024-02-01" \
"https://api.cloud.llamaindex.ai/api/v1/parsing/upload"
v2 Example:
curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F 'configuration={
"parse_options": {
"parse_mode": "parse_with_external_provider",
"parse_with_external_provider_options": {
"azure_openai": {
"endpoint": "https://myresource.openai.azure.com/",
"deployment_name": "gpt-4-vision",
"api_key": "your-key",
"api_version": "2024-02-01"
}
}
}
}' \
"https://api.cloud.llamaindex.ai/api/v2alpha1/parse/upload"
Parameter Mapping Reference
Basic Options
v1 Parameter | v2 Location | Notes |
---|---|---|
input_url | source_url.url | Moved to structured source configuration |
http_proxy | source_url.http_proxy | Same functionality |
max_pages | page_ranges.max_pages | Same functionality |
target_pages | page_ranges.target_pages | Breaking change: Now uses 1-based indexing (user inputs "1,2,3" instead of "0,1,2") |
invalidate_cache and do_not_cache | disable_cache | Breaking change: Single boolean combines both v1 parameters |
language | parse_options.{mode}_options.ocr_parameters.languages | Same functionality |
Important: In v1,
target_pages
used 0-based indexing (e.g., "0,1,2" for pages 1, 2, 3). In v2, it uses 1-based indexing (e.g., "1,2,3" for the same pages) to be homogenous with the rest of the platform.
Always Enabled in v2 (Breaking Changes)
The following parameters are always enabled in v2 and cannot be disabled. We're doing this to simplify calling LlamaParse and because these options give better results:
v1 Parameter | v2 Behavior | Breaking Change |
---|---|---|
adaptive_long_table | Always true | Breaking: Cannot be disabled in v2 |
high_res_ocr | Always true | Breaking: Cannot be disabled in v2 |
merge_tables_across_pages_in_markdown | Always true | Breaking: Cannot be disabled in v2 |
outlined_table_extraction | Always true | Breaking: Cannot be disabled in v2 |
Removed/Deprecated Parameters
The following v1 parameters are not supported in v2:
v1 Parameter | v2 Status | Migration Path |
---|---|---|
use_vendor_multimodal_model | Removed (was deprecated) | Use parse_mode: "parse_with_external_provider" instead |
gpt4o_mode | Removed | Use parse_mode: "parse_with_llm" with model: "gpt-4o" |
gpt4o_api_key | Removed | Use parse_mode: "parse_with_external_provider" with appropriate provider config |
premium_mode | Removed | Use appropriate parse mode instead |
fast_mode | Removed | Use parse_mode: "parse_without_ai" for faster processing |
continuous_mode | Removed | No direct equivalent |
parsing_instruction | Renamed | Use parse_options.{mode}_options.prompts.user_prompt |
formatting_instruction | Renamed | Use parse_options.{mode}_options.prompts.user_prompt |
system_prompt | Renamed | Use parse_options.{mode}_options.prompts.system_prompt_append |
bounding_box | Renamed | Use crop_box object instead |
input_s3_path and input_s3_region | Removed | Not supported in v2alpha1 |
output_s3_path_prefix and output_s3_region | Removed | Not supported in v2alpha1 |
Webhook Configuration Breaking Changes
v1 Parameter | v2 Location | Notes |
---|---|---|
webhook_url | webhook_configurations[0].webhook_url | Breaking: Now an array, but only first entry is used at the moment |
webhook_configurations (string) | webhook_configurations (array) | Breaking: Format changed from JSON string to structured array |
Not Yet Implemented in v2
The following options exist in the v2 schema but are not yet implemented:
ignore_strikethrough_text
(exists in schema but not processed)input_options.pdf.password
(placeholder for future implementation)
Crop Box Options
v1 Parameter | v2 Location |
---|---|
bbox_top | crop_box.top |
bbox_bottom | crop_box.bottom |
bbox_left | crop_box.left |
bbox_right | crop_box.right |
Input Format Options
v1 Parameter | v2 Location |
---|---|
html_make_all_elements_visible | input_options.html.make_all_elements_visible |
html_remove_fixed_elements | input_options.html.remove_fixed_elements |
html_remove_navigation_elements | input_options.html.remove_navigation_elements |
disable_image_extraction | input_options.pdf.disable_image_extraction |
spreadsheet_extract_sub_tables | input_options.spreadsheet.detect_sub_tables_in_sheets |
Ignore Options (Parse Mode Specific)
v1 Parameter | v2 Location | Available In Modes |
---|---|---|
skip_diagonal_text | parse_options.{mode}_options.ignore.ignore_diagonal_text | All modes except preset |
disable_ocr | parse_options.{mode}_options.ignore.ignore_text_in_image | All modes except preset |
Output Options
v1 Parameter | v2 Location |
---|---|
annotate_links | output_options.markdown.annotate_links |
page_prefix | output_options.markdown.pages.prefix |
page_separator | output_options.markdown.pages.custom_page_separator |
page_suffix | output_options.markdown.pages.suffix |
hide_headers | output_options.markdown.headers_footers.hide_headers |
hide_footers | output_options.markdown.headers_footers.hide_footers |
compact_markdown_table | output_options.markdown.tables.compact_markdown_tables |
output_tables_as_HTML | output_options.markdown.tables.output_tables_as_markdown (inverted) |
guess_xlsx_sheet_name | output_options.tables_as_spreadsheet.guess_sheet_name |
extract_layout | output_options.extract_layout.enable |
take_screenshot | output_options.screenshots.enable |
output_pdf_of_document | output_options.export_pdf.enable |
Processing Control
v1 Parameter | v2 Location |
---|---|
job_timeout_in_seconds | processing_control.timeouts.base_in_seconds |
job_timeout_extra_time_per_page_in_seconds | processing_control.timeouts.extra_time_per_page_in_seconds |
page_error_tolerance | processing_control.job_failure_conditions.allowed_page_failure_ratio |
Complete Migration Examples
Simple Document Parsing
v1:
curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F "parse_mode=parse_page_with_agent" \
"https://api.cloud.llamaindex.ai/api/v1/parsing/upload"
v2:
curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F 'configuration={
"parse_options": {
"parse_mode": "parse_with_agent",
"parse_with_agent_options": {}
}
}
}' \
"https://api.cloud.llamaindex.ai/api/v2alpha1/parse/upload"
Complex Configuration with Custom Output
v1:
curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F "parse_mode=parse_page_with_llm" \
-F "model=gpt-4o" \
-F "user_prompt=Extract financial data" \
-F "max_pages=10" \
-F "page_prefix=## Page " \
-F "hide_headers=true" \
-F "extract_layout=true" \
-F "webhook_url=https://example.com/webhook" \
"https://api.cloud.llamaindex.ai/api/v1/parsing/upload"
v2:
curl -X POST \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-F "file=@document.pdf" \
-F 'configuration={
"parse_options": {
"parse_mode": "parse_with_llm",
"parse_with_llm_options": {
"model": "gpt-4o",
"prompts": {
"user_prompt": "Extract financial data"
}
}
},
"page_ranges": {
"max_pages": 10
},
"output_options": {
"markdown": {
"pages": {
"prefix": "## Page "
},
"headers_footers": {
"hide_headers": true
}
},
"extract_layout": {
"enable": true
}
},
"webhook_configurations": [{
"webhook_url": "https://example.com/webhook",
"webhook_events": ["parse.done"]
}]
}' \
"https://api.cloud.llamaindex.ai/api/v2alpha1/parse/upload"
Python SDK Migration
v1 (llama-parse):
from llama_cloud_services import LlamaParse
parser = LlamaParse(
api_key="llx-...",
result_type="markdown",
parsing_instruction="Extract key information",
max_pages=10
)
result = parser.load_data("document.pdf")
v2 (llama-cloud-services):
from llama_cloud_services import LlamaParse
# Simple preset usage
parser = LlamaParse(
api_key="llx-...",
preset="scientific",
max_pages=10
)
result = parser.parse("document.pdf")
# Advanced configuration
config = {
"parse_options": {
"parse_mode": "parse_with_llm",
"parse_with_llm_options": {
"prompts": {
"user_prompt": "Extract key information"
}
}
},
"page_ranges": {"max_pages": 10}
}
parser = LlamaParse(api_key="llx-...", configuration=config)
result = parser.parse("document.pdf")
Error Handling
v2 provides more detailed error messages:
v1 Errors:
400: Invalid parameter combination
v2 Errors:
{
"detail": [
{
"type": "value_error",
"loc": ["parse_options", "parse_with_llm_options"],
"msg": "parse_with_llm_options can only be used with parse_mode 'parse_with_llm'",
"input": {...}
}
]
}
The v1 endpoint will remain available for the foreseeable future, so you can migrate at your own pace. However, new features and improvements will be focused on the v2 endpoint structure.