Advance Parsing modes
LlamaParse leverage Large Language Models (LLM) and Large Vision Models (LVM) to parse documents. By setting parse_mode
it is possible to control of the parsing method used.
Parse without LLM​
Used by setting parse_mode="parse_page_without_llm"
on the API.
Equivalent to setting fast_mode=True
. In this mode LlamaParse will not use LLM or LVM to parse the document. Only a layered text will be output. This mode do not return markdown.
By default this mode extract image from the document and OCR them.
If faster result are required it is possible to disable image extraction (by setting disable_image_extraction=True
) and OCR (by setting disable_ocr=True
).
parser = LlamaParse(Using the API:
  parse_mode="parse_page_without_llm"
)
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'parse_mode="parse_page_without_llm"' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'
Parse page with LLM​
Used by setting parse_mode="parse_page_with_llm"
on the API.
Our default mode (equivalent to balanced mode). In this mode LlamaParse will first extract a layered text (output as text
), then feed it to a Large Language Model for reconstruction of the page structure (output as markdown
).
This mode feed the document page by page to the model.
This offer a good quality / cost ballance (LLM are cheaper to run than LVM).
The model used is non configurable.
In Python:parser = LlamaParse(Using the API:
  parse_mode="parse_page_with_llm"
)
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'parse_mode="parse_page_with_llm"' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'
Parse document with LLM​
Used by setting parse_mode="parse_document_with_llm"
on the API.
Same as parse_page_with_llm
, but instead feed the document in full to the model, leading to better coherence in headings / multipage tables.
parser = LlamaParse(Using the API:
  parse_mode="parse_document_with_llm"
)
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'parse_mode="parse_document_with_llm"' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'
Parse page with LVM​
Used by setting parse_mode="parse_page_with_lvm"
on the API.
Equivalent to use_vendor_multimodal_model=True
.
In this mode LlamaParse will take screenshot of each pages of the document and feed them to a LVM for reconstruction.
By default this mode use openai-gpt4o
as a model. This can be setup using vendor_multimodal_model_name=<model_name>
.
Available models are :
Model | Model string | Price |
---|---|---|
Open AI Gpt4o (Default) | openai-gpt4o | 10 credits per page (3c/page) |
Open AI Gpt4o Mini | openai-gpt-4o-mini | 5 credits per page (1.5c/page) |
Sonnet 3.5 | anthropic-sonnet-3.5 | 20 credits per page (6c/page) |
Sonnet 3.5 | anthropic-sonnet-3.7 | 20 credits per page (6c/page) |
Gemini 2.0 Flash 001 | gemini-2.0-flash-001 | 5 credits per page (1.5c/page) |
Gemini 1.5 Flash | gemini-1.5-flash | 5 credits per page (1.5c/page) |
Gemini 1.5 Pro | gemini-1.5-pro | 10 credits per page (3c/page) |
Custom Azure Model | custom-azure-model | N/A |
See Multimodal for more options / details.
In Python:parser = LlamaParse(Using the API:
  parse_mode="parse_page_with_lvm"
)
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'parse_mode="parse_page_with_lvm"' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'
Parse page with Agent​
Used by setting parse_mode="parse_page_with_agent"
on the API.
Equivalent to premium_mode=True
.
Our most accurate mode. In this mode LlamaParse will first extract a layered text (output as text
) and take a screenshot of each page of a document.
Then it will use an agentic process to feed it to a Large Language Model / Large vision mode for reconstruction of the page structure (output as markdown
).
This mode feed the document page by page to the model(s).
The model used are non configurable.
In Python:parser = LlamaParse(Using the API:
  parse_mode="parse_page_with_agent"
)
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'parse_mode="parse_page_with_agent"' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'