Skip to main content

Multimodal Parsing

You can use a Vendor multimodal model to handle document extraction. This is more expensive than regular parsing but can get better results for some documents.

Supported models are

ModelModel stringPrice
Open AI Gpt4o (Default)openai-gpt4o10 credits per page (3c/page)
Open AI Gpt4o Miniopenai-gpt-4o-mini5 credits per page (1.5c/page)
Sonnet 3.5anthropic-sonnet-3.520 credits per page (6c/page)
Gemini 1.5 Flashgemini-1.5-flash5 credits per page (1.5c/page)
Gemini 1.5 Progemini-1.5-pro10 credits per page (3c/page)

When using this mode, LlamaParse's regular parsing is bypassed and instead the following process is used:

  • A screenshot of every page of your document is taken
  • Each page screenshot is sent to the multimodal with instruction to extract as markdown
  • The resulting markdown of each page is consolidated into the final result.

Using Mutlimodal mode

To use the multimodal mode, set use_vendor_multimodal_model to True. You can then select which model to use ny setting vendor_multimodal_model_name to the model you want to target (eg: anthropic-sonnet-3.5).

In Python:
parser = LlamaParse(
  use_vendor_multimodal_model=True
  vendor_multimodal_model_name="anthropic-sonnet-3.5"
)
Using the API:
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'use_vendor_multimodal_model=True' \
  --form 'vendor_multimodal_model_name="anthropic-sonnet-3.5"' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'

Bring your own LLM key (Optional)

When using To use the multimodal mode, you can supply your own vendor key to parse the document. If you choose to do so LlamaParse will only charge you 1 credit (0.3c) per page.

Using your own API key will incur some price from your model provider, and could led to fail page/document if you do not have high usage limits.

To use your own API key set the parameter vendor_multimodal_api_key to your own key value. In Python:

parser = LlamaParse(
  use_vendor_multimodal_model=True
  vendor_multimodal_model_name="openai-gpt4o"
  vendor_multimodal_api_key=sk-proj-xxxxxx
)

Using the API:
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'use_vendor_multimodal_model=True' \
  --form 'vendor_multimodal_model_name="openai-gpt4o' \
  --form 'vendor_multimodal_api_key=sk-proj-xxxxxx' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'

[Deprecated] GPt4-o mode

By Setting gpt4o_mode to True LlamaParse will use openAI GPT4-o to do the document reconstruction. This is still working, but we recommend using use_vendor_multimodal_model to True and vendor_multimodal_model_name to openai-gpt4o instead.

The parameter gpt4o_api_key is still working but we recommend using the parameter vendor_multimodal_api_key instead.