Multimodal Parsing
You can use a Vendor multimodal model to handle document extraction. This is more expensive than regular parsing but can get better results for some documents.
Supported models are
Model | Model string | Price |
---|---|---|
Open AI Gpt4o (Default) | openai-gpt4o | 10 credits per page (3c/page) |
Open AI Gpt4o Mini | openai-gpt-4o-mini | 5 credits per page (1.5c/page) |
Sonnet 3.5 | anthropic-sonnet-3.5 | 20 credits per page (6c/page) |
Gemini 1.5 Flash | gemini-1.5-flash | 5 credits per page (1.5c/page) |
Gemini 1.5 Pro | gemini-1.5-pro | 10 credits per page (3c/page) |
Custom Azure Model | custom-azure-model | N/A |
When using this mode, LlamaParse's regular parsing is bypassed and instead the following process is used:
- A screenshot of every page of your document is taken
- Each page screenshot is sent to the multimodal with instruction to extract as
markdown
- The resulting markdown of each page is consolidated into the final result.
Using Multimodal mode​
To use the multimodal mode, set use_vendor_multimodal_model
to True
. You can then select which model to use ny setting vendor_multimodal_model_name
to the model you want to target (eg: anthropic-sonnet-3.5
).
parser = LlamaParse(Using the API:
  use_vendor_multimodal_model=True
  vendor_multimodal_model_name="anthropic-sonnet-3.5"
)
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'use_vendor_multimodal_model=True' \
  --form 'vendor_multimodal_model_name="anthropic-sonnet-3.5"' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'
Bring your own LLM key (Optional)​
When using To use the multimodal mode, you can supply your own vendor key to parse the document. If you choose to do so LlamaParse will only charge you 1 credit (0.3c) per page.
Using your own API key will incur some price from your model provider, and could led to fail page/document if you do not have high usage limits.
To use your own API key set the parameter vendor_multimodal_api_key
to your own key value.
In Python:
parser = LlamaParse(Using the API:
  use_vendor_multimodal_model=True
  vendor_multimodal_model_name="openai-gpt4o"
  vendor_multimodal_api_key="sk-proj-xxxxxx"
)
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'use_vendor_multimodal_model="true"' \
  --form 'vendor_multimodal_model_name="openai-gpt4o"' \
  --form 'vendor_multimodal_api_key="sk-proj-xxxxxx"' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'
Custom Azure Model​
You also have the possibility to use your own Azure Model Deployment using the following parameters:
In Python:parser = LlamaParse(Using the API:
  use_vendor_multimodal_model=True
  azure_openai_deployment_name="llamaparse-gpt-4o"
  azure_openai_endpoint="https://<org>.openai.azure.com/openai/deployments/<dep>/chat/completions?api-version=<ver>"
  azure_openai_api_version="2024-02-15-preview"
  azure_openai_key="xxx"
)
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'use_vendor_multimodal_model="true"' \
  --form 'azure_openai_deployment_name="llamaparse-gpt-4o"' \
  --form 'azure_openai_endpoint="https://<org>.openai.azure.com/openai/deployments/<dep>/chat/completions?api-version=<ver>"' \
  --form 'azure_openai_api_version="2024-02-15-preview"' \
  --form 'azure_openai_key="xxx"' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'
[Deprecated] GPt4-o mode​
By Setting gpt4o_mode
to True
LlamaParse will use openAI GPT4-o to do the document reconstruction. This is still working, but we recommend using use_vendor_multimodal_model
to True
and vendor_multimodal_model_name
to openai-gpt4o
instead.
The parameter gpt4o_api_key
is still working but we recommend using the parameter vendor_multimodal_api_key
instead.