Multimodal Parsing
You can use a Vendor multimodal model to handle document extraction. This is more expensive than regular parsing but can get better results for some documents.
Supported models are models are here.
When using this mode, LlamaParse's regular parsing is bypassed and instead the following process is used:
- A screenshot of every page of your document is taken
- Each page screenshot is sent to the multimodal with instruction to extract as
markdown
- The resulting markdown of each page is consolidated into the final result.
Using Multimodal mode
To use the multimodal mode, set use_vendor_multimodal_model
to True
. You can then select which model to use ny setting vendor_multimodal_model_name
to the model you want to target (eg: anthropic-sonnet-3.5
).
- Python
- API
parser = LlamaParse(
use_vendor_multimodal_model=True,
vendor_multimodal_model_name="anthropic-sonnet-3.5"
)
curl -X 'POST' \
'https://api.cloud.llamaindex.ai/api/parsing/upload' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
--form 'use_vendor_multimodal_model=True' \
--form 'vendor_multimodal_model_name="anthropic-sonnet-3.5"' \
-F 'file=@/path/to/your/file.pdf;type=application/pdf'
Bring your own LLM key (Optional)
When using To use the multimodal mode, you can supply your own vendor key to parse the document. If you choose to do so LlamaParse will only charge you 1 credit (0.3c) per page.
Using your own API key will incur some price from your model provider, and could led to fail page/document if you do not have high usage limits.
To use your own API key set the parameter vendor_multimodal_api_key
to your own key value.
- Python
- API
parser = LlamaParse(
use_vendor_multimodal_model=True,
vendor_multimodal_model_name="openai-gpt4o",
vendor_multimodal_api_key="sk-proj-xxxxxx"
)
curl -X 'POST' \
'https://api.cloud.llamaindex.ai/api/parsing/upload' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
--form 'use_vendor_multimodal_model="true"' \
--form 'vendor_multimodal_model_name="openai-gpt4o"' \
--form 'vendor_multimodal_api_key="sk-proj-xxxxxx"' \
-F 'file=@/path/to/your/file.pdf;type=application/pdf'
Custom Azure Model
You also have the possibility to use your own Azure Model Deployment using the following parameters:
- Python
- API
parser = LlamaParse(
use_vendor_multimodal_model=True,
azure_openai_deployment_name="llamaparse-gpt-4o",
azure_openai_endpoint="https://<org>.openai.azure.com/openai/deployments/<dep>/chat/completions?api-version=<ver>",
azure_openai_api_version="2024-02-15-preview",
azure_openai_key="xxx"
)
curl -X 'POST' \
'https://api.cloud.llamaindex.ai/api/parsing/upload' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
--form 'use_vendor_multimodal_model="true"' \
--form 'azure_openai_deployment_name="llamaparse-gpt-4o"' \
--form 'azure_openai_endpoint="https://<org>.openai.azure.com/openai/deployments/<dep>/chat/completions?api-version=<ver>"' \
--form 'azure_openai_api_version="2024-02-15-preview"' \
--form 'azure_openai_key="xxx"' \
-F 'file=@/path/to/your/file.pdf;type=application/pdf'
GPt4-o mode (Deprecated)
By Setting gpt4o_mode
to True
LlamaParse will use openAI GPT4-o to do the document reconstruction. This is still working, but we recommend using use_vendor_multimodal_model
to True
and vendor_multimodal_model_name
to openai-gpt4o
instead.
The parameter gpt4o_api_key
is still working but we recommend using the parameter vendor_multimodal_api_key
instead.