Multimodal Parsing
You can use a Vendor multimodal model to handle document extraction. This is more expensive than regular parsing but can get better results for some documents.
Supported models are
Model | Model string | Price |
---|---|---|
Open AI Gpt4o (Default) | openai-gpt4o | 10 credits per page (3c/page) |
Open AI Gpt4o Mini | openai-gpt-4o-mini | 5 credits per page (1.5c/page) |
Sonnet 3.5 | anthropic-sonnet-3.5 | 20 credits per page (6c/page) |
Gemini 1.5 Flash | gemini-1.5-flash | 5 credits per page (1.5c/page) |
Gemini 1.5 Pro | gemini-1.5-pro | 10 credits per page (3c/page) |
Custom Azure Model | custom-azure-model | N/A |
When using this mode, LlamaParse's regular parsing is bypassed and instead the following process is used:
- A screenshot of every page of your document is taken
- Each page screenshot is sent to the multimodal with instruction to extract as
markdown
- The resulting markdown of each page is consolidated into the final result.
Using Mutlimodal mode
To use the multimodal mode, set use_vendor_multimodal_model
to True
. You can then select which model to use ny setting vendor_multimodal_model_name
to the model you want to target (eg: anthropic-sonnet-3.5
).
parser = LlamaParse(Using the API:
use_vendor_multimodal_model=True
vendor_multimodal_model_name="anthropic-sonnet-3.5"
)
curl -X 'POST' \
'https://api.cloud.llamaindex.ai/api/parsing/upload' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
--form 'use_vendor_multimodal_model=True' \
--form 'vendor_multimodal_model_name="anthropic-sonnet-3.5"' \
-F 'file=@/path/to/your/file.pdf;type=application/pdf'
Bring your own LLM key (Optional)
When using To use the multimodal mode, you can supply your own vendor key to parse the document. If you choose to do so LlamaParse will only charge you 1 credit (0.3c) per page.
Using your own API key will incur some price from your model provider, and could led to fail page/document if you do not have high usage limits.
To use your own API key set the parameter vendor_multimodal_api_key
to your own key value.
In Python:
parser = LlamaParse(Using the API:
use_vendor_multimodal_model=True
vendor_multimodal_model_name="openai-gpt4o"
vendor_multimodal_api_key="sk-proj-xxxxxx"
)
curl -X 'POST' \
'https://api.cloud.llamaindex.ai/api/parsing/upload' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
--form 'use_vendor_multimodal_model="true"' \
--form 'vendor_multimodal_model_name="openai-gpt4o"' \
--form 'vendor_multimodal_api_key="sk-proj-xxxxxx"' \
-F 'file=@/path/to/your/file.pdf;type=application/pdf'
Custom Azure Model
You also have the possibility to use your own Azure Model Deployment using the following parameters:
In Python:parser = LlamaParse(Using the API:
use_vendor_multimodal_model=True
azure_openai_deployment_name="llamaparse-gpt-4o"
azure_openai_endpoint="https://<org>.openai.azure.com/openai/deployments/<dep>/chat/completions?api-version=<ver>"
azure_openai_api_version="2024-02-15-preview"
azure_openai_key="xxx"
)
curl -X 'POST' \
'https://api.cloud.llamaindex.ai/api/parsing/upload' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
--form 'use_vendor_multimodal_model="true"' \
--form 'azure_openai_deployment_name="llamaparse-gpt-4o"' \
--form 'azure_openai_endpoint="https://<org>.openai.azure.com/openai/deployments/<dep>/chat/completions?api-version=<ver>"' \
--form 'azure_openai_api_version="2024-02-15-preview"' \
--form 'azure_openai_key="xxx"' \
-F 'file=@/path/to/your/file.pdf;type=application/pdf'
[Deprecated] GPt4-o mode
By Setting gpt4o_mode
to True
LlamaParse will use openAI GPT4-o to do the document reconstruction. This is still working, but we recommend using use_vendor_multimodal_model
to True
and vendor_multimodal_model_name
to openai-gpt4o
instead.
The parameter gpt4o_api_key
is still working but we recommend using the parameter vendor_multimodal_api_key
instead.