Skip to main content

Retrieving Image Content

In addition to retrieving text content from your indexed documents, LlamaCloud also supports retrieving images sourced from these documents.

This is particularly useful for applications that require visual context, such as presentations, reports, or any other document type that includes images.

Image sources

Images are extracted from Files attached to an index in the following ways:

  • Page Screenshots
    • A screenshot of each page in a file is taken and stored as an image.
    • This screenshot image data can be downloaded from the Page Screenshots API
  • Page Figures
    • If a file contains figures embedded in its pages, these figures are extracted and stored as images.
    • This figure image data can be downloaded from the Page Figures API
    • Important Note: Please note that Page Figure extraction is currently not supported for self-hosted (aka BYOC) deployments of LlamaCloud. We will be adding support for this environment in the near future!

Enabling Image Indexing

To enable image retrieval in your index, you need to have the correct parsing parameters setup on your index.

Setting up via LlamaCloud UI

When creating a new index or editing an existing one, simply ensure you've toggled Enable Multi-modal retrieval under the Multi-Modal Indexing section.

Enable Multi-modal retrieval

Setting up via API

To enable image retrieval programmatically, you need to toggle the correct flags under the llama_parse_parameters on your index.

For enabling Page Screenshot indexing & retrieval, set the llama_parse_parameters.take_screenshot flag to true. For enabling Page Figure indexing & retrieval, set the llama_parse_parameters.extract_layout flag to true.

Here is an example of setting this up using the LlamaCloudIndex class:

from llama_cloud import LlamaParseParameters
from llama_index.indices.managed.llama_cloud import LlamaCloudIndex

index = LlamaCloudIndex.create_index(
name="my_image_index",
project_name="Default",
api_key="llx-...",
llama_parse_parameters=LlamaParseParameters(
take_screenshot=True,
extract_layout=True,
),
)

Retrieving Images

Once your index is set up to support image indexing, you can retrieve images using the retriever interface on your LlamaCloudIndex. Image retrieval works similarly to text retrieval, but you must specify which types of images you want to retrieve: page screenshots and/or page figures.

Retrieving Page Screenshots

To retrieve page screenshots (full-page images for each page in your indexed files), use the as_retriever method with the retrieve_page_screenshot_nodes=True parameter:

from llama_cloud import LlamaParseParameters
from llama_index.indices.managed.llama_cloud import LlamaCloudIndex

# Assume you have already created and ingested files into your index
index = LlamaCloudIndex(
name="my_image_index",
project_name="Default",
api_key="llx-...",
llama_parse_parameters=LlamaParseParameters(
take_screenshot=True,
),
)

# Wait for ingestion to complete if needed
index.wait_for_completion()

# Get a retriever that will return page screenshot images
retriever = index.as_retriever(retrieve_page_screenshot_nodes=True)

# Retrieve images relevant to your query
nodes = retriever.retrieve("What color is the company's logo?")

# Filter for image nodes (optional)
from llama_index.core.schema import ImageNode
image_nodes = [n.node for n in nodes if isinstance(n.node, ImageNode)]

# Each ImageNode contains a base64-encoded image and metadata
for img_node in image_nodes:
print(img_node.metadata) # e.g., file_id, page_index, file_name
# img_node.image is a base64-encoded image string

Retrieving Page Figures

To retrieve figures (e.g., charts, diagrams, images) extracted from your documents, use the retrieve_page_figure_nodes=True parameter:

from llama_cloud import LlamaParseParameters
from llama_index.indices.managed.llama_cloud import LlamaCloudIndex

# Assume you have already created and ingested files into your index
index = LlamaCloudIndex(
name="my_image_index",
project_name="Default",
api_key="llx-...",
llama_parse_parameters=LlamaParseParameters(
extract_layout=True,
),
)

# Wait for ingestion to complete if needed
index.wait_for_completion()

# Get a retriever that will return page figure images
retriever = index.as_retriever(retrieve_page_figure_nodes=True)

nodes = retriever.retrieve("Describe the chart showing future growth projections")

image_nodes = [n.node for n in nodes if isinstance(n.node, ImageNode)]
for img_node in image_nodes:
print(img_node.metadata) # includes file_id, page_index, figure_name, file_name

Of course, to retrieve both page screenshots and figures, you can set both retrieve_page_screenshot_nodes=True & retrieve_page_figure_nodes=True.

Just ensure you've also set take_screenshot=True and extract_layout=True in your index's llama_parse_parameters to enable the necessary image extraction.

Conclusion

That's it! You can now retrieve images (screenshots and figures) from your indexed documents using LlamaCloud.

For more advanced use cases, such as composite retrieval or async usage, refer to the framework integration guide and the API reference.