Skip to main content

Selecting what to parse

By default LlamaParse will extract all the visible content of every page of a document

Parsing only some pages​

You can specify the pages you want to parse by passing specific page numbers as a comma-separated list in the target_pages argument. Pages are numbered starting at 0.

In Python:
parser = LlamaParse(
  target_pages="0,1,2,22,33"
)
Using the API:
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'target_pages="0,1,2,22,33"' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'

The range syntax is also supported target_pages=0-2,6-22,33.

Parsing only a targeted area of a document​

You can specify an area of a document that you want to parse. This can be helpful to remove headers and footers.

To do so you need to provide the bounding box margin expressed as a ratio compare to the page size between 0 and 1 in bbox_left, bbox_right, bbox_top and bbox_bottom.

Examples:

  • To not parse the top 10% of a document: bbox_top=0.1
  • To not parse the top 10% and bottom 20% of a document: bbbox_top=0.1 and bbox_bottom=0.2,
In Python:
parser = LlamaParse(
  bbox_left=0.2
)
Using the API:
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'bbox_left=0.2' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'

bounding_box (legacy)​

We support a deprecated way of doing so where it is possible to provide the bounding box margin in clockwise order from the top in a comma separated string in the bounding_box arguments. The margins are expressed as a ratio compare to the page size between 0 and 1.

Examples:

  • To not parse the top 10% of a document: bounding_box="0.1,0,0,0"
  • To not parse the top 10% and bottom 20% of a document: bounding_box="0.1,0,0.2,0"
In Python:
parser = LlamaParse(
  bounding_box="0.1,0.4,0.2,0.3"
)
Using the API:
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'bounding_box="0.1,0.4,0.2,0.3"' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'

Limiting number of page to parse​

If you want to limit the maximum amount of pages to parse you can use the parameter max_pages. LlamaParse will stop parsing the document after the specified pages.

In Python:
parser = LlamaParse(
  max_pages=25
)
Using the API:
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'max_pages=25' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'