Skip to main content

Selecting what to parse

By default LlamaParse will extract all the visible content of every page of a document

Parsing only some pages

You can specify the pages you want to parse by passing specific page numbers as a comma-separated list in the target_pages argument. Pages are numbered starting at 0.

In Python:
parser = LlamaParse(
  target_pages="0,1,2,22,33"
)
Using the API:
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'target_pages="0,1,2,22,33"' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'

Parsing only a targeted area of a document

You can specify an area of a document that you want to parse. This can be helpful to remove headers and footers.

To do so you need to provide the bounding box margin in clockwise order from the top in a comma separated string in the bounding_box arguments. The margins are expressed as a ratio compare to the page size between 0 and 1.

Examples:

  • To not parse the top 10% of a document: bounding_box="0.1,0,0,0"
  • To not parse the top 10% and bottom 20% of a document: bounding_box="0.1,0,0.2,0"
In Python:
parser = LlamaParse(
  bounding_box="0.1,0.4,0.2,0.3"
)
Using the API:
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'bounding_box="0.1,0.4,0.2,0.3"' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'