Selecting what to parse
By default LlamaParse will extract all the visible content of every page of a document
Parsing only some pages
You can specify the pages you want to parse by passing specific page numbers as a comma-separated list in the target_pages
argument. Pages are numbered starting at 0
.
parser = LlamaParse(Using the API:
target_pages="0,1,2,22,33"
)
curl -X 'POST' \
'https://api.cloud.llamaindex.ai/api/parsing/upload' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
--form 'target_pages="0,1,2,22,33"' \
-F 'file=@/path/to/your/file.pdf;type=application/pdf'
Parsing only a targeted area of a document
You can specify an area of a document that you want to parse. This can be helpful to remove headers and footers.
To do so you need to provide the bounding box margin in clockwise order from the top in a comma separated string in the bounding_box
arguments. The margins are expressed as a ratio compare to the page size between 0 and 1.
Examples:
- To not parse the top 10% of a document:
bounding_box="0.1,0,0,0"
- To not parse the top 10% and bottom 20% of a document:
bounding_box="0.1,0,0.2,0"
parser = LlamaParse(Using the API:
bounding_box="0.1,0.4,0.2,0.3"
)
curl -X 'POST' \
'https://api.cloud.llamaindex.ai/api/parsing/upload' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
--form 'bounding_box="0.1,0.4,0.2,0.3"' \
-F 'file=@/path/to/your/file.pdf;type=application/pdf'