Layout Extraction
LlamaParse supports layout extraction. This can be useful if you want to be able to reconstitute the original look of the document by putting things back in their original places.
If you set extract_layout=True
on the API and request JSON output it will include bounding boxes for the following types:
- tables
- figures
- titles
- text
- lists
The layout data is returned in the JSON data, as a layout
property attached to each page.
Each layout entry contains:
- A
bbox
expressed as a fraction of page width and height (a number between 0 and 1) - An
image
name corresponding to an image of the element. This can be retrieved with the image API just like other images. - A
confidence
score (for 0 to 1, 1 mean good) - A
label
indicating the type of element isLikelyNoise
, set totrue
if our NMS detects that the element is likely to be noise.
Example​
{
"bbox": {
"x": 0.176,
"y": 0.497,
"w": 0.651,
"h": 0.112
},
"image": "page_1_text_1.jpg",
"confidence": 0.996,
"label": "text",
"isLikelyNoise": false
},
Cost​
Layout extraction costs 1 extra credit per page.