Skip to main content

Layout Extraction

LlamaParse supports layout extraction. This can be useful if you want to be able to reconstitute the original look of the document by putting things back in their original places.

If you set extract_layout=True on the API and request JSON output it will include bounding boxes for the following types:

  • tables
  • figures
  • titles
  • text
  • lists

The layout data is returned in the JSON data, as a layout property attached to each page.

Each layout entry contains:

  • A bbox expressed as a fraction of page width and height (a number between 0 and 1)
  • An image name corresponding to an image of the element. This can be retrieved with the image API just like other images.
  • A confidence score (for 0 to 1, 1 mean good)
  • A label indicating the type of element
  • isLikelyNoise, set to true if our NMS detects that the element is likely to be noise.

Example​

{
"bbox": {
"x": 0.176,
"y": 0.497,
"w": 0.651,
"h": 0.112
},
"image": "page_1_text_1.jpg",
"confidence": 0.996,
"label": "text",
"isLikelyNoise": false
},

Cost​

Layout extraction costs 1 extra credit per page.