Skip to main content

Parsing options

Result type

By default, LlamaParse will return your results as parsed text. The other options available are markdown, which formats the output as clean Markdown, and json which returns a data structure representing the parsed object.

In Python:
parser = LlamaParse(
  result_type="markdown"
)
Using the API:
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/job/<job_id>/result/markdown'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY"

Set language

LlamaParse use OCR to extract text from images. Our OCR supports a long list of languages and you can tell LlamaParse which language(s) to parse for by setting this option. You can specify multiple languages by separating them with a comma. This will only affect text extracted from images.

In Python:
parser = LlamaParse(
  language=fr
)
Using the API:
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'language="fr"' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'

Parsing instructions

LlamaParse can use LLMs under the hood, allowing you to give it natural-language instructions about what it's parsing and how to parse. This is an incredibly powerful feature!

In Python:
parser = LlamaParse(
  parsing_instruction = "You are parsing a receipt from a restaurant. Please extract the total amount paid and the tip."
)
Using the API:
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'parsing_instruction="string"' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'

Skip diagonal text

By default, LlamaParse will attempt to parse text that is diagonal on the page. This can be useful for some documents, but it can also lead to errors. If you're seeing strange results, try setting skip_diagonal_text to True.

In Python:
parser = LlamaParse(
  skip_diagonal_text=True
)
Using the API:
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'skip_diagonal_text="true"' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'

Do not unroll collumns

By default, LlamaParse will attempt to unroll columns (putting them after each other in reading order). Setting do_not_unroll_columns to True will prevent LlamaParse from doing so.

In Python:
parser = LlamaParse(
  do_not_unroll_columns=True
)
Using the API:
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'do_not_unroll_columns="true"' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'

Page separator

By default, LlamaParse will separate pages in the markdown and text output by \n---\n. You can change this separator by seting page_separator to the desired string.

In Python:
parser = LlamaParse(
  page_separator="\n=================\n"
)
Using the API:
curl -X 'POST' \
  'https://api.cloud.llamaindex.ai/api/parsing/upload'  \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
  --form 'page_separator="\n=================\n"' \
  -F 'file=@/path/to/your/file.pdf;type=application/pdf'