Using in Python

First, get an api key. We recommend putting your key in a file called .env that looks like this:

LLAMA_CLOUD_API_KEY=llx-xxxxxx

Set up a new python environment using the tool of your choice, we used poetry init. Then install the deps you’ll need:

pip install llama-index-core llama-parse llama-index-readers-file python-dotenv

Now we have our libraries and our API key available, let’s create a parse.py file and parse a file. In this case, we're using this list of fun facts about Canada:

# bring in our LLAMA_CLOUD_API_KEY
from dotenv import load_dotenv
load_dotenv()

# bring in deps
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader

# set up parser
parser = LlamaParse(
    result_type="markdown"  # "markdown" and "text" are available
)

# use SimpleDirectoryReader to parse our file
file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(input_files=['data/canada.pdf'], file_extractor=file_extractor).load_data()
print(documents)

Now run it like any python file:

python parse.py

This will print an object that contains the full text of the parsed document. Let’s go a step further, and query this document using an LLM! For this, you will need an OpenAI API key (LlamaIndex supports dozens of LLMs, we're just picking a popular one). Get an OpenAI API key and add it to your .env file:

OPENAI_API_KEY=sk-proj-xxxxxx

We'll also need to , to encode the document into an index:

pip install llama-index-llms-openai llama-index-embeddings-openai

Now, add these lines to your parse.py:

# one extra dep
from llama_index.core import VectorStoreIndex

# create an index from the parsed markdown
index = VectorStoreIndex.from_documents(documents)

# create a query engine for the index
query_engine = index.as_query_engine()

# query the engine
query = "What can you do in the Bay of Fundy?"
response = query_engine.query(query)
print(response)

Which will give us this output:

You can raft-surf the world’s highest tides at the Bay of Fundy.

Congratulations! You’ve used industry-leading PDF parsing and are ready to integrate it into your app. You can learn more about building LlamaIndex apps in our Python documentation.