Using in Python
First, get an api key. We recommend putting your key in a file called .env
that looks like this:
LLAMA_CLOUD_API_KEY=llx-xxxxxx
Set up a new python environment using the tool of your choice, we used poetry init
. Then install the deps you’ll need:
pip install llama-index-core llama-parse llama-index-readers-file python-dotenv
Now we have our libraries and our API key available, let’s create a parse.py
file and parse a file. In this case, we're using this list of fun facts about Canada:
# bring in our LLAMA_CLOUD_API_KEY
from dotenv import load_dotenv
load_dotenv()
# bring in deps
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader
# set up parser
parser = LlamaParse(
result_type="markdown" # "markdown" and "text" are available
)
# use SimpleDirectoryReader to parse our file
file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(input_files=['data/canada.pdf'], file_extractor=file_extractor).load_data()
print(documents)
Now run it like any python file:
python parse.py
This will print an object that contains the full text of the parsed document. Let’s go a step further, and query this document using an LLM! For this, you will need an OpenAI API key (LlamaIndex supports dozens of LLMs, we're just picking a popular one). Get an OpenAI API key and add it to your .env
file:
OPENAI_API_KEY=sk-proj-xxxxxx
We'll also need to , to encode the document into an index:
pip install llama-index-llms-openai llama-index-embeddings-openai
Now, add these lines to your parse.py
:
# one extra dep
from llama_index.core import VectorStoreIndex
# create an index from the parsed markdown
index = VectorStoreIndex.from_documents(documents)
# create a query engine for the index
query_engine = index.as_query_engine()
# query the engine
query = "What can you do in the Bay of Fundy?"
response = query_engine.query(query)
print(response)
Which will give us this output:
You can raft-surf the world’s highest tides at the Bay of Fundy.
Congratulations! You’ve used industry-leading PDF parsing and are ready to integrate it into your app. You can learn more about building LlamaIndex apps in our Python documentation.