Using in Python
First, get an api key. We recommend putting your key in a file called .env
that looks like this:
LLAMA_CLOUD_API_KEY=llx-xxxxxx
Set up a new python environment using the tool of your choice, we used poetry init
. Then install the deps you’ll need:
pip install llama-index-core llama-parse llama-index-readers-file python-dotenv
Now we have our libraries and our API key available, let’s create a parse.py
file and parse a file. In this case, we're using this list of fun facts about Canada:
# bring in our LLAMA_CLOUD_API_KEY
from dotenv import load_dotenv
load_dotenv()
# bring in deps
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader
# set up parser
parser = LlamaParse(
result_type="markdown" # "markdown" and "text" are available
)
# use SimpleDirectoryReader to parse our file
file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(input_files=['data/canada.pdf'], file_extractor=file_extractor).load_data()
print(documents)
Now run it like any python file:
python parse.py
This will print an object that contains the full text of the parsed document. Let’s go a step further, and query this document using an LLM! For this, you will need an OpenAI API key (LlamaIndex supports dozens of LLMs, we're just picking a popular one). Get an OpenAI API key and add it to your .env
file:
OPENAI_API_KEY=sk-proj-xxxxxx
We'll also need to , to encode the document into an index:
pip install llama-index-llms-openai llama-index-embeddings-openai
Now, add these lines to your parse.py
:
# one extra dep
from llama_index.core import VectorStoreIndex
# create an index from the parsed markdown
index = VectorStoreIndex.from_documents(documents)
# create a query engine for the index
query_engine = index.as_query_engine()
# query the engine
query = "What can you do in the Bay of Fundy?"
response = query_engine.query(query)
print(response)
Which will give us this output:
You can raft-surf the world’s highest tides at the Bay of Fundy.
Congratulations! You’ve used industry-leading PDF parsing and are ready to integrate it into your app. You can learn more about building LlamaIndex apps in our Python documentation.
Examples
For Python notebooks examples, visit our GitHub repo.
For guided content, take a look at our official youtube tutorials