Service Configurations
Individual LlamaCloud services can be configured based on your specific needs. This page will cover the different configurations for each service.
Global Configurations
At the time of writing, the only global configurations are for external dependencies. For more information, please refer to the Dependencies page.
Backend Service
Qdrant
Qdrant is a popular vector database that is used to store and retrieve embeddings. Users can configure Qdrant as a Data Sink on a project by project basis, or if they prefer, they can configure it to be used as a Data Sink across all projects and organizations. For the latter, the following configurations can be set:
# basic example
backend:
config:
qdrant:
enabled: true
url: "http://qdrant:6333"
apiKey: "your-api-key"
# or, if you prefer to use an existing secret
backend:
config:
qdrant:
enabled: true
existingSecret: "qdrant-secret"
Jobs Worker Service
There are several configs that can be set to modify how the Jobs Worker handled job executions.
Concurrency Settings
These settings modify how concurrent jobs get distributed across the job worker pods. These are mainly used to:
- Prevent a noisy neighbor problem whereby a single user can flood the job workers, thereby leading to starvation for other users
- Prevent external resources such as Mongo, Embeddings API, Vector DBs etc. from being overloaded with too many requests
-
maxJobsInExecutionPerJobType: This setting defines the maximum number of concurrent jobs a user can have running per job type. It is used by the job runner to help prevent any one user from overloading the system.
- Set this to 0 to disable this concurrency check.
-
maxIndexJobsInExecution: This configuration specifies the maximum number of ingestion (indexing) jobs that a single pipeline is allowed to execute concurrently. It is applied to pipelines handling document ingestion and indexing operations to control resource usage.
- Set this to 0 to disable this concurrency check.
-
maxDocumentIngestionJobsInExecution: This parameter limits the number of concurrent document ingestion jobs a user can have in execution. Document ingestion is typically resource intensive, so this should be kept relatively low to avoid overloading the system.
- Set this to 0 to disable this concurrency check.
# example values.yaml for high throughput
jobsWorker:
config:
maxJobsInExecutionPerJobType: 25
maxIndexJobsInExecution: 0
maxDocumentIngestionJobsInExecution: 10
LlamaParse
Job Throughput Settings
- maxQueueConcurrency: This configuration sets the maximum number of jobs that can be processed concurrently by the LlamaParse service. It helps enable the service to process a high volume of jobs efficiently. The higher the number, the more resources will be used, so please be mindful of this.
- Default value is
3
.
- Default value is
# example values.yaml for high throughput
llamaParse:
config:
maxQueueConcurrency: 10