Skip to main content

Customize Engine Settings

In this guide, we'll walk you through the process of customizing your engine settings by configuring the nitro.json file

  1. Navigate to the App Settings > Advanced > Open App Directory > ~/jan/engine folder.
cd ~/jan/engines
  1. Modify the nitro.json file based on your needs. The default settings are shown below.
"ctx_len": 2048,
"ngl": 100,
"cpu_threads": 1,
"cont_batching": false,
"embedding": false

The table below describes the parameters in the nitro.json file.

ctx_lenIntegerTypically set at 2048, ctx_len provides ample context for model operations like GPT-3.5. (Maximum: 4096, Minimum: 1)
nglIntegerDefaulted at 100, ngl determines GPU layer usage.
cpu_threadsIntegerDetermines CPU inference threads, limited by hardware and OS. (Maximum determined by system)
cont_batchingIntegerControls continuous batching, enhancing throughput for LLM inference.
embeddingIntegerEnables embedding utilization for tasks like document-enhanced chat in RAG-based applications.
  • By default, the value of ngl is set to 100, which indicates that it will offload all. If you wish to offload only 50% of the GPU, you can set ngl to 15 because most models on Mistral or Llama are around ~ 30 layers.
  • To utilize the embedding feature, include the JSON parameter "embedding": true. It will enable Nitro to process inferences with embedding capabilities. Please refer to the Embedding in the Nitro documentation for a more detailed explanation.
  • To utilize the continuous batching feature for boosting throughput and minimizing latency in large language model (LLM) inference, include cont_batching: true. For details, please refer to the Continuous Batching in the Nitro documentation.
Assistance and Support

If you have questions, please join our Discord community for support, updates, and discussions.