Vic's Blog
Understanding the VRAM Requirements for LLMs on ThinkPad T15g Gen 2 Intel
Table of Contents #
- Introduction
- Checking Your System Specs
- Downloading and Installing Ollama
- Adding Ollama as a Startup Service
- Setting Up and Testing LLM Models
- Configuring Continue.dev for VS Code
- Determining Model Fit for VRAM
Introduction #
In this guide, we'll explore how to determine the optimal model for your ThinkPad T15g Gen 2 Intel based on VRAM requirements, and provide detailed instructions on installing and configuring Ollama on a Linux system. VRAM plays a crucial role in determining the performance of LLMs, more so than CPU or RAM when it comes to speed. By the end, you'll be equipped to run LLMs efficiently on your ThinkPad.
Checking Your System Specs #
First, let's check your system specs to understand the VRAM available. You can use the following commands to get detailed information about your system.
Note
VRAM is the most important factor in determining the performance of LLMs.
vic@debian:~/Documents$ neofetch
Output:
_,met$$$$$gg. vic@debian
,g$$$$$$$$$$$$$$$P. -------
,g$$P" """Y$$.". OS: Debian GNU/Linux trixie/sid x86_64
,$$P' `$$$. Host: 20YS005NUS ThinkPad T15g Gen 2i
',$$P ,ggs. `$$b: Kernel: 6.9.10-amd64
`d$$' ,$P"' . $$$ Uptime: 7 mins
$$P d$' , $$P Packages: 1922 (dpkg)
$$: $$. - ,d$$' Shell: bash 5.2.21
$$; Y$b._ _,d$P' Resolution: 1920x1080, 2560x1440
Y$$. `.`"Y$$$$P"' DE: Plasma 5.27.11
`$$b "-.__ WM: KWin
`Y$$ Theme: [Plasma], Breeze [GTK2/3]
`Y$$. Icons: breeze [Plasma], breeze [GTK2/3]
`$$b. Terminal: konsole
`Y$$b. CPU: 11th Gen Intel i7-11800H (16) @ 4.600GHz
`"Y$b._ GPU: NVIDIA GeForce RTX 3080 Mobile / Max-Q 8GB/16GB
`""" Memory: 4055MiB / 64098MiB
Next, check the VRAM usage with nvidia-smi
:
vic@debian:~/Documents$ sudo nvidia-smi
Output:
Sat Aug 3 21:02:14 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.107.02 Driver Version: 550.107.02 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3080 ... Off | 00000000:01:00.0 On | N/A |
| N/A 51C P8 19W / 90W | 1371MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1130 G /usr/lib/xorg/Xorg 333MiB |
| 0 N/A N/A 1336 G /usr/bin/kwalletd5 3MiB |
| 0 N/A N/A 1443 G /usr/bin/ksmserver 3MiB |
| 0 N/A N/A 1445 G /usr/bin/kded5 3MiB |
| 0 N/A N/A 1446 G /usr/bin/kwin_x11 578MiB |
| 0 N/A N/A 1515 G /usr/bin/plasmashell 61MiB |
| 0 N/A N/A 1530 G ...c/polkit-kde-authentication-agent-1 3MiB |
| 0 N/A N/A 1649 G ...86_64-linux-gnu/libexec/kdeconnectd 3MiB |
| 0 N/A N/A 1671 G /usr/bin/kaccess 3MiB |
| 0 N/A N/A 1678 G ...-linux-gnu/libexec/DiscoverNotifier 3MiB |
| 0 N/A N/A 1680 G /usr/bin/kalendarac 3MiB |
| 0 N/A N/A 1854 G ...86_64-linux-gnu/libexec/baloorunner 3MiB |
| 0 N/A N/A 1860 G /usr/bin/konsole 3MiB |
| 0 N/A N/A 2195 G ...erProcess --variations-seed-version 82MiB |
| 0 N/A N/A 2241 G ...-gnu/libexec/xdg-desktop-portal-kde 3MiB |
| 0 N/A N/A 2824 G /usr/lib/firefox/firefox-bin 192MiB |
| 0 N/A N/A 3844 G ...erProcess --variations-seed-version 29MiB |
+-----------------------------------------------------------------------------------------+
Downloading and Installing Ollama #
Next, download and install Ollama for Linux. Here's how to do it manually:
Tip
For detailed instructions, visit the Ollama GitHub page.
sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama
sudo chmod +x /usr/bin/ollama
Adding Ollama as a Startup Service #
To ensure Ollama starts automatically, follow these steps:
Create a user for Ollama:
sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama
Create a service file in /etc/systemd/system/ollama.service
:
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
[Install]
WantedBy=default.target
Start the service:
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama
sudo systemctl status ollama
Output:
● ollama.service - Ollama Service
Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: enabled)
Active: active (running) since Sat 2024-08-03 21:05:34 PDT; 3s ago
Invocation: 84aac1fbcdda4830b75d098a7d7bd070
Main PID: 4807 (ollama)
Tasks: 17 (limit: 76831)
Memory: 1.1G (peak
: 1.1G)
CPU: 7.657s
C
Group: /system.slice/ollama.service
└─4807 /usr/bin/ollama serve
Aug 03 21:05:34 debian ollama[4807]: Your new public key is:
Aug 03 21:05:34 debian ollama[4807]: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIL18vrhzQX/bbL112Im2qGVy8O4EcxJ0PcSanEs2+ECh
Aug 03 21:05:34 debian ollama[4807]: 2024/08/03 21:05:34 routes.go:1108: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:ht>
Aug 03 21:05:34 debian ollama[4807]: time=2024-08-03T21:05:34.282-07:00 level=INFO source=images.go:781 msg="total blobs: 0"
Aug 03 21:05:34 debian ollama[4807]: time=2024-08-03T21:05:34.282-07:00 level=INFO source=images.go:788 msg="total unused blobs removed: 0"
Aug 03 21:05:34 debian ollama[4807]: time=2024-08-03T21:05:34.282-07:00 level=INFO source=routes.go:1155 msg="Listening on 127.0.0.1:11434 (version 0.3.3)"
Aug 03 21:05:34 debian ollama[4807]: time=2024-08-03T21:05:34.283-07:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3690093923/runners
Aug 03 21:05:36 debian ollama[4807]: time=2024-08-03T21:05:36.905-07:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60102 cpu]"
Aug 03 21:05:36 debian ollama[4807]: time=2024-08-03T21:05:36.905-07:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
Aug 03 21:05:36 debian ollama[4807]: time=2024-08-03T21:05:36.983-07:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-91ba6083-78a5-b37c-10cf-4d9e2c60925b library=cuda compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3080 Laptop GPU" to>
Setting Up and Testing LLM Models #
To set up and test LLM models, start by creating a modelfile for deepseek-coder-v2
:
vic@debian:~/ollama_projects/modelfiles$ cat deepseek_instruct_16k
# Modelfile for custom model setup
# Specify the base model to use
FROM deepseek-coder-v2:16b
# Set model parameters for temperature, context window size, and GPU layer usage
PARAMETER temperature 0.7
PARAMETER num_ctx 16384
Create the model:
vic@debian:~/ollama_projects/modelfiles$ ollama create deepseek_instruct_16k -f deepseek_instruct_16k
Output:
transferring model data
pulling manifest
pulling 5ff0abeeac1d... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 8.9 GB
pulling b321cd7de6c7... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 111 B
pulling 4bb71764481f... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 13 KB
pulling 1c8f573e830c... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.1 KB
pulling 19f2fb9e8bc6... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 32 B
pulling 34488e453cfe... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 568 B
verifying sha256 digest
writing manifest
removing any unused layers
success
using existing layer sha256:5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046
using existing layer sha256:b321cd7de6c7494351e6f0f6b4588378af4bf9cb6d2e0bba022ad81e72d9a776
using existing layer sha256:4bb71764481f96d4161efc810c6185a0d0eb5a50ab7a0dedbdd283670cbcc2b5
using existing layer sha256:1c8f573e830ca9b3ebfeb7ace1823146e22b66f99ee223840e7637c9e745e1c7
creating new layer sha256:fe7e91ac8c07fca04d448419757c4d574c0bcf0dabf997bdbc01243b84bc5c1e
creating new layer sha256:a911ce456c516614471a08ba4968b77537c0dd3555f3d356e61e4d5cff0f7968
writing manifest
success
vic@debian:~/ollama_projects/modelfiles$ ollama list
NAME ID SIZE MODIFIED
deepseek-coder-v2:16b 8577f96d693e 8.9 GB 34 seconds ago
deepseek_instruct_16k:latest 3e306615964d 8.9 GB 34 seconds ago
For embeddings, use nomic-embed-text
:
vic@debian:~/ollama_projects/modelfiles$ cat nomic_embed_8k
# Modelfile for custom model setup
# Specify the base model to use
FROM nomic-embed-text
# Set model parameters for temperature, context window size, and GPU layer usage
PARAMETER num_ctx 8192
PARAMETER num_gpu 31
Create the model:
vic@debian:~/ollama_projects/modelfiles$ ollama create nomic_embed_8k -f nomic_embed_8k
Output:
transferring model data
pulling manifest
pulling 970aa74c0a90... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 274 MB
pulling c71d239df917... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 11 KB
pulling ce4a164fc046... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 17 B
pulling
31df23ea7daa... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 420 B
verifying sha256 digest
writing manifest
removing any unused layers
success
using existing layer sha256:970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
using existing layer sha256:c71d239df91726fc519c6eb72d318ec65820627232b2f796219e87dcf35d0ab4
creating new layer sha256:1872e07c1c1dffda6adc3a9e13ef8c78cdf4e3520c05614a141805b61fd06bd2
creating new layer sha256:978b1d3bbe6d409aae4ea80f3e217a66cd57264606e1f079cf51581d9c3c02ce
writing manifest
success
vic@debian:~/ollama_projects/modelfiles$ ollama list
NAME ID SIZE MODIFIED
nomic-embed-text:latest 0a109f422b47 274 MB 2 seconds ago
nomic_embed_8k:latest 4c9aa2c3e095 274 MB 2 seconds ago
deepseek-coder-v2:16b 8577f96d693e 8.9 GB About a minute ago
deepseek_instruct_16k:latest 3e306615964d 8.9 GB About a minute ago
Configuring Continue.dev for VS Code #
Download and install the Continue.dev extension:
Use the following config in continue.dev
setup.
Determining Model Fit for VRAM #
To ensure that the LLM models you use fit within the VRAM available on your ThinkPad T15g Gen 2 Intel, follow these steps:
Monitor VRAM Usage:
Use thenvidia-smi
command to check the current VRAM usage. This helps you understand how much VRAM is already in use by other processes.Experiment with Model Parameters:
Adjust thenum_ctx
andnum_predict
parameters in your model files. These parameters significantly impact the VRAM requirements:- num_ctx: This parameter sets the context window size. A larger context window requires more VRAM but allows the model to handle longer sequences of text.
- num_predict: This parameter defines the number of predictions the model will generate. More predictions require additional VRAM.
Example:
vic@debian:~/ollama_projects/modelfiles$ cat custom_model
# Modelfile for custom model setup
# Specify the base model to use
FROM custom-model-v2:10b
# Set model parameters for temperature, context window size, and GPU layer usage
PARAMETER temperature 0.7
PARAMETER num_ctx 8192
PARAMETER num_predict 2048
Test the Model:
After adjusting the parameters, create the model and monitor the VRAM usage again usingnvidia-smi
. If the model exceeds the available VRAM, reduce thenum_ctx
ornum_predict
values and test again.Optimize VRAM Usage:
Fine-tune the parameters to find the optimal balance between model performance and VRAM usage. Aim to maximize thenum_ctx
andnum_predict
values without exceeding the available VRAM.
By experimenting with these parameters, you can ensure that your LLM models run efficiently on your ThinkPad T15g Gen 2 Intel without exceeding the VRAM limits.
That's it! You're now set up to use Ollama for running LLMs on your ThinkPad T15g Gen 2 Intel with optimal VRAM usage.