Understanding the VRAM Requirements for LLMs on ThinkPad T15g Gen 2 Intel

Vic's Blog

Understanding the VRAM Requirements for LLMs on ThinkPad T15g Gen 2 Intel

24 • 08 • 03 / 12 minutes [ linux, devops, llm, thinkpad, vram ]

Table of Contents #

Introduction
Checking Your System Specs
Downloading and Installing Ollama
Adding Ollama as a Startup Service
Setting Up and Testing LLM Models
Configuring Continue.dev for VS Code
Determining Model Fit for VRAM

Introduction #

In this guide, we'll explore how to determine the optimal model for your ThinkPad T15g Gen 2 Intel based on VRAM requirements, and provide detailed instructions on installing and configuring Ollama on a Linux system. VRAM plays a crucial role in determining the performance of LLMs, more so than CPU or RAM when it comes to speed. By the end, you'll be equipped to run LLMs efficiently on your ThinkPad.

Checking Your System Specs #

First, let's check your system specs to understand the VRAM available. You can use the following commands to get detailed information about your system.

Note

VRAM is the most important factor in determining the performance of LLMs.

vic@debian:~/Documents$ neofetch

Output:

      _,met$$$$$gg.          vic@debian    
   ,g$$$$$$$$$$$$$$$P.       -------    
 ,g$$P"     """Y$$.".        OS: Debian GNU/Linux trixie/sid x86_64    
,$$P'              `$$$.     Host: 20YS005NUS ThinkPad T15g Gen 2i    
',$$P       ,ggs.     `$$b:   Kernel: 6.9.10-amd64    
`d$$'     ,$P"'   .    $$$    Uptime: 7 mins    
$$P      d$'     ,    $$P    Packages: 1922 (dpkg)    
$$:      $$.   -    ,d$$'    Shell: bash 5.2.21    
$$;      Y$b._   _,d$P'      Resolution: 1920x1080, 2560x1440    
Y$$.    `.`"Y$$$$P"'         DE: Plasma 5.27.11    
`$$b      "-.__              WM: KWin    
 `Y$$                        Theme: [Plasma], Breeze [GTK2/3]    
  `Y$$.                      Icons: breeze [Plasma], breeze [GTK2/3]    
    `$$b.                    Terminal: konsole    
      `Y$$b.                 CPU: 11th Gen Intel i7-11800H (16) @ 4.600GHz    
         `"Y$b._             GPU: NVIDIA GeForce RTX 3080 Mobile / Max-Q 8GB/16GB    
             `"""            Memory: 4055MiB / 64098MiB

Next, check the VRAM usage with nvidia-smi:

vic@debian:~/Documents$ sudo nvidia-smi

Output:

Sat Aug  3 21:02:14 2024          
+-----------------------------------------------------------------------------------------+  
| NVIDIA-SMI 550.107.02             Driver Version: 550.107.02     CUDA Version: 12.4     |  
|-----------------------------------------+------------------------+----------------------+  
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |  
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |  
|                                         |                        |               MIG M. |  
|=========================================+========================+======================|  
|   0  NVIDIA GeForce RTX 3080 ...    Off |   00000000:01:00.0  On |                  N/A |  
| N/A   51C    P8             19W /   90W |    1371MiB /  16384MiB |      0%      Default |  
|                                         |                        |                  N/A |  
+-----------------------------------------+------------------------+----------------------+  

+-----------------------------------------------------------------------------------------+  
| Processes:                                                                              |  
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |  
|        ID   ID                                                               Usage      |  
|=========================================================================================|  
|    0   N/A  N/A      1130      G   /usr/lib/xorg/Xorg                            333MiB |  
|    0   N/A  N/A      1336      G   /usr/bin/kwalletd5                              3MiB |  
|    0   N/A  N/A      1443      G   /usr/bin/ksmserver                              3MiB |  
|    0   N/A  N/A      1445      G   /usr/bin/kded5                                  3MiB |  
|    0   N/A  N/A      1446      G   /usr/bin/kwin_x11                             578MiB |  
|    0   N/A  N/A      1515      G   /usr/bin/plasmashell                           61MiB |  
|    0   N/A  N/A      1530      G   ...c/polkit-kde-authentication-agent-1          3MiB |  
|    0   N/A  N/A      1649      G   ...86_64-linux-gnu/libexec/kdeconnectd          3MiB |  
|    0   N/A  N/A      1671      G   /usr/bin/kaccess                                3MiB |  
|    0   N/A  N/A      1678      G   ...-linux-gnu/libexec/DiscoverNotifier          3MiB |  
|    0   N/A  N/A      1680      G   /usr/bin/kalendarac                             3MiB |  
|    0   N/A  N/A      1854      G   ...86_64-linux-gnu/libexec/baloorunner          3MiB |  
|    0   N/A  N/A      1860      G   /usr/bin/konsole                                3MiB |  
|    0   N/A  N/A      2195      G   ...erProcess --variations-seed-version         82MiB |  
|    0   N/A  N/A      2241      G   ...-gnu/libexec/xdg-desktop-portal-kde          3MiB |  
|    0   N/A  N/A      2824      G   /usr/lib/firefox/firefox-bin                  192MiB |  
|    0   N/A  N/A      3844      G   ...erProcess --variations-seed-version         29MiB |  
+-----------------------------------------------------------------------------------------+

Downloading and Installing Ollama #

Next, download and install Ollama for Linux. Here's how to do it manually:

Tip

For detailed instructions, visit the Ollama GitHub page.

sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama
sudo chmod +x /usr/bin/ollama

Adding Ollama as a Startup Service #

To ensure Ollama starts automatically, follow these steps:

Create a user for Ollama:

sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama

Create a service file in /etc/systemd/system/ollama.service:

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3

[Install]
WantedBy=default.target

Start the service:

sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama
sudo systemctl status ollama

Output:

● ollama.service - Ollama Service  
    Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: enabled)  
    Active: active (running) since Sat 2024-08-03 21:05:34 PDT; 3s ago  
Invocation: 84aac1fbcdda4830b75d098a7d7bd070  
  Main PID: 4807 (ollama)  
     Tasks: 17 (limit: 76831)  
    Memory: 1.1G (peak

: 1.1G)  
       CPU: 7.657s  
    C

Group: /system.slice/ollama.service  
            └─4807 /usr/bin/ollama serve  
  
Aug 03 21:05:34 debian ollama[4807]: Your new public key is:  
Aug 03 21:05:34 debian ollama[4807]: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIL18vrhzQX/bbL112Im2qGVy8O4EcxJ0PcSanEs2+ECh  
Aug 03 21:05:34 debian ollama[4807]: 2024/08/03 21:05:34 routes.go:1108: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:ht>  
Aug 03 21:05:34 debian ollama[4807]: time=2024-08-03T21:05:34.282-07:00 level=INFO source=images.go:781 msg="total blobs: 0"  
Aug 03 21:05:34 debian ollama[4807]: time=2024-08-03T21:05:34.282-07:00 level=INFO source=images.go:788 msg="total unused blobs removed: 0"  
Aug 03 21:05:34 debian ollama[4807]: time=2024-08-03T21:05:34.282-07:00 level=INFO source=routes.go:1155 msg="Listening on 127.0.0.1:11434 (version 0.3.3)"  
Aug 03 21:05:34 debian ollama[4807]: time=2024-08-03T21:05:34.283-07:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3690093923/runners  
Aug 03 21:05:36 debian ollama[4807]: time=2024-08-03T21:05:36.905-07:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60102 cpu]"  
Aug 03 21:05:36 debian ollama[4807]: time=2024-08-03T21:05:36.905-07:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs"  
Aug 03 21:05:36 debian ollama[4807]: time=2024-08-03T21:05:36.983-07:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-91ba6083-78a5-b37c-10cf-4d9e2c60925b library=cuda compute=8.6 driver=12.4 name="NVIDIA GeForce RTX 3080 Laptop GPU" to>

Setting Up and Testing LLM Models #

To set up and test LLM models, start by creating a modelfile for deepseek-coder-v2:

vic@debian:~/ollama_projects/modelfiles$ cat deepseek_instruct_16k    
# Modelfile for custom model setup  
  
# Specify the base model to use  
FROM deepseek-coder-v2:16b  
  
# Set model parameters for temperature, context window size, and GPU layer usage  
PARAMETER temperature 0.7  
PARAMETER num_ctx 16384

Create the model:

vic@debian:~/ollama_projects/modelfiles$ ollama create deepseek_instruct_16k -f deepseek_instruct_16k

Output:

transferring model data    
pulling manifest    
pulling 5ff0abeeac1d... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 8.9 GB                            
pulling b321cd7de6c7... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  111 B                            
pulling 4bb71764481f... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  13 KB                            
pulling 1c8f573e830c... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.1 KB                            
pulling 19f2fb9e8bc6... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   32 B                            
pulling 34488e453cfe... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  568 B                            
verifying sha256 digest    
writing manifest    
removing any unused layers    
success    
using existing layer sha256:5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046    
using existing layer sha256:b321cd7de6c7494351e6f0f6b4588378af4bf9cb6d2e0bba022ad81e72d9a776    
using existing layer sha256:4bb71764481f96d4161efc810c6185a0d0eb5a50ab7a0dedbdd283670cbcc2b5    
using existing layer sha256:1c8f573e830ca9b3ebfeb7ace1823146e22b66f99ee223840e7637c9e745e1c7    
creating new layer sha256:fe7e91ac8c07fca04d448419757c4d574c0bcf0dabf997bdbc01243b84bc5c1e    
creating new layer sha256:a911ce456c516614471a08ba4968b77537c0dd3555f3d356e61e4d5cff0f7968    
writing manifest    
success    
vic@debian:~/ollama_projects/modelfiles$ ollama list  
NAME                            ID              SIZE    MODIFIED          
deepseek-coder-v2:16b           8577f96d693e    8.9 GB  34 seconds ago  
deepseek_instruct_16k:latest    3e306615964d    8.9 GB  34 seconds ago

For embeddings, use nomic-embed-text:

vic@debian:~/ollama_projects/modelfiles$ cat nomic_embed_8k    
# Modelfile for custom model setup  
  
# Specify the base model to use  
FROM nomic-embed-text  
  
# Set model parameters for temperature, context window size, and GPU layer usage  
PARAMETER num_ctx 8192  
PARAMETER num_gpu 31

Create the model:

vic@debian:~/ollama_projects/modelfiles$ ollama create nomic_embed_8k -f nomic_embed_8k

Output:

transferring model data    
pulling manifest    
pulling 970aa74c0a90... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 274 MB                            
pulling c71d239df917... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  11 KB                            
pulling ce4a164fc046... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   17 B                            
pulling 

31df23ea7daa... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  420 B                            
verifying sha256 digest    
writing manifest    
removing any unused layers    
success    
using existing layer sha256:970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6    
using existing layer sha256:c71d239df91726fc519c6eb72d318ec65820627232b2f796219e87dcf35d0ab4    
creating new layer sha256:1872e07c1c1dffda6adc3a9e13ef8c78cdf4e3520c05614a141805b61fd06bd2    
creating new layer sha256:978b1d3bbe6d409aae4ea80f3e217a66cd57264606e1f079cf51581d9c3c02ce    
writing manifest    
success    
vic@debian:~/ollama_projects/modelfiles$ ollama list  
NAME                            ID              SIZE    MODIFIED              
nomic-embed-text:latest         0a109f422b47    274 MB  2 seconds ago        
nomic_embed_8k:latest           4c9aa2c3e095    274 MB  2 seconds ago        
deepseek-coder-v2:16b           8577f96d693e    8.9 GB  About a minute ago  
deepseek_instruct_16k:latest    3e306615964d    8.9 GB  About a minute ago

Configuring Continue.dev for VS Code #

Download and install the Continue.dev extension:

Continue.dev Extension

Use the following config in continue.dev setup.

Determining Model Fit for VRAM #

To ensure that the LLM models you use fit within the VRAM available on your ThinkPad T15g Gen 2 Intel, follow these steps:

Monitor VRAM Usage:
Use the nvidia-smi command to check the current VRAM usage. This helps you understand how much VRAM is already in use by other processes.
Experiment with Model Parameters:
Adjust the num_ctx and num_predict parameters in your model files. These parameters significantly impact the VRAM requirements:
- num_ctx: This parameter sets the context window size. A larger context window requires more VRAM but allows the model to handle longer sequences of text.
- num_predict: This parameter defines the number of predictions the model will generate. More predictions require additional VRAM.
Example:

vic@debian:~/ollama_projects/modelfiles$ cat custom_model   
# Modelfile for custom model setup  

# Specify the base model to use  
FROM custom-model-v2:10b  

# Set model parameters for temperature, context window size, and GPU layer usage  
PARAMETER temperature 0.7  
PARAMETER num_ctx 8192
PARAMETER num_predict 2048

Test the Model:
After adjusting the parameters, create the model and monitor the VRAM usage again using nvidia-smi. If the model exceeds the available VRAM, reduce the num_ctx or num_predict values and test again.
Optimize VRAM Usage:
Fine-tune the parameters to find the optimal balance between model performance and VRAM usage. Aim to maximize the num_ctx and num_predict values without exceeding the available VRAM.

By experimenting with these parameters, you can ensure that your LLM models run efficiently on your ThinkPad T15g Gen 2 Intel without exceeding the VRAM limits.

That's it! You're now set up to use Ollama for running LLMs on your ThinkPad T15g Gen 2 Intel with optimal VRAM usage.

🙏 SHARE 🙏

Thanks so much for reading this far, please consider sharing this article on your favorite social media network. I love receiving feedback. For feedback, please ping me on Twitter.