How to set up your Jetson device for LLM inference and fine-tuning

Michael Yuan
5 min readOct 2, 2023

--

by Tony Yuan and Michael Yuan

When it comes to large language model (LLM, e.g. llama2) inference and fine-tuning, GPUs are critical resources. Those computations are simply too slow on CPUs. In particular, the GPU needs a large amount of RAM to fit the entire model in it. For example, even after quantization, the llama2 70b model requires at least 48GB of GPU RAM. Yet, those high-end GPUs are very expensive.

As we can see, only Nvidia H100, AWS g5.12xlarge (based on Nvidia A10G), Mac (Apple m2), and Nvidia Jetson can meet the 48GB minimum requirement for 70b models.

For even larger models, such as the Falcon 180b, which requires at least 132 GB of GPU RAM, the Mac is pretty much the only choice under $10,000. For Nvidia devices, you can use NVLink to connect multiple GPUs and share RAM among them, but that is generally a very expensive solution.

It is obvious that the Jetson AGX Orin, while still quite expensive, offers the biggest bang for the buck for personal and research uses! So, I bought one!

However, the device comes with almost no information about the software on it. I will show you how I set it up for LLM inference in this article!

An alternative way to set up the Jetson device is to connect it to your computer via USB and then burn a customized version of Linux OS into it. You can use the Nvidia SDK manager to do this. It is a complex process and we will not cover it here.

Starting up

In order to use the Jetson device as a standalone computer, you will need to connect a keyboard, a mouse, and a monitor (display) to it.

I just used the basic USB keyboard and mouse. I plugged them into the USB port on the device.

The display is a little more complex since the Jetson device only has a DisplayPort (DP) while my flat panel LED display, like most displays, only supports HDMI. So, I bought a $6 DP-HDMI converter and it worked just fine.

The power brick that comes with the Jetson device has a USB-C plug. You should plug it into the port right above the power port on the device. As soon as I plug it on, a small white light comes on, and the device boots. Press any key on the keyboard and you should be able to see a Nvidia logo on the display screen.

Next, you will be guided by a series of screens to agree to Nvidia’s software licenses, select a system language, create a username and password, select a time zone, and finally connect to wifi. After all these are done, you can log into the device using the username and password you just created.

Default Linux distro

Once you log in, you should be able to see a Terminal icon on the home screen. Double-click on it to bring up a command line terminal. Type in the following two commands to see the Linux distribution and kernel version.

lsb_release -a

and

uname -r

As you can see the Jetson device comes with Ubuntu Linux 20.04 pre-installed. It has a specialized kernel built for the Nvidia Tegra architecture, which refers to Jetson's integrated CPU, GPU, and shared RAM architecture. Cool!

Since this is from Nvidia, maybe it has Nvidia GPU drivers and developer tools pre-installed as well? Let’s try the nvcc command, which is part of the Nvidia CUDA developer toolkit.

Nope. Cannot find CUDA. That is a bummer! The software team and hardware team really should talk more!

Install JetPack

Now, Nvidia has a software package for Jetson devices called the JetPack. It contains all the device drivers and developer tools for the Jetson hardware, including the CUDA developer toolkit as well as advanced libraries such as cuDNN, TensorRT, etc. Yet, this important package is NOT pre-installed on the Jetson device. Oh well, we can install using Ubuntu’s built-in apt package manager.

sudo apt update
sudo apt install nvidia-jetpack

Once JetPack is installed, you still need to tell it where to find the CUDA toolkit. Add the following lines to the end of your ~/.profile file.

PATH="/usr/local/cuda/bin:$PATH"
LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"

Then reboot your device and log in again.

sudo reboot

Now, you can see that nvcc is working! We have the CUDA toolkit v11.4 installed.

nvcc --version

GPU status

While the CUDA toolkit is installed, how do we know that GPUs are working properly? On most Nvidia devices, including AWS g5 servers, you can simply use the nvidia-smi tool to check the GPU status.

However, nvidia-smi is part of the Nvidia drivers for standalone GPUs. the Jetson devices have GPUs integrated on the same board as CPUs with shared RAM — the aforementioned Tegra architecture. So, nvidia-smi is NOT available on the Jetson devices.

Fortunately, there is a great tool for GPU monitoring that works for Tegra devices. It is a Python program called jtop. Let’s install it through Python’s package manager pip.

sudo apt install python3-pip
sudo pip3 install -U jetson-stats

Reboot the device again.

sudo reboot

Now, run the jtop command in a terminal window. You can see colorful graphs of GPU and CPU status right in the terminal!

jtop

Next steps

Now, we have confirmed that the GPU drivers and CUDA toolkit are all working well on the device. The next step is to install WasmEdge and then run LLMs on this device!

--

--