由于Nvidia显卡驱动不支持的原因,放弃从Proxmox VE 7上搭建,转为从PVE 6
安装debian10,安装PVE6,方法见下
https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_Buster
可以参照这个老哥的步骤
https://www.youtube.com/watch?v=cPrOoeMxzu0&ab_channel=CraftComputing
总结下来有这几个步骤
nano /etc/apt/sources.list
deb http://download.proxmox.com/debian/pve buster pve-no-subscription
添加完了apt update一下,可能会出现NO_PUBKEY的报错。
用sudo apt-key adv –keyserver keys.gnupg.net –recv-keys
注意疑难解答节的pve repo部分
apt update
apt -y upgrade
apt -y install python3 python3-pip git build-essential pve-headers dkms jq
# 这个应该不装也行
# pip3 install frida
# 下面命令是用在消费级GPU上解锁显卡分割功能的,本次工作用不到。需要相关的资料,请回去看文中给出的youtube连接
# git clone https://github.com/DualCoder/vgpu_unlock
# chmod -R +x vgpu_unlock
wget http://ftp.br.debian.org/debian/pool/main/m/mdevctl/mdevctl_0.81-1_all.deb
dpkg -i mdevctl_0.81-1_all.deb
reboot
nano /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
或者
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"
update-grub
nano /etc/modules
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
update-initramfs -u
reboot
重启后执行dmesg | grep -e DMAR -e IOMMU
检查IOMMU是否开启
https://nvid.nvidia.com/dashboard/#/dashboard 到这个url,需要Nvidia企业账号
然后去左侧的软件下载tab,筛选软件。vGPU-LinuxKVM(这个部分根据你虚拟化的平台去选,本次的PVE就是用文中这个)
找个最新的然后下载
下载后的压缩包,里面有个叫做类似NVIDIA-Linux-x86_64-470.63-vgpu-kvm.run
的东西,这就是服务器Host需要的显卡驱动。上传到服务器里面。剩下的可执行程序是VM里面需要的驱动。
在Host里面,chmod +x, 然后执行它,并添加参数 –dkms
这个时候应该会安装成功,如果失败了,那可能软件依赖、内核版本的问题,自己想办法吧。本文在Linux内核5.4上成功。
然后重启。这个时候可以认为宿主机的前期准备已经完成。
执行mdevctl types,应该会列出你物理gpu的pci地址,以及它支持的虚拟gpu profile
或者运行 nvidia-smi vgpu -s
也会得到类似的东西。
从支持的各个vgpu profile中选一个需要的。注意一般会选择Q类型的vGPU,因为这个类型开放的功能最多,默认fps限制也最高有60帧,其他的可能只有45
Series Optimal Workload
Q-series Virtual workstations for creative and technical professionals who require the performance and features of Quadro technology
C-series Compute-intensive server workloads, such as artificial intelligence (AI), deep learning, or high-performance computing (HPC)2, 3
B-series Virtual desktops for business professionals and knowledge workers
A-series App streaming or session-based solutions for virtual applications users6
When enabled, the frame-rate limiter (FRL) limits the maximum frame rate in frames per second (FPS) for a vGPU as follows:
For B-series vGPUs, the maximum frame rate is 45 FPS.
For Q-series, C-series, and A-series vGPUs, the maximum frame rate is 60 FPS.
本文选择nvidia-233
然后去网上生成几个GUID
然后形成如下的命令(例子中创建了两个vGPU,看个意思就好)
# GUID1: 5c4d18e7-5bc0-4792-900d-c9a17a644e92
# GUID2: 03d16c31-c5dd-433e-b442-936c8f1dfb3f
mdevctl start -u 5c4d18e7-5bc0-4792-900d-c9a17a644e92 -p 0000:3b:00.0 --type nvidia-233
mdevctl start -u 03d16c31-c5dd-433e-b442-936c8f1dfb3f -p 0000:3b:00.0 --type nvidia-233
mdevctl define --auto --uuid 5c4d18e7-5bc0-4792-900d-c9a17a644e92
mdevctl define --auto --uuid 03d16c31-c5dd-433e-b442-936c8f1dfb3f
上面的start命令每次Host启动都要运行一遍,所以搞个自动执行吧,在此略过
这个时候可以认为显卡已经分割好了,下一步是把虚拟显卡直通给VM
没啥好说的,注意创建参数
BIOS需要是OVMF(UEFI), Machine是q35。
然后启动虚拟机,装系统。系统装完之后关机。
VM的配置文件位于Host的/etc/pve/qemu-server/[VMID].conf里面,打开它做一点修改
注意此处和视频教程略有不同
原文件内容
bios: ovmf
boot: order=ide0;ide2;net0
cores: 4
efidisk0: local:101/vm-101-disk-1.qcow2,size=128K
ide0: local:101/vm-101-disk-0.qcow2,size=120G
ide2: local:iso/cn_windows_10_business_editions_version_1903_updated_sept_2019_x64_dvd_2f5281e1.iso,media=cdrom
machine: pc-q35-5.2
memory: 16384
name: VGPU2
net0: e1000=66:AB:4D:07:99:E5,bridge=vmbr1,firewall=1
numa: 0
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=061a1415-6b6a-414f-ba80-5f5d93893f6c
sockets: 2
vmgenid: e8d21fb3-583f-4e2b-bbd3-fac2c58d55b5
在原始文件的头部,加一行
args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/03d16c31-c5dd-433e-b442-936c8f1dfb3f,display=off' -uuid 1f4558e8-f7be-4b80-9f7e-6723d2df0c89
文中出现两个uuid,第一个uuid,是上文中我们分割显卡之后,赋予虚拟显卡的uuid。而第二个id是此VM的id,这个请在网上再生成一个,不要和第一个id重复
改完之后保存。此时相当于已经配置好VM的vGPU直通了。
ps: 教程用的命令如下,它可以用来欺骗vm显卡型号。还是注意后面一个-uuid的内容要重新生成一个
args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/[UUID],display=off,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-pci-vendor-id=0x10de,x-pci-device-id=0x[PCI-ID],x-pci-sub-vendor-id=0x10de,x-pci-sub-device-id=0x11A0' -uuid [UUID]
开机之后能看到硬件中有一个没有驱动显卡,这个时候可以用第三步下载的压缩包里面的客户机驱动文件安装。但注意也先别急安装,首先需要打开3389或者安装其他远控程序。因为显卡驱动一旦装上了,pve控制台的novnc可能会失效,vm就失去控制了。当然,为了远控能用,还需要给虚拟机配上网络,并且在host里面搞定端口转发。这个内容见下文。
等远控开启了,显卡驱动转完,vm搭建也就完成了。
参考这个文章来搞:
https://www.flomain.de/2015/05/how-to-proxmox-networking/
保存了一份附件
我们需要的是Routed Networking,所以写出来的配置长这样
cat /etc/network/interfaces
auto lo
iface lo inet loopback
# 黑石用来上网的网卡
auto eth0
iface eth0 inet dhcp
post-up echo 1 > /proc/sys/net/ipv4/conf/eth0/proxy_arp
# 这里是黑石的第二张网卡,不相干
iface eth1 inet manual
auto vmbr1
iface vmbr1 inet static
address 10.3.5.1/24
bridge-ports none
bridge-stp off
bridge-fd 0
post-up echo 1 > /proc/sys/net/ipv4/ip_forward
post-up iptables -t nat -A POSTROUTING -s '10.3.5.0/24' -o eth0 -j MASQUERADE
post-down iptables -t nat -D POSTROUTING -s '10.3.5.0/24' -o eth0 -j MASQUERADE
可能需要重启生效,我这里用的pve提供的Apply Configuration功能来生效的,功能在node中的Network里面
然后给VM添加网卡,添加我们配置的vmbr1。进系统之后设置静态ip, 地址为10.3.5.*,掩码255.255.255.0,网关10.3.5.1,dns自己填公网上好用的
配好了之后VM应该可以上百度
三句话让Linux给我转发端口:
iptables -A PREROUTING -t nat -i eth0 -p tcp --dport 55555 -j DNAT --to-destination 10.3.5.10:3389
iptables -A FORWARD -p tcp -d 10.3.5.10 --dport 3389 -j ACCEPT
iptables -A POSTROUTING -t nat -s 10.3.5.10 -o eth0 -j MASQUERADE
不过在网络配置那一段,出站的nat已经搞定了,所以其实只需要上面的第一句即可。
上面就完成了VM机10.3.5.10的3389映射到Host的55555端口
修改下QEMU的源码,防止虚拟机被检测。
在宿主机里面
git clone --recursive git://git.proxmox.com/git/pve-qemu.git
进入Qemu/target/i386/cpu.c,找到这一行
*ecx = env->features[FEAT_1_ECX]
修改为
*ecx = env->features[FEAT_1_ECX] & 0x7fffffff
# install basic build dependencies and debian build helpers
apt install devscripts build-essential
# clone repo recursive (just delete the old directory previously)
git clone --recursive git://git.proxmox.com/git/pve-qemu.git
cd pve-qemu
# install all build dependencies of the pve-qemu package
mk-build-deps --install debian/control
# edit GUI_REFRESH_INTERVAL_DEFAULT you can also use vim or another editor of your choice
nano qemu/include/ui/console.h
# build and install package (can need a few minutes up to half an hour depending on resources)
make dinstall
如果报错,可能是cpu核心数太多了,并行出了问题。用make -j32测试一下。妈的智障。
I had this same issue and beat my head against a wall for the longest time. The answer is indeed in the Package_Repositories page, but its not as explicit as you’d like. I had to edit my repositories list at /etc/apt/sources.list as stated in the post, but you also have to comment out the line listed in /etc/apt/sources.list.d/pve-enterprise.list. If you don’t comment out the line in the second file, it still tries to use the enterprise repo. After commenting that out, run apt update and it should work.
如上文所述,注释掉enterprise.list中的repo,然后在/etc/apt/sources.list中添加deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription
使用nvidia提供的kvm显卡驱动,如NVIDIA-Linux-x86_64-470.63-vgpu-kvm.run
时,编译报错,诸如什么"set_fs"没有定义之类的。这是由于linux内核5.11修改了部分api,nvidia驱动还是针对旧版linux写的
网上有好人给出了patch,在此https://github.com/rupansh/vgpu_unlock_5.12.
This section assumes you have gone through the vgpu_unlock wiki.
Linux Kernel 5.12 removed set_fs, which prevented nvidia from using hacks to bypass the eventfd api.
This repo has a specific patch which may be used for users on kernel 5.12.
Copy the patch to the folder which contains the vgpu driver related binaries.
To apply to patch you must do the following:
./NVIDIA-Linux-x86_64-<version>-vgpu-kvm.run -x
This will create a new folder with the driver related files in NVIDIA-Linux-x86_64-<version>-vgpu-kvm
cd NVIDIA-Linux-x86_64-<version>-vgpu-kvm
patch -p0 < ../twelve.patch
The module will still not build so you must do manual changes.
#include <disclaimer.h>
// Note that this change may or may not be illegal for an individual user to do.
// I am not a lawyer and I am not responsible for any trouble you land in.
you must change the MODULE_LICENSE of nvidia & nvidia-vgpu-vfio modules to something that is compatible with GPL-only symbols, e.g Dual MIT/GPL.
The files containing them for the respective modules are kernel/nvidia/nv-frontend.c and kernel/nvidia-vgpu-vfio ;)))
Once the changes are applied, install the driver with the following command:
nvidia-installer --dkms
The rest of the procedure is same as the wiki.
The merged driver is available here(you must still make the MODULE_LICENSE changes yourself):
https://drive.google.com/file/d/119I9SxxfQ-mheinVjRN3Woep7YsKl7FS/view?usp=sharing
Install it with nvidia-installer --dkms
文中说的MODULE_LICENSE的意思是,把诸如MODULE_LICENSE("*");
的语句,替换成MODULE_LICENSE("Dual MIT/GPL");
一共就两个地方需要替换,打开代码文件夹全局搜索"MODULE_LICENSE"即可
修改完的代码可见附件。
cd进入代码文件夹,运行nvidia-installer –dkms. 会出现几个弹框说什么异常,一路ok倒也还好,最后提示驱动安装成功。
最后运行nvidia-smi
测试下,应该会显示gpu状态。
NoVNC启动失败,报错诸如"VM 106 qmp command ‘change’ failed – The command change has not been found"的情况。目前这个问题没有办法解决,不过可以改用Spice进行连接。在web控制台选中目标虚拟机->DisPlay->选择为SPICE。然后重启。随后在右上角_Console按钮里面有个下拉菜单,选择SPICE,此时会下载一个.vv文件。去Google搜索spice viewer,下载之。可以在这里下载:https://virt-manager.org/download/。 下载完之后安装之后会自动关联.vv文件,打开它即可。