不能通过修改qemu来记录cpuid的原因

背景

假设有个很难分析的程序，它用了cpuid进行重要的功能（比如检测虚拟机），但是我们没办法分析它（虚拟化、反调试等）。在这个背景下，可以把它装在虚拟机里面，然后在Qemu处理cpuid的逻辑中进行log.

编译PVE-Qemu

看看这个脚本，大概就懂了。checkout之后需要重新init一下submodule，否则编译时会报xxxsubmodule不存在。至于用哪个命令初始化submodule我忘了，反正试几个命令就出来了。

    apt install -y devscripts build-essential pve-kernel-libc-dev
    git clone --recursive https://git.proxmox.com/git/pve-qemu.git
    cd pve-qemu
    # checkout to 5.2.0-3
    git checkout 970196f # 5.2.0-3
    # 重新init submodule
    asdadada
    # install all build dependencies of the pve-qemu package
    echo Y | mk-build-deps --install debian/control
    # build
    make

    # 如果修改代码重新编译
    make clean
    make

修改qemu源码

知道了怎么编译，下一步就是改源码。qemu处理cpuid的逻辑在qemu/target/i386/cpu.c的cpu_x86_cpuid()里面。看下面代码，是不是很合逻辑。

switch(index) {
    case 0:
        *eax = env->cpuid_level;
        *ebx = env->cpuid_vendor1;
        *edx = env->cpuid_vendor2;
        *ecx = env->cpuid_vendor3;
        break;
    case 1:
        *eax = env->cpuid_version;
        *ebx = (cpu->apic_id << 24) |
               8 << 8; /* CLFLUSH size in quad words, Linux wants it. */
        *ecx = env->features[FEAT_1_ECX];
        if ((*ecx & CPUID_EXT_XSAVE) && (env->cr[4] & CR4_OSXSAVE_MASK)) {
            *ecx |= CPUID_EXT_OSXSAVE;
        }
        *edx = env->features[FEAT_1_EDX];
        if (cs->nr_cores * cs->nr_threads > 1) {
            *ebx |= (cs->nr_cores * cs->nr_threads) << 16;
            *edx |= CPUID_HT;
        }
        if (!cpu->enable_pmu) {
            *ecx &= ~CPUID_EXT_PDCM;
        }
        break;
    case 2:

如果是要修改cpuid的返回结果（例如修改返回的厂商字符串），在这个函数里改就ok了。但是如果要进行log的话，这里还不行。因为qemu考虑到cpuid的返回值是固定的，它有个缓存机制，不会每次都调这个函数。据分析，它在qemu/target/i386/kvm/kvm.c的kvm_arch_init_vcpu()函数里面会一次性把cpuid读完，存在自己的数据结构里。看看kvm_arch_init_vcpu的这个代码片段。（看不懂的话，看看紧随其后的cpuid指令解析）

 cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused);

    for (i = 0; i <= limit; i++) {
        if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
            fprintf(stderr, "unsupported level value: 0x%x\n", limit);
            abort();
        }
        c = &cpuid_data.entries[cpuid_i++];

        switch (i) {
        case 2: {
            /* Keep reading function 2 till all the input is received */
            int times;

            c->function = i;
            c->flags = KVM_CPUID_FLAG_STATEFUL_FUNC |
                       KVM_CPUID_FLAG_STATE_READ_NEXT;
            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
            times = c->eax & 0xff;

            for (j = 1; j < times; ++j) {
                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
                    fprintf(stderr, "cpuid_data is full, no space for "
                            "cpuid(eax:2):eax & 0xf = 0x%x\n", times);
                    abort();
                }
                c = &cpuid_data.entries[cpuid_i++];
                c->function = i;
                c->flags = KVM_CPUID_FLAG_STATEFUL_FUNC;
                cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
            }
            break;

首先用eax=0，读出最多支持的cpuid数目。cpuid指令的设计就是如此：

INPUT EAX = 0: Returns CPUID’s Highest Value for Basic Processor Information and the Vendor Identification String
When CPUID executes with EAX set to 0, the processor returns the highest value the CPUID recognizes for returning basic processor information. The value is returned in the EAX register (see second table) and is processor specific.

然后对于EAX=2，循环读完EAX=2里的数据，因为cpuid的设计如此

INPUT EAX = 2: Cache and TLB Information Returned in EAX, EBX, ECX, EDX
When CPUID executes with EAX set to 2, the processor returns information about the processor’s internal caches and TLBs in the EAX, EBX, ECX, and EDX registers.
The encoding is as follows: – The least-significant byte in register EAX (register AL) indicates the number of times the CPUID instruction must be executed with an input value of 2 to get a complete description of the processor’s caches and TLBs. The first member of the family of Pentium 4 processors will return a 1.

这就说明qemu缓存了cpuid的返回值，在cpu.c文件里做log，记录每次cpu的调用结果是行不通的。

还是看这个init_vcpu函数，看到它这一行

r = kvm_vcpu_ioctl(cs, KVM_SET_CPUID2, &cpuid_data);

cpuid_data就是缓存的cpuid数据，看样子这个数据被送到内核了，看名字是送给kvm模块了。这样的话，改qemu就没戏了。

My Blog