Odroid 3.8.y #30

mdrjr · 2013-03-09T14:16:30Z

Test

commit dacae5a upstream. snd_ali_pointer function is called with local interrupts disabled. However it seems very strange to reenable them in such way. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Denis Efremov <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit f49a59c upstream. According to the other code in this driver and similar code in rme96 it seems, that spin_lock_irq in snd_rme32_capture_close function should be paired with spin_unlock_irq. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Denis Efremov <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit edac894 upstream. snd-aloop driver has no proper PM implementation, thus the PM resume may trigger Oops due to leftover timer instance. This patch adds the missing suspend/resume implementation. Reported-and-tested-by: El boulangero <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 4d9b109 upstream. This change fixes a deadlock when the multiplexer is closed while there are still client side ports open. When the multiplexer is closed and there are active tty's it tries to close them with tty_vhangup. This has a problem though, because tty_vhangup needs the tty_lock. This patch changes it to unlock the tty_lock before attempting the hangup and relocks afterwards. The additional call to tty_port_tty_set is needed because otherwise the port stays active because of the reference counter. This change also exposed another problem that other code paths don't expect that the multiplexer could have been closed. This patch also adds checks for these cases in the gsmtty_ class of function that could be called. The documentation explicitly states that "first close all virtual ports before closing the physical port" but we've found this to not always reality in our field situations. The GPRS / UTMS modem sometimes crashes and needs a power cycle in that case which means cleanly shutting down everything is not always possible. This change makes it much more robust for our situation where at least the system is recoverable with this patch and doesn't hang in a deadlock situation inside the kernel. The patch is against the long term support kernel (3.4.27) and should apply cleanly to more recent branches. Tested with a Telit GE864-QUADV2 and Telit HE910 modem. Signed-off-by: Dirkjan Bussink <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 183d95c upstream. See https://bugzilla.redhat.com/show_bug.cgi?id=904907 read command causes bash to abort with double free or corruption (out). A simple test-case from Roman: // Compile the reproducer and send sigchld ti that process. // EINTR occurs even if SA_RESTART flag is set. void handler(int sig) { } main() { struct sigaction act; act.sa_handler = handler; act.sa_flags = SA_RESTART; sigaction (SIGCHLD, &act, 0); struct termio ttp; ioctl(0, TCGETA, &ttp); while(1) { if (ioctl(0, TCSETAW, ttp) < 0) { if (errno == EINTR) { fprintf(stderr, "BUG!"); return(1); } } } } Change set_termios/set_termiox to return -ERESTARTSYS to fix this particular problem. I didn't dare to change other EINTR's in drivers/tty/, but they look equally wrong. Reported-by: Roman Rakus <[email protected]> Reported-by: Lingzhu Xiang <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Cc: Jiri Slaby <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit b2ca699 upstream. Make sure serial-driver dtr_rts is called with disc_mutex held after checking the disconnected flag. Due to a bug in the tty layer, dtr_rts may get called after a device has been disconnected and the tty-device unregistered. Some drivers have had individual checks for disconnect to make sure the disconnected interface was not accessed, but this should really be handled in usb-serial core (at least until the long-standing tty-bug has been fixed). Note that the problem has been made more acute with commit 0998d06 ("device-core: Ensure drvdata = NULL when no driver is bound") as the port data is now also NULL when dtr_rts is called resulting in further oopses. Reported-by: Chris Ruehl <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 677fe55 upstream. commit 9ec1882 (tty: serial: imx: console write routing is unsafe on SMP) introduced a recursive locking bug in imx_console_write(). The callchain is: imx_rxint() spin_lock_irqsave(&sport->port.lock,flags); ... uart_handle_sysrq_char(); sysrq_function(); printk(); imx_console_write(); spin_lock_irqsave(&sport->port.lock,flags); <--- DEAD The bad news is that the kernel debugging facilities can dectect the problem, but the printks never surface on the serial console for obvious reasons. There is a similar issue with oops_in_progress. If the kernel crashes we really don't want to be stuck on the lock and unable to tell what happened. In general most UP originated drivers miss these checks and nobody ever notices because CONFIG_PROVE_LOCKING seems to be still ignored by a large number of developers. The solution is to avoid locking in the sysrq case and trylock in the oops_in_progress case. This scheme is used in other drivers as well and it would be nice if we could move this to a common place, so the usual copy/paste/modify bugs can be avoided. Now there is another issue with this scheme: CPU0 CPU1 printk() rxint() sysrq_detection() -> sets port->sysrq return from interrupt console_write() if (port->sysrq) avoid locking port->sysrq is reset with the next receive character. So as long as the port->sysrq is not reset and this can take an endless amount of time if after the break no futher receive character follows, all console writes happen unlocked. While the current writer is protected against other console writers by the console sem, it's unprotected against open/close or other operations which fiddle with the port. That's what the above mentioned commit tried to solve. That's an issue in all drivers which use that scheme and unfortunately there is no easy workaround. The only solution is to have a separate indicator port->sysrq_cpu. uart_handle_sysrq_char() then sets it to smp_processor_id() before calling into handle_sysrq() and resets it to -1 after that. Then change the locking check to: if (port->sysrq_cpu == smp_processor_id()) locked = 0; else if (oops_in_progress) locked = spin_trylock_irqsave(port->lock, flags); else spin_lock_irqsave(port->lock, flags); That would force all other cpus into the spin_lock path. Problem solved, but that's way beyond the scope of this fix and really wants to be implemented in a common function which calls the uart specific write function to avoid another gazillion of hard to debug copy/paste/modify bugs. Reported-and-tested-by: Tim Sander <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 85f0244 upstream. It was mistakenly defined to be 24 instead of the next higher number 25. Reported-by: Alexander Shishkin <[email protected]> Cc: Stephen Hurd <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit ccae0e5 upstream. Bastian Bittorf reported that some of the silent freezes on a Linksys WRT54G were due to overflow of the RX DMA ring buffer, which was created with 64 slots. That finding reminded me that I was seeing similar crashed on a netbook, which also has a relatively slow processor. After increasing the number of slots to 128, runs on the netbook that previously failed now worked; however, I found that 109 slots had been used in one test. For that reason, the number of slots is being increased to 256. Signed-off-by: Larry Finger <[email protected]> Cc: Bastian Bittorf <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 957f4ac upstream. When the new_id entry in /sysfs is used for a foreign USB device, rtlwifi BUGS with a NULL pointer dereference because the per-driver configuration data is not available. The probe function has been restructured as suggested by Ben Hutchings <[email protected]>. Signed-off-by: Larry Finger <[email protected]> Signed-off-by: John W. Linville <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 8708aac upstream. A new model of the RTL8188CUS has appeared. Reported-and-tested-by: Thomas Rosenkrantz <[email protected]> Signed-off-by: Larry Finger <[email protected]> Signed-off-by: John W. Linville <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…er separately commit bc6b892 upstream. rtlwifi allocates both setup_packet and data buffer of control message urb, using shared kmalloc in _usbctrl_vendorreq_async_write. Structure used for allocating is: struct { u8 data[254]; struct usb_ctrlrequest dr; }; Because 'struct usb_ctrlrequest' is __packed, setup packet is unaligned and DMA mapping of both 'data' and 'dr' confuses ARM/sunxi, leading to memory corruptions and freezes. Patch changes setup packet to be allocated separately. [v2]: - Use WARN_ON_ONCE instead of WARN_ON Signed-off-by: Jussi Kivilinna <[email protected]> Signed-off-by: John W. Linville <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit a883b70 upstream. Commit 81732c3 ("tty vt: Fix line garbage in virtual console on command line edition") broke insert_char() in multiple ways. Then commit b1a925f ("tty vt: Fix a regression in command line edition") partially fixed it. However, the buffer being moved is still too large and overflowing beyond the end of the current line, corrupting existing characters on the next line. Example test case: echo -e "abc\nde\x1b[A\x1b[4h \x1b[4l\x1b[B" Expected result: ab c de Current result: ab c e Needless to say that this is very annoying when inserting words in the middle of paragraphs with certain text editors. Signed-off-by: Nicolas Pitre <[email protected]> Acked-by: Jean-François Moine <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 76eaca0 upstream. There is a loophole between Xen's current implementation of pv-spinlocks and the scheduler. This was triggerable through a testcase until v3.6 changed the TLB flushing code. The problem potentially is still there just not observable in the same way. What could happen was (is): 1. CPU n tries to schedule task x away and goes into a slow wait for the runq lock of CPU n-# (must be one with a lower number). 2. CPU n-#, while processing softirqs, tries to balance domains and goes into a slow wait for its own runq lock (for updating some records). Since this is a spin_lock_irqsave in softirq context, interrupts will be re-enabled for the duration of the poll_irq hypercall used by Xen. 3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives an interrupt (e.g. endio) and when processing the interrupt, tries to wake up task x. But that is in schedule and still on_cpu, so try_to_wake_up goes into a tight loop. 4. The runq lock of CPU n-# gets unlocked, but the message only gets sent to the first waiter, which is CPU n-# and that is busily stuck. 5. CPU n-# never returns from the nested interruption to take and release the lock because the scheduler uses a busy wait. And CPU n never finishes the task migration because the unlock notification only went to CPU n-#. To avoid this and since the unlocking code has no real sense of which waiter is best suited to grab the lock, just send the IPI to all of them. This causes the waiters to return from the hyper- call (those not interrupted at least) and do active spinlocking. BugLink: http://bugs.launchpad.net/bugs/1011792 Acked-by: Jan Beulich <[email protected]> Signed-off-by: Stefan Bader <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit e7e44e4 upstream. Signed-off-by: Wei Liu <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 513b032 upstream. The PPS serial line discipline wants to attach a PPS device to a tty without changing the tty code to add a struct pps_device * pointer. Since the number of PPS devices in a typical system is generally very low (n=1 is by far the most common), it's practical to search the entire list of allocated pps devices. (We capture the timestamp before the lookup, so the timing isn't affected.) It is a bit ugly that this function, which is part of the in-kernel PPS API, has to be in pps.c as opposed to kapi,c, but that's not something that affects users. Signed-off-by: George Spelvin <[email protected]> Acked-by: Rodolfo Giometti <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 03a7ffe upstream. Now that N_TTY uses tty->disc_data for its private data, 'subclass' ldiscs cannot use ->disc_data for their own private data. (This is a regression is v3.8-rc1) Use pps_lookup_dev to associate the tty with the pps source instead. This fixes a crashing regression in 3.8-rc1. Signed-off-by: George Spelvin <[email protected]> Acked-by: Rodolfo Giometti <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit d953e0e upstream. Remove the cdev from the system (with cdev_del) *before* deallocating it (in pps_device_destruct, called via kobject_put from device_destroy). Also prevent deallocating a device with open file handles. A better long-term fix is probably to remove the cdev from the pps_device entirely, and instead have all devices reference one global cdev. Then the deallocation ordering becomes simpler. But that's more complex and invasive change, so we leave that for later. Signed-off-by: George Spelvin <[email protected]> Acked-by: Rodolfo Giometti <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 7e5a510 upstream. Now zram allocates new page with GFP_KERNEL in zram I/O path if IO is partial. Unfortunately, It may cause deadlock with reclaim path like below. write_page from fs fs_lock allocation(GFP_KERNEL) reclaim pageout write_page from fs fs_lock <-- deadlock This patch fixes it by using GFP_NOIO. In read path, we reorganize code flow so that kmap_atomic is called after the GFP_NOIO allocation. Acked-by: Jerome Marchand <[email protected]> Acked-by: Nitin Gupta <[email protected]> [ [email protected]: don't use GFP_ATOMIC ] Signed-off-by: Pekka Enberg <[email protected]> Signed-off-by: Minchan Kim <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 4fa3e78 upstream. A bus_type has a list of devices (klist_devices), but the list and the subsys_private structure that contains it are not initialized until the bus_type is registered with bus_register(). The panic/reboot path has fixups that look up devices in pci_bus_type. If we panic before registering pci_bus_type, the bus_type exists but the list does not, so mach_reboot_fixups() trips over a null pointer and panics again: mach_reboot_fixups pci_get_device .. bus_find_device(&pci_bus_type, ...) bus->p is NULL Joonsoo reported a problem when panicking before PCI was initialized. I think this patch should be sufficient to replace the patch he posted here: https://lkml.org/lkml/2012/12/28/75 ("[PATCH] x86, reboot: skip reboot_fixups in early boot phase") Reported-by: Joonsoo Kim <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…allouts commit 751efd8 upstream. There is a race condition between mmu_notifier_unregister() and __mmu_notifier_release(). Assume two tasks, one calling mmu_notifier_unregister() as a result of a filp_close() ->flush() callout (task A), and the other calling mmu_notifier_release() from an mmput() (task B). A B t1 srcu_read_lock() t2 if (!hlist_unhashed()) t3 srcu_read_unlock() t4 srcu_read_lock() t5 hlist_del_init_rcu() t6 synchronize_srcu() t7 srcu_read_unlock() t8 hlist_del_rcu() <--- NULL pointer deref. Additionally, the list traversal in __mmu_notifier_release() is not protected by the by the mmu_notifier_mm->hlist_lock which can result in callouts to the ->release() notifier from both mmu_notifier_unregister() and __mmu_notifier_release(). -stable suggestions: The stable trees prior to 3.7.y need commits 21a9273 and 7040030 cherry-picked in that order prior to cherry-picking this commit. The 3.7.y tree already has those two commits. Signed-off-by: Robin Holt <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: Xiao Guangrong <[email protected]> Cc: Avi Kivity <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Sagi Grimberg <[email protected]> Cc: Haggai Eran <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 55c171a upstream. Running under a kvm host does not necessarily imply the presence of a page mapped above the main memory with the virtio information; however, the code includes a hard coded access to that page. Instead, check for the presence of the page and exit gracefully before we hit an addressing exception if it does not exist. Reviewed-by: Marcelo Tosatti <[email protected]> Reviewed-by: Alexander Graf <[email protected]> Signed-off-by: Cornelia Huck <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 15bc8d8 upstream. On store status we need to copy the current state of registers into a save area. Currently we might save stale versions: The sie state descriptor doesnt have fields for guest ACRS,FPRS, those registers are simply stored in the host registers. The host program must copy these away if needed. We do that in vcpu_put/load. If we now do a store status in KVM code between vcpu_put/load, the saved values are not up-to-date. Lets collect the ACRS/FPRS before saving them. This also fixes some strange problems with hotplug and virtio-ccw, since the low level machine check handler (on hotplug a machine check will happen) will revalidate all registers with the content of the save area. Signed-off-by: Christian Borntraeger <[email protected]> Signed-off-by: Gleb Natapov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit fe2b05f upstream. This reverts commit ec0c427. get_robust_list() is in use and a removal would break existing user space. With the permission checks in place it's not longer a security hole. Remove the deprecation warnings. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 676a067 upstream. Running the command: inotifywait -e unmount /mnt/disk immediately aborts with a -EINVAL return code. This is however a valid parameter. This abort occurs only if unmount is the sole event parameter. If other event parameters are supplied, then the unmount event wait will work. The problem was introduced by commit 44b350f ("inotify: Fix mask checks"). In that commit, it states: The mask checks in inotify_update_existing_watch() and inotify_new_watch() are useless because inotify_arg_to_mask() sets FS_IN_IGNORED and FS_EVENT_ON_CHILD bits anyway. But instead of removing the useless checks, it did this: mask = inotify_arg_to_mask(arg); - if (unlikely(!mask)) + if (unlikely(!(mask & IN_ALL_EVENTS))) return -EINVAL; The problem is that IN_ALL_EVENTS doesn't include IN_UNMOUNT, and other parts of the code keep IN_UNMOUNT separate from IN_ALL_EVENTS. So the check should be: if (unlikely(!(mask & (IN_ALL_EVENTS | IN_UNMOUNT)))) But inotify_arg_to_mask(arg) always sets the IN_UNMOUNT bit in the mask anyway, so the check is always going to pass and thus should simply be removed. Also note that inotify_arg_to_mask completely controls what mask bits get set from arg, there's no way for invalid bits to get enabled there. Lets fix it by simply removing the useless broken checks. Signed-off-by: Jim Somerville <[email protected]> Signed-off-by: Paul Gortmaker <[email protected]> Cc: Jerome Marchand <[email protected]> Cc: John McCutchan <[email protected]> Cc: Robert Love <[email protected]> Cc: Eric Paris <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…disk() commit 7630b66 upstream. We found that bdev->bd_invalidated was left set once revalidate_disk() is called, which results in page cache flush every time that device is open. Specifically, we found this problem in MD block device. Once we resize a MD device, mdadm --monitor periodically flush all page cache for that device every 60 or 1000 seconds when it opens the device. This bug lies since at least 3.2.0 till the latest kernel(3.6.2). Patch is attached. The following steps will reproduce the problem. 1. prepair a block device (eg /dev/sdb). 2. create two partitions: sudo parted /dev/sdb mklabel gpt mkpart primary 0% 50% mkpart primary 50% 100% 3. create a md device. sudo mdadm -C /dev/md/hoge -l 1 -n 2 -e 1.2 --assume-clean --auto=md --symlink=no /dev/sdb1 /dev/sdb2 4. create file system and mount it sudo mkfs.ext3 /dev/md/hoge sudo mkdir /mnt/test sudo mount /dev/md/hoge /mnt/test 5. try to resize the device sudo mdadm -G /dev/md/hoge --size=max 6. create a file to fill file cache. sudo dd if=/dev/urandom of=/mnt/test/data bs=1M count=10 and verify the current status of file by free command. 7. mdadm monitor will open the md device every 1000 seconds and you will find all file cache on the device are cleared. The timing can be reduced by the following steps. a) kill mdadm and restart it with --delay option /sbin/mdadm --monitor --delay=30 --pid-file /var/run/mdadm/monitor.pid --daemonise --scan --syslog or open the md device directly. sudo dd if=/dev/md/hoge of=/dev/null bs=4096 count=1 Signed-off-by: MITSUNARI Shigeo <[email protected]> Cc: Al Viro <[email protected]> Cc: Jeff Moyer <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 3278bb7 upstream. If lockres refresh failed, the super lock will never be released which will cause some processes on other cluster nodes hung forever. Signed-off-by: Junxiao Bi <[email protected]> Cc: Joel Becker <[email protected]> Cc: Mark Fasheh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 5eb02c0 upstream. Clearing the NSTBY bit in the control register also automatically clears the BLEN bit. So we need to make sure to set it again during resume, otherwise the backlight will stay off. Signed-off-by: Lars-Peter Clausen <[email protected]> Acked-by: Michael Hennerich <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit fe9453a upstream. A patch to fix some unreachable code in search_my_process_keyrings() got applied twice by two different routes upstream as commits e67eab3 and b010520 (both "fix unreachable code"). Unfortunately, the second application removed something it shouldn't have and this wasn't detected by GIT. This is due to the patch not having sufficient lines of context to distinguish the two places of application. The effect of this is relatively minor: inside the kernel, the keyring search routines may search multiple keyrings and then prioritise the errors if no keys or negative keys are found in any of them. With the extra deletion, the presence of a negative key in the thread keyring (causing ENOKEY) is incorrectly overridden by an error searching the process keyring. So revert the second application of the patch. Signed-off-by: David Howells <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 5f00110 upstream. The tmpfs remount logic preserves filesystem mempolicy if the mpol=M option is not specified in the remount request. A new policy can be specified if mpol=M is given. Before this patch remounting an mpol bound tmpfs without specifying mpol= mount option in the remount request would set the filesystem's mempolicy object to a freed mempolicy object. To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run: # mkdir /tmp/x # mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x # grep /tmp/x /proc/mounts nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0 # mount -o remount,size=200M nodev /tmp/x # grep /tmp/x /proc/mounts nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0 # note ? garbage in mpol=... output above # dd if=/dev/zero of=/tmp/x/f count=1 # panic here Panic: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) [...] Oops: 0010 [#1] SMP DEBUG_PAGEALLOC Call Trace: mpol_shared_policy_init+0xa5/0x160 shmem_get_inode+0x209/0x270 shmem_mknod+0x3e/0xf0 shmem_create+0x18/0x20 vfs_create+0xb5/0x130 do_last+0x9a1/0xea0 path_openat+0xb3/0x4d0 do_filp_open+0x42/0xa0 do_sys_open+0xfe/0x1e0 compat_sys_open+0x1b/0x20 cstar_dispatch+0x7/0x1f Non-debug kernels will not crash immediately because referencing the dangling mpol will not cause a fault. Instead the filesystem will reference a freed mempolicy object, which will cause unpredictable behavior. The problem boils down to a dropped mpol reference below if shmem_parse_options() does not allocate a new mpol: config = *sbinfo shmem_parse_options(data, &config, true) mpol_put(sbinfo->mpol) sbinfo->mpol = config.mpol /* BUG: saves unreferenced mpol */ This patch avoids the crash by not releasing the mempolicy if shmem_parse_options() doesn't create a new mpol. How far back does this issue go? I see it in both 2.6.36 and 3.3. I did not look back further. Signed-off-by: Greg Thelen <[email protected]> Acked-by: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>