Saturday, 1 February 2014

Linux Kernel Modules: When to use try_module_get / module_put

I was reading the LKMPG ( See Section 4.1.4. Unregistering A Device ) and it wasn't clear to me when to use the try_module_get / module_put functions. Some of the LKMPG examples use them, some don't.
To add to the confusion, try_module_get appears 282 times in 193 files in the 2.6.24 source, yet inLinux Device Drivers ( LDD3 ) and Essential Linux Device Drivers, they appears in not even a single code example.
I thought maybe they were tied to the old register_chrdev interface ( superseded in 2.6 by the cdev interface ), but they only appear together in the same files 8 times:
find -type f -name *.c | xargs grep -l try_module_get | sort -u | xargs grep -l register_chrdev | sort -u | grep -c .
So when is it appropriate to use these functions and are they tied to the use of a particular interface or set of circumstances?
Edit
I loaded the sched.c example from the LKMPG and tried the following experiment:
anon@anon:~/kernel-source/lkmpg/2.6.24$ tail /proc/sched -f &
Timer called 5041 times so far
[1] 14594

anon@anon:~$ lsmod | grep sched
sched 2868 1

anon@anon:~$ sudo rmmod sched
ERROR: Module sched is in use
This leads me to believe that the kernel now does it's own accounting and the gets / puts may be obsolete. Can anyone verify this?

Answers:-

You should essentially never have to use try_module_get(THIS_MODULE); pretty much all such uses are unsafe since if you are already in your module, it's too late to bump the reference count -- there will always be a (small) window where you are executing code in your module but haven't incremented the reference count. If someone removes the module exactly in that window, then you're in the bad situation of running code in an unloaded module.
The particular example you linked in LKMPG where the code does try_module_get() in the open() method would be handled in the modern kernel by setting the .owner field in struct file_operations:
struct file_operations fops = {
.owner = THIS_MODULE,
.open = device_open,
//...
};
this will make the VFS code take a reference to the module before calling into it, which eliminates the unsafe window -- either the try_module_get() will succeed before the call to .open(), or the try_module_get() will fail and the VFS will never call into the module. In either case, we never run code from a module that has already been unloaded.
The only good time to use try_module_get() is when you want to take a reference on a different module before calling into it or using it in some way (eg as the file open code does in the example I explained above). There are a number of uses of try_module_get(THIS_MODULE) in the kernel source but most if not all of them are latent bugs that should be cleaned up.
The reason you were not able to unload the sched example is that your
$ tail /proc/sched -f &
command keeps /proc/sched open, and because of
        Our_Proc_File->owner = THIS_MODULE;
in the sched.c code, opening /proc/sched increments the reference count of the sched module, which accounts for the 1 reference that your lsmod shows. From a quick skim of the rest of the code, I think if you release /proc/sched by killing your tail command, you would be able to remove the sched module.

0 comments:

Post a Comment

Twitter Delicious Facebook Digg Stumbleupon Favorites More