The Linux Kernel APIThe Linux Kernel API This documentation is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA For more details see the file COPYING in the source distribution of Linux. The Linux Kernel APIThe Linux Kernel APITable of Contents1. Driver Basics Driver Entry and Exit points Atomic and pointer manipulation Delaying, scheduling, and timer routines High-resolution timers Internal Functions Kernel objects manipulation Kernel utility functions 2. Data Types Doubly Linked Lists 3. Basic C Library Functions String Conversions String Manipulation Bit Operations 4. Memory Management in Linux The Slab Cache User Space Memory Access More Memory Management Functions 5. Kernel IPC facilities IPC utilities 6. FIFO Buffer kfifo interface 7. The proc filesystem sysctl interface proc filesystem interface 8. The debugfs filesystem debugfs interface 9. The Linux VFS The Filesystem types The Directory Cache Inode Handling Registration and Superblocks File Locks Other Functions 10. Linux Networking Networking Base Types Socket Buffer Functions Socket Filter Generic Network Statistics SUN RPC subsystem 11. Network device support Driver Support Synchronous PPP 12. Module Support Module Loading Inter Module support 13. Hardware Interfaces Interrupt Handling Resources Management MTRR Handling PCI Support Library PCI Hotplug Support Library MCA Architecture MCA Device Functions MCA Bus DMA 14. The Device File System devfs_mk_dir 15. The Filesystem for Exporting Kernel Objects sysfs_create_file sysfs_update_file sysfs_chmod_file sysfs_remove_file sysfs_create_dir sysfs_remove_dir sysfs_create_link sysfs_remove_link sysfs_create_bin_file sysfs_remove_bin_file 16. Security Framework register_security unregister_security mod_reg_security mod_unreg_security capable 17. Power Management pm_register pm_unregister pm_unregister_all pm_send_all 18. Device drivers infrastructure Device Drivers Base Device Drivers Power Management Device Drivers ACPI Support Device drivers PnP support 19. Block Devices blk_get_backing_dev_info blk_queue_prep_rq blk_queue_merge_bvec blk_queue_make_request blk_queue_ordered blk_queue_issue_flush_fn blk_queue_bounce_limit blk_queue_max_sectors blk_queue_max_phys_segments blk_queue_max_hw_segments blk_queue_max_segment_size blk_queue_hardsect_size blk_queue_stack_limits blk_queue_segment_boundary blk_queue_dma_alignment blk_queue_find_tag blk_queue_free_tags blk_queue_init_tags blk_queue_resize_tags blk_queue_end_tag blk_queue_start_tag blk_queue_invalidate_tags generic_unplug_device blk_start_queue blk_stop_queue blk_sync_queue blk_run_queue blk_cleanup_queue blk_init_queue blk_requeue_request blk_insert_request blk_rq_map_user blk_rq_map_user_iov blk_rq_unmap_user blk_rq_map_kern blk_execute_rq_nowait blk_execute_rq blkdev_issue_flush blk_end_sync_rq blk_congestion_wait generic_make_request submit_bio end_that_request_first end_that_request_chunk blk_complete_request 20. Miscellaneous Devices misc_register misc_deregister 21. Video4Linux video_register_device video_unregister_device 22. Sound Devices snd_printk snd_printd snd_assert snd_printdd register_sound_special_device register_sound_mixer register_sound_midi register_sound_dsp register_sound_synth unregister_sound_special unregister_sound_mixer unregister_sound_midi unregister_sound_dsp unregister_sound_synth snd_pcm_playback_ready snd_pcm_capture_ready snd_pcm_playback_data snd_pcm_playback_empty snd_pcm_capture_empty snd_pcm_format_cpu_endian snd_pcm_new_stream snd_pcm_new snd_device_new snd_device_free snd_device_register snd_iprintf snd_info_get_line snd_info_get_str snd_info_create_module_entry snd_info_create_card_entry snd_card_proc_new snd_info_free_entry snd_info_register snd_info_unregister snd_rawmidi_receive snd_rawmidi_transmit_empty snd_rawmidi_transmit_peek snd_rawmidi_transmit_ack snd_rawmidi_transmit snd_rawmidi_new snd_rawmidi_set_ops snd_request_card snd_lookup_minor_data snd_register_device snd_unregister_device copy_to_user_fromio copy_from_user_toio snd_pcm_lib_preallocate_free_for_all snd_pcm_lib_preallocate_pages snd_pcm_lib_preallocate_pages_for_all snd_pcm_sgbuf_ops_page snd_pcm_lib_malloc_pages snd_pcm_lib_free_pages snd_card_new snd_card_disconnect snd_card_free snd_card_free_in_thread snd_card_register snd_component_add snd_card_file_add snd_card_file_remove snd_power_wait snd_dma_program snd_dma_disable snd_dma_pointer snd_ctl_new snd_ctl_new1 snd_ctl_free_one snd_ctl_add snd_ctl_remove snd_ctl_remove_id snd_ctl_rename_id snd_ctl_find_numid snd_ctl_find_id snd_pcm_set_ops snd_pcm_set_sync snd_interval_refine snd_interval_ratnum snd_interval_list snd_pcm_hw_rule_add snd_pcm_hw_constraint_integer snd_pcm_hw_constraint_minmax snd_pcm_hw_constraint_list snd_pcm_hw_constraint_ratnums snd_pcm_hw_constraint_ratdens snd_pcm_hw_constraint_msbits snd_pcm_hw_constraint_step snd_pcm_hw_constraint_pow2 snd_pcm_hw_param_value_min snd_pcm_hw_param_value_max snd_pcm_hw_param_first snd_pcm_hw_param_last snd_pcm_hw_param_set snd_pcm_hw_param_mask snd_pcm_hw_param_near snd_pcm_lib_ioctl snd_pcm_period_elapsed snd_hwdep_new snd_pcm_stop snd_pcm_suspend snd_pcm_suspend_all snd_malloc_pages snd_free_pages snd_dma_alloc_pages snd_dma_alloc_pages_fallback snd_dma_free_pages snd_dma_get_reserved_buf snd_dma_reserve_buf 23. 16x50 UART Driver uart_handle_dcd_change uart_handle_cts_change uart_update_timeout uart_get_baud_rate uart_get_divisor uart_register_driver uart_unregister_driver uart_add_one_port uart_remove_one_port serial8250_suspend_port serial8250_resume_port serial8250_register_port serial8250_unregister_port 24. Z85230 Support Library z8530_interrupt z8530_sync_open z8530_sync_close z8530_sync_dma_open z8530_sync_dma_close z8530_sync_txdma_open z8530_sync_txdma_close z8530_describe z8530_init z8530_shutdown z8530_channel_load z8530_null_rx z8530_queue_xmit z8530_get_stats 25. Frame Buffer Library Frame Buffer Memory Frame Buffer Colormap Frame Buffer Video Mode Database Frame Buffer Macintosh Video Mode Database Frame Buffer Fonts Driver BasicsDriver BasicsChapter 1. Driver BasicsDriver Entry and Exit pointsDriver Entry and Exit pointsNamemodule_init -- driver initialization entry point SynopsisSynopsis module_init (x); x;ArgumentsArgumentsx function to be run at kernel boot time or module insertion DescriptionDescription module_init will either be called during do_initcalls (if builtin) or at module insertion time (if a module). There can only be one per module. Namemodule_exit -- driver exit entry point SynopsisSynopsis module_exit (x); x;ArgumentsArgumentsx function to be run when driver is removed DescriptionDescription module_exit will wrap the driver clean-up code with cleanup_module when used with rmmod when the driver is a module. If the driver is statically compiled into the kernel, module_exit has no effect. There can only be one per module. Atomic and pointer manipulationAtomic and pointer manipulationNameatomic_read -- read atomic variable SynopsisSynopsis atomic_read (v); v;ArgumentsArgumentsv pointer of type atomic_t DescriptionDescription Atomically reads the value of v. Nameatomic_set -- set atomic variable SynopsisSynopsis atomic_set (v, i); v; i;ArgumentsArgumentsv pointer of type atomic_t i required value DescriptionDescription Atomically sets the value of v to i. Nameatomic_add -- add integer to atomic variable SynopsisSynopsisvoid atomic_add (i, v);int i;atomic_t * v;ArgumentsArgumentsi integer value to add v pointer of type atomic_t DescriptionDescription Atomically adds i to v. Nameatomic_sub -- subtract the atomic variable SynopsisSynopsisvoid atomic_sub (i, v);int i;atomic_t * v;ArgumentsArgumentsi integer value to subtract v pointer of type atomic_t DescriptionDescription Atomically subtracts i from v. Nameatomic_sub_and_test -- subtract value from variable and test result SynopsisSynopsisint atomic_sub_and_test (i, v);int i;atomic_t * v;ArgumentsArgumentsi integer value to subtract v pointer of type atomic_t DescriptionDescription Atomically subtracts i from v and returns true if the result is zero, or false for all other cases. Nameatomic_inc -- increment atomic variable SynopsisSynopsisvoid atomic_inc (v);atomic_t * v;ArgumentsArgumentsv pointer of type atomic_t DescriptionDescription Atomically increments v by 1. Nameatomic_dec -- decrement atomic variable SynopsisSynopsisvoid atomic_dec (v);atomic_t * v;ArgumentsArgumentsv pointer of type atomic_t DescriptionDescription Atomically decrements v by 1. Nameatomic_dec_and_test -- decrement and test SynopsisSynopsisint atomic_dec_and_test (v);atomic_t * v;ArgumentsArgumentsv pointer of type atomic_t DescriptionDescription Atomically decrements v by 1 and returns true if the result is 0, or false for all other cases. Nameatomic_inc_and_test -- increment and test SynopsisSynopsisint atomic_inc_and_test (v);atomic_t * v;ArgumentsArgumentsv pointer of type atomic_t DescriptionDescription Atomically increments v by 1 and returns true if the result is zero, or false for all other cases. Nameatomic_add_negative -- add and test if negative SynopsisSynopsisint atomic_add_negative (i, v);int i;atomic_t * v;ArgumentsArgumentsi integer value to add v pointer of type atomic_t DescriptionDescription Atomically adds i to v and returns true if the result is negative, or false when result is greater than or equal to zero. Nameatomic_add_return -- add and return SynopsisSynopsisint atomic_add_return (i, v);int i;atomic_t * v;ArgumentsArgumentsi integer value to add v pointer of type atomic_t DescriptionDescription Atomically adds i to v and returns i + v Nameatomic_add_unless -- add unless the number is a given value SynopsisSynopsis atomic_add_unless (v, a, u); v; a; u;ArgumentsArgumentsv pointer of type atomic_t a the amount to add to v... u ...unless v is equal to u. DescriptionDescription Atomically adds a to v, so long as it was not u. Returns non-zero if v was not u, and zero otherwise. Nameget_unaligned -- get value from possibly mis-aligned location SynopsisSynopsis get_unaligned (ptr); ptr;ArgumentsArgumentsptr pointer to value DescriptionDescription This macro should be used for accessing values larger in size than single bytes at locations that are expected to be improperly aligned, e.g. retrieving a u16 value from a location not u16-aligned. Note that unaligned accesses can be very expensive on some architectures. Nameput_unaligned -- put value to a possibly mis-aligned location SynopsisSynopsis put_unaligned (val, ptr); val; ptr;ArgumentsArgumentsval value to place ptr pointer to location DescriptionDescription This macro should be used for placing values larger in size than single bytes at locations that are expected to be improperly aligned, e.g. writing a u16 value to a location not u16-aligned. Note that unaligned accesses can be very expensive on some architectures. Delaying, scheduling, and timer routinesDelaying, scheduling, and timer routinesNamepid_alive -- check that a task structure is not stale SynopsisSynopsisint pid_alive (p);struct task_struct * p;ArgumentsArgumentsp Task structure to be checked. DescriptionDescription Test if a process is not yet dead (at most zombie state) If pid_alive fails, then pointers within the task structure can be stale and must not be dereferenced. Name__wake_up -- wake up threads blocked on a waitqueue. SynopsisSynopsisvoid fastcall __wake_up (q, mode, nr_exclusive, key);wait_queue_head_t * q;unsigned int mode;int nr_exclusive;void * key;ArgumentsArgumentsq the waitqueue mode which threads nr_exclusive how many wake-one or wake-many threads to wake up key is directly passed to the wakeup function Name__wake_up_sync -- wake up threads blocked on a waitqueue. SynopsisSynopsisvoid fastcall __wake_up_sync (q, mode, nr_exclusive);wait_queue_head_t * q;unsigned int mode;int nr_exclusive;ArgumentsArgumentsq the waitqueue mode which threads nr_exclusive how many wake-one or wake-many threads to wake up DescriptionDescription The sync wakeup differs that the waker knows that it will schedule away soon, so while the target thread will be woken up, it will not be migrated to another CPU - ie. the two threads are 'synchronized' with each other. This can prevent needless bouncing between CPUs. On UP it can prevent extra preemption. Nametask_nice -- return the nice value of a given task. SynopsisSynopsisint task_nice (p);const task_t * p;ArgumentsArgumentsp the task in question. Namesched_setscheduler -- change the scheduling policy and/or RT priority of SynopsisSynopsisint sched_setscheduler (p, policy, param);struct task_struct * p;int policy;struct sched_param * param;ArgumentsArgumentsp the task in question. policy new policy. param structure containing the new RT priority. DescriptionDescription a thread. Nameyield -- yield the current processor to other threads. SynopsisSynopsisvoid __sched yield (void); void;ArgumentsArgumentsvoid no arguments DescriptionDescription this is a shortcut for kernel-space yielding - it marks the thread runnable and calls sys_sched_yield. Nameschedule_timeout -- sleep until timeout SynopsisSynopsissigned long __sched schedule_timeout (timeout);signed long timeout;ArgumentsArgumentstimeout timeout value in jiffies DescriptionDescription Make the current task sleep until timeout jiffies have elapsed. The routine will return immediately unless the current task state has been set (see set_current_state). You can set the task state as follows - TASK_UNINTERRUPTIBLE - at least timeout jiffies are guaranteed to pass before the routine returns. The routine will return 0 TASK_INTERRUPTIBLE - the routine may return early if a signal is delivered to the current task. In this case the remaining time in jiffies will be returned, or 0 if the timer expired in time The current task state is guaranteed to be TASK_RUNNING when this routine returns. Specifying a timeout value of MAX_SCHEDULE_TIMEOUT will schedule the CPU away without a bound on the timeout. In this case the return value will be MAX_SCHEDULE_TIMEOUT. In all cases the return value is guaranteed to be non-negative. Namemsleep -- sleep safely even with waitqueue interruptions SynopsisSynopsisvoid msleep (msecs);unsigned int msecs;ArgumentsArgumentsmsecs Time in milliseconds to sleep for Namemsleep_interruptible -- sleep waiting for signals SynopsisSynopsisunsigned long msleep_interruptible (msecs);unsigned int msecs;ArgumentsArgumentsmsecs Time in milliseconds to sleep for High-resolution timersHigh-resolution timersNamektime_set -- Set a ktime_t variable from a seconds/nanoseconds value SynopsisSynopsisktime_t ktime_set (secs, nsecs);const long secs;const unsigned long nsecs;ArgumentsArgumentssecs seconds to set nsecs nanoseconds to set DescriptionDescription Return the ktime_t representation of the value DescriptionDescription Return the ktime_t representation of the value Namektime_sub -- subtract two ktime_t variables SynopsisSynopsisktime_t ktime_sub (lhs, rhs);const ktime_t lhs;const ktime_t rhs;ArgumentsArgumentslhs minuend rhs subtrahend DescriptionDescription Returns the remainder of the substraction DescriptionDescription Returns the remainder of the substraction Namektime_add -- add two ktime_t variables SynopsisSynopsisktime_t ktime_add (add1, add2);const ktime_t add1;const ktime_t add2;ArgumentsArgumentsadd1 addend1 add2 addend2 DescriptionDescription Returns the sum of addend1 and addend2 DescriptionDescription Returns the sum of addend1 and addend2 Namektime_add_ns -- Add a scalar nanoseconds value to a ktime_t variable SynopsisSynopsisktime_t ktime_add_ns (kt, nsec);const ktime_t kt;u64 nsec;ArgumentsArgumentskt addend nsec the scalar nsec value to add DescriptionDescription Returns the sum of kt and nsec in ktime_t format DescriptionDescription Returns the sum of kt and nsec in ktime_t format Nametimespec_to_ktime -- convert a timespec to ktime_t format SynopsisSynopsisktime_t timespec_to_ktime (ts);const struct timespec ts;ArgumentsArgumentsts the timespec variable to convert DescriptionDescription Returns a ktime_t variable with the converted timespec value DescriptionDescription Returns a ktime_t variable with the converted timespec value Nametimeval_to_ktime -- convert a timeval to ktime_t format SynopsisSynopsisktime_t timeval_to_ktime (tv);const struct timeval tv;ArgumentsArgumentstv the timeval variable to convert DescriptionDescription Returns a ktime_t variable with the converted timeval value DescriptionDescription Returns a ktime_t variable with the converted timeval value Namektime_to_timespec -- convert a ktime_t variable to timespec format SynopsisSynopsisstruct timespec ktime_to_timespec (kt);const ktime_t kt;ArgumentsArgumentskt the ktime_t variable to convert DescriptionDescription Returns the timespec representation of the ktime value DescriptionDescription Returns the timespec representation of the ktime value Namektime_to_timeval -- convert a ktime_t variable to timeval format SynopsisSynopsisstruct timeval ktime_to_timeval (kt);const ktime_t kt;ArgumentsArgumentskt the ktime_t variable to convert DescriptionDescription Returns the timeval representation of the ktime value DescriptionDescription Returns the timeval representation of the ktime value Namektime_to_clock_t -- convert a ktime_t variable to clock_t format SynopsisSynopsisclock_t ktime_to_clock_t (kt);const ktime_t kt;ArgumentsArgumentskt the ktime_t variable to convert DescriptionDescription Returns a clock_t variable with the converted value Namektime_to_ns -- convert a ktime_t variable to scalar nanoseconds SynopsisSynopsisu64 ktime_to_ns (kt);const ktime_t kt;ArgumentsArgumentskt the ktime_t variable to convert DescriptionDescription Returns the scalar nanoseconds representation of kt Namestruct hrtimer -- the basic hrtimer structure SynopsisSynopsis struct hrtimer { struct rb_node node; ktime_t expires; enum hrtimer_state state; int (* function) (void *); void * data; struct hrtimer_base * base; }; MembersMembersnode red black tree node for time ordered insertion expires the absolute expiry time in the hrtimers internal representation. The time is related to the clock on which the timer is based. state state of the timer function timer expiry callback function data argument for the callback function base pointer to the timer base (per cpu and per clock) DescriptionDescription The hrtimer structure must be initialized by init_hrtimer_#CLOCKTYPE DescriptionDescription The hrtimer structure must be initialized by init_hrtimer_#CLOCKTYPE Namestruct hrtimer_base -- the timer base for a specific clock SynopsisSynopsis struct hrtimer_base { clockid_t index; spinlock_t lock; struct rb_root active; struct rb_node * first; ktime_t resolution; ktime_t (* get_time) (void); struct hrtimer * curr_timer; }; MembersMembersindex clock type index for per_cpu support when moving a timer to a base on another cpu. lock lock protecting the base and associated timers active red black tree root node for the active timers first pointer to the timer node which expires first resolution the resolution of the clock, in nanoseconds get_time function to retrieve the current time of the clock curr_timer the timer which is executing a callback right now DescriptionDescription Namektime_get_real -- get the real (wall-) time in ktime_t format SynopsisSynopsisktime_t ktime_get_real (void); void;ArgumentsArgumentsvoid no arguments DescriptionDescription returns the time in ktime_t format Namektime_get_ts -- get the monotonic clock in timespec format SynopsisSynopsisvoid ktime_get_ts (ts);struct timespec * ts;ArgumentsArgumentsts pointer to timespec variable DescriptionDescription The function calculates the monotonic clock from the realtime clock and the wall_to_monotonic offset and stores the result in normalized timespec format in the variable pointed to by ts. DescriptionDescription The function calculates the monotonic clock from the realtime clock and the wall_to_monotonic offset and stores the result in normalized timespec format in the variable pointed to by ts. Internal FunctionsInternal FunctionsNamereparent_to_init -- Reparent the calling kernel thread to the init task. SynopsisSynopsisvoid reparent_to_init (void); void;ArgumentsArgumentsvoid no arguments DescriptionDescription If a kernel thread is launched as a result of a system call, or if it ever exits, it should generally reparent itself to init so that it is correctly cleaned up on exit. The various task state such as scheduling policy and priority may have been inherited from a user process, so we reset them to sane values here. NOTE that reparent_to_init gives the caller full capabilities. Namesys_tgkill -- send signal to one specific thread SynopsisSynopsislong sys_tgkill (tgid, pid, sig);int tgid;int pid;int sig;ArgumentsArgumentstgid the thread group ID of the thread pid the PID of the thread sig signal to be sent DescriptionDescription This syscall also checks the tgid and returns -ESRCH even if the PID exists but it's not belonging to the target process anymore. This method solves the problem of threads exiting and PIDs getting reused. Kernel objects manipulationKernel objects manipulationNamekobject_init -- initialize object. SynopsisSynopsisvoid kobject_init (kobj);struct kobject * kobj;ArgumentsArgumentskobj object in question. Namekobject_add -- add an object to the hierarchy. SynopsisSynopsisint kobject_add (kobj);struct kobject * kobj;ArgumentsArgumentskobj object. Namekobject_register -- initialize and add an object. SynopsisSynopsisint kobject_register (kobj);struct kobject * kobj;ArgumentsArgumentskobj object in question. Namekobject_set_name -- Set the name of an object SynopsisSynopsisint kobject_set_name (kobj, fmt, ...);struct kobject * kobj;const char * fmt; ...;ArgumentsArgumentskobj object. fmt format string used to build the name ... variable arguments DescriptionDescription If strlen(name) >= KOBJ_NAME_LEN, then use a dynamically allocated string that kobj->k_name points to. Otherwise, use the static kobj->name array. Namekobject_del -- unlink kobject from hierarchy. SynopsisSynopsisvoid kobject_del (kobj);struct kobject * kobj;ArgumentsArgumentskobj object. Namekobject_unregister -- remove object from hierarchy and decrement refcount. SynopsisSynopsisvoid kobject_unregister (kobj);struct kobject * kobj;ArgumentsArgumentskobj object going away. Namekobject_get -- increment refcount for object. SynopsisSynopsisstruct kobject * kobject_get (kobj);struct kobject * kobj;ArgumentsArgumentskobj object. Namekobject_put -- decrement refcount for object. SynopsisSynopsisvoid kobject_put (kobj);struct kobject * kobj;ArgumentsArgumentskobj object. DescriptionDescription Decrement the refcount, and if 0, call kobject_cleanup. Namekset_register -- initialize and add a kset. SynopsisSynopsisint kset_register (k);struct kset * k;ArgumentsArgumentsk kset. Namekset_unregister -- remove a kset. SynopsisSynopsisvoid kset_unregister (k);struct kset * k;ArgumentsArgumentsk kset. Namekset_find_obj -- search for object in kset. SynopsisSynopsisstruct kobject * kset_find_obj (kset, name);struct kset * kset;const char * name;ArgumentsArgumentskset kset we're looking in. name object's name. DescriptionDescription Lock kset via kset->subsys, and iterate over kset->list, looking for a matching kobject. If matching object is found take a reference and return the object. Namesubsystem_register -- register a subsystem. SynopsisSynopsisint subsystem_register (s);struct subsystem * s;ArgumentsArgumentss the subsystem we're registering. DescriptionDescription Once we register the subsystem, we want to make sure that the kset points back to this subsystem for correct usage of the rwsem. Namesubsys_create_file -- export sysfs attribute file. SynopsisSynopsisint subsys_create_file (s, a);struct subsystem * s;struct subsys_attribute * a;ArgumentsArgumentss subsystem. a subsystem attribute descriptor. Namesubsys_remove_file -- remove sysfs attribute file. SynopsisSynopsisvoid subsys_remove_file (s, a);struct subsystem * s;struct subsys_attribute * a;ArgumentsArgumentss subsystem. a attribute desciptor. Kernel utility functionsKernel utility functionsNamecontainer_of -- cast a member of a structure out to the containing structure SynopsisSynopsis container_of (ptr, type, member); ptr; type; member;ArgumentsArgumentsptr the pointer to the member. type the type of the container struct this is embedded in. member the name of the member within the struct. Nameprintk -- print a kernel message SynopsisSynopsisint printk (fmt, ...);const char * fmt; ...;ArgumentsArgumentsfmt format string ... variable arguments DescriptionDescription This is printk. It can be called from any context. We want it to work. We try to grab the console_sem. If we succeed, it's easy - we log the output and call the console drivers. If we fail to get the semaphore we place the output into the log buffer and return. The current holder of the console_sem will notice the new output in release_console_sem and will send it to the consoles before releasing the semaphore. One effect of this deferred printing is that code which calls printk and then changes console_loglevel may break. This is because console_loglevel is inspected when the actual printing occurs. See alsoSee also printf(3) Nameacquire_console_sem -- lock the console system for exclusive use. SynopsisSynopsisvoid acquire_console_sem (void); void;ArgumentsArgumentsvoid no arguments DescriptionDescription Acquires a semaphore which guarantees that the caller has exclusive access to the console system and the console_drivers list. Can sleep, returns nothing. Namerelease_console_sem -- unlock the console system SynopsisSynopsisvoid release_console_sem (void); void;ArgumentsArgumentsvoid no arguments DescriptionDescription Releases the semaphore which the caller holds on the console system and the console driver list. While the semaphore was held, console output may have been buffered by printk. If this is the case, release_console_sem emits the output prior to releasing the semaphore. If there is output waiting for klogd, we wake it up. release_console_sem may be called from any context. Nameconsole_conditional_schedule -- yield the CPU if required SynopsisSynopsisvoid __sched console_conditional_schedule (void); void;ArgumentsArgumentsvoid no arguments DescriptionDescription If the console code is currently allowed to sleep, and if this CPU should yield the CPU to another task, do so here. Must be called within acquire_console_sem. Namepanic -- halt the system SynopsisSynopsisNORET_TYPE void panic (fmt, ...);const char * fmt; ...;ArgumentsArgumentsfmt The text string to print ... variable arguments DescriptionDescription Display a message, then perform cleanups. This function never returns. Namenotifier_chain_register -- Add notifier to a notifier chain SynopsisSynopsisint notifier_chain_register (list, n);struct notifier_block ** list;struct notifier_block * n;ArgumentsArgumentslist Pointer to root list pointer n New entry in notifier chain DescriptionDescription Adds a notifier to a notifier chain. Currently always returns zero. Namenotifier_chain_unregister -- Remove notifier from a notifier chain SynopsisSynopsisint notifier_chain_unregister (nl, n);struct notifier_block ** nl;struct notifier_block * n;ArgumentsArgumentsnl Pointer to root list pointer n New entry in notifier chain DescriptionDescription Removes a notifier from a notifier chain. Returns zero on success, or -ENOENT on failure. Namenotifier_call_chain -- Call functions in a notifier chain SynopsisSynopsisint __kprobes notifier_call_chain (n, val, v);struct notifier_block ** n;unsigned long val;void * v;ArgumentsArgumentsn Pointer to root pointer of notifier chain val Value passed unmodified to notifier function v Pointer passed unmodified to notifier function DescriptionDescription Calls each function in a notifier chain in turn. If the return value of the notifier can be and'd with NOTIFY_STOP_MASK, then notifier_call_chain will return immediately, with the return value of the notifier function which halted execution. Otherwise, the return value is the return value of the last notifier function called. Nameregister_reboot_notifier -- Register function to be called at reboot time SynopsisSynopsisint register_reboot_notifier (nb);struct notifier_block * nb;ArgumentsArgumentsnb Info about notifier function to be called DescriptionDescription Registers a function with the list of functions to be called at reboot time. Currently always returns zero, as notifier_chain_register always returns zero. Nameunregister_reboot_notifier -- Unregister previously registered reboot notifier SynopsisSynopsisint unregister_reboot_notifier (nb);struct notifier_block * nb;ArgumentsArgumentsnb Hook to be unregistered DescriptionDescription Unregisters a previously registered reboot notifier function. Returns zero on success, or -ENOENT on failure. Nameemergency_restart -- reboot the system SynopsisSynopsisvoid emergency_restart (void); void;ArgumentsArgumentsvoid no arguments DescriptionDescription Without shutting down any hardware or taking any locks reboot the system. This is called when we know we are in trouble so this is our best effort to reboot. This is safe to call in interrupt context. Namekernel_restart -- reboot the system SynopsisSynopsisvoid kernel_restart (cmd);char * cmd;ArgumentsArgumentscmd pointer to buffer containing command to execute for restart or NULL DescriptionDescription Shutdown everything and perform a clean reboot. This is not safe to call in interrupt context. Namekernel_kexec -- reboot the system SynopsisSynopsisvoid kernel_kexec (void); void;ArgumentsArgumentsvoid no arguments DescriptionDescription Move into place and start executing a preloaded standalone executable. If nothing was preloaded return an error. Namekernel_halt -- halt the system SynopsisSynopsisvoid kernel_halt (void); void;ArgumentsArgumentsvoid no arguments DescriptionDescription Shutdown everything and perform a clean system halt. Namekernel_power_off -- power_off the system SynopsisSynopsisvoid kernel_power_off (void); void;ArgumentsArgumentsvoid no arguments DescriptionDescription Shutdown everything and perform a clean system power_off. Namecall_rcu -- Queue an RCU callback for invocation after a grace period. SynopsisSynopsisvoid fastcall call_rcu (head, func);struct rcu_head * head;void (*func) (struct rcu_head *rcu);ArgumentsArgumentshead structure to be used for queueing the RCU updates. func actual update function to be invoked after the grace period DescriptionDescription The update function will be invoked some time after a full grace period elapses, in other words after all currently executing RCU read-side critical sections have completed. RCU read-side critical sections are delimited by rcu_read_lock and rcu_read_unlock, and may be nested. Namecall_rcu_bh -- Queue an RCU for invocation after a quicker grace period. SynopsisSynopsisvoid fastcall call_rcu_bh (head, func);struct rcu_head * head;void (*func) (struct rcu_head *rcu);ArgumentsArgumentshead structure to be used for queueing the RCU updates. func actual update function to be invoked after the grace period DescriptionDescription The update function will be invoked some time after a full grace period elapses, in other words after all currently executing RCU read-side critical sections have completed. call_rcu_bh assumes that the read-side critical sections end on completion of a softirq handler. This means that read-side critical sections in process context must not be interrupted by softirqs. This interface is to be used when most of the read-side critical sections are in softirq context. RCU read-side critical sections are delimited by rcu_read_lock and rcu_read_unlock, * if in interrupt context or rcu_read_lock_bh and rcu_read_unlock_bh, if in process context. These may be nested. Namercu_barrier -- Wait until all the in-flight RCUs are complete. SynopsisSynopsisvoid rcu_barrier (void); void;ArgumentsArgumentsvoid no arguments Namesynchronize_rcu -- wait until a grace period has elapsed. SynopsisSynopsisvoid synchronize_rcu (void); void;ArgumentsArgumentsvoid no arguments DescriptionDescription Control will return to the caller some time after a full grace period has elapsed, in other words after all currently executing RCU read-side critical sections have completed. RCU read-side critical sections are delimited by rcu_read_lock and rcu_read_unlock, and may be nested. If your read-side code is not protected by rcu_read_lock, do -not- use synchronize_rcu. Data TypesData TypesChapter 2. Data TypesDoubly Linked ListsDoubly Linked ListsNamelist_add -- add a new entry SynopsisSynopsisvoid list_add (new, head);struct list_head * new;struct list_head * head;ArgumentsArgumentsnew new entry to be added head list head to add it after DescriptionDescription Insert a new entry after the specified head. This is good for implementing stacks. Namelist_add_tail -- add a new entry SynopsisSynopsisvoid list_add_tail (new, head);struct list_head * new;struct list_head * head;ArgumentsArgumentsnew new entry to be added head list head to add it before DescriptionDescription Insert a new entry before the specified head. This is useful for implementing queues. Namelist_add_rcu -- add a new entry to rcu-protected list SynopsisSynopsisvoid list_add_rcu (new, head);struct list_head * new;struct list_head * head;ArgumentsArgumentsnew new entry to be added head list head to add it after DescriptionDescription Insert a new entry after the specified head. This is good for implementing stacks. The caller must take whatever precautions are necessary (such as holding appropriate locks) to avoid racing with another list-mutation primitive, such as list_add_rcu or list_del_rcu, running on this same list. However, it is perfectly legal to run concurrently with the _rcu list-traversal primitives, such as list_for_each_entry_rcu. Namelist_add_tail_rcu -- add a new entry to rcu-protected list SynopsisSynopsisvoid list_add_tail_rcu (new, head);struct list_head * new;struct list_head * head;ArgumentsArgumentsnew new entry to be added head list head to add it before DescriptionDescription Insert a new entry before the specified head. This is useful for implementing queues. The caller must take whatever precautions are necessary (such as holding appropriate locks) to avoid racing with another list-mutation primitive, such as list_add_tail_rcu or list_del_rcu, running on this same list. However, it is perfectly legal to run concurrently with the _rcu list-traversal primitives, such as list_for_each_entry_rcu. Namelist_del -- deletes entry from list. SynopsisSynopsisvoid list_del (entry);struct list_head * entry;ArgumentsArgumentsentry the element to delete from the list. NoteNote list_empty on entry does not return true after this, the entry is in an undefined state. Namelist_del_rcu -- deletes entry from list without re-initialization SynopsisSynopsisvoid list_del_rcu (entry);struct list_head * entry;ArgumentsArgumentsentry the element to delete from the list. NoteNote list_empty on entry does not return true after this, the entry is in an undefined state. It is useful for RCU based lockfree traversal. In particular, it means that we can not poison the forward pointers that may still be used for walking the list. The caller must take whatever precautions are necessary (such as holding appropriate locks) to avoid racing with another list-mutation primitive, such as list_del_rcu or list_add_rcu, running on this same list. However, it is perfectly legal to run concurrently with the _rcu list-traversal primitives, such as list_for_each_entry_rcu. Note that the caller is not permitted to immediately free the newly deleted entry. Instead, either synchronize_rcu or call_rcu must be used to defer freeing until an RCU grace period has elapsed. Namelist_del_init -- deletes entry from list and reinitialize it. SynopsisSynopsisvoid list_del_init (entry);struct list_head * entry;ArgumentsArgumentsentry the element to delete from the list. Namelist_move -- delete from one list and add as another's head SynopsisSynopsisvoid list_move (list, head);struct list_head * list;struct list_head * head;ArgumentsArgumentslist the entry to move head the head that will precede our entry Namelist_move_tail -- delete from one list and add as another's tail SynopsisSynopsisvoid list_move_tail (list, head);struct list_head * list;struct list_head * head;ArgumentsArgumentslist the entry to move head the head that will follow our entry Namelist_empty -- tests whether a list is empty SynopsisSynopsisint list_empty (head);const struct list_head * head;ArgumentsArgumentshead the list to test. Namelist_empty_careful -- tests whether a list is SynopsisSynopsisint list_empty_careful (head);const struct list_head * head;ArgumentsArgumentshead the list to test. DescriptionDescription empty _and_ checks that no other CPU might be in the process of still modifying either member NOTENOTE using list_empty_careful without synchronization can only be safe if the only activity that can happen to the list entry is list_del_init. Eg. it cannot be used if another CPU could re-list_add it. Namelist_splice -- join two lists SynopsisSynopsisvoid list_splice (list, head);struct list_head * list;struct list_head * head;ArgumentsArgumentslist the new list to add. head the place to add it in the first list. Namelist_splice_init -- join two lists and reinitialise the emptied list. SynopsisSynopsisvoid list_splice_init (list, head);struct list_head * list;struct list_head * head;ArgumentsArgumentslist the new list to add. head the place to add it in the first list. DescriptionDescription The list at list is reinitialised Namelist_entry -- get the struct for this entry SynopsisSynopsis list_entry (ptr, type, member); ptr; type; member;ArgumentsArgumentsptr the &struct list_head pointer. type the type of the struct this is embedded in. member the name of the list_struct within the struct. Namelist_for_each -- iterate over a list SynopsisSynopsis list_for_each (pos, head); pos; head;ArgumentsArgumentspos the &struct list_head to use as a loop counter. head the head for your list. Name__list_for_each -- iterate over a list SynopsisSynopsis __list_for_each (pos, head); pos; head;ArgumentsArgumentspos the &struct list_head to use as a loop counter. head the head for your list. DescriptionDescription This variant differs from list_for_each in that it's the simplest possible list iteration code, no prefetching is done. Use this for code that knows the list to be very short (empty or 1 entry) most of the time. Namelist_for_each_prev -- iterate over a list backwards SynopsisSynopsis list_for_each_prev (pos, head); pos; head;ArgumentsArgumentspos the &struct list_head to use as a loop counter. head the head for your list. Namelist_for_each_safe -- iterate over a list safe against removal of list entry SynopsisSynopsis list_for_each_safe (pos, n, head); pos; n; head;ArgumentsArgumentspos the &struct list_head to use as a loop counter. n another &struct list_head to use as temporary storage head the head for your list. Namelist_for_each_entry -- iterate over list of given type SynopsisSynopsis list_for_each_entry (pos, head, member); pos; head; member;ArgumentsArgumentspos the type * to use as a loop counter. head the head for your list. member the name of the list_struct within the struct. Namelist_for_each_entry_reverse -- iterate backwards over list of given type. SynopsisSynopsis list_for_each_entry_reverse (pos, head, member); pos; head; member;ArgumentsArgumentspos the type * to use as a loop counter. head the head for your list. member the name of the list_struct within the struct. Namelist_prepare_entry -- prepare a pos entry for use as a start point in SynopsisSynopsis list_prepare_entry (pos, head, member); pos; head; member;ArgumentsArgumentspos the type * to use as a start point head the head of the list member the name of the list_struct within the struct. DescriptionDescription list_for_each_entry_continue Namelist_for_each_entry_continue -- iterate over list of given type SynopsisSynopsis list_for_each_entry_continue (pos, head, member); pos; head; member;ArgumentsArgumentspos the type * to use as a loop counter. head the head for your list. member the name of the list_struct within the struct. DescriptionDescription continuing after existing point Namelist_for_each_entry_safe -- iterate over list of given type safe against removal of list entry SynopsisSynopsis list_for_each_entry_safe (pos, n, head, member); pos; n; head; member;ArgumentsArgumentspos the type * to use as a loop counter. n another type * to use as temporary storage head the head for your list. member the name of the list_struct within the struct. Namelist_for_each_entry_safe_continue -- iterate over list of given type SynopsisSynopsis list_for_each_entry_safe_continue (pos, n, head, member); pos; n; head; member;ArgumentsArgumentspos the type * to use as a loop counter. n another type * to use as temporary storage head the head for your list. member the name of the list_struct within the struct. DescriptionDescription continuing after existing point safe against removal of list entry Namelist_for_each_entry_safe_reverse -- iterate backwards over list of given type safe against SynopsisSynopsis list_for_each_entry_safe_reverse (pos, n, head, member); pos; n; head; member;ArgumentsArgumentspos the type * to use as a loop counter. n another type * to use as temporary storage head the head for your list. member the name of the list_struct within the struct. DescriptionDescription removal of list entry Namelist_for_each_rcu -- iterate over an rcu-protected list SynopsisSynopsis list_for_each_rcu (pos, head); pos; head;ArgumentsArgumentspos the &struct list_head to use as a loop counter. head the head for your list. DescriptionDescription This list-traversal primitive may safely run concurrently with the _rcu list-mutation primitives such as list_add_rcu as long as the traversal is guarded by rcu_read_lock. Namelist_for_each_safe_rcu -- iterate over an rcu-protected list safe SynopsisSynopsis list_for_each_safe_rcu (pos, n, head); pos; n; head;ArgumentsArgumentspos the &struct list_head to use as a loop counter. n another &struct list_head to use as temporary storage head the head for your list. DescriptionDescription This list-traversal primitive may safely run concurrently with the _rcu list-mutation primitives such as list_add_rcu as long as the traversal is guarded by rcu_read_lock. DescriptionDescription This list-traversal primitive may safely run concurrently with the _rcu list-mutation primitives such as list_add_rcu as long as the traversal is guarded by rcu_read_lock. Namelist_for_each_entry_rcu -- iterate over rcu list of given type SynopsisSynopsis list_for_each_entry_rcu (pos, head, member); pos; head; member;ArgumentsArgumentspos the type * to use as a loop counter. head the head for your list. member the name of the list_struct within the struct. DescriptionDescription This list-traversal primitive may safely run concurrently with the _rcu list-mutation primitives such as list_add_rcu as long as the traversal is guarded by rcu_read_lock. Namelist_for_each_continue_rcu -- iterate over an rcu-protected list SynopsisSynopsis list_for_each_continue_rcu (pos, head); pos; head;ArgumentsArgumentspos the &struct list_head to use as a loop counter. head the head for your list. DescriptionDescription This list-traversal primitive may safely run concurrently with the _rcu list-mutation primitives such as list_add_rcu as long as the traversal is guarded by rcu_read_lock. DescriptionDescription This list-traversal primitive may safely run concurrently with the _rcu list-mutation primitives such as list_add_rcu as long as the traversal is guarded by rcu_read_lock. Namehlist_del_rcu -- deletes entry from hash list without re-initialization SynopsisSynopsisvoid hlist_del_rcu (n);struct hlist_node * n;ArgumentsArgumentsn the element to delete from the hash list. NoteNote list_unhashed on entry does not return true after this, the entry is in an undefined state. It is useful for RCU based lockfree traversal. In particular, it means that we can not poison the forward pointers that may still be used for walking the hash list. The caller must take whatever precautions are necessary (such as holding appropriate locks) to avoid racing with another list-mutation primitive, such as hlist_add_head_rcu or hlist_del_rcu, running on this same list. However, it is perfectly legal to run concurrently with the _rcu list-traversal primitives, such as hlist_for_each_entry. Namehlist_add_head_rcu -- adds the specified element to the specified hlist, SynopsisSynopsisvoid hlist_add_head_rcu (n, h);struct hlist_node * n;struct hlist_head * h;ArgumentsArgumentsn the element to add to the hash list. h the list to add to. DescriptionDescription The caller must take whatever precautions are necessary (such as holding appropriate locks) to avoid racing with another list-mutation primitive, such as hlist_add_head_rcu or hlist_del_rcu, running on this same list. However, it is perfectly legal to run concurrently with the _rcu list-traversal primitives, such as hlist_for_each_entry_rcu, used to prevent memory-consistency problems on Alpha CPUs. Regardless of the type of CPU, the list-traversal primitive must be guarded by rcu_read_lock. DescriptionDescription The caller must take whatever precautions are necessary (such as holding appropriate locks) to avoid racing with another list-mutation primitive, such as hlist_add_head_rcu or hlist_del_rcu, running on this same list. However, it is perfectly legal to run concurrently with the _rcu list-traversal primitives, such as hlist_for_each_entry_rcu, used to prevent memory-consistency problems on Alpha CPUs. Regardless of the type of CPU, the list-traversal primitive must be guarded by rcu_read_lock. Namehlist_add_before_rcu -- adds the specified element to the specified hlist SynopsisSynopsisvoid hlist_add_before_rcu (n, next);struct hlist_node * n;struct hlist_node * next;ArgumentsArgumentsn the new element to add to the hash list. next the existing element to add the new element before. DescriptionDescription The caller must take whatever precautions are necessary (such as holding appropriate locks) to avoid racing with another list-mutation primitive, such as hlist_add_head_rcu or hlist_del_rcu, running on this same list. However, it is perfectly legal to run concurrently with the _rcu list-traversal primitives, such as hlist_for_each_entry_rcu, used to prevent memory-consistency problems on Alpha CPUs. DescriptionDescription The caller must take whatever precautions are necessary (such as holding appropriate locks) to avoid racing with another list-mutation primitive, such as hlist_add_head_rcu or hlist_del_rcu, running on this same list. However, it is perfectly legal to run concurrently with the _rcu list-traversal primitives, such as hlist_for_each_entry_rcu, used to prevent memory-consistency problems on Alpha CPUs. Namehlist_add_after_rcu -- adds the specified element to the specified hlist SynopsisSynopsisvoid hlist_add_after_rcu (prev, n);struct hlist_node * prev;struct hlist_node * n;ArgumentsArgumentsprev the existing element to add the new element after. n the new element to add to the hash list. DescriptionDescription The caller must take whatever precautions are necessary (such as holding appropriate locks) to avoid racing with another list-mutation primitive, such as hlist_add_head_rcu or hlist_del_rcu, running on this same list. However, it is perfectly legal to run concurrently with the _rcu list-traversal primitives, such as hlist_for_each_entry_rcu, used to prevent memory-consistency problems on Alpha CPUs. DescriptionDescription The caller must take whatever precautions are necessary (such as holding appropriate locks) to avoid racing with another list-mutation primitive, such as hlist_add_head_rcu or hlist_del_rcu, running on this same list. However, it is perfectly legal to run concurrently with the _rcu list-traversal primitives, such as hlist_for_each_entry_rcu, used to prevent memory-consistency problems on Alpha CPUs. Namehlist_for_each_entry -- iterate over list of given type SynopsisSynopsis hlist_for_each_entry (tpos, pos, head, member); tpos; pos; head; member;ArgumentsArgumentstpos the type * to use as a loop counter. pos the &struct hlist_node to use as a loop counter. head the head for your list. member the name of the hlist_node within the struct. Namehlist_for_each_entry_continue -- iterate over a hlist continuing after existing point SynopsisSynopsis hlist_for_each_entry_continue (tpos, pos, member); tpos; pos; member;ArgumentsArgumentstpos the type * to use as a loop counter. pos the &struct hlist_node to use as a loop counter. member the name of the hlist_node within the struct. Namehlist_for_each_entry_from -- iterate over a hlist continuing from existing point SynopsisSynopsis hlist_for_each_entry_from (tpos, pos, member); tpos; pos; member;ArgumentsArgumentstpos the type * to use as a loop counter. pos the &struct hlist_node to use as a loop counter. member the name of the hlist_node within the struct. Namehlist_for_each_entry_safe -- iterate over list of given type safe against removal of list entry SynopsisSynopsis hlist_for_each_entry_safe (tpos, pos, n, head, member); tpos; pos; n; head; member;ArgumentsArgumentstpos the type * to use as a loop counter. pos the &struct hlist_node to use as a loop counter. n another &struct hlist_node to use as temporary storage head the head for your list. member the name of the hlist_node within the struct. Namehlist_for_each_entry_rcu -- iterate over rcu list of given type SynopsisSynopsis hlist_for_each_entry_rcu (tpos, pos, head, member); tpos; pos; head; member;ArgumentsArgumentstpos the type * to use as a loop counter. pos the &struct hlist_node to use as a loop counter. head the head for your list. member the name of the hlist_node within the struct. DescriptionDescription This list-traversal primitive may safely run concurrently with the _rcu list-mutation primitives such as hlist_add_head_rcu as long as the traversal is guarded by rcu_read_lock. Basic C Library FunctionsBasic C Library FunctionsChapter 3. Basic C Library Functions When writing drivers, you cannot in general use routines which are from the C Library. Some of the functions have been found generally useful and they are listed below. The behaviour of these functions may vary slightly from those defined by ANSI, and these deviations are noted in the text. String ConversionsString ConversionsNamesimple_strtoll -- convert a string to a signed long long SynopsisSynopsislong long simple_strtoll (cp, endp, base);const char * cp;char ** endp;unsigned int base;ArgumentsArgumentscp The start of the string endp A pointer to the end of the parsed string will be placed here base The number base to use Namesimple_strtoul -- convert a string to an unsigned long SynopsisSynopsisunsigned long simple_strtoul (cp, endp, base);const char * cp;char ** endp;unsigned int base;ArgumentsArgumentscp The start of the string endp A pointer to the end of the parsed string will be placed here base The number base to use Namesimple_strtol -- convert a string to a signed long SynopsisSynopsislong simple_strtol (cp, endp, base);const char * cp;char ** endp;unsigned int base;ArgumentsArgumentscp The start of the string endp A pointer to the end of the parsed string will be placed here base The number base to use Namesimple_strtoull -- convert a string to an unsigned long long SynopsisSynopsisunsigned long long simple_strtoull (cp, endp, base);const char * cp;char ** endp;unsigned int base;ArgumentsArgumentscp The start of the string endp A pointer to the end of the parsed string will be placed here base The number base to use Namevsnprintf -- Format a string and place it in a buffer SynopsisSynopsisint vsnprintf (buf, size, fmt, args);char * buf;size_t size;const char * fmt;va_list args;ArgumentsArgumentsbuf The buffer to place the result into size The size of the buffer, including the trailing null space fmt The format string to use args Arguments for the format string DescriptionDescription The return value is the number of characters which would be generated for the given input, excluding the trailing '\0', as per ISO C99. If you want to have the exact number of characters written into buf as return value (not including the trailing '\0'), use vscnprintf. If the return is greater than or equal to size, the resulting string is truncated. Call this function if you are already dealing with a va_list. You probably want snprintf instead. Namevscnprintf -- Format a string and place it in a buffer SynopsisSynopsisint vscnprintf (buf, size, fmt, args);char * buf;size_t size;const char * fmt;va_list args;ArgumentsArgumentsbuf The buffer to place the result into size The size of the buffer, including the trailing null space fmt The format string to use args Arguments for the format string DescriptionDescription The return value is the number of characters which have been written into the buf not including the trailing '\0'. If size is <= 0 the function returns 0. Call this function if you are already dealing with a va_list. You probably want scnprintf instead. Namesnprintf -- Format a string and place it in a buffer SynopsisSynopsisint snprintf (buf, size, fmt, ...);char * buf;size_t size;const char * fmt; ...;ArgumentsArgumentsbuf The buffer to place the result into size The size of the buffer, including the trailing null space fmt The format string to use @...: Arguments for the format string ... variable arguments DescriptionDescription The return value is the number of characters which would be generated for the given input, excluding the trailing null, as per ISO C99. If the return is greater than or equal to size, the resulting string is truncated. Namescnprintf -- Format a string and place it in a buffer SynopsisSynopsisint scnprintf (buf, size, fmt, ...);char * buf;size_t size;const char * fmt; ...;ArgumentsArgumentsbuf The buffer to place the result into size The size of the buffer, including the trailing null space fmt The format string to use @...: Arguments for the format string ... variable arguments DescriptionDescription The return value is the number of characters written into buf not including the trailing '\0'. If size is <= 0 the function returns 0. If the return is greater than or equal to size, the resulting string is truncated. Namevsprintf -- Format a string and place it in a buffer SynopsisSynopsisint vsprintf (buf, fmt, args);char * buf;const char * fmt;va_list args;ArgumentsArgumentsbuf The buffer to place the result into fmt The format string to use args Arguments for the format string DescriptionDescription The function returns the number of characters written into buf. Use vsnprintf or vscnprintf in order to avoid buffer overflows. Call this function if you are already dealing with a va_list. You probably want sprintf instead. Namesprintf -- Format a string and place it in a buffer SynopsisSynopsisint sprintf (buf, fmt, ...);char * buf;const char * fmt; ...;ArgumentsArgumentsbuf The buffer to place the result into fmt The format string to use @...: Arguments for the format string ... variable arguments DescriptionDescription The function returns the number of characters written into buf. Use snprintf or scnprintf in order to avoid buffer overflows. Namevsscanf -- Unformat a buffer into a list of arguments SynopsisSynopsisint vsscanf (buf, fmt, args);const char * buf;const char * fmt;va_list args;ArgumentsArgumentsbuf input buffer fmt format of buffer args arguments Namesscanf -- Unformat a buffer into a list of arguments SynopsisSynopsisint sscanf (buf, fmt, ...);const char * buf;const char * fmt; ...;ArgumentsArgumentsbuf input buffer fmt formatting of buffer @...: resulting arguments ... variable arguments String ManipulationString ManipulationNamestrnicmp -- Case insensitive, length-limited string comparison SynopsisSynopsisint strnicmp (s1, s2, len);const char * s1;const char * s2;size_t len;ArgumentsArgumentss1 One string s2 The other string len the maximum number of characters to compare Namestrcpy -- Copy a NUL terminated string SynopsisSynopsischar * strcpy (dest, src);char * dest;const char * src;ArgumentsArgumentsdest Where to copy the string to src Where to copy the string from Namestrncpy -- Copy a length-limited, NUL-terminated string SynopsisSynopsischar * strncpy (dest, src, count);char * dest;const char * src;size_t count;ArgumentsArgumentsdest Where to copy the string to src Where to copy the string from count The maximum number of bytes to copy DescriptionDescription The result is not NUL-terminated if the source exceeds count bytes. In the case where the length of src is less than that of count, the remainder of dest will be padded with NUL. Namestrlcpy -- Copy a NUL terminated string into a sized buffer SynopsisSynopsissize_t strlcpy (dest, src, size);char * dest;const char * src;size_t size;ArgumentsArgumentsdest Where to copy the string to src Where to copy the string from size size of destination buffer BSDBSD the result is always a valid NUL-terminated string that fits in the buffer (unless, of course, the buffer size is zero). It does not pad out the result like strncpy does. Namestrcat -- Append one NUL-terminated string to another SynopsisSynopsischar * strcat (dest, src);char * dest;const char * src;ArgumentsArgumentsdest The string to be appended to src The string to append to it Namestrncat -- Append a length-limited, NUL-terminated string to another SynopsisSynopsischar * strncat (dest, src, count);char * dest;const char * src;size_t count;ArgumentsArgumentsdest The string to be appended to src The string to append to it count The maximum numbers of bytes to copy DescriptionDescription Note that in contrast to strncpy, strncat ensures the result is terminated. Namestrlcat -- Append a length-limited, NUL-terminated string to another SynopsisSynopsissize_t strlcat (dest, src, count);char * dest;const char * src;size_t count;ArgumentsArgumentsdest The string to be appended to src The string to append to it count The size of the destination buffer. Namestrcmp -- Compare two strings SynopsisSynopsisint strcmp (cs, ct);const char * cs;const char * ct;ArgumentsArgumentscs One string ct Another string Namestrncmp -- Compare two length-limited strings SynopsisSynopsisint strncmp (cs, ct, count);const char * cs;const char * ct;size_t count;ArgumentsArgumentscs One string ct Another string count The maximum number of bytes to compare Namestrchr -- Find the first occurrence of a character in a string SynopsisSynopsischar * strchr (s, c);const char * s;int c;ArgumentsArgumentss The string to be searched c The character to search for Namestrrchr -- Find the last occurrence of a character in a string SynopsisSynopsischar * strrchr (s, c);const char * s;int c;ArgumentsArgumentss The string to be searched c The character to search for Namestrnchr -- Find a character in a length limited string SynopsisSynopsischar * strnchr (s, count, c);const char * s;size_t count;int c;ArgumentsArgumentss The string to be searched count The number of characters to be searched c The character to search for Namestrlen -- Find the length of a string SynopsisSynopsissize_t strlen (s);const char * s;ArgumentsArgumentss The string to be sized Namestrnlen -- Find the length of a length-limited string SynopsisSynopsissize_t strnlen (s, count);const char * s;size_t count;ArgumentsArgumentss The string to be sized count The maximum number of bytes to search Namestrspn -- Calculate the length of the initial substring of s which only SynopsisSynopsissize_t strspn (s, accept);const char * s;const char * accept;ArgumentsArgumentss The string to be searched accept The string to search for DescriptionDescription contain letters in accept Namestrcspn -- Calculate the length of the initial substring of s which does SynopsisSynopsissize_t strcspn (s, reject);const char * s;const char * reject;ArgumentsArgumentss The string to be searched reject The string to avoid DescriptionDescription not contain letters in reject Namestrpbrk -- Find the first occurrence of a set of characters SynopsisSynopsischar * strpbrk (cs, ct);const char * cs;const char * ct;ArgumentsArgumentscs The string to be searched ct The characters to search for Namestrsep -- Split a string into tokens SynopsisSynopsischar * strsep (s, ct);char ** s;const char * ct;ArgumentsArgumentss The string to be searched ct The characters to search for DescriptionDescription strsep updates s to point after the token, ready for the next call. It returns empty tokens, too, behaving exactly like the libc function of that name. In fact, it was stolen from glibc2 and de-fancy-fied. Same semantics, slimmer shape. ;) Namememset -- Fill a region of memory with the given value SynopsisSynopsisvoid * memset (s, c, count);void * s;int c;size_t count;ArgumentsArgumentss Pointer to the start of the area. c The byte to fill the area with count The size of the area. DescriptionDescription Do not use memset to access IO space, use memset_io instead. Namememcpy -- Copy one area of memory to another SynopsisSynopsisvoid * memcpy (dest, src, count);void * dest;const void * src;size_t count;ArgumentsArgumentsdest Where to copy to src Where to copy from count The size of the area. DescriptionDescription You should not use this function to access IO space, use memcpy_toio or memcpy_fromio instead. Namememmove -- Copy one area of memory to another SynopsisSynopsisvoid * memmove (dest, src, count);void * dest;const void * src;size_t count;ArgumentsArgumentsdest Where to copy to src Where to copy from count The size of the area. DescriptionDescription Unlike memcpy, memmove copes with overlapping areas. Namememcmp -- Compare two areas of memory SynopsisSynopsisint memcmp (cs, ct, count);const void * cs;const void * ct;size_t count;ArgumentsArgumentscs One area of memory ct Another area of memory count The size of the area. Namememscan -- Find a character in an area of memory. SynopsisSynopsisvoid * memscan (addr, c, size);void * addr;int c;size_t size;ArgumentsArgumentsaddr The memory area c The byte to search for size The size of the area. DescriptionDescription returns the address of the first occurrence of c, or 1 byte past the area if c is not found Namestrstr -- Find the first substring in a NUL terminated string SynopsisSynopsischar * strstr (s1, s2);const char * s1;const char * s2;ArgumentsArgumentss1 The string to be searched s2 The string to search for Namememchr -- Find a character in an area of memory. SynopsisSynopsisvoid * memchr (s, c, n);const void * s;int c;size_t n;ArgumentsArgumentss The memory area c The byte to search for n The size of the area. DescriptionDescription returns the address of the first occurrence of c, or NULL if c is not found Bit OperationsBit OperationsNameset_bit -- Atomically set a bit in memory SynopsisSynopsisvoid set_bit (nr, addr);int nr;volatile unsigned long * addr;ArgumentsArgumentsnr the bit to set addr the address to start counting from DescriptionDescription This function is atomic and may not be reordered. See __set_bit if you do not require the atomic guarantees. NoteNote there are no guarantees that this function will not be reordered on non x86 architectures, so if you are writting portable code, make sure not to rely on its reordering guarantees. Note that nr may be almost arbitrarily large; this function is not restricted to acting on a single-word quantity. Name__set_bit -- Set a bit in memory SynopsisSynopsisvoid __set_bit (nr, addr);int nr;volatile unsigned long * addr;ArgumentsArgumentsnr the bit to set addr the address to start counting from DescriptionDescription Unlike set_bit, this function is non-atomic and may be reordered. If it's called on the same region of memory simultaneously, the effect may be that only one operation succeeds. Nameclear_bit -- Clears a bit in memory SynopsisSynopsisvoid clear_bit (nr, addr);int nr;volatile unsigned long * addr;ArgumentsArgumentsnr Bit to clear addr Address to start counting from DescriptionDescription clear_bit is atomic and may not be reordered. However, it does not contain a memory barrier, so if it is used for locking purposes, you should call smp_mb__before_clear_bit and/or smp_mb__after_clear_bit in order to ensure changes are visible on other processors. Name__change_bit -- Toggle a bit in memory SynopsisSynopsisvoid __change_bit (nr, addr);int nr;volatile unsigned long * addr;ArgumentsArgumentsnr the bit to change addr the address to start counting from DescriptionDescription Unlike change_bit, this function is non-atomic and may be reordered. If it's called on the same region of memory simultaneously, the effect may be that only one operation succeeds. Namechange_bit -- Toggle a bit in memory SynopsisSynopsisvoid change_bit (nr, addr);int nr;volatile unsigned long * addr;ArgumentsArgumentsnr Bit to change addr Address to start counting from DescriptionDescription change_bit is atomic and may not be reordered. It may be reordered on other architectures than x86. Note that nr may be almost arbitrarily large; this function is not restricted to acting on a single-word quantity. Nametest_and_set_bit -- Set a bit and return its old value SynopsisSynopsisint test_and_set_bit (nr, addr);int nr;volatile unsigned long * addr;ArgumentsArgumentsnr Bit to set addr Address to count from DescriptionDescription This operation is atomic and cannot be reordered. It may be reordered on other architectures than x86. It also implies a memory barrier. Name__test_and_set_bit -- Set a bit and return its old value SynopsisSynopsisint __test_and_set_bit (nr, addr);int nr;volatile unsigned long * addr;ArgumentsArgumentsnr Bit to set addr Address to count from DescriptionDescription This operation is non-atomic and can be reordered. If two examples of this operation race, one can appear to succeed but actually fail. You must protect multiple accesses with a lock. Nametest_and_clear_bit -- Clear a bit and return its old value SynopsisSynopsisint test_and_clear_bit (nr, addr);int nr;volatile unsigned long * addr;ArgumentsArgumentsnr Bit to clear addr Address to count from DescriptionDescription This operation is atomic and cannot be reordered. It can be reorderdered on other architectures other than x86. It also implies a memory barrier. Name__test_and_clear_bit -- Clear a bit and return its old value SynopsisSynopsisint __test_and_clear_bit (nr, addr);int nr;volatile unsigned long * addr;ArgumentsArgumentsnr Bit to clear addr Address to count from DescriptionDescription This operation is non-atomic and can be reordered. If two examples of this operation race, one can appear to succeed but actually fail. You must protect multiple accesses with a lock. Nametest_and_change_bit -- Change a bit and return its old value SynopsisSynopsisint test_and_change_bit (nr, addr);int nr;volatile unsigned long * addr;ArgumentsArgumentsnr Bit to change addr Address to count from DescriptionDescription This operation is atomic and cannot be reordered. It also implies a memory barrier. Nametest_bit -- Determine whether a bit is set SynopsisSynopsisint test_bit (nr, addr);int nr;const volatile void * addr;ArgumentsArgumentsnr bit number to test addr Address to start counting from Namefind_first_zero_bit -- find the first zero bit in a memory region SynopsisSynopsisint find_first_zero_bit (addr, size);const unsigned long * addr;unsigned size;ArgumentsArgumentsaddr The address to start the search at size The maximum size to search DescriptionDescription Returns the bit-number of the first zero bit, not the number of the byte containing a bit. Namefind_next_zero_bit -- find the first zero bit in a memory region SynopsisSynopsisint find_next_zero_bit (addr, size, offset);const unsigned long * addr;int size;int offset;ArgumentsArgumentsaddr The address to base the search on size The maximum size to search offset The bitnumber to start searching at Name__ffs -- find first bit in word. SynopsisSynopsisunsigned long __ffs (word);unsigned long word;ArgumentsArgumentsword The word to search DescriptionDescription Undefined if no bit exists, so code should check against 0 first. Namefind_first_bit -- find the first set bit in a memory region SynopsisSynopsisunsigned find_first_bit (addr, size);const unsigned long * addr;unsigned size;ArgumentsArgumentsaddr The address to start the search at size The maximum size to search DescriptionDescription Returns the bit-number of the first set bit, not the number of the byte containing a bit. Namefind_next_bit -- find the first set bit in a memory region SynopsisSynopsisint find_next_bit (addr, size, offset);const unsigned long * addr;int size;int offset;ArgumentsArgumentsaddr The address to base the search on size The maximum size to search offset The bitnumber to start searching at Nameffz -- find first zero in word. SynopsisSynopsisunsigned long ffz (word);unsigned long word;ArgumentsArgumentsword The word to search DescriptionDescription Undefined if no zero exists, so code should check against ~0UL first. Nameffs -- find first bit set SynopsisSynopsisint ffs (x);int x;ArgumentsArgumentsx the word to search DescriptionDescription This is defined the same way as the libc and compiler builtin ffs routines, therefore differs in spirit from the above ffz (man ffs). Namefls -- find last bit set SynopsisSynopsisint fls (x);int x;ArgumentsArgumentsx the word to search DescriptionDescription This is defined the same way as ffs. Namehweight32 -- returns the hamming weight of a N-bit word SynopsisSynopsis hweight32 (x); x;ArgumentsArgumentsx the word to weigh DescriptionDescription The Hamming Weight of a number is the total number of bits set in it. Memory Management in LinuxMemory Management in LinuxChapter 4. Memory Management in LinuxThe Slab CacheThe Slab CacheNamekmem_cache_create -- Create a cache. SynopsisSynopsisstruct kmem_cache * kmem_cache_create (name, size, align, flags, ctor, dtor);const char * name;size_t size;size_t align;unsigned long flags;void (*ctor) (void*, struct kmem_cache *, unsigned long);void (*dtor) (void*, struct kmem_cache *, unsigned long);ArgumentsArgumentsname A string which is used in /proc/slabinfo to identify this cache. size The size of objects to be created in this cache. align The required alignment for the objects. flags SLAB flags ctor A constructor for the objects. dtor A destructor for the objects. DescriptionDescription Returns a ptr to the cache on success, NULL on failure. Cannot be called within a int, but can be interrupted. The ctor is run when new pages are allocated by the cache and the dtor is run before the pages are handed back. name must be valid until the cache is destroyed. This implies that the module calling this has to destroy the cache before getting unloaded. The flags are SLAB_POISON - Poison the slab with a known test pattern (a5a5a5a5) to catch references to uninitialised memory. SLAB_RED_ZONE - Insert `Red' zones around the allocated memory to check for buffer overruns. SLAB_NO_REAP - Don't automatically reap this cache when we're under memory pressure. SLAB_HWCACHE_ALIGN - Align the objects in this cache to a hardware cacheline. This can be beneficial if you're counting cycles as closely as davem. Namekmem_cache_shrink -- Shrink a cache. SynopsisSynopsisint kmem_cache_shrink (cachep);struct kmem_cache * cachep;ArgumentsArgumentscachep The cache to shrink. DescriptionDescription Releases as many slabs as possible for a cache. To help debugging, a zero exit status indicates all slabs were released. Namekmem_cache_destroy -- delete a cache SynopsisSynopsisint kmem_cache_destroy (cachep);struct kmem_cache * cachep;ArgumentsArgumentscachep the cache to destroy DescriptionDescription Remove a struct kmem_cache object from the slab cache. Returns 0 on success. It is expected this function will be called by a module when it is unloaded. This will remove the cache completely, and avoid a duplicate cache being allocated each time a module is loaded and unloaded, if the module doesn't have persistent in-kernel storage across loads and unloads. The cache must be empty before calling this function. The caller must guarantee that noone will allocate memory from the cache during the kmem_cache_destroy. Namekmem_cache_alloc -- Allocate an object SynopsisSynopsisvoid * kmem_cache_alloc (cachep, flags);struct kmem_cache * cachep;gfp_t flags;ArgumentsArgumentscachep The cache to allocate from. flags See kmalloc. DescriptionDescription Allocate an object from this cache. The flags are only relevant if the cache has no available objects. Namekmem_cache_alloc_node -- Allocate an object on the specified node SynopsisSynopsisvoid * kmem_cache_alloc_node (cachep, flags, nodeid);struct kmem_cache * cachep;gfp_t flags;int nodeid;ArgumentsArgumentscachep The cache to allocate from. flags See kmalloc. nodeid node number of the target node. DescriptionDescription Identical to kmem_cache_alloc, except that this function is slow and can sleep. And it will allocate memory on the given node, which can improve the performance for cpu bound structures. New and improvedNew and improved it will now make sure that the object gets put on the correct node list so that there is no false sharing. Name__alloc_percpu -- allocate one copy of the object for every present SynopsisSynopsisvoid * __alloc_percpu (size);size_t size;ArgumentsArgumentssize how many bytes of memory are required. DescriptionDescription cpu in the system, zeroing them. Objects should be dereferenced using the per_cpu_ptr macro only. Namekmem_cache_free -- Deallocate an object SynopsisSynopsisvoid kmem_cache_free (cachep, objp);struct kmem_cache * cachep;void * objp;ArgumentsArgumentscachep The cache the allocation was from. objp The previously allocated object. DescriptionDescription Free an object which was previously allocated from this cache. Namekfree -- free previously allocated memory SynopsisSynopsisvoid kfree (objp);const void * objp;ArgumentsArgumentsobjp pointer returned by kmalloc. DescriptionDescription If objp is NULL, no operation is performed. Don't free memory not originally allocated by kmalloc or you will run into trouble. Namefree_percpu -- free previously allocated percpu memory SynopsisSynopsisvoid free_percpu (objp);const void * objp;ArgumentsArgumentsobjp pointer returned by alloc_percpu. DescriptionDescription Don't free memory not originally allocated by alloc_percpu The complemented objp is to check for that. User Space Memory AccessUser Space Memory AccessNameaccess_ok -- Checks if a user space pointer is valid SynopsisSynopsis access_ok (type, addr, size); type; addr; size;ArgumentsArgumentstype Type of access: VERIFY_READ or VERIFY_WRITE. Note that VERIFY_WRITE is a superset of VERIFY_READ - if it is safe to write to a block, it is always safe to read from it. addr User space pointer to start of block to check size Size of block to check ContextContext User context only. This function may sleep. DescriptionDescription Checks if a pointer to a block of memory in user space is valid. Returns true (nonzero) if the memory block may be valid, false (zero) if it is definitely invalid. Note that, depending on architecture, this function probably just checks that the pointer is in the user space range - after calling this function, memory access functions may still return -EFAULT. Nameget_user -- Get a simple variable from user space. SynopsisSynopsis get_user (x, ptr); x; ptr;ArgumentsArgumentsx Variable to store result. ptr Source address, in user space. ContextContext User context only. This function may sleep. DescriptionDescription This macro copies a single simple variable from user space to kernel space. It supports simple types like char and int, but not larger data types like structures or arrays. ptr must have pointer-to-simple-variable type, and the result of dereferencing ptr must be assignable to x without a cast. Returns zero on success, or -EFAULT on error. On error, the variable x is set to zero. Nameput_user -- Write a simple value into user space. SynopsisSynopsis put_user (x, ptr); x; ptr;ArgumentsArgumentsx Value to copy to user space. ptr Destination address, in user space. ContextContext User context only. This function may sleep. DescriptionDescription This macro copies a single simple value from kernel space to user space. It supports simple types like char and int, but not larger data types like structures or arrays. ptr must have pointer-to-simple-variable type, and x must be assignable to the result of dereferencing ptr. Returns zero on success, or -EFAULT on error. Name__get_user -- Get a simple variable from user space, with less checking. SynopsisSynopsis __get_user (x, ptr); x; ptr;ArgumentsArgumentsx Variable to store result. ptr Source address, in user space. ContextContext User context only. This function may sleep. DescriptionDescription This macro copies a single simple variable from user space to kernel space. It supports simple types like char and int, but not larger data types like structures or arrays. ptr must have pointer-to-simple-variable type, and the result of dereferencing ptr must be assignable to x without a cast. Caller must check the pointer with access_ok before calling this function. Returns zero on success, or -EFAULT on error. On error, the variable x is set to zero. Name__put_user -- Write a simple value into user space, with less checking. SynopsisSynopsis __put_user (x, ptr); x; ptr;ArgumentsArgumentsx Value to copy to user space. ptr Destination address, in user space. ContextContext User context only. This function may sleep. DescriptionDescription This macro copies a single simple value from kernel space to user space. It supports simple types like char and int, but not larger data types like structures or arrays. ptr must have pointer-to-simple-variable type, and x must be assignable to the result of dereferencing ptr. Caller must check the pointer with access_ok before calling this function. Returns zero on success, or -EFAULT on error. Name__copy_to_user_inatomic -- Copy a block of data into user space, with less checking. SynopsisSynopsis__always_inline unsigned long __must_check __copy_to_user_inatomic (to, from, n);void __user * to;const void * from;unsigned long n;ArgumentsArgumentsto Destination address, in user space. from Source address, in kernel space. n Number of bytes to copy. ContextContext User context only. This function may sleep. DescriptionDescription Copy data from kernel space to user space. Caller must check the specified block with access_ok before calling this function. Returns number of bytes that could not be copied. On success, this will be zero. Name__copy_from_user_inatomic -- Copy a block of data from user space, with less checking. SynopsisSynopsis__always_inline unsigned long __copy_from_user_inatomic (to, from, n);void * to;const void __user * from;unsigned long n;ArgumentsArgumentsto Destination address, in kernel space. from Source address, in user space. n Number of bytes to copy. ContextContext User context only. This function may sleep. DescriptionDescription Copy data from user space to kernel space. Caller must check the specified block with access_ok before calling this function. Returns number of bytes that could not be copied. On success, this will be zero. If some data could not be copied, this function will pad the copied data to the requested size using zero bytes. Namestrlen_user -- Get the size of a string in user space. SynopsisSynopsis strlen_user (str); str;ArgumentsArgumentsstr The string to measure. ContextContext User context only. This function may sleep. DescriptionDescription Get the size of a NUL-terminated string in user space. Returns the size of the string INCLUDING the terminating NUL. On exception, returns 0. If there is a limit on the length of a valid string, you may wish to consider using strnlen_user instead. Name__strncpy_from_user -- Copy a NUL terminated string from userspace, with less checking. SynopsisSynopsislong __strncpy_from_user (dst, src, count);char * dst;const char __user * src;long count;ArgumentsArgumentsdst Destination address, in kernel space. This buffer must be at least count bytes long. src Source address, in user space. count Maximum number of bytes to copy, including the trailing NUL. DescriptionDescription Copies a NUL-terminated string from userspace to kernel space. Caller must check the specified block with access_ok before calling this function. On success, returns the length of the string (not including the trailing NUL). If access to userspace fails, returns -EFAULT (some data may have been copied). If count is smaller than the length of the string, copies count bytes and returns count. Namestrncpy_from_user -- Copy a NUL terminated string from userspace. SynopsisSynopsislong strncpy_from_user (dst, src, count);char * dst;const char __user * src;long count;ArgumentsArgumentsdst Destination address, in kernel space. This buffer must be at least count bytes long. src Source address, in user space. count Maximum number of bytes to copy, including the trailing NUL. DescriptionDescription Copies a NUL-terminated string from userspace to kernel space. On success, returns the length of the string (not including the trailing NUL). If access to userspace fails, returns -EFAULT (some data may have been copied). If count is smaller than the length of the string, copies count bytes and returns count. Nameclear_user -- Zero a block of memory in user space. SynopsisSynopsisunsigned long clear_user (to, n);void __user * to;unsigned long n;ArgumentsArgumentsto Destination address, in user space. n Number of bytes to zero. DescriptionDescription Zero a block of memory in user space. Returns number of bytes that could not be cleared. On success, this will be zero. Name__clear_user -- Zero a block of memory in user space, with less checking. SynopsisSynopsisunsigned long __clear_user (to, n);void __user * to;unsigned long n;ArgumentsArgumentsto Destination address, in user space. n Number of bytes to zero. DescriptionDescription Zero a block of memory in user space. Caller must check the specified block with access_ok before calling this function. Returns number of bytes that could not be cleared. On success, this will be zero. Namestrnlen_user -- Get the size of a string in user space. SynopsisSynopsislong strnlen_user (s, n);const char __user * s;long n;ArgumentsArgumentss The string to measure. n The maximum valid length DescriptionDescription Get the size of a NUL-terminated string in user space. Returns the size of the string INCLUDING the terminating NUL. On exception, returns 0. If the string is too long, returns a value greater than n. Namecopy_to_user -- Copy a block of data into user space. SynopsisSynopsisunsigned long copy_to_user (to, from, n);void __user * to;const void * from;unsigned long n;ArgumentsArgumentsto Destination address, in user space. from Source address, in kernel space. n Number of bytes to copy. ContextContext User context only. This function may sleep. DescriptionDescription Copy data from kernel space to user space. Returns number of bytes that could not be copied. On success, this will be zero. Namecopy_from_user -- Copy a block of data from user space. SynopsisSynopsisunsigned long copy_from_user (to, from, n);void * to;const void __user * from;unsigned long n;ArgumentsArgumentsto Destination address, in kernel space. from Source address, in user space. n Number of bytes to copy. ContextContext User context only. This function may sleep. DescriptionDescription Copy data from user space to kernel space. Returns number of bytes that could not be copied. On success, this will be zero. If some data could not be copied, this function will pad the copied data to the requested size using zero bytes. More Memory Management FunctionsMore Memory Management FunctionsNamepage_dup_rmap -- duplicate pte mapping to a page SynopsisSynopsisvoid page_dup_rmap (page);struct page * page;ArgumentsArgumentspage the page to add the mapping to For copy_page_range onlyFor copy_page_range only minimal extract from page_add_rmap, avoiding unnecessary tests (already checked) so it's quicker. Nameread_cache_pages -- populate an address space with some pages, and SynopsisSynopsisint read_cache_pages (mapping, pages, filler, data);struct address_space * mapping;struct list_head * pages;int (*filler) (void *, struct page *);void * data;ArgumentsArgumentsmapping the address_space pages The address of a list_head which contains the target pages. These pages have their ->index populated and are otherwise uninitialised. filler callback routine for filling a single page. data private data for the callback routine. DescriptionDescription Hides the details of the LRU cache etc from the filesystems. DescriptionDescription Hides the details of the LRU cache etc from the filesystems. Namefilemap_fdatawait -- walk the list of under-writeback pages of the given SynopsisSynopsisint filemap_fdatawait (mapping);struct address_space * mapping;ArgumentsArgumentsmapping address space structure to wait for DescriptionDescription address space and wait for all of them. Nameunlock_page -- unlock a locked page SynopsisSynopsisvoid fastcall unlock_page (page);struct page * page;ArgumentsArgumentspage the page DescriptionDescription Unlocks the page and wakes up sleepers in ___wait_on_page_locked. Also wakes sleepers in wait_on_page_writeback because the wakeup mechananism between PageLocked pages and PageWriteback pages is shared. But that's OK - sleepers in wait_on_page_writeback just go back to sleep. The first mb is necessary to safely close the critical section opened by the TestSetPageLocked, the second mb is necessary to enforce ordering between the clear_bit and the read of the waitqueue (to avoid SMP races with a parallel wait_on_page_locked). DescriptionDescription Unlocks the page and wakes up sleepers in ___wait_on_page_locked. Also wakes sleepers in wait_on_page_writeback because the wakeup mechananism between PageLocked pages and PageWriteback pages is shared. But that's OK - sleepers in wait_on_page_writeback just go back to sleep. The first mb is necessary to safely close the critical section opened by the TestSetPageLocked, the second mb is necessary to enforce ordering between the clear_bit and the read of the waitqueue (to avoid SMP races with a parallel wait_on_page_locked). Namefind_lock_page -- locate, pin and lock a pagecache page SynopsisSynopsisstruct page * find_lock_page (mapping, offset);struct address_space * mapping;unsigned long offset;ArgumentsArgumentsmapping the address_space to search offset the page index DescriptionDescription Locates the desired pagecache page, locks it, increments its reference count and returns its address. Returns zero if the page was not present. find_lock_page may sleep. DescriptionDescription Locates the desired pagecache page, locks it, increments its reference count and returns its address. Returns zero if the page was not present. find_lock_page may sleep. Namefind_or_create_page -- locate or add a pagecache page SynopsisSynopsisstruct page * find_or_create_page (mapping, index, gfp_mask);struct address_space * mapping;unsigned long index;gfp_t gfp_mask;ArgumentsArgumentsmapping the page's address_space index the page's index into the mapping gfp_mask page allocation mode DescriptionDescription Locates a page in the pagecache. If the page is not present, a new page is allocated using gfp_mask and is added to the pagecache and to the VM's LRU list. The returned page is locked and has its reference count incremented. find_or_create_page may sleep, even if gfp_flags specifies an atomic allocation! find_or_create_page returns the desired page's address, or zero on memory exhaustion. DescriptionDescription Locates a page in the pagecache. If the page is not present, a new page is allocated using gfp_mask and is added to the pagecache and to the VM's LRU list. The returned page is locked and has its reference count incremented. find_or_create_page may sleep, even if gfp_flags specifies an atomic allocation! find_or_create_page returns the desired page's address, or zero on memory exhaustion. Nameunmap_mapping_range -- unmap the portion of all mmaps SynopsisSynopsisvoid unmap_mapping_range (mapping, holebegin, holelen, even_cows);struct address_space * mapping;loff_t const holebegin;loff_t const holelen;int even_cows;ArgumentsArgumentsmapping the address space containing mmaps to be unmapped. holebegin byte in first page to unmap, relative to the start of the underlying file. This will be rounded down to a PAGE_SIZE boundary. Note that this is different from vmtruncate, which must keep the partial page. In contrast, we must get rid of partial pages. holelen size of prospective hole in bytes. This will be rounded up to a PAGE_SIZE boundary. A holelen of zero truncates to the end of the file. even_cows 1 when truncating a file, unmap even private COWed pages; but 0 when invalidating pagecache, don't throw away private data. DescriptionDescription in the specified address_space corresponding to the specified page range in the underlying file. Namevfree -- release memory allocated by vmalloc SynopsisSynopsisvoid vfree (addr);void * addr;ArgumentsArgumentsaddr memory base address DescriptionDescription Free the virtually contiguous memory area starting at addr, as obtained from vmalloc, vmalloc_32 or __vmalloc. If addr is NULL, no operation is performed. Must not be called in interrupt context. DescriptionDescription Free the virtually contiguous memory area starting at addr, as obtained from vmalloc, vmalloc_32 or __vmalloc. If addr is NULL, no operation is performed. Must not be called in interrupt context. Namevunmap -- release virtual mapping obtained by vmap SynopsisSynopsisvoid vunmap (addr);void * addr;ArgumentsArgumentsaddr memory base address DescriptionDescription Free the virtually contiguous memory area starting at addr, which was created from the page array passed to vmap. Must not be called in interrupt context. DescriptionDescription Free the virtually contiguous memory area starting at addr, which was created from the page array passed to vmap. Must not be called in interrupt context. Namevmap -- map an array of pages into virtually contiguous space SynopsisSynopsisvoid * vmap (pages, count, flags, prot);struct page ** pages;unsigned int count;unsigned long flags;pgprot_t prot;ArgumentsArgumentspages array of page pointers count number of pages to map flags vm_area->flags prot page protection for the mapping DescriptionDescription Maps count pages from pages into contiguous kernel virtual space. DescriptionDescription Maps count pages from pages into contiguous kernel virtual space. Name__vmalloc_node -- allocate virtually contiguous memory SynopsisSynopsisvoid * __vmalloc_node (size, gfp_mask, prot, node);unsigned long size;gfp_t gfp_mask;pgprot_t prot;int node;ArgumentsArgumentssize allocation size gfp_mask flags for the page level allocator prot protection mask for the allocated pages node node to use for allocation or -1 DescriptionDescription Allocate enough pages to cover size from the page level allocator with gfp_mask flags. Map them into contiguous kernel virtual space, using a pagetable protection of prot. DescriptionDescription Allocate enough pages to cover size from the page level allocator with gfp_mask flags. Map them into contiguous kernel virtual space, using a pagetable protection of prot. Namevmalloc -- allocate virtually contiguous memory SynopsisSynopsisvoid * vmalloc (size);unsigned long size;ArgumentsArgumentssize allocation size DescriptionDescription Allocate enough pages to cover size from the page level allocator and map them into contiguous kernel virtual space. For tight cotrol over page level allocator and protection flags use __vmalloc instead. DescriptionDescription Allocate enough pages to cover size from the page level allocator and map them into contiguous kernel virtual space. For tight cotrol over page level allocator and protection flags use __vmalloc instead. Namevmalloc_node -- allocate memory on a specific node SynopsisSynopsisvoid * vmalloc_node (size, node);unsigned long size;int node;ArgumentsArgumentssize allocation size node numa node DescriptionDescription Allocate enough pages to cover size from the page level allocator and map them into contiguous kernel virtual space. For tight cotrol over page level allocator and protection flags use __vmalloc instead. DescriptionDescription Allocate enough pages to cover size from the page level allocator and map them into contiguous kernel virtual space. For tight cotrol over page level allocator and protection flags use __vmalloc instead. Namevmalloc_32 -- allocate virtually contiguous memory (32bit addressable) SynopsisSynopsisvoid * vmalloc_32 (size);unsigned long size;ArgumentsArgumentssize allocation size DescriptionDescription Allocate enough 32bit PA addressable pages to cover size from the page level allocator and map them into contiguous kernel virtual space. DescriptionDescription Allocate enough 32bit PA addressable pages to cover size from the page level allocator and map them into contiguous kernel virtual space. Namemempool_create -- create a memory pool SynopsisSynopsismempool_t * mempool_create (min_nr, alloc_fn, free_fn, pool_data);int min_nr;mempool_alloc_t * alloc_fn;mempool_free_t * free_fn;void * pool_data;ArgumentsArgumentsmin_nr the minimum number of elements guaranteed to be allocated for this pool. alloc_fn user-defined element-allocation function. free_fn user-defined element-freeing function. pool_data optional private data available to the user-defined functions. DescriptionDescription this function creates and allocates a guaranteed size, preallocated memory pool. The pool can be used from the mempool_alloc and mempool_free functions. This function might sleep. Both the alloc_fn and the free_fn functions might sleep - as long as the mempool_alloc function is not called from IRQ contexts. Namemempool_resize -- resize an existing memory pool SynopsisSynopsisint mempool_resize (pool, new_min_nr, gfp_mask);mempool_t * pool;int new_min_nr;gfp_t gfp_mask;ArgumentsArgumentspool pointer to the memory pool which was allocated via mempool_create. new_min_nr the new minimum number of elements guaranteed to be allocated for this pool. gfp_mask the usual allocation bitmask. DescriptionDescription This function shrinks/grows the pool. In the case of growing, it cannot be guaranteed that the pool will be grown to the new size immediately, but new mempool_free calls will refill it. Note, the caller must guarantee that no mempool_destroy is called while this function is running. mempool_alloc & mempool_free might be called (eg. from IRQ contexts) while this function executes. Namemempool_destroy -- deallocate a memory pool SynopsisSynopsisvoid mempool_destroy (pool);mempool_t * pool;ArgumentsArgumentspool pointer to the memory pool which was allocated via mempool_create. DescriptionDescription this function only sleeps if the free_fn function sleeps. The caller has to guarantee that all elements have been returned to the pool (ie: freed) prior to calling mempool_destroy. Namemempool_alloc -- allocate an element from a specific memory pool SynopsisSynopsisvoid * mempool_alloc (pool, gfp_mask);mempool_t * pool;gfp_t gfp_mask;ArgumentsArgumentspool pointer to the memory pool which was allocated via mempool_create. gfp_mask the usual allocation bitmask. DescriptionDescription this function only sleeps if the alloc_fn function sleeps or returns NULL. Note that due to preallocation, this function *never* fails when called from process contexts. (it might fail if called from an IRQ context.) Namemempool_free -- return an element to the pool. SynopsisSynopsisvoid mempool_free (element, pool);void * element;mempool_t * pool;ArgumentsArgumentselement pool element pointer. pool pointer to the memory pool which was allocated via mempool_create. DescriptionDescription this function only sleeps if the free_fn function sleeps. Namebalance_dirty_pages_ratelimited -- balance dirty memory state SynopsisSynopsisvoid balance_dirty_pages_ratelimited (mapping);struct address_space * mapping;ArgumentsArgumentsmapping address_space which was dirtied DescriptionDescription Processes which are dirtying memory should call in here once for each page which was newly dirtied. The function will periodically check the system's dirty state and will initiate writeback if needed. On really big machines, get_writeback_state is expensive, so try to avoid calling it too often (ratelimiting). But once we're over the dirty memory limit we decrease the ratelimiting by a lot, to prevent individual processes from overshooting the limit by (ratelimit_pages) each. Namewrite_one_page -- write out a single page and optionally wait on I/O SynopsisSynopsisint write_one_page (page, wait);struct page * page;int wait;ArgumentsArgumentspage the page to write wait if true, wait on writeout DescriptionDescription The page must be locked by the caller and will be unlocked upon return. write_one_page returns a negative error code if I/O failed. DescriptionDescription The page must be locked by the caller and will be unlocked upon return. write_one_page returns a negative error code if I/O failed. Nametruncate_inode_pages_range -- truncate range of pages specified by start and SynopsisSynopsisvoid truncate_inode_pages_range (mapping, lstart, lend);struct address_space * mapping;loff_t lstart;loff_t lend;ArgumentsArgumentsmapping mapping to truncate lstart offset from which to truncate lend offset to which to truncate DescriptionDescription Truncate the page cache, removing the pages that are between specified offsets (and zeroing out partial page (if lstart is not page aligned)). Truncate takes two passes - the first pass is nonblocking. It will not block on page locks and it will not block on writeback. The second pass will wait. This is to prevent as much IO as possible in the affected region. The first pass will remove most pages, so the search cost of the second pass is low. When looking at page->index outside the page lock we need to be careful to copy it into a local to avoid races (it could change at any time). We pass down the cache-hot hint to the page freeing code. Even if the mapping is large, it is probably the case that the final pages are the most recently touched, and freeing happens in ascending file offset order. DescriptionDescription Truncate the page cache, removing the pages that are between specified offsets (and zeroing out partial page (if lstart is not page aligned)). Truncate takes two passes - the first pass is nonblocking. It will not block on page locks and it will not block on writeback. The second pass will wait. This is to prevent as much IO as possible in the affected region. The first pass will remove most pages, so the search cost of the second pass is low. When looking at page->index outside the page lock we need to be careful to copy it into a local to avoid races (it could change at any time). We pass down the cache-hot hint to the page freeing code. Even if the mapping is large, it is probably the case that the final pages are the most recently touched, and freeing happens in ascending file offset order. Nametruncate_inode_pages -- truncate *all* the pages from an offset SynopsisSynopsisvoid truncate_inode_pages (mapping, lstart);struct address_space * mapping;loff_t lstart;ArgumentsArgumentsmapping mapping to truncate lstart offset from which to truncate DescriptionDescription Called under (and serialised by) inode->i_mutex. Nameinvalidate_inode_pages2_range -- remove range of pages from an address_space SynopsisSynopsisint invalidate_inode_pages2_range (mapping, start, end);struct address_space * mapping;pgoff_t start;pgoff_t end;ArgumentsArgumentsmapping the address_space start the page offset 'from' which to invalidate end the page offset 'to' which to invalidate (inclusive) DescriptionDescription Any pages which are found to be mapped into pagetables are unmapped prior to invalidation. Returns -EIO if any pages could not be invalidated. Nameinvalidate_inode_pages2 -- remove all pages from an address_space SynopsisSynopsisint invalidate_inode_pages2 (mapping);struct address_space * mapping;ArgumentsArgumentsmapping the address_space DescriptionDescription Any pages which are found to be mapped into pagetables are unmapped prior to invalidation. Returns -EIO if any pages could not be invalidated. Kernel IPC facilitiesKernel IPC facilitiesChapter 5. Kernel IPC facilitiesIPC utilitiesIPC utilitiesNameipc_init -- initialise IPC subsystem SynopsisSynopsisint __init ipc_init (void); void;ArgumentsArgumentsvoid no arguments DescriptionDescription The various system5 IPC resources (semaphores, messages and shared memory are initialised Nameipc_init_ids -- initialise IPC identifiers SynopsisSynopsisvoid __init ipc_init_ids (ids, size);struct ipc_ids * ids;int size;ArgumentsArgumentsids Identifier set size Number of identifiers DescriptionDescription Given a size for the ipc identifier range (limited below IPCMNI) set up the sequence range to use then allocate and initialise the array itself. Nameipc_init_proc_interface -- Create a proc interface for sysipc types SynopsisSynopsisvoid __init ipc_init_proc_interface (path, header, ids, show);const char * path;const char * header;struct ipc_ids * ids;int (*show) (struct seq_file *, void *);ArgumentsArgumentspath Path in procfs header Banner to be printed at the beginning of the file. ids ipc id table to iterate. show show routine. DescriptionDescription using a seq_file interface. Nameipc_findkey -- find a key in an ipc identifier set SynopsisSynopsisint ipc_findkey (ids, key);struct ipc_ids * ids;key_t key;ArgumentsArgumentsids Identifier set key The key to find DescriptionDescription Requires ipc_ids.sem locked. Returns the identifier if found or -1 if not. Nameipc_addid -- add an IPC identifier SynopsisSynopsisint ipc_addid (ids, new, size);struct ipc_ids * ids;struct kern_ipc_perm * new;int size;ArgumentsArgumentsids IPC identifier set new new IPC permission set size new size limit for the id array DescriptionDescription Add an entry 'new' to the IPC arrays. The permissions object is initialised and the first free entry is set up and the id assigned is returned. The list is returned in a locked state on success. On failure the list is not locked and -1 is returned. Called with ipc_ids.sem held. Nameipc_rmid -- remove an IPC identifier SynopsisSynopsisstruct kern_ipc_perm* ipc_rmid (ids, id);struct ipc_ids * ids;int id;ArgumentsArgumentsids identifier set id Identifier to remove DescriptionDescription The identifier must be valid, and in use. The kernel will panic if fed an invalid identifier. The entry is removed and internal variables recomputed. The object associated with the identifier is returned. ipc_ids.sem and the spinlock for this ID is hold before this function is called, and remain locked on the exit. Nameipc_alloc -- allocate ipc space SynopsisSynopsisvoid* ipc_alloc (size);int size;ArgumentsArgumentssize size desired DescriptionDescription Allocate memory from the appropriate pools and return a pointer to it. NULL is returned if the allocation fails Nameipc_free -- free ipc space SynopsisSynopsisvoid ipc_free (ptr, size);void * ptr;int size;ArgumentsArgumentsptr pointer returned by ipc_alloc size size of block DescriptionDescription Free a block created with ipc_alloc. The caller must know the size used in the allocation call. Nameipc_rcu_alloc -- allocate ipc and rcu space SynopsisSynopsisvoid* ipc_rcu_alloc (size);int size;ArgumentsArgumentssize size desired DescriptionDescription Allocate memory for the rcu header structure + the object. Returns the pointer to the object. NULL is returned if the allocation fails. Nameipc_schedule_free -- free ipc + rcu space SynopsisSynopsisvoid ipc_schedule_free (head);struct rcu_head * head;ArgumentsArgumentshead RCU callback structure for queued work DescriptionDescription Since RCU callback function is called in bh, we need to defer the vfree to schedule_work Nameipc_immediate_free -- free ipc + rcu space SynopsisSynopsisvoid ipc_immediate_free (head);struct rcu_head * head;ArgumentsArgumentshead RCU callback structure that contains pointer to be freed DescriptionDescription Free from the RCU callback context Nameipcperms -- check IPC permissions SynopsisSynopsisint ipcperms (ipcp, flag);struct kern_ipc_perm * ipcp;short flag;ArgumentsArgumentsipcp IPC permission set flag desired permission set. DescriptionDescription Check user, group, other permissions for access to ipc resources. return 0 if allowed Namekernel_to_ipc64_perm -- convert kernel ipc permissions to user SynopsisSynopsisvoid kernel_to_ipc64_perm (in, out);struct kern_ipc_perm * in;struct ipc64_perm * out;ArgumentsArgumentsin kernel permissions out new style IPC permissions DescriptionDescription Turn the kernel object 'in' into a set of permissions descriptions for returning to userspace (out). Nameipc64_perm_to_ipc_perm -- convert old ipc permissions to new SynopsisSynopsisvoid ipc64_perm_to_ipc_perm (in, out);struct ipc64_perm * in;struct ipc_perm * out;ArgumentsArgumentsin new style IPC permissions out old style IPC permissions DescriptionDescription Turn the new style permissions object in into a compatibility object and store it into the 'out' pointer. Nameipc_parse_version -- IPC call version SynopsisSynopsisint ipc_parse_version (cmd);int * cmd;ArgumentsArgumentscmd pointer to command DescriptionDescription Return IPC_64 for new style IPC and IPC_OLD for old style IPC. The cmd value is turned from an encoding command and version into just the command code. FIFO BufferFIFO BufferChapter 6. FIFO Bufferkfifo interfacekfifo interfaceName__kfifo_reset -- removes the entire FIFO contents, no locking version SynopsisSynopsisvoid __kfifo_reset (fifo);struct kfifo * fifo;ArgumentsArgumentsfifo the fifo to be emptied. Namekfifo_reset -- removes the entire FIFO contents SynopsisSynopsisvoid kfifo_reset (fifo);struct kfifo * fifo;ArgumentsArgumentsfifo the fifo to be emptied. Namekfifo_put -- puts some data into the FIFO SynopsisSynopsisunsigned int kfifo_put (fifo, buffer, len);struct kfifo * fifo;unsigned char * buffer;unsigned int len;ArgumentsArgumentsfifo the fifo to be used. buffer the data to be added. len the length of the data to be added. DescriptionDescription This function copies at most 'len' bytes from the 'buffer' into the FIFO depending on the free space, and returns the number of bytes copied. Namekfifo_get -- gets some data from the FIFO SynopsisSynopsisunsigned int kfifo_get (fifo, buffer, len);struct kfifo * fifo;unsigned char * buffer;unsigned int len;ArgumentsArgumentsfifo the fifo to be used. buffer where the data must be copied. len the size of the destination buffer. DescriptionDescription This function copies at most 'len' bytes from the FIFO into the 'buffer' and returns the number of copied bytes. Name__kfifo_len -- returns the number of bytes available in the FIFO, no locking version SynopsisSynopsisunsigned int __kfifo_len (fifo);struct kfifo * fifo;ArgumentsArgumentsfifo the fifo to be used. Namekfifo_len -- returns the number of bytes available in the FIFO SynopsisSynopsisunsigned int kfifo_len (fifo);struct kfifo * fifo;ArgumentsArgumentsfifo the fifo to be used. Namekfifo_init -- allocates a new FIFO using a preallocated buffer SynopsisSynopsisstruct kfifo * kfifo_init (buffer, size, gfp_mask, lock);unsigned char * buffer;unsigned int size;gfp_t gfp_mask;spinlock_t * lock;ArgumentsArgumentsbuffer the preallocated buffer to be used. size the size of the internal buffer, this have to be a power of 2. gfp_mask get_free_pages mask, passed to kmalloc lock the lock to be used to protect the fifo buffer DescriptionDescription Do NOT pass the kfifo to kfifo_free after use ! Simply free the struct kfifo with kfree. Namekfifo_alloc -- allocates a new FIFO and its internal buffer SynopsisSynopsisstruct kfifo * kfifo_alloc (size, gfp_mask, lock);unsigned int size;gfp_t gfp_mask;spinlock_t * lock;ArgumentsArgumentssize the size of the internal buffer to be allocated. gfp_mask get_free_pages mask, passed to kmalloc lock the lock to be used to protect the fifo buffer DescriptionDescription The size will be rounded-up to a power of 2. Namekfifo_free -- frees the FIFO SynopsisSynopsisvoid kfifo_free (fifo);struct kfifo * fifo;ArgumentsArgumentsfifo the fifo to be freed. Name__kfifo_put -- puts some data into the FIFO, no locking version SynopsisSynopsisunsigned int __kfifo_put (fifo, buffer, len);struct kfifo * fifo;unsigned char * buffer;unsigned int len;ArgumentsArgumentsfifo the fifo to be used. buffer the data to be added. len the length of the data to be added. DescriptionDescription This function copies at most 'len' bytes from the 'buffer' into the FIFO depending on the free space, and returns the number of bytes copied. Note that with only one concurrent reader and one concurrent writer, you don't need extra locking to use these functions. Name__kfifo_get -- gets some data from the FIFO, no locking version SynopsisSynopsisunsigned int __kfifo_get (fifo, buffer, len);struct kfifo * fifo;unsigned char * buffer;unsigned int len;ArgumentsArgumentsfifo the fifo to be used. buffer where the data must be copied. len the size of the destination buffer. DescriptionDescription This function copies at most 'len' bytes from the FIFO into the 'buffer' and returns the number of copied bytes. Note that with only one concurrent reader and one concurrent writer, you don't need extra locking to use these functions. The proc filesystemThe proc filesystemChapter 7. The proc filesystemsysctl interfacesysctl interfaceNameregister_sysctl_table -- register a sysctl hierarchy SynopsisSynopsisstruct ctl_table_header * register_sysctl_table (table, insert_at_head);ctl_table * table;int insert_at_head;ArgumentsArgumentstable the top-level table structure insert_at_head whether the entry should be inserted in front or at the end DescriptionDescription Register a sysctl table hierarchy. table should be a filled in ctl_table array. An entry with a ctl_name of 0 terminates the table. The members of the &ctl_table structure are used as follows: ctl_name - This is the numeric sysctl value used by sysctl(2). The number must be unique within that level of sysctl procname - the name of the sysctl file under /proc/sys. Set to NULL to not enter a sysctl file data - a pointer to data for use by proc_handler maxlen - the maximum size in bytes of the data mode - the file permissions for the /proc/sys file, and for sysctl(2) child - a pointer to the child sysctl table if this entry is a directory, or NULL. proc_handler - the text handler routine (described below) strategy - the strategy routine (described below) de - for internal use by the sysctl routines extra1, extra2 - extra pointers usable by the proc handler routines Leaf nodes in the sysctl tree will be represented by a single file under /proc; non-leaf nodes will be represented by directories. sysctl(2) can automatically manage read and write requests through the sysctl table. The data and maxlen fields of the ctl_table struct enable minimal validation of the values being written to be performed, and the mode field allows minimal authentication. More sophisticated management can be enabled by the provision of a strategy routine with the table entry. This will be called before any automatic read or write of the data is performed. The strategy routine may return < 0 - Error occurred (error is passed to user process) 0 - OK - proceed with automatic read or write. > 0 - OK - read or write has been done by the strategy routine, so return immediately. There must be a proc_handler routine for any terminal nodes mirrored under /proc/sys (non-terminals are handled by a built-in directory handler). Several default handlers are available to cover common cases - proc_dostring, proc_dointvec, proc_dointvec_jiffies, proc_dointvec_userhz_jiffies, proc_dointvec_minmax, proc_doulongvec_ms_jiffies_minmax, proc_doulongvec_minmax It is the handler's job to read the input buffer from user memory and process it. The handler should return 0 on success. This routine returns NULL on a failure to register, and a pointer to the table header on success. Nameunregister_sysctl_table -- unregister a sysctl table hierarchy SynopsisSynopsisvoid unregister_sysctl_table (header);struct ctl_table_header * header;ArgumentsArgumentsheader the header returned from register_sysctl_table DescriptionDescription Unregisters the sysctl table and all children. proc entries may not actually be removed until they are no longer used by anyone. Nameproc_dostring -- read a string sysctl SynopsisSynopsisint proc_dostring (table, write, filp, buffer, lenp, ppos);ctl_table * table;int write;struct file * filp;void __user * buffer;size_t * lenp;loff_t * ppos;ArgumentsArgumentstable the sysctl table write TRUE if this is a write to the sysctl file filp the file structure buffer the user buffer lenp the size of the user buffer ppos file position DescriptionDescription Reads/writes a string from/to the user buffer. If the kernel buffer provided is not large enough to hold the string, the string is truncated. The copied string is NULL-terminated. If the string is being read by the user process, it is copied and a newline '\n' is added. It is truncated if the buffer is not large enough. Returns 0 on success. Nameproc_dointvec -- read a vector of integers SynopsisSynopsisint proc_dointvec (table, write, filp, buffer, lenp, ppos);ctl_table * table;int write;struct file * filp;void __user * buffer;size_t * lenp;loff_t * ppos;ArgumentsArgumentstable the sysctl table write TRUE if this is a write to the sysctl file filp the file structure buffer the user buffer lenp the size of the user buffer ppos file position DescriptionDescription Reads/writes up to table->maxlen/sizeof(unsigned int) integer values from/to the user buffer, treated as an ASCII string. Returns 0 on success. Nameproc_dointvec_minmax -- read a vector of integers with min/max values SynopsisSynopsisint proc_dointvec_minmax (table, write, filp, buffer, lenp, ppos);ctl_table * table;int write;struct file * filp;void __user * buffer;size_t * lenp;loff_t * ppos;ArgumentsArgumentstable the sysctl table write TRUE if this is a write to the sysctl file filp the file structure buffer the user buffer lenp the size of the user buffer ppos file position DescriptionDescription Reads/writes up to table->maxlen/sizeof(unsigned int) integer values from/to the user buffer, treated as an ASCII string. This routine will ensure the values are within the range specified by table->extra1 (min) and table->extra2 (max). Returns 0 on success. Nameproc_doulongvec_minmax -- read a vector of long integers with min/max values SynopsisSynopsisint proc_doulongvec_minmax (table, write, filp, buffer, lenp, ppos);ctl_table * table;int write;struct file * filp;void __user * buffer;size_t * lenp;loff_t * ppos;ArgumentsArgumentstable the sysctl table write TRUE if this is a write to the sysctl file filp the file structure buffer the user buffer lenp the size of the user buffer ppos file position DescriptionDescription Reads/writes up to table->maxlen/sizeof(unsigned long) unsigned long values from/to the user buffer, treated as an ASCII string. This routine will ensure the values are within the range specified by table->extra1 (min) and table->extra2 (max). Returns 0 on success. Nameproc_doulongvec_ms_jiffies_minmax -- read a vector of millisecond values with min/max values SynopsisSynopsisint proc_doulongvec_ms_jiffies_minmax (table, write, filp, buffer, lenp, ppos);ctl_table * table;int write;struct file * filp;void __user * buffer;size_t * lenp;loff_t * ppos;ArgumentsArgumentstable the sysctl table write TRUE if this is a write to the sysctl file filp the file structure buffer the user buffer lenp the size of the user buffer ppos file position DescriptionDescription Reads/writes up to table->maxlen/sizeof(unsigned long) unsigned long values from/to the user buffer, treated as an ASCII string. The values are treated as milliseconds, and converted to jiffies when they are stored. This routine will ensure the values are within the range specified by table->extra1 (min) and table->extra2 (max). Returns 0 on success. Nameproc_dointvec_jiffies -- read a vector of integers as seconds SynopsisSynopsisint proc_dointvec_jiffies (table, write, filp, buffer, lenp, ppos);ctl_table * table;int write;struct file * filp;void __user * buffer;size_t * lenp;loff_t * ppos;ArgumentsArgumentstable the sysctl table write TRUE if this is a write to the sysctl file filp the file structure buffer the user buffer lenp the size of the user buffer ppos file position DescriptionDescription Reads/writes up to table->maxlen/sizeof(unsigned int) integer values from/to the user buffer, treated as an ASCII string. The values read are assumed to be in seconds, and are converted into jiffies. Returns 0 on success. Nameproc_dointvec_userhz_jiffies -- read a vector of integers as 1/USER_HZ seconds SynopsisSynopsisint proc_dointvec_userhz_jiffies (table, write, filp, buffer, lenp, ppos);ctl_table * table;int write;struct file * filp;void __user * buffer;size_t * lenp;loff_t * ppos;ArgumentsArgumentstable the sysctl table write TRUE if this is a write to the sysctl file filp the file structure buffer the user buffer lenp the size of the user buffer ppos pointer to the file position DescriptionDescription Reads/writes up to table->maxlen/sizeof(unsigned int) integer values from/to the user buffer, treated as an ASCII string. The values read are assumed to be in 1/USER_HZ seconds, and are converted into jiffies. Returns 0 on success. Nameproc_dointvec_ms_jiffies -- read a vector of integers as 1 milliseconds SynopsisSynopsisint proc_dointvec_ms_jiffies (table, write, filp, buffer, lenp, ppos);ctl_table * table;int write;struct file * filp;void __user * buffer;size_t * lenp;loff_t * ppos;ArgumentsArgumentstable the sysctl table write TRUE if this is a write to the sysctl file filp the file structure buffer the user buffer lenp the size of the user buffer ppos the current position in the file DescriptionDescription Reads/writes up to table->maxlen/sizeof(unsigned int) integer values from/to the user buffer, treated as an ASCII string. The values read are assumed to be in 1/1000 seconds, and are converted into jiffies. Returns 0 on success. proc filesystem interfaceproc filesystem interfaceNameproc_pid_unhash -- Unhash /proc/pid entry from the dcache. SynopsisSynopsisstruct dentry * proc_pid_unhash (p);struct task_struct * p;ArgumentsArgumentsp task that should be flushed. DescriptionDescription Drops the /proc/pid dcache entry from the hash chains. Dropping /proc/pid entries and detach_pid must be synchroneous, otherwise e.g. /proc/pid/exe might point to the wrong executable, if the pid value is immediately reused. This is enforced by - caller must acquire spin_lock(p->proc_lock) - must be called before detach_pid - proc_pid_lookup acquires proc_lock, and checks that the target is not dead by looking at the attach count of PIDTYPE_PID. Nameproc_pid_flush -- recover memory used by stale /proc/pid/x entries SynopsisSynopsisvoid proc_pid_flush (proc_dentry);struct dentry * proc_dentry;ArgumentsArgumentsproc_dentry directoy to prune. DescriptionDescription Shrink the /proc directory that was used by the just killed thread. The debugfs filesystemThe debugfs filesystemChapter 8. The debugfs filesystemdebugfs interfacedebugfs interfaceNamedebugfs_create_file -- create a file in the debugfs filesystem SynopsisSynopsisstruct dentry * debugfs_create_file (name, mode, parent, data, fops);const char * name;mode_t mode;struct dentry * parent;void * data;struct file_operations * fops;ArgumentsArgumentsname a pointer to a string containing the name of the file to create. mode the permission that the file should have parent a pointer to the parent dentry for this file. This should be a directory dentry if set. If this paramater is NULL, then the file will be created in the root of the debugfs filesystem. data a pointer to something that the caller will want to get to later on. The inode.u.generic_ip pointer will point to this value on the open call. fops a pointer to a struct file_operations that should be used for this file. DescriptionDescription This is the basic “create a file” function for debugfs. It allows for a wide range of flexibility in createing a file, or a directory (if you want to create a directory, the debugfs_create_dir function is recommended to be used instead.) This function will return a pointer to a dentry if it succeeds. This pointer must be passed to the debugfs_remove function when the file is to be removed (no automatic cleanup happens if your module is unloaded, you are responsible here.) If an error occurs, NULL will be returned. If debugfs is not enabled in the kernel, the value -ENODEV will be returned. It is not wise to check for this value, but rather, check for NULL or !NULL instead as to eliminate the need for #ifdef in the calling code. DescriptionDescription This is the basic “create a file” function for debugfs. It allows for a wide range of flexibility in createing a file, or a directory (if you want to create a directory, the debugfs_create_dir function is recommended to be used instead.) This function will return a pointer to a dentry if it succeeds. This pointer must be passed to the debugfs_remove function when the file is to be removed (no automatic cleanup happens if your module is unloaded, you are responsible here.) If an error occurs, NULL will be returned. If debugfs is not enabled in the kernel, the value -ENODEV will be returned. It is not wise to check for this value, but rather, check for NULL or !NULL instead as to eliminate the need for #ifdef in the calling code. Namedebugfs_create_dir -- create a directory in the debugfs filesystem SynopsisSynopsisstruct dentry * debugfs_create_dir (name, parent);const char * name;struct dentry * parent;ArgumentsArgumentsname a pointer to a string containing the name of the directory to create. parent a pointer to the parent dentry for this file. This should be a directory dentry if set. If this paramater is NULL, then the directory will be created in the root of the debugfs filesystem. DescriptionDescription This function creates a directory in debugfs with the given name. This function will return a pointer to a dentry if it succeeds. This pointer must be passed to the debugfs_remove function when the file is to be removed (no automatic cleanup happens if your module is unloaded, you are responsible here.) If an error occurs, NULL will be returned. If debugfs is not enabled in the kernel, the value -ENODEV will be returned. It is not wise to check for this value, but rather, check for NULL or !NULL instead as to eliminate the need for #ifdef in the calling code. DescriptionDescription This function creates a directory in debugfs with the given name. This function will return a pointer to a dentry if it succeeds. This pointer must be passed to the debugfs_remove function when the file is to be removed (no automatic cleanup happens if your module is unloaded, you are responsible here.) If an error occurs, NULL will be returned. If debugfs is not enabled in the kernel, the value -ENODEV will be returned. It is not wise to check for this value, but rather, check for NULL or !NULL instead as to eliminate the need for #ifdef in the calling code. Namedebugfs_remove -- removes a file or directory from the debugfs filesystem SynopsisSynopsisvoid debugfs_remove (dentry);struct dentry * dentry;ArgumentsArgumentsdentry a pointer to a the dentry of the file or directory to be removed. DescriptionDescription This function removes a file or directory in debugfs that was previously created with a call to another debugfs function (like debufs_create_file or variants thereof.) This function is required to be called in order for the file to be removed, no automatic cleanup of files will happen when a module is removed, you are responsible here. DescriptionDescription This function removes a file or directory in debugfs that was previously created with a call to another debugfs function (like debufs_create_file or variants thereof.) This function is required to be called in order for the file to be removed, no automatic cleanup of files will happen when a module is removed, you are responsible here. Namedebugfs_create_u8 -- create a file in the debugfs filesystem that is used to read and write a unsigned 8 bit value. SynopsisSynopsisstruct dentry * debugfs_create_u8 (name, mode, parent, value);const char * name;mode_t mode;struct dentry * parent;u8 * value;ArgumentsArgumentsname a pointer to a string containing the name of the file to create. mode the permission that the file should have parent a pointer to the parent dentry for this file. This should be a directory dentry if set. If this paramater is NULL, then the file will be created in the root of the debugfs filesystem. value a pointer to the variable that the file should read to and write from. DescriptionDescription This function creates a file in debugfs with the given name that contains the value of the variable value. If the mode variable is so set, it can be read from, and written to. This function will return a pointer to a dentry if it succeeds. This pointer must be passed to the debugfs_remove function when the file is to be removed (no automatic cleanup happens if your module is unloaded, you are responsible here.) If an error occurs, NULL will be returned. If debugfs is not enabled in the kernel, the value -ENODEV will be returned. It is not wise to check for this value, but rather, check for NULL or !NULL instead as to eliminate the need for #ifdef in the calling code. DescriptionDescription This function creates a file in debugfs with the given name that contains the value of the variable value. If the mode variable is so set, it can be read from, and written to. This function will return a pointer to a dentry if it succeeds. This pointer must be passed to the debugfs_remove function when the file is to be removed (no automatic cleanup happens if your module is unloaded, you are responsible here.) If an error occurs, NULL will be returned. If debugfs is not enabled in the kernel, the value -ENODEV will be returned. It is not wise to check for this value, but rather, check for NULL or !NULL instead as to eliminate the need for #ifdef in the calling code. Namedebugfs_create_u16 -- create a file in the debugfs filesystem that is used to read and write a unsigned 8 bit value. SynopsisSynopsisstruct dentry * debugfs_create_u16 (name, mode, parent, value);const char * name;mode_t mode;struct dentry * parent;u16 * value;ArgumentsArgumentsname a pointer to a string containing the name of the file to create. mode the permission that the file should have parent a pointer to the parent dentry for this file. This should be a directory dentry if set. If this paramater is NULL, then the file will be created in the root of the debugfs filesystem. value a pointer to the variable that the file should read to and write from. DescriptionDescription This function creates a file in debugfs with the given name that contains the value of the variable value. If the mode variable is so set, it can be read from, and written to. This function will return a pointer to a dentry if it succeeds. This pointer must be passed to the debugfs_remove function when the file is to be removed (no automatic cleanup happens if your module is unloaded, you are responsible here.) If an error occurs, NULL will be returned. If debugfs is not enabled in the kernel, the value -ENODEV will be returned. It is not wise to check for this value, but rather, check for NULL or !NULL instead as to eliminate the need for #ifdef in the calling code. DescriptionDescription This function creates a file in debugfs with the given name that contains the value of the variable value. If the mode variable is so set, it can be read from, and written to. This function will return a pointer to a dentry if it succeeds. This pointer must be passed to the debugfs_remove function when the file is to be removed (no automatic cleanup happens if your module is unloaded, you are responsible here.) If an error occurs, NULL will be returned. If debugfs is not enabled in the kernel, the value -ENODEV will be returned. It is not wise to check for this value, but rather, check for NULL or !NULL instead as to eliminate the need for #ifdef in the calling code. Namedebugfs_create_u32 -- create a file in the debugfs filesystem that is used to read and write a unsigned 8 bit value. SynopsisSynopsisstruct dentry * debugfs_create_u32 (name, mode, parent, value);const char * name;mode_t mode;struct dentry * parent;u32 * value;ArgumentsArgumentsname a pointer to a string containing the name of the file to create. mode the permission that the file should have parent a pointer to the parent dentry for this file. This should be a directory dentry if set. If this paramater is NULL, then the file will be created in the root of the debugfs filesystem. value a pointer to the variable that the file should read to and write from. DescriptionDescription This function creates a file in debugfs with the given name that contains the value of the variable value. If the mode variable is so set, it can be read from, and written to. This function will return a pointer to a dentry if it succeeds. This pointer must be passed to the debugfs_remove function when the file is to be removed (no automatic cleanup happens if your module is unloaded, you are responsible here.) If an error occurs, NULL will be returned. If debugfs is not enabled in the kernel, the value -ENODEV will be returned. It is not wise to check for this value, but rather, check for NULL or !NULL instead as to eliminate the need for #ifdef in the calling code. DescriptionDescription This function creates a file in debugfs with the given name that contains the value of the variable value. If the mode variable is so set, it can be read from, and written to. This function will return a pointer to a dentry if it succeeds. This pointer must be passed to the debugfs_remove function when the file is to be removed (no automatic cleanup happens if your module is unloaded, you are responsible here.) If an error occurs, NULL will be returned. If debugfs is not enabled in the kernel, the value -ENODEV will be returned. It is not wise to check for this value, but rather, check for NULL or !NULL instead as to eliminate the need for #ifdef in the calling code. Namedebugfs_create_bool -- create a file in the debugfs filesystem that is used to read and write a boolean value. SynopsisSynopsisstruct dentry * debugfs_create_bool (name, mode, parent, value);const char * name;mode_t mode;struct dentry * parent;u32 * value;ArgumentsArgumentsname a pointer to a string containing the name of the file to create. mode the permission that the file should have parent a pointer to the parent dentry for this file. This should be a directory dentry if set. If this paramater is NULL, then the file will be created in the root of the debugfs filesystem. value a pointer to the variable that the file should read to and write from. DescriptionDescription This function creates a file in debugfs with the given name that contains the value of the variable value. If the mode variable is so set, it can be read from, and written to. This function will return a pointer to a dentry if it succeeds. This pointer must be passed to the debugfs_remove function when the file is to be removed (no automatic cleanup happens if your module is unloaded, you are responsible here.) If an error occurs, NULL will be returned. If debugfs is not enabled in the kernel, the value -ENODEV will be returned. It is not wise to check for this value, but rather, check for NULL or !NULL instead as to eliminate the need for #ifdef in the calling code. DescriptionDescription This function creates a file in debugfs with the given name that contains the value of the variable value. If the mode variable is so set, it can be read from, and written to. This function will return a pointer to a dentry if it succeeds. This pointer must be passed to the debugfs_remove function when the file is to be removed (no automatic cleanup happens if your module is unloaded, you are responsible here.) If an error occurs, NULL will be returned. If debugfs is not enabled in the kernel, the value -ENODEV will be returned. It is not wise to check for this value, but rather, check for NULL or !NULL instead as to eliminate the need for #ifdef in the calling code. The Linux VFSThe Linux VFSChapter 9. The Linux VFSThe Filesystem typesThe Filesystem typesNameenum positive_aop_returns -- aop return codes with specific semantics SynopsisSynopsis enum positive_aop_returns { AOP_WRITEPAGE_ACTIVATE, AOP_TRUNCATED_PAGE }; ConstantsConstantsAOP_WRITEPAGE_ACTIVATE Informs the caller that page writeback has completed, that the page is still locked, and should be considered active. The VM uses this hint to return the page to the active list -- it won't be a candidate for writeback again in the near future. Other callers must be careful to unlock the page if they get this return. Returned by writepage; AOP_TRUNCATED_PAGE The AOP method that was handed a locked page has unlocked it and the page might have been truncated. The caller should back up to acquiring a new page and trying again. The aop will be taking reasonable precautions not to livelock. If the caller held a page reference, it should drop it before retrying. Returned by readpage, prepare_write, and commit_write. DescriptionDescription address_space_operation functions return these large constants to indicate special semantics to the caller. These are much larger than the bytes in a page to allow for functions that return the number of bytes operated on in a given page. DescriptionDescription address_space_operation functions return these large constants to indicate special semantics to the caller. These are much larger than the bytes in a page to allow for functions that return the number of bytes operated on in a given page. Namestruct export_operations -- for nfsd to communicate with file systems SynopsisSynopsis struct export_operations { struct dentry *(* decode_fh) (struct super_block *sb, __u32 *fh, int fh_len, int fh_type,int (*acceptable); int (* encode_fh) (struct dentry *de, __u32 *fh, int *max_len,int connectable); int (* get_name) (struct dentry *parent, char *name,struct dentry *child); struct dentry * (* get_parent) (struct dentry *child); struct dentry * (* get_dentry) (struct super_block *sb, void *inump); struct dentry * (* find_exported_dentry) (struct super_block *sb, void *obj, void *parent,int (*acceptable); }; MembersMembersdecode_fh decode a file handle fragment and return a &struct dentry encode_fh encode a file handle fragment from a dentry get_name find the name for a given inode in a given directory get_parent find the parent of a given directory get_dentry find a dentry for the inode given a file handle sub-fragment find_exported_dentry set by the exporting module to a standard helper function. DescriptionDescription The export_operations structure provides a means for nfsd to communicate with a particular exported file system - particularly enabling nfsd and the filesystem to co-operate when dealing with file handles. export_operations contains two basic operation for dealing with file handles, decode_fh and encode_fh, and allows for some other operations to be defined which standard helper routines use to get specific information from the filesystem. nfsd encodes information use to determine which filesystem a filehandle applies to in the initial part of the file handle. The remainder, termed a file handle fragment, is controlled completely by the filesystem. The standard helper routines assume that this fragment will contain one or two sub-fragments, one which identifies the file, and one which may be used to identify the (a) directory containing the file. In some situations, nfsd needs to get a dentry which is connected into a specific part of the file tree. To allow for this, it passes the function acceptable together with a context which can be used to see if the dentry is acceptable. As there can be multiple dentrys for a given file, the filesystem should check each one for acceptability before looking for the next. As soon as an acceptable one is found, it should be returned. decode_fhdecode_fh decode_fh is given a &struct super_block (sb), a file handle fragment (fh, fh_len) and an acceptability testing function (acceptable, context). It should return a &struct dentry which refers to the same file that the file handle fragment refers to, and which passes the acceptability test. If it cannot, it should return a NULL pointer if the file was found but no acceptable &dentries were available, or a ERR_PTR error code indicating why it couldn't be found (e.g. ENOENT or ENOMEM). encode_fhencode_fh encode_fh should store in the file handle fragment fh (using at most max_len bytes) information that can be used by decode_fh to recover the file refered to by the &struct dentry de. If the connectable flag is set, the encode_fh should store sufficient information so that a good attempt can be made to find not only the file but also it's place in the filesystem. This typically means storing a reference to de->d_parent in the filehandle fragment. encode_fh should return the number of bytes stored or a negative error code such as -ENOSPC get_nameget_name get_name should find a name for the given child in the given parent directory. The name should be stored in the name (with the understanding that it is already pointing to a a NAME_MAX+1 sized buffer. get_name should return 0 on success, a negative error code or error. get_name will be called without parent->i_mutex held. get_parentget_parent get_parent should find the parent directory for the given child which is also a directory. In the event that it cannot be found, or storage space cannot be allocated, a ERR_PTR should be returned. get_dentryget_dentry Given a &super_block (sb) and a pointer to a file-system specific inode identifier, possibly an inode number, (inump) get_dentry should find the identified inode and return a dentry for that inode. Any suitable dentry can be returned including, if necessary, a new dentry created with d_alloc_root. The caller can then find any other extant dentrys by following the d_alias links. If a new dentry was created using d_alloc_root, DCACHE_NFSD_DISCONNECTED should be set, and the dentry should be d_rehashed. If the inode cannot be found, either a NULL pointer or an ERR_PTR code can be returned. The inump will be whatever was passed to nfsd_find_fh_dentry in either the obj or parent parameters. Locking rulesLocking rules get_parent is called with child->d_inode->i_mutex down get_name is not (which is possibly inconsistent) The Directory CacheThe Directory CacheNamed_invalidate -- invalidate a dentry SynopsisSynopsisint d_invalidate (dentry);struct dentry * dentry;ArgumentsArgumentsdentry dentry to invalidate DescriptionDescription Try to invalidate the dentry if it turns out to be possible. If there are other dentries that can be reached through this one we can't delete it and we return -EBUSY. On success we return 0. no dcache lock. Nameshrink_dcache_sb -- shrink dcache for a superblock SynopsisSynopsisvoid shrink_dcache_sb (sb);struct super_block * sb;ArgumentsArgumentssb superblock DescriptionDescription Shrink the dcache for the specified super block. This is used to free the dcache before unmounting a file system Namehave_submounts -- check for mounts over a dentry SynopsisSynopsisint have_submounts (parent);struct dentry * parent;ArgumentsArgumentsparent dentry to check. DescriptionDescription Return true if the parent or its subdirectories contain a mount point Nameshrink_dcache_parent -- prune dcache SynopsisSynopsisvoid shrink_dcache_parent (parent);struct dentry * parent;ArgumentsArgumentsparent parent of entries to prune DescriptionDescription Prune the dcache to remove unused children of the parent dentry. Named_alloc -- allocate a dcache entry SynopsisSynopsisstruct dentry * d_alloc (parent, name);struct dentry * parent;const struct qstr * name;ArgumentsArgumentsparent parent of entry to allocate name qstr of the name DescriptionDescription Allocates a dentry. It returns NULL if there is insufficient memory available. On a success the dentry is returned. The name passed in is copied and the copy passed in may be reused after this call. Named_instantiate -- fill in inode information for a dentry SynopsisSynopsisvoid d_instantiate (entry, inode);struct dentry * entry;struct inode * inode;ArgumentsArgumentsentry dentry to complete inode inode to attach to this dentry DescriptionDescription Fill in inode information in the entry. This turns negative dentries into productive full members of society. NOTE! This assumes that the inode count has been incremented (or otherwise set) by the caller to indicate that it is now in use by the dcache. Named_instantiate_unique -- instantiate a non-aliased dentry SynopsisSynopsisstruct dentry * d_instantiate_unique (entry, inode);struct dentry * entry;struct inode * inode;ArgumentsArgumentsentry dentry to instantiate inode inode to attach to this dentry DescriptionDescription Fill in inode information in the entry. On success, it returns NULL. If an unhashed alias of “entry” already exists, then we return the aliased dentry instead and drop one reference to inode. Note that in order to avoid conflicts with rename etc, the caller had better be holding the parent directory semaphore. This also assumes that the inode count has been incremented (or otherwise set) by the caller to indicate that it is now in use by the dcache. Named_alloc_root -- allocate root dentry SynopsisSynopsisstruct dentry * d_alloc_root (root_inode);struct inode * root_inode;ArgumentsArgumentsroot_inode inode to allocate the root for DescriptionDescription Allocate a root (“/”) dentry for the inode given. The inode is instantiated and returned. NULL is returned if there is insufficient memory or the inode passed is NULL. Named_alloc_anon -- allocate an anonymous dentry SynopsisSynopsisstruct dentry * d_alloc_anon (inode);struct inode * inode;ArgumentsArgumentsinode inode to allocate the dentry for DescriptionDescription This is similar to d_alloc_root. It is used by filesystems when creating a dentry for a given inode, often in the process of mapping a filehandle to a dentry. The returned dentry may be anonymous, or may have a full name (if the inode was already in the cache). The file system may need to make further efforts to connect this dentry into the dcache properly. When called on a directory inode, we must ensure that the inode only ever has one dentry. If a dentry is found, that is returned instead of allocating a new one. On successful return, the reference to the inode has been transferred to the dentry. If NULL is returned (indicating kmalloc failure), the reference on the inode has not been released. Named_splice_alias -- splice a disconnected dentry into the tree if one exists SynopsisSynopsisstruct dentry * d_splice_alias (inode, dentry);struct inode * inode;struct dentry * dentry;ArgumentsArgumentsinode the inode which may have a disconnected dentry dentry a negative dentry which we want to point to the inode. DescriptionDescription If inode is a directory and has a 'disconnected' dentry (i.e. IS_ROOT and DCACHE_DISCONNECTED), then d_move that in place of the given dentry and return it, else simply d_add the inode to the dentry and return NULL. This is needed in the lookup routine of any filesystem that is exportable (via knfsd) so that we can build dcache paths to directories effectively. If a dentry was found and moved, then it is returned. Otherwise NULL is returned. This matches the expected return value of ->lookup. Named_lookup -- search for a dentry SynopsisSynopsisstruct dentry * d_lookup (parent, name);struct dentry * parent;struct qstr * name;ArgumentsArgumentsparent parent dentry name qstr of name we wish to find DescriptionDescription Searches the children of the parent dentry for the name in question. If the dentry is found its reference count is incremented and the dentry is returned. The caller must use d_put to free the entry when it has finished using it. NULL is returned on failure. __d_lookup is dcache_lock free. The hash list is protected using RCU. Memory barriers are used while updating and doing lockless traversal. To avoid races with d_move while rename is happening, d_lock is used. Overflows in memcmp, while d_move, are avoided by keeping the length and name pointer in one structure pointed by d_qstr. rcu_read_lock and rcu_read_unlock are used to disable preemption while lookup is going on. dentry_unused list is not updated even if lookup finds the required dentry in there. It is updated in places such as prune_dcache, shrink_dcache_sb, select_parent and __dget_locked. This laziness saves lookup from dcache_lock acquisition. d_lookup is protected against the concurrent renames in some unrelated directory using the seqlockt_t rename_lock. Named_validate -- verify dentry provided from insecure source SynopsisSynopsisint d_validate (dentry, dparent);struct dentry * dentry;struct dentry * dparent;ArgumentsArgumentsdentry The dentry alleged to be valid child of dparent dparent The parent dentry (known to be valid) DescriptionDescription An insecure source has sent us a dentry, here we verify it and dget it. This is used by ncpfs in its readdir implementation. Zero is returned in the dentry is invalid. Named_delete -- delete a dentry SynopsisSynopsisvoid d_delete (dentry);struct dentry * dentry;ArgumentsArgumentsdentry The dentry to delete DescriptionDescription Turn the dentry into a negative dentry if possible, otherwise remove it from the hash queues so it can be deleted later Named_rehash -- add an entry back to the hash SynopsisSynopsisvoid d_rehash (entry);struct dentry * entry;ArgumentsArgumentsentry dentry to add to the hash DescriptionDescription Adds a dentry to the hash according to its name. Named_move -- move a dentry SynopsisSynopsisvoid d_move (dentry, target);struct dentry * dentry;struct dentry * target;ArgumentsArgumentsdentry entry to move target new dentry DescriptionDescription Update the dcache to reflect the move of a file name. Negative dcache entries should not be moved in this way. Namefind_inode_number -- check for dentry with name SynopsisSynopsisino_t find_inode_number (dir, name);struct dentry * dir;struct qstr * name;ArgumentsArgumentsdir directory to check name Name to find. DescriptionDescription Check whether a dentry already exists for the given name, and return the inode number if it has an inode. Otherwise 0 is returned. This routine is used to post-process directory listings for filesystems using synthetic inode numbers, and is necessary to keep getcwd working. Name__d_drop -- drop a dentry SynopsisSynopsisvoid __d_drop (dentry);struct dentry * dentry;ArgumentsArgumentsdentry dentry to drop DescriptionDescription d_drop unhashes the entry from the parent dentry hashes, so that it won't be found through a VFS lookup any more. Note that this is different from deleting the dentry - d_delete will try to mark the dentry negative if possible, giving a successful _negative_ lookup, while d_drop will just make the cache lookup fail. d_drop is used mainly for stuff that wants to invalidate a dentry for some reason (NFS timeouts or autofs deletes). __d_drop requires dentry->d_lock. Named_add -- add dentry to hash queues SynopsisSynopsisvoid d_add (entry, inode);struct dentry * entry;struct inode * inode;ArgumentsArgumentsentry dentry to add inode The inode to attach to this dentry DescriptionDescription This adds the entry to the hash queues and initializes inode. The entry was actually filled in earlier during d_alloc. Named_add_unique -- add dentry to hash queues without aliasing SynopsisSynopsisstruct dentry * d_add_unique (entry, inode);struct dentry * entry;struct inode * inode;ArgumentsArgumentsentry dentry to add inode The inode to attach to this dentry DescriptionDescription This adds the entry to the hash queues and initializes inode. The entry was actually filled in earlier during d_alloc. Namedget -- get a reference to a dentry SynopsisSynopsisstruct dentry * dget (dentry);struct dentry * dentry;ArgumentsArgumentsdentry dentry to get a reference to DescriptionDescription Given a dentry or NULL pointer increment the reference count if appropriate and return the dentry. A dentry will not be destroyed when it has references. dget should never be called for dentries with zero reference counter. For these cases (preferably none, functions in dcache.c are sufficient for normal needs and they take necessary precautions) you should hold dcache_lock and call dget_locked instead of dget. Named_unhashed -- is dentry hashed SynopsisSynopsisint d_unhashed (dentry);struct dentry * dentry;ArgumentsArgumentsdentry entry to check DescriptionDescription Returns true if the dentry passed is not currently hashed. Inode HandlingInode HandlingNameclear_inode -- clear an inode SynopsisSynopsisvoid clear_inode (inode);struct inode * inode;ArgumentsArgumentsinode inode to clear DescriptionDescription This is called by the filesystem to tell us that the inode is no longer useful. We just terminate it with extreme prejudice. Nameinvalidate_inodes -- discard the inodes on a device SynopsisSynopsisint invalidate_inodes (sb);struct super_block * sb;ArgumentsArgumentssb superblock DescriptionDescription Discard all of the inodes for a given superblock. If the discard fails because there are busy inodes then a non zero value is returned. If the discard is successful all the inodes have been discarded. Namenew_inode -- obtain an inode SynopsisSynopsisstruct inode * new_inode (sb);struct super_block * sb;ArgumentsArgumentssb superblock DescriptionDescription Allocates a new inode for given superblock. Nameiunique -- get a unique inode number SynopsisSynopsisino_t iunique (sb, max_reserved);struct super_block * sb;ino_t max_reserved;ArgumentsArgumentssb superblock max_reserved highest reserved inode number DescriptionDescription Obtain an inode number that is unique on the system for a given superblock. This is used by file systems that have no natural permanent inode numbering system. An inode number is returned that is higher than the reserved limit but unique. BUGSBUGS With a large number of inodes live on the file system this function currently becomes quite slow. Nameilookup5_nowait -- search for an inode in the inode cache SynopsisSynopsisstruct inode * ilookup5_nowait (sb, hashval, test, data);struct super_block * sb;unsigned long hashval;int (*test) (struct inode *, void *);void * data;ArgumentsArgumentssb super block of file system to search hashval hash value (usually inode number) to search for test callback used for comparisons between inodes data opaque data pointer to pass to test DescriptionDescription ilookup5 uses ifind to search for the inode specified by hashval and data in the inode cache. This is a generalized version of ilookup for file systems where the inode number is not sufficient for unique identification of an inode. If the inode is in the cache, the inode is returned with an incremented reference count. Note, the inode lock is not waited upon so you have to be very careful what you do with the returned inode. You probably should be using ilookup5 instead. Otherwise NULL is returned. Note, test is called with the inode_lock held, so can't sleep. Nameilookup5 -- search for an inode in the inode cache SynopsisSynopsisstruct inode * ilookup5 (sb, hashval, test, data);struct super_block * sb;unsigned long hashval;int (*test) (struct inode *, void *);void * data;ArgumentsArgumentssb super block of file system to search hashval hash value (usually inode number) to search for test callback used for comparisons between inodes data opaque data pointer to pass to test DescriptionDescription ilookup5 uses ifind to search for the inode specified by hashval and data in the inode cache. This is a generalized version of ilookup for file systems where the inode number is not sufficient for unique identification of an inode. If the inode is in the cache, the inode lock is waited upon and the inode is returned with an incremented reference count. Otherwise NULL is returned. Note, test is called with the inode_lock held, so can't sleep. Nameilookup -- search for an inode in the inode cache SynopsisSynopsisstruct inode * ilookup (sb, ino);struct super_block * sb;unsigned long ino;ArgumentsArgumentssb super block of file system to search ino inode number to search for DescriptionDescription ilookup uses ifind_fast to search for the inode ino in the inode cache. This is for file systems where the inode number is sufficient for unique identification of an inode. If the inode is in the cache, the inode is returned with an incremented reference count. Otherwise NULL is returned. Nameiget5_locked -- obtain an inode from a mounted file system SynopsisSynopsisstruct inode * iget5_locked (sb, hashval, test, set, data);struct super_block * sb;unsigned long hashval;int (*test) (struct inode *, void *);int (*set) (struct inode *, void *);void * data;ArgumentsArgumentssb super block of file system hashval hash value (usually inode number) to get test callback used for comparisons between inodes set callback used to initialize a new struct inode data opaque data pointer to pass to test and set DescriptionDescription This is iget without the read_inode portion of get_new_inode. iget5_locked uses ifind to search for the inode specified by hashval and data in the inode cache and if present it is returned with an increased reference count. This is a generalized version of iget_locked for file systems where the inode number is not sufficient for unique identification of an inode. If the inode is not in cache, get_new_inode is called to allocate a new inode and this is returned locked, hashed, and with the I_NEW flag set. The file system gets to fill it in before unlocking it via unlock_new_inode. Note both test and set are called with the inode_lock held, so can't sleep. Nameiget_locked -- obtain an inode from a mounted file system SynopsisSynopsisstruct inode * iget_locked (sb, ino);struct super_block * sb;unsigned long ino;ArgumentsArgumentssb super block of file system ino inode number to get DescriptionDescription This is iget without the read_inode portion of get_new_inode_fast. iget_locked uses ifind_fast to search for the inode specified by ino in the inode cache and if present it is returned with an increased reference count. This is for file systems where the inode number is sufficient for unique identification of an inode. If the inode is not in cache, get_new_inode_fast is called to allocate a new inode and this is returned locked, hashed, and with the I_NEW flag set. The file system gets to fill it in before unlocking it via unlock_new_inode. Name__insert_inode_hash -- hash an inode SynopsisSynopsisvoid __insert_inode_hash (inode, hashval);struct inode * inode;unsigned long hashval;ArgumentsArgumentsinode unhashed inode hashval unsigned long value used to locate this object in the inode_hashtable. DescriptionDescription Add an inode to the inode hash for this superblock. Nameremove_inode_hash -- remove an inode from the hash SynopsisSynopsisvoid remove_inode_hash (inode);struct inode * inode;ArgumentsArgumentsinode inode to unhash DescriptionDescription Remove an inode from the superblock. Nameiput -- put an inode SynopsisSynopsisvoid iput (inode);struct inode * inode;ArgumentsArgumentsinode inode to put DescriptionDescription Puts an inode, dropping its usage count. If the inode use count hits zero, the inode is then freed and may also be destroyed. Consequently, iput can sleep. Namebmap -- find a block number in a file SynopsisSynopsissector_t bmap (inode, block);struct inode * inode;sector_t block;ArgumentsArgumentsinode inode of file block block to find DescriptionDescription Returns the block number on the device holding the inode that is the disk block number for the block of the file requested. That is, asked for block 4 of inode 1 the function will return the disk block relative to the disk start that holds that block of the file. Nametouch_atime -- update the access time SynopsisSynopsisvoid touch_atime (mnt, dentry);struct vfsmount * mnt;struct dentry * dentry;ArgumentsArgumentsmnt mount the inode is accessed on dentry dentry accessed DescriptionDescription Update the accessed time on an inode and mark it for writeback. This function automatically handles read only file systems and media, as well as the “noatime” flag and inode specific “noatime” markers. Namefile_update_time -- update mtime and ctime time SynopsisSynopsisvoid file_update_time (file);struct file * file;ArgumentsArgumentsfile file accessed DescriptionDescription Update the mtime and ctime members of an inode and mark the inode for writeback. Note that this function is meant exclusively for usage in the file write path of filesystems, and filesystems may choose to explicitly ignore update via this function with the S_NOCTIME inode flag, e.g. for network filesystem where these timestamps are handled by the server. Namemake_bad_inode -- mark an inode bad due to an I/O error SynopsisSynopsisvoid make_bad_inode (inode);struct inode * inode;ArgumentsArgumentsinode Inode to mark bad DescriptionDescription When an inode cannot be read due to a media or remote network failure this function makes the inode “bad” and causes I/O operations on it to fail from this point on. Nameis_bad_inode -- is an inode errored SynopsisSynopsisint is_bad_inode (inode);struct inode * inode;ArgumentsArgumentsinode inode to test DescriptionDescription Returns true if the inode in question has been marked as bad. Registration and SuperblocksRegistration and SuperblocksNamedeactivate_super -- drop an active reference to superblock SynopsisSynopsisvoid deactivate_super (s);struct super_block * s;ArgumentsArgumentss superblock to deactivate DescriptionDescription Drops an active reference to superblock, acquiring a temprory one if there is no active references left. In that case we lock superblock, tell fs driver to shut it down and drop the temporary reference we had just acquired. Namegeneric_shutdown_super -- common helper for ->kill_sb SynopsisSynopsisvoid generic_shutdown_super (sb);struct super_block * sb;ArgumentsArgumentssb superblock to kill DescriptionDescription generic_shutdown_super does all fs-independent work on superblock shutdown. Typical ->kill_sb should pick all fs-specific objects that need destruction out of superblock, call generic_shutdown_super and release aforementioned objects. Note: dentries and inodes _are_ taken care of and do not need specific handling. Namesget -- find or create a superblock SynopsisSynopsisstruct super_block * sget (type, test, set, data);struct file_system_type * type;int (*test) (struct super_block *,void *);int (*set) (struct super_block *,void *);void * data;ArgumentsArgumentstype filesystem type superblock should belong to test comparison callback set setup callback data argument to each of them Nameget_super -- get the superblock of a device SynopsisSynopsisstruct super_block * get_super (bdev);struct block_device * bdev;ArgumentsArgumentsbdev device to get the superblock for DescriptionDescription Scans the superblock list and finds the superblock of the file system mounted on the device given. NULL is returned if no match is found. File LocksFile LocksNameposix_lock_file -- Apply a POSIX-style lock to a file SynopsisSynopsisint posix_lock_file (filp, fl);struct file * filp;struct file_lock * fl;ArgumentsArgumentsfilp The file to apply the lock to fl The lock to be applied DescriptionDescription Add a POSIX style lock to a file. We merge adjacent & overlapping locks whenever possible. POSIX locks are sorted by owner task, then by starting address Nameposix_lock_file_wait -- Apply a POSIX-style lock to a file SynopsisSynopsisint posix_lock_file_wait (filp, fl);struct file * filp;struct file_lock * fl;ArgumentsArgumentsfilp The file to apply the lock to fl The lock to be applied DescriptionDescription Add a POSIX style lock to a file. We merge adjacent & overlapping locks whenever possible. POSIX locks are sorted by owner task, then by starting address Namelocks_mandatory_area -- Check for a conflicting lock SynopsisSynopsisint locks_mandatory_area (read_write, inode, filp, offset, count);int read_write;struct inode * inode;struct file * filp;loff_t offset;size_t count;ArgumentsArgumentsread_write FLOCK_VERIFY_WRITE for exclusive access, FLOCK_VERIFY_READ for shared inode the file to check filp how the file was opened (if it was) offset start of area to check count length of area to check DescriptionDescription Searches the inode's list of locks to find any POSIX locks which conflict. This function is called from rw_verify_area and locks_verify_truncate. Name__break_lease -- revoke all outstanding leases on file SynopsisSynopsisint __break_lease (inode, mode);struct inode * inode;unsigned int mode;ArgumentsArgumentsinode the inode of the file to return mode the open mode (read or write) DescriptionDescription break_lease (inlined for speed) has checked there already is a lease on this file. Leases are broken on a call to open or truncate. This function can sleep unless you specified O_NONBLOCK to your open. Namelease_get_mtime -- SynopsisSynopsisvoid lease_get_mtime (inode, time);struct inode * inode;struct timespec * time;ArgumentsArgumentsinode the inode time pointer to a timespec which will contain the last modified time DescriptionDescription This is to force NFS clients to flush their caches for files with exclusive leases. The justification is that if someone has an exclusive lease, then they could be modifiying it. Nameflock_lock_file_wait -- Apply a FLOCK-style lock to a file SynopsisSynopsisint flock_lock_file_wait (filp, fl);struct file * filp;struct file_lock * fl;ArgumentsArgumentsfilp The file to apply the lock to fl The lock to be applied DescriptionDescription Add a FLOCK style lock to a file. Nameposix_block_lock -- blocks waiting for a file lock SynopsisSynopsisvoid posix_block_lock (blocker, waiter);struct file_lock * blocker;struct file_lock * waiter;ArgumentsArgumentsblocker the lock which is blocking waiter the lock which conflicts and has to wait DescriptionDescription lockd needs to block waiting for locks. Nameposix_unblock_lock -- stop waiting for a file lock SynopsisSynopsisint posix_unblock_lock (filp, waiter);struct file * filp;struct file_lock * waiter;ArgumentsArgumentsfilp how the file was opened waiter the lock which was waiting DescriptionDescription lockd needs to block waiting for locks. Namelock_may_read -- checks that the region is free of locks SynopsisSynopsisint lock_may_read (inode, start, len);struct inode * inode;loff_t start;unsigned long len;ArgumentsArgumentsinode the inode that is being read start the first byte to read len the number of bytes to read DescriptionDescription Emulates Windows locking requirements. Whole-file mandatory locks (share modes) can prohibit a read and byte-range POSIX locks can prohibit a read if they overlap. N.B. this function is only ever called from knfsd and ownership of locks is never checked. Namelock_may_write -- checks that the region is free of locks SynopsisSynopsisint lock_may_write (inode, start, len);struct inode * inode;loff_t start;unsigned long len;ArgumentsArgumentsinode the inode that is being written start the first byte to write len the number of bytes to write DescriptionDescription Emulates Windows locking requirements. Whole-file mandatory locks (share modes) can prohibit a write and byte-range POSIX locks can prohibit a write if they overlap. N.B. this function is only ever called from knfsd and ownership of locks is never checked. Namelocks_mandatory_locked -- Check for an active lock SynopsisSynopsisint locks_mandatory_locked (inode);struct inode * inode;ArgumentsArgumentsinode the file to check DescriptionDescription Searches the inode's list of locks to find any POSIX locks which conflict. This function is called from locks_verify_locked only. Namefcntl_getlease -- Enquire what lease is currently active SynopsisSynopsisint fcntl_getlease (filp);struct file * filp;ArgumentsArgumentsfilp the file DescriptionDescription The value returned by this function will be one of (if no lease break is pending): F_RDLCK to indicate a shared lease is held. F_WRLCK to indicate an exclusive lease is held. F_UNLCK to indicate no lease is held. (if a lease break is pending): F_RDLCK to indicate an exclusive lease needs to be changed to a shared lease (or removed). F_UNLCK to indicate the lease needs to be removed. XXXXXX sfr & willy disagree over whether F_INPROGRESS should be returned to userspace. Name__setlease -- sets a lease on an open file SynopsisSynopsisint __setlease (filp, arg, flp);struct file * filp;long arg;struct file_lock ** flp;ArgumentsArgumentsfilp file pointer arg type of lease to obtain flp input - file_lock to use, output - file_lock inserted DescriptionDescription The (input) flp->fl_lmops->fl_break function is required by break_lease. Called with kernel lock held. Namefcntl_setlease -- sets a lease on an open file SynopsisSynopsisint fcntl_setlease (fd, filp, arg);unsigned int fd;struct file * filp;long arg;ArgumentsArgumentsfd open file descriptor filp file pointer arg type of lease to obtain DescriptionDescription Call this fcntl to establish a lease on the file. Note that you also need to call F_SETSIG to receive a signal when the lease is broken. Namesys_flock -- flock system call. SynopsisSynopsislong sys_flock (fd, cmd);unsigned int fd;unsigned int cmd;ArgumentsArgumentsfd the file descriptor to lock. cmd the type of lock to apply. DescriptionDescription Apply a FL_FLOCK style lock to an open file descriptor. The cmd can be one of LOCK_SH -- a shared lock. LOCK_EX -- an exclusive lock. LOCK_UN -- remove an existing lock. LOCK_MAND -- a `mandatory' flock. This exists to emulate Windows Share Modes. LOCK_MAND can be combined with LOCK_READ or LOCK_WRITE to allow other processes read and write access respectively. Nameget_locks_status -- reports lock usage in /proc/locks SynopsisSynopsisint get_locks_status (buffer, start, offset, length);char * buffer;char ** start;off_t offset;int length;ArgumentsArgumentsbuffer address in userspace to write into start ? offset how far we are through the buffer length how much to read Other FunctionsOther FunctionsNamempage_readpages -- populate an address space with some pages, and SynopsisSynopsisint mpage_readpages (mapping, pages, nr_pages, get_block);struct address_space * mapping;struct list_head * pages;unsigned nr_pages;get_block_t get_block;ArgumentsArgumentsmapping the address_space pages The address of a list_head which contains the target pages. These pages have their ->index populated and are otherwise uninitialised. nr_pages The number of pages at *pages get_block The filesystem's block mapper function. DescriptionDescription This function walks the pages and the blocks within each page, building and emitting large BIOs. If anything unusual happens, such as: - encountering a page which has buffers - encountering a page which has a non-hole after a hole - encountering a page with non-contiguous blocks then this code just gives up and calls the buffer_head-based read function. It does handle a page which has holes at the end - that is a common case: the end-of-file on blocksize < PAGE_CACHE_SIZE setups. DescriptionDescription This function walks the pages and the blocks within each page, building and emitting large BIOs. If anything unusual happens, such as: - encountering a page which has buffers - encountering a page which has a non-hole after a hole - encountering a page with non-contiguous blocks then this code just gives up and calls the buffer_head-based read function. It does handle a page which has holes at the end - that is a common case: the end-of-file on blocksize < PAGE_CACHE_SIZE setups. DescriptionDescription This function walks the pages and the blocks within each page, building and emitting large BIOs. If anything unusual happens, such as: - encountering a page which has buffers - encountering a page which has a non-hole after a hole - encountering a page with non-contiguous blocks then this code just gives up and calls the buffer_head-based read function. It does handle a page which has holes at the end - that is a common case: the end-of-file on blocksize < PAGE_CACHE_SIZE setups. BH_Boundary explanationBH_Boundary explanation There is a problem. The mpage read code assembles several pages, gets all their disk mappings, and then submits them all. That's fine, but obtaining the disk mappings may require I/O. Reads of indirect blocks, for example. So an mpage read of the first 16 blocks of an ext2 file will cause I/O to be submitted in the following ordersubmitted in the following order 12 0 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16 because the indirect block has to be read to get the mappings of blocks 13,14,15,16. Obviously, this impacts performance. So what we do it to allow the filesystem's get_block function to set BH_Boundary when it maps block 11. BH_Boundary says: mapping of the block after this one will require I/O against a block which is probably close to this one. So you should push what I/O you have currently accumulated. This all causes the disk requests to be issued in the correct order. Namempage_writepages -- walk the list of dirty pages of the given SynopsisSynopsisint mpage_writepages (mapping, wbc, get_block);struct address_space * mapping;struct writeback_control * wbc;get_block_t get_block;ArgumentsArgumentsmapping address space structure to write wbc subtract the number of written pages from *wbc->nr_to_write get_block the filesystem's block mapper function. If this is NULL then use a_ops->writepage. Otherwise, go direct-to-BIO. DescriptionDescription This is a library function, which implements the writepages address_space_operation. If a page is already under I/O, generic_writepages skips it, even if it's dirty. This is desirable behaviour for memory-cleaning writeback, but it is INCORRECT for data-integrity system calls such as fsync. fsync and msync need to guarantee that all the data which was dirty at the time the call was made get new I/O started against them. If wbc->sync_mode is WB_SYNC_ALL then we were called for data integrity and we must wait for existing IO to complete. DescriptionDescription This is a library function, which implements the writepages address_space_operation. If a page is already under I/O, generic_writepages skips it, even if it's dirty. This is desirable behaviour for memory-cleaning writeback, but it is INCORRECT for data-integrity system calls such as fsync. fsync and msync need to guarantee that all the data which was dirty at the time the call was made get new I/O started against them. If wbc->sync_mode is WB_SYNC_ALL then we were called for data integrity and we must wait for existing IO to complete. Namegeneric_permission -- check for access rights on a Posix-like filesystem SynopsisSynopsisint generic_permission (inode, mask, check_acl);struct inode * inode;int mask;int (*check_acl) (struct inode *inode, int mask);ArgumentsArgumentsinode inode to check access rights for mask right to check for (MAY_READ, MAY_WRITE, MAY_EXEC) check_acl optional callback to check for Posix ACLs DescriptionDescription Used to check for read/write/execute permissions on a file. We use “fsuid” for this, letting us set arbitrary permissions for filesystem access without changing the “normal” uids which are used for other things.. Namevfs_permission -- check for access rights to a given path SynopsisSynopsisint vfs_permission (nd, mask);struct nameidata * nd;int mask;ArgumentsArgumentsnd lookup result that describes the path mask right to check for (MAY_READ, MAY_WRITE, MAY_EXEC) DescriptionDescription Used to check for read/write/execute permissions on a path. We use “fsuid” for this, letting us set arbitrary permissions for filesystem access without changing the “normal” uids which are used for other things. Namefile_permission -- check for additional access rights to a given file SynopsisSynopsisint file_permission (file, mask);struct file * file;int mask;ArgumentsArgumentsfile file to check access rights for mask right to check for (MAY_READ, MAY_WRITE, MAY_EXEC) DescriptionDescription Used to check for read/write/execute permissions on an already opened file. NoteNote Do not use this function in new code. All access checks should be done using vfs_permission. Namelookup_create -- lookup a dentry, creating it if it doesn't exist SynopsisSynopsisstruct dentry * lookup_create (nd, is_dir);struct nameidata * nd;int is_dir;ArgumentsArgumentsnd nameidata info is_dir directory flag DescriptionDescription Simple function to lookup and return a dentry and create it if it doesn't exist. Is SMP-safe. Returns with nd->dentry->d_inode->i_mutex locked. Namefreeze_bdev -- - lock a filesystem and force it into a consistent state SynopsisSynopsisstruct super_block * freeze_bdev (bdev);struct block_device * bdev;ArgumentsArgumentsbdev blockdevice to lock DescriptionDescription This takes the block device bd_mount_sem to make sure no new mounts happen on bdev until thaw_bdev is called. If a superblock is found on this device, we take the s_umount semaphore on it to make sure nobody unmounts until the snapshot creation is done. Namethaw_bdev -- - unlock filesystem SynopsisSynopsisvoid thaw_bdev (bdev, sb);struct block_device * bdev;struct super_block * sb;ArgumentsArgumentsbdev blockdevice to unlock sb associated superblock DescriptionDescription Unlocks the filesystem and marks it writeable again after freeze_bdev. Namesync_mapping_buffers -- write out and wait upon a mapping's “associated” SynopsisSynopsisint sync_mapping_buffers (mapping);struct address_space * mapping;ArgumentsArgumentsmapping the mapping which wants those buffers written DescriptionDescription Starts I/O against the buffers at mapping->private_list, and waits upon that I/O. Basically, this is a convenience function for fsync. mapping is a file or directory which needs those buffers to be written for a successful fsync. DescriptionDescription Starts I/O against the buffers at mapping->private_list, and waits upon that I/O. Basically, this is a convenience function for fsync. mapping is a file or directory which needs those buffers to be written for a successful fsync. Namemark_buffer_dirty -- mark a buffer_head as needing writeout SynopsisSynopsisvoid fastcall mark_buffer_dirty (bh);struct buffer_head * bh;ArgumentsArgumentsbh the buffer_head to mark dirty DescriptionDescription mark_buffer_dirty will set the dirty bit against the buffer, then set its backing page dirty, then tag the page as dirty in its address_space's radix tree and then attach the address_space's inode to its superblock's dirty inode list. mark_buffer_dirty is atomic. It takes bh->b_page->mapping->private_lock, mapping->tree_lock and the global inode_lock. Name__bread -- reads a specified block and returns the bh SynopsisSynopsisstruct buffer_head * __bread (bdev, block, size);struct block_device * bdev;sector_t block;int size;ArgumentsArgumentsbdev the block_device to read from block number of block size size (in bytes) to read DescriptionDescription Reads a specified block, and returns buffer head that contains it. It returns NULL if the block was unreadable. Nametry_to_release_page -- release old fs-specific metadata on a page SynopsisSynopsisint try_to_release_page (page, gfp_mask);struct page * page;gfp_t gfp_mask;ArgumentsArgumentspage the page which the kernel is trying to free gfp_mask memory allocation flags (and I/O mode) DescriptionDescription The address_space is to try to release any data against the page (presumably at page->private). If the release was successful, return `1'. Otherwise return zero. The gfp_mask argument specifies whether I/O may be performed to release this page (__GFP_IO), and whether the call may block (__GFP_WAIT). DescriptionDescription The address_space is to try to release any data against the page (presumably at page->private). If the release was successful, return `1'. Otherwise return zero. The gfp_mask argument specifies whether I/O may be performed to release this page (__GFP_IO), and whether the call may block (__GFP_WAIT). NOTENOTE gfp_mask may go away, and this function may become non-blocking. Nameblock_invalidatepage -- invalidate part of all of a buffer-backed page SynopsisSynopsisint block_invalidatepage (page, offset);struct page * page;unsigned long offset;ArgumentsArgumentspage the page which is affected offset the index of the truncation point DescriptionDescription block_invalidatepage is called when all or part of the page has become invalidatedby a truncate operation. block_invalidatepage does not have to release all buffers, but it must ensure that no dirty buffer is left outside offset and that no I/O is underway against any of the blocks which are outside the truncation point. Because the caller is about to free (and possibly reuse) those blocks on-disk. DescriptionDescription block_invalidatepage is called when all or part of the page has become invalidatedby a truncate operation. block_invalidatepage does not have to release all buffers, but it must ensure that no dirty buffer is left outside offset and that no I/O is underway against any of the blocks which are outside the truncation point. Because the caller is about to free (and possibly reuse) those blocks on-disk. Namell_rw_block -- level access to block devices (DEPRECATED) SynopsisSynopsisvoid ll_rw_block (rw, nr, bhs[]);int rw;int nr;struct buffer_head * bhs[];ArgumentsArgumentsrw whether to READ or WRITE or SWRITE or maybe READA (readahead) nr number of &struct buffer_heads in the array bhs[] array of pointers to &struct buffer_head DescriptionDescription ll_rw_block takes an array of pointers to &struct buffer_heads, and requests an I/O operation on them, either a READ or a WRITE. The third SWRITE is like WRITE only we make sure that the *current* data in buffers are sent to disk. The fourth READA option is described in the documentation for generic_make_request which ll_rw_block calls. This function drops any buffer that it cannot get a lock on (with the BH_Lock state bit) unless SWRITE is required, any buffer that appears to be clean when doing a write request, and any buffer that appears to be up-to-date when doing read request. Further it marks as clean buffers that are processed for writing (the buffer cache won't assume that they are actually clean until the buffer gets unlocked). ll_rw_block sets b_end_io to simple completion handler that marks the buffer up-to-date (if approriate), unlocks the buffer and wakes any waiters. All of the buffers must be for the same device, and must also be a multiple of the current approved size for the device. Namebio_alloc_bioset -- allocate a bio for I/O SynopsisSynopsisstruct bio * bio_alloc_bioset (gfp_mask, nr_iovecs, bs);gfp_t gfp_mask;int nr_iovecs;struct bio_set * bs;ArgumentsArgumentsgfp_mask the GFP_ mask given to the slab allocator nr_iovecs number of iovecs to pre-allocate bs the bio_set to allocate from DescriptionDescription bio_alloc_bioset will first try it's on mempool to satisfy the allocation. If __GFP_WAIT is set then we will block on the internal pool waiting for a &struct bio to become free. allocate bio and iovecs from the memory pools specified by the bio_set structure. Namebio_put -- release a reference to a bio SynopsisSynopsisvoid bio_put (bio);struct bio * bio;ArgumentsArgumentsbio bio to release reference to DescriptionDescription Put a reference to a &struct bio, either one you have gotten with bio_alloc or bio_get. The last put of a bio will free it. Name__bio_clone -- clone a bio SynopsisSynopsisvoid __bio_clone (bio, bio_src);struct bio * bio;struct bio * bio_src;ArgumentsArgumentsbio destination bio bio_src bio to clone DescriptionDescription Clone a &bio. Caller will own the returned bio, but not the actual data it points to. Reference count of returned bio will be one. Namebio_clone -- clone a bio SynopsisSynopsisstruct bio * bio_clone (bio, gfp_mask);struct bio * bio;gfp_t gfp_mask;ArgumentsArgumentsbio bio to clone gfp_mask allocation priority DescriptionDescription Like __bio_clone, only also allocates the returned bio Namebio_get_nr_vecs -- return approx number of vecs SynopsisSynopsisint bio_get_nr_vecs (bdev);struct block_device * bdev;ArgumentsArgumentsbdev I/O target DescriptionDescription Return the approximate number of pages we can send to this target. There's no guarantee that you will be able to fit this number of pages into a bio, it does not account for dynamic restrictions that vary on offset. Namebio_add_pc_page -- attempt to add page to bio SynopsisSynopsisint bio_add_pc_page (q, bio, page, len, offset);request_queue_t * q;struct bio * bio;struct page * page;unsigned int len;unsigned int offset;ArgumentsArgumentsq the target queue bio destination bio page page to add len vec entry length offset vec entry offset DescriptionDescription Attempt to add a page to the bio_vec maplist. This can fail for a number of reasons, such as the bio being full or target block device limitations. The target block device must allow bio's smaller than PAGE_SIZE, so it is always possible to add a single page to an empty bio. This should only be used by REQ_PC bios. Namebio_add_page -- attempt to add page to bio SynopsisSynopsisint bio_add_page (bio, page, len, offset);struct bio * bio;struct page * page;unsigned int len;unsigned int offset;ArgumentsArgumentsbio destination bio page page to add len vec entry length offset vec entry offset DescriptionDescription Attempt to add a page to the bio_vec maplist. This can fail for a number of reasons, such as the bio being full or target block device limitations. The target block device must allow bio's smaller than PAGE_SIZE, so it is always possible to add a single page to an empty bio. Namebio_uncopy_user -- finish previously mapped bio SynopsisSynopsisint bio_uncopy_user (bio);struct bio * bio;ArgumentsArgumentsbio bio being terminated DescriptionDescription Free pages allocated from bio_copy_user and write back data to user space in case of a read. Namebio_copy_user -- copy user data to bio SynopsisSynopsisstruct bio * bio_copy_user (q, uaddr, len, write_to_vm);request_queue_t * q;unsigned long uaddr;unsigned int len;int write_to_vm;ArgumentsArgumentsq destination block queue uaddr start of user address len length in bytes write_to_vm bool indicating writing to pages or not DescriptionDescription Prepares and returns a bio for indirect user io, bouncing data to/from kernel pages as necessary. Must be paired with call bio_uncopy_user on io completion. Namebio_map_user -- map user address into bio SynopsisSynopsisstruct bio * bio_map_user (q, bdev, uaddr, len, write_to_vm);request_queue_t * q;struct block_device * bdev;unsigned long uaddr;unsigned int len;int write_to_vm;ArgumentsArgumentsq the request_queue_t for the bio bdev destination block device uaddr start of user address len length in bytes write_to_vm bool indicating writing to pages or not DescriptionDescription Map the user space address into a bio suitable for io to a block device. Returns an error pointer in case of error. Namebio_unmap_user -- unmap a bio SynopsisSynopsisvoid bio_unmap_user (bio);struct bio * bio;ArgumentsArgumentsbio the bio being unmapped DescriptionDescription Unmap a bio previously mapped by bio_map_user. Must be called with a process context. bio_unmap_user may sleep. Namebio_map_kern -- map kernel address into bio SynopsisSynopsisstruct bio * bio_map_kern (q, data, len, gfp_mask);request_queue_t * q;void * data;unsigned int len;gfp_t gfp_mask;ArgumentsArgumentsq the request_queue_t for the bio data pointer to buffer to map len length in bytes gfp_mask allocation flags for bio allocation DescriptionDescription Map the kernel address into a bio suitable for io to a block device. Returns an error pointer in case of error. Namebio_endio -- end I/O on a bio SynopsisSynopsisvoid bio_endio (bio, bytes_done, error);struct bio * bio;unsigned int bytes_done;int error;ArgumentsArgumentsbio bio bytes_done number of bytes completed