2007-12-25  

这个文件实现了几个系统调用接口.sys_dup sys_dup2 sys_fcntl 和fasync(异步).

1.sys_dup sys_dup2

就是clone一个fd到另一个fd,实现本身平淡无奇就是操作 task_struct:struct files_struct *files;数组,其索引就是
大名顶顶的fd了. 下面是man dup的简单介绍.
dup() and dup2() create a copy of the file descriptor oldfd.

After a successful return from dup() or dup2(), the old and new file descriptors may be
used interchangeably. They refer to the same open file description (see open(2)) and
thus share file offset and file status flags; for example, if the file offset is modi-
fied by using lseek(2) on one of the descriptors, the offset is also changed for the
other.

The two descriptors do not share file descriptor flags (the close-on-exec flag).
 The close-on-exec flag (FD_CLOEXEC; see fcntl(2)) for the duplicate descriptor is off.

dup() uses the lowest-numbered unused descriptor for the new descriptor.

dup2() makes newfd be the copy of oldfd, closing newfd first if necessary.
dup后clos_on_exec将被清除.
值得一提的就是task_struct->files,就是fd的动态增长增加了些复杂度.
struct files_struct {
atomic_t count;
rwlock_t file_lock;
int max_fds;
int max_fdset;
int next_fd;
struct file ** fd; /* current fd array,init is fd_arrary */
fd_set *close_on_exec;
fd_set *open_fds; /*bitmap for open fd_array*/
fd_set close_on_exec_init;
fd_set open_fds_init;
struct file * fd_array[NR_OPEN_DEFAULT]; /*inline fd array*/
};
sys_dup寻找一个空闲的fd,而sys_dup2则使用指定的fd(老的fd其文件将被释放).
动态增长fd的代码就不看了.其实其中的竞争关系,files_struct的各个域的精确涵义值得去体会的...(应该不是个遗憾事...)


2. fcntl ioctl sysctl
这里是只有fcntl的. 至于sysctl是对系统本身参数的控制. ioctl大部分都是控制驱动程序的. 这些ctl函数提供给user以便于
user对内核各个部分做出相应的调整. 其实现本身就是dispatch的过程.
最直接的办法是man,然后看看源代码,知道这个流程即可,具体到特性,那是纷繁复杂,每一个特性都属于不同的领域/特性.

fcntl overview:

  • Duplicating a file descriptor
  • File descriptor flags (close on exec)
       F_GETFD
Read the file descriptor flags.
F_SETFD
Set the file descriptor flags to the value specified by arg.
  • File status flags
       F_GETFL
Read the file status flags.
F_SETFL
Set the file status flags to the value specified by arg. File access mode
(O_RDONLY, O_WRONLY, O_RDWR) and file creation flags (i.e., O_CREAT, O_EXCL,
O_NOCTTY, O_TRUNC) in arg are ignored. On Linux this command can only change
the O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME, and O_NONBLOCK flags.
  • Advisory locking
       F_GETLK,  F_SETLK and F_SETLKW are used to acquire, release, and test for the existence
of record locks (also known as file-segment or file-region locks). The third argument
lock is a pointer to a structure that has at least the following fields (in unspecified
order).

struct flock {
...
short l_type; /* Type of lock: F_RDLCK,
F_WRLCK, F_UNLCK */
short l_whence; /* How to interpret l_start:
SEEK_SET, SEEK_CUR, SEEK_END */
off_t l_start; /* Starting offset for lock */
off_t l_len; /* Number of bytes to lock */
pid_t l_pid; /* PID of process blocking our lock
(F_GETLK only) */
...
};
  • Mandatory locking
       (Non-POSIX.)  The above record locks may be either advisory or mandatory, and are advi-
sory by default.

Advisory locks are not enforced and are useful only between cooperating processes.

Mandatory locks are enforced for all processes. If a process tries to perform an
incompatible access (e.g., read(2) or write(2)) on a file region that has an incompati-
ble mandatory lock, then the result depends upon whether the O_NONBLOCK flag is enabled
for its open file description. If the O_NONBLOCK flag is not enabled, then system call
is blocked until the lock is removed or converted to a mode that is compatible with the
access. If the O_NONBLOCK flag is enabled, then the system call fails with the error
EAGAIN or EWOULDBLOCK.

To make use of mandatory locks, mandatory locking must be enabled both on the file sys-
tem that contains the file to be locked, and on the file itself. Mandatory locking is
enabled on a file system using the "-o mand" option to mount(8), or the MS_MANDLOCK
flag for mount(2). Mandatory locking is enabled on a file by disabling group execute
permission on the file and enabling the set-group-ID permission bit (see chmod(1) and
chmod(2)).

说到文件加锁,linux有两种文件加锁的系统调用:
flock, fcntl(lockf就是fcntl). fcntl默认是
Advisory lock需要Mandatory lock是需要以特殊的方式安装文
件系统然后再把文件的属性改为:
disabling group execute + enabling the set-group-ID. 然后用fcntl
加锁后就是Mandatory lock了.
flock只容许对整个文件进行加锁,是BSD风格的. fntl是posix的锁.在linux内部是都是使用 flle_lock来实现
的.

struct file_lock {
struct file_lock *fl_next; /* singly linked list for this inode */
struct list_head fl_link; /* doubly linked list of all locks */
struct list_head fl_block; /* circular list of blocked processes */
fl_owner_t fl_owner;
unsigned int fl_pid;
wait_queue_head_t fl_wait;
struct file *fl_file;
unsigned char fl_flags;
unsigned char fl_type;
loff_t fl_start;
loff_t fl_end;

void (*fl_notify)(struct file_lock *); /* unblock callback */
void (*fl_insert)(struct file_lock *); /* lock insertion callback */
void (*fl_remove)(struct file_lock *); /* lock removal callback */

struct fasync_struct * fl_fasync; /* for lease break notifications */

union {
struct nfs_lock_info nfs_fl;
} fl_u;
};

  • Managing signals
       F_GETOWN, F_SETOWN, F_GETSIG and F_SETSIG are used to manage I/O availability signals:
  • Leases
       F_SETLEASE  and F_GETLEASE (Linux 2.4 onwards) are used (respectively) to establish and
retrieve the current setting of the calling process’s lease on the file referred to by
fd. A file lease provides a mechanism whereby the process holding the lease (the
"lease holder") is notified (via delivery of a signal) when a process (the "lease
breaker") tries to open(2) or truncate(2) that file.
关于Leases,man里介绍的很清楚了.


我们来具体看看fcntl的功能,这里仅仅给出一个简单的注释,

static long do_fcntl(unsigned int fd, unsigned int cmd,
unsigned long arg, struct file * filp)
{
long err = -EINVAL;

switch (cmd) {
case F_DUPFD: /* 类似dup2,但是不是强制以arg为fd,而是dup到arg开始的第一个空闲fd */
if (arg < NR_OPEN) {
get_file(filp);
err = dupfd(filp, arg);
}
break;
case F_GETFD: /* get set close on exec*/
.....
case F_SETFD:
.....
case F_GETFL: /*get set filp->f_flags*/
......
case F_SETFL:
.....
case F_GETLK: /*Posix Lock 操作*/
err = fcntl_getlk(fd, (struct flock *) arg);
break;
case F_SETLK:
case F_SETLKW:
err = fcntl_setlk(fd, cmd, (struct flock *) arg);
break;
case F_GETOWN: /*分析dnotify.c的时候说过,owener用于记录当文件发生变化
需要通知的进程, owener在不同的情景中有不同的用法:
1. dir notify 2.lease(via fl->fl_fasync)
3. fasync (via specific fasync queue)

*/

/*
* XXX If f_owner is a process group, the
* negative return value will get converted
* into an error. Oops. If we keep the
* current syscall conventions, the only way
* to fix this will be in libc.
*/
err = filp->f_owner.pid;
break;
case F_SETOWN:
lock_kernel();
filp->f_owner.pid = arg;
filp->f_owner.uid = current->uid;
filp->f_owner.euid = current->euid;
err = 0;
if (S_ISSOCK (filp->f_dentry->d_inode->i_mode))
err = sock_fcntl (filp, F_SETOWN, arg);
unlock_kernel();
break;
case F_GETSIG:
err = filp->f_owner.signum;
break;
case F_SETSIG:
/* arg == 0 restores default behaviour. */
if (arg < 0 || arg > _NSIG) {
break;
}
err = 0;
filp->f_owner.signum = arg;
break;
case F_GETLEASE:
err = fcntl_getlease(filp);
break;
case F_SETLEASE:
err = fcntl_setlease(fd, filp, arg);
break;
case F_NOTIFY: /*dir notify 通知*/
err = fcntl_dirnotify(fd, filp, arg);
break;
default:
/* sockets need a few special fcntls. */
err = -EINVAL;
if (S_ISSOCK (filp->f_dentry->d_inode->i_mode))
err = sock_fcntl (filp, cmd, arg);
break;
}

return err;
}

3. fasync

这里着重解释下 filp->f_owner, dnotify, lease,和 fasync直接的关系: 文件异步通知的使用.
思路蛮简单的:
一个file结构是一次open特定的(多个file可以使用同一个dentry),也就是说file是一个进程特定的结构,其中的
f_owner,用于记录需要获取异步通知的进程之uid/gid/signum等信息, f_owner是需要通知的进程.
然后,有两中方式使用f_owner:
1) dnotify_struct -> filep->f_owner
dnotify 使用了异步通知机制, dnotify_struct挂接到inode下,需要的时候从inode找到owner,发送一个信号给指定进程.
dnotify 创建时直接初始化了owner,设置好了各个参数.

2) fasync 文件异步通知
struct fasync_struct {
int magic;
int fa_fd;
struct fasync_struct *fa_next; /* singly linked list */
struct file *fa_file;
};
其实dir notify也可以用这个东西的,这是一个通用的结构.和dnotify_struct 其实没有什么不同.
有两个函数提供fasync:

a)生成一个fasync结构,挂接到指定队列.
extern int fasync_helper(int, struct file *, int, struct fasync_struct **);

b)通知指定的owner
extern void kill_fasync(struct fasync_struct **, int, int);


3)file lease
这是建立在fasync机制上的一个应用. 特殊的地方是挂接到file_lock->fl_fasync.... (这也能叫特殊啊)

4)fasync调用
file_operations{ fasync }
就是调用fasync_helper, 就是队列不同而已. 合适的时候会有sig发送到指定进程....
(这个和fsync可不是一对啊)