Labels

Friday, May 15, 2015

【转】Linux socket select作用与用法

(2012-12-07 00:42:56)
 
select系统调用是用来让我们的程序监视多个文件句柄(file descrīptor)的状态变化的。程序会停在select这里等待,直到被监视的文件句柄有某一个或多个发生了状态改变。

文件在句柄在Linux里很多,如果你man某个函数,在函数返回值部分说到成功后有一个文件句柄被创建的都是的,如man socket可以看到“On success, a file descrīptor for the new socket is returned.”而man 2 open可以看到“open() and creat() return the new file descrīptor”,其实文件句柄就是一个整数,看socket函数的声明就明白了:
int socket(int domain, int type, int protocol);
当然,我们最熟悉的句柄是0、1、2三个,0是标准输入,1是标准输出,2是标准错误输出。0、1、2是整数表示的,对应的FILE *结构的表示就是stdin、stdout、stderr,0就是stdin,1就是stdout,2就是stderr。
比如下面这两段代码都是从标准输入读入9个字节字符:
  1. #include <</span>stdio.h>
  2. #include <</span>unistd.h>
  3. #include <</span>string.h>
  4. int main(int argc, char ** argv)
  5. {
  6. char buf[10] = "";
  7. read(0, buf, 9);
  8. fprintf(stdout, "%s\n", buf);
  9. return 0;
  10. }

  11. #include <</span>stdio.h>
  12. #include <</span>unistd.h>
  13. #include <</span>string.h>
  14. int main(int argc, char ** argv)
  15. {
  16. char buf[10] = "";
  17. fread(buf, 9, 1, stdin);
  18. write(1, buf, strlen(buf));
  19. return 0;
  20. }
继续上面说的select,就是用来监视某个或某些句柄的状态变化的。select函数原型如下:
int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);
函数的最后一个参数timeout显然是一个超时时间值,其类型是struct timeval *,即一个struct timeval结构的变量的指针,所以我们在程序里要申明一个struct timeval tv;然后把变量tv的地址&tv传递给select函数。struct timeval结构如下:

struct timeval {
long tv_sec;
long tv_usec;
};
第2、3、4三个参数是一样的类型: fd_set *,即我们在程序里要申明几个fd_set类型的变量,比如rdfds, wtfds, exfds,然后把这个变量的地址&rdfds, &wtfds, &exfds 传递给select函数。这三个参数都是一个句柄的集合,第一个rdfds是用来保存这样的句柄的:当句柄的状态变成可读的时系统就会告诉select函数返回,同理第二个wtfds是指有句柄状态变成可写的时系统就会告诉select函数返回,同理第三个参数exfds是特殊情况,即句柄上有特殊情况发生时系统会告诉select函数返回。特殊情况比如对方通过一个socket句柄发来了紧急数据。如果我们程序里只想检测某个socket是否有数据可读,我们可以这样:
  1. fd_set rdfds;
  2. struct timeval tv;
  3. int ret;
  4. FD_ZERO(&rdfds);
  5. FD_SET(socket, &rdfds);
  6. tv.tv_sec = 1;
  7. tv.tv_usec = 500;
  8. ret = select(socket + 1, &rdfds, NULL, NULL, &tv);
  9. if(ret <</span> 0) perror("select");
  10. else if(ret == 0) printf("超时\n");
  11. else {
  12. printf("ret=%d\n", ret);

  13. if(FD_ISSET(socket, &rdfds)) {

  14. recv(...);
  15. }
  16. }
注意select函数的第一个参数,是所有加入集合的句柄值的最大那个值还要加1。比如我们创建了3个句柄:

int sa, sb, sc;
sa = socket(...);
connect(sa,...);
sb = socket(...);
connect(sb,...);
sc = socket(...);
connect(sc,...);

FD_SET(sa, &rdfds);
FD_SET(sb, &rdfds);
FD_SET(sc, &rdfds);
在使用select函数之前,一定要找到3个句柄中的最大值是哪个,我们一般定义一个变量来保存最大值,取得最大socket值如下:
int maxfd = 0;
if(sa > maxfd) maxfd = sa;
if(sb > maxfd) maxfd = sb;
if(sc > maxfd) maxfd = sc;
然后调用select函数:
ret = select(maxfd + 1, &rdfds, NULL, NULL, &tv);
同样的道理,如果我们要检测用户是否按了键盘进行输入,我们就应该把标准输入0这个句柄放到select里来检测,如下:
FD_ZERO(&rdfds);
FD_SET(0, &rdfds);
tv.tv_sec = 1;
tv.tv_usec = 0;
ret = select(1, &rdfds, NULL, NULL, &tv);
if(ret < 0) perror("select");
else if(ret == 0) printf("超时\n");
else {
scanf("%s", buf);
}


select函数用于在非阻塞中,当一个套接字或一组套接字有信号时通知你,系统提供select函数来实现多路复用输入/输出模型,原型:
#include <sys/time.h>
#include <unistd.h>
int select(int maxfd,fd_set *rdset,fd_set *wrset,fd_set *exset,struct timeval *timeout);
参数maxfd是需要监视的最大的文件描述符值+1;rdset,wrset,exset分别对应于需要检测的可读文件描述符的集合,可写文件描述符的集 合及异常文件描述符的集合。struct timeval结构用于描述一段时间长度,如果在这个时间内,需要监视的描述符没有事件发生则函数返回,返回值为0。
fd_set(它比较重要所以先介绍一下)是一组文件描述字(fd)的集合,它用一位来表示一个fd(下面会仔细介绍),对于fd_set类型通过下面四个宏来操作:
FD_ZERO(fd_set *fdset);将指定的文件描述符集清空,在对文件描述符集合进行设置前,必须对其进行初始化,如果不清空,由于在系统分配内存空间后,通常并不作清空处理,所以结果是不可知的。
FD_SET(int fd, fd_set *fdset);用于在文件描述符集合中增加一个新的文件描述符。
FD_CLR(int fd, fd_set *fdset);用于在文件描述符集合中删除一个文件描述符。
FD_ISSET(int fd, fd_set *fdset);用于测试指定的文件描述符是否在该集合中。

Wednesday, April 15, 2015

kernel:Neighbour table overflow

转载:

Kernel: Neighbour table overflow

While randomly browsing some sites, I came a cross this page which described a problem I had about a year ago. I thought I should document it on my own site as well for reference.

We where expanding our internal network (We couldn’t stay within a /24 range anymore) I decided to create a /22.
So we had: 192.168.5.0 to 192.168.5.255 and now we have 192.168.4.0 to 192.168.7.255, nice, extra IP’s!
All went well, until I noticed errors appearing on our syslog server..
Apr 17 15:22:42 uma kernel: [12562.837562] Neighbour table overflow.
Apr 17 15:22:42 uma kernel: [12562.867554] printk: 87 messages suppressed.
These warnings mean that the ARP table on the server is full and needs to be expanded if you want to avoid these overflows. Since I used a /24 before, the maximum was 255 entries. Since I’ve changed to a /22 the amount of ARP entries possible has been tripled!
There are some options in sysctl you can use to tune the ARP table size and the gc’s (garbage collector) ‘attitude’ ;) These are the following settings and their default values:
# The minimum number of entries to keep in the ARP cache. The garbage collector will
# not run if there are fewer than this number of entries in the cache.
net.ipv4.neigh.default.gc_thresh1 = 128

# The soft maximum number of entries to keep in the ARP cache. The garbage collector will
# allow the number of entries to exceed this for 5 seconds before collection will be performed.
net.ipv4.neigh.default.gc_thresh2 = 512

# The hard maximum number of entries to keep in the ARP cache. The garbage collector will
# always run if there are more than this number of entries in the cache.
net.ipv4.neigh.default.gc_thresh3 = 1024

# How frequently the garbage collector for neighbour entries should attempt to run.
net.ipv4.neigh.default.gc_interval = 30

# Determines how often to check for stale neighbour entries. When a neighbour entry
# is considered stale it is resolved again before sending data to it.
net.ipv4.neigh.default.gc_stale_time = 60
There are more options you can change to tune neighour entry list, but are often fine with the default settings, you can find them here.
Now that you know what the settings mean, you can change them. As a rule of thumb, if you change the gc_threshN values, change them all, and multiply them by 2, until the warnings don’t show up in your syslog anymore.
For me, these values worked out fine:
net.ipv4.neigh.default.gc_thresh1 = 256
net.ipv4.neigh.default.gc_thresh2 = 1024
net.ipv4.neigh.default.gc_thresh3 = 2048
net.ipv4.neigh.default.gc_interval = 60
net.ipv4.neigh.default.gc_stale_time = 120
After a reboot, your changes will be lost, to prevent that, add the above settings in the file /etc/sysctl.conf.
When you just added the settings, run:
sysctl -p
to activate the changes without needing a reboot.