patterncMinor
Non-blocking Unix domain socket
Viewed 0 times
unixnonblockingsocketdomain
Problem
I've developed quickly two kinds of socket use: the first with blocking mode and the second with non-blocking mode. The sockets are Unix domain sockets. My problem is that the kernel consume a huge amount of CPU (approx: 85 %). My goal is to minimize the kernel CPU usage and to increase the throughput.
I use
The blocking mode Unix socket shows performances of approx 1.3 GB/s. The non-blocking mode Unix socket shows performances of approx 170 MB/s.
The blocking version is faster than the non-blocking (+ epoll) version by approximately 8×.
Blocking version:
client.c
server.c
```
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#define SOCK_PATH "echo_soc
I use
taskset command to affect each process to a particular CPU core.The blocking mode Unix socket shows performances of approx 1.3 GB/s. The non-blocking mode Unix socket shows performances of approx 170 MB/s.
The blocking version is faster than the non-blocking (+ epoll) version by approximately 8×.
Blocking version:
client.c
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#define SOCK_PATH "echo_socket"
typedef struct proto_t {
uint32_t len;
uint8_t *data;
} proto_t;
int main(void)
{
int s;
int t;
int len;
struct sockaddr_un remote;
char buffer[1400];
proto_t *frame = (proto_t *)buffer;
if ((s = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) {
perror("socket");
exit(1);
}
printf("Trying to connect...\n");
remote.sun_family = AF_UNIX;
strcpy(remote.sun_path, SOCK_PATH);
len = strlen(remote.sun_path) + sizeof(remote.sun_family);
if (connect(s, (struct sockaddr *)&remote, len) == -1) {
perror("connect");
exit(1);
}
printf("Connected.\n");
srand(time(NULL));
for (;;) {
len = (rand() % (sizeof(buffer) - sizeof(uint32_t))) + sizeof(uint32_t);
frame->len = htobe32(len - sizeof(uint32_t));
if (send(s, frame, len, 0) == -1) {
perror("send");
close(s);
exit(1);
}
}
close(s);
return 0;
}server.c
```
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#define SOCK_PATH "echo_soc
Solution
I think you need to rethink your assumptions. For starters it's not obvious at all that non-blocking would be faster than blocking. The big difference between your two programs is that the non-blocking one can handle multiple clients and your blocking one can't. So it's reasonable to suggest that the support might cost some performance. If you start multiple clients with the non-blocking one, the total throughput may well be higher.
Now as to your non-blocking code, it seems to be somewhat inefficient. For example you spend a lot of time adding and removing FDs from your epoll() file descriptor. That shouldn't be necessary in a properly coded program. When the descriptor is readable you read it and then go back to epoll(). Add the descriptor once, then leave it there.
Secondly you spend time working out exactly how many bytes to read for the framing. Don't do that. Have a buffer of say 64KB where you read the bytes into and then parse the framing out of that. Look up buffering.
An important tip for network performance is to think about how many syscalls are you doing per message? ISTM you're doing at least five but with the parameters you have you should be able to write a version doing a maximum of two syscalls per packet on average (the epoll and a single read) and less if there are lots of packets (read more than one packet per read()). You know the maximum frame size upfront, which simplifies coding considerably.
Also, debug print statements cost performance.
Now as to your non-blocking code, it seems to be somewhat inefficient. For example you spend a lot of time adding and removing FDs from your epoll() file descriptor. That shouldn't be necessary in a properly coded program. When the descriptor is readable you read it and then go back to epoll(). Add the descriptor once, then leave it there.
Secondly you spend time working out exactly how many bytes to read for the framing. Don't do that. Have a buffer of say 64KB where you read the bytes into and then parse the framing out of that. Look up buffering.
An important tip for network performance is to think about how many syscalls are you doing per message? ISTM you're doing at least five but with the parameters you have you should be able to write a version doing a maximum of two syscalls per packet on average (the epoll and a single read) and less if there are lots of packets (read more than one packet per read()). You know the maximum frame size upfront, which simplifies coding considerably.
Also, debug print statements cost performance.
Context
StackExchange Code Review Q#98558, answer score: 6
Revisions (0)
No revisions yet.