While developing a program, the programmer has to keep several things in mind like the code should not be complex ie it should be maintainable, portability is another area that is to be kept in mind. So we see that there are some good practices that the programmer should follow in order to produce a good code. Here in this article, we will focus on some good practices that the programmer should follow while working with system calls in Linux.
What is a system call?
A system call is a special function call that is made to request some service from the Kernel. The requested service could be to create a new process, to access hardware like hard disk etc. When a system call is made the execution switches from the user-mode to the kernel-mode and when the required service is provided by the kernel then the execution switches back to the user-mode. Examples of system calls could be fork(), read(), write() etc.
Dealing with system calls
The following points should be kept in mind while dealing with system calls :
- The programmer should have in and out knowledge of the system call. Like, what exactly it does, system resources it uses, what type of arguments it expects and specially in which cases it fails.
- Most of the linux system calls return an error code if they fail. These error code may vary on the basis of the type of error that caused the failure. So, proper error handling should be in place so that each kind of error should be handled properly and escalated clearly (either to the user or the parent module).
- For the thorough knowledge of system call and the error codes it returns, I would strongly recommend to go through the man page of that specific system call. Man pages are best references to begin with and develop good fundamental understanding about any system call in Linux.
General system call failures
Though the failure of a system call may depend upon the type of error encountered while the execution of the system call, here is a list of reasons that mostly contribute to system call failures :
- If a system call tries to access the system hardware and due to any reason the hardware is not available or suppose the hardware is faulty then in that case the system call will fail.
- While executing a system call, if a high priority signal occurs then it may also cause the system call execution failure.
- There are situations when through a system call, a program tries to do a specific task that requires special or root privileges. If the program does not have those kind of privileges then also the system call will fail.
- Passing invalid arguments is another very common reason for system calls to fail.
- Suppose a system call is made to request some memory from heap and due to some reason the system is not able to allocate memory to the requesting process which made the system call, in this case also the system call will fail.
The above list is not exhaustive as there could be numerous other reasons because of which a system call can fail.
Working with error codes
As already discussed, each system call returns a specific error code for each type of error that it encountered (which caused the system call failure). So, identifying and communicating the error information is a very vital task of programming. In general most of the system calls return ‘0’ on success and non zero on failure but those system calls that return a pointer to a memory (like malloc() ) return ‘0’ or NULL on failure and non zero pointer value on success.
NOTE: The above observation may not be true for all the system calls. There could very well be some exceptions.
So, coming back to the error codes, as discussed, they can provide vital information about the cause of failure of a system call. Now, since each error code is associated with a specific reason so the program can have a map of error codes and the text that describes the cause of the error. But, this is highly inefficient and non practical as this would amount to a lot of mapping for each system call being used in the program. So, now the question is that what could be a more efficient way of achieving this?
The ‘errno’ variable
From the man page of this variable :
The <errno.h> header file defines the integer variable errno, which is set by system calls and some library functions in the event of an error to indicate what went wrong. Its value is significant only when the return value of the call indicated an error (i.e., -1 from most system calls; -1 or NULL from most library functions); a function that succeeds is allowed to change errno.
Valid error numbers are all non-zero; errno is never set to zero by any system call or library function. For some system calls and library functions (e.g., getpriority), -1 is a valid return on success. In such cases, a successful return can be distinguished from an error return by setting errno to zero before the call, and then, if the call returns a status that indicates that an error may have occurred, checking to see if errno has a non-zero value. errno is defined by the ISO C standard to be a modifiable lvalue of type int, and must not be explicitly declared; errno may be a macro. errno is thread-local; setting it in one thread does not affect its value in any other thread.
So. from the description above, it is quite clear that its a very handy tool when it comes to error handling of system calls on Linux and can save us a lot of hard work. But, beware of using this variable in a multi-threaded program as it is local to a thread and so any change of value of errno in one thread cannot be accessed in any other thread.
The strerror() API
Well, one problem with using only errno is that still its only an integer value. A description is always more helpful while logging or while passing the error cause to user. So there has to be a map of error codes and the cause they map to. Here comes the ‘strerror()’ API. This function takes the errno variable as argument and returns a pointer to a string that contains the description of the cause that error code maps to.
#include <string.h> char *strerror(int errnum);
Other variants of this function are also available. For more information please visit the man page for this API.
NOTE : Interested readers can also go through the perror() API. It is used to print the error message for a system call failure on standard error.
An Example
Lets take an example to demonstrate the use of errno and strerror()
#include<stdio.h> #include<errno.h> #include<string.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> int main(void) { int fd = -1; // Always Reset errno before use. errno = 0; // Make sure you are opening a file that does not exist fd = open("abcd",O_RDONLY); if(fd == -1) { // Seems like some error occured. Use strerror to print it printf("\nStrerror() says -> [%s]\n",(char*)strerror(errno)); return 1; } return 0; }
In code above :
- errno is initialzed to ‘0’ as it is not guaranteed to be zero initially.
- Open a non existent file so that the system call open() fails.
- Now, strerror() API is used to print the error message based on the errno code.
When the above program is run :
$ ./strerror Strerror() says -> [No such file or directory]
So we see that in the output we get to see the a meaningful error message instead of an error code.
Comments on this entry are closed.
love it! i want it to be huge series thanks man!
Hi,
Thanks….
very useful material…
great as usual!!
thanks
Why is this needed?
// Always Reset errno before use.
errno = 0;
If the call to open will fail errno will be updated accordingly and if open succeeds the value of errno has no meaning.
As the article said, it will return -1 if open succeeds.
“strerror” is new for me~ quite helpful~
Thanks
It’s a file handle (should be declared as such) so it must be some value other than zero, plus you are depending on that value to indicate a failure in the open api. As he stated this becomes much more complex when you have a system that you coded and it requires a root level at places to do what it needs. When a user is running it, you don’t want a failure to cause the user to end up with all permissions. Usually you are also including code to log the error and do what needs to be done to clean up after yourself. I think it’s a good point.
Jack