snippetModerate
How does an operating system implement the C library?
Viewed 0 times
theimplementsystemoperatingdoeshowlibrary
Problem
I have an idea of how a C program is turned into machine code by the compiler. I also know how the processor processes the instructions (https://www.youtube.com/watch?v=cNN_tTXABUA this video has a good introduction). But what I don't understand, is how an operating system (most times written in C or some low-level language) can run programs also in C (or other low-level language). I don't understand this. Does the OS read the code and then processes it with some internal functions, or does it only open the machine code and send it to the processor, that makes the rest? In case of the second option, how the OS take care of which instructions are allowed to be executed, and which are not? (example: I may write a program that has an instruction that jumps to a forbidden part of the memory RAM, how the OS protect it from happening?)
I don't expect to understand it fully in this post's answers, but if you guys could give me an idea and then some books or tags to search, I'd be happy!
I don't expect to understand it fully in this post's answers, but if you guys could give me an idea and then some books or tags to search, I'd be happy!
Solution
It's hard to give a full answer to this question, as it would basically amount to an introductory text in operating system design, so I will try to give some pointers.
First of all, if you really want to know (including all the gritty details) how an operating system actually works, there's probably no way around looking at an actual implementation of one. Luckily, there's xv6 providing quite a good example, see xv6 and follow the references there.
Before diving into the code, it's probably a good idea to have a look at the basic components of an operating system, see Operating System Components.
I think your question has three main aspects
(Warning: The following might be oversimplifying things)
For the first question, see Loader. A compiled C program is just binary data. When loading (let us ignore some bookkeeping that is also going on around this), the operating system copies this binary data simply into memory, does some initialisation (e.g. copy arguments at places the program can find it) and then sets the instruction pointer to the start of the loaded program. Basically, from this point on the program is in control (see below). Eventually, the program (hopefully) will issue a system call, namely
This leads to the second question. The interaction with the operating system takes place via system calls. These, in turns, are implemented via interrupts. Roughly speaking, you can think of a special processor instruction being called (i.e., it's a feature of the processor), which stops the execution of the current code and starts the execution of an interrupt handler, which is part of the operating system. The interrupt handler than tries to figure out why it was called and will act accordingly. How does the processor know where to find the interrupt handler? This is part of the loading process of the operating system at boot time, see e.g. Kernel startup stage.
For the third question, see Process management:
There are two possible ways for an OS to regain control of the
processor during a program’s execution in order for the OS to perform
de-allocation or allocation:
file on hard disk.
The situation is slightly different on multi-core architectures, as there obviously the operating system and the loaded program can run concurrently.
For the memory related question, see Memory protection. Again, the short answer is that this is supported by processor features, i.e., the operating system can set boundaries when loading a program leading to an interrupt when these boundaries are violated. Another important feature related to this are CPU modes.
To sum things up, in order to understand operating systems, it's also important to understand computer architecture, as many features of an operating system are based in corresponding features of processors. I always found the following a good introductory textbook: "Computer Organization and Design: The Hardware/Software Interface" by David Patterson and John Hennessy.
Finally, the following (Linux-specific) example might help. The usual
will turn into assembler code as follows (more or less..., stolen from the Wikipedia article on Netwide Assembler)
Once actually compiled, this will look like (just the part between the first and last column, the rest is for illustration purposes, generated using
```
0000000: 7f45 4c46 0101 0100 0000 0000 0000 0000 .ELF............
0000
First of all, if you really want to know (including all the gritty details) how an operating system actually works, there's probably no way around looking at an actual implementation of one. Luckily, there's xv6 providing quite a good example, see xv6 and follow the references there.
Before diving into the code, it's probably a good idea to have a look at the basic components of an operating system, see Operating System Components.
I think your question has three main aspects
- How does an operating system load a (compiled) C program?
- How does such a program interact with the system it runs on?
- How does the operating system prevent bad things from happening and regain control once the program is running?
(Warning: The following might be oversimplifying things)
For the first question, see Loader. A compiled C program is just binary data. When loading (let us ignore some bookkeeping that is also going on around this), the operating system copies this binary data simply into memory, does some initialisation (e.g. copy arguments at places the program can find it) and then sets the instruction pointer to the start of the loaded program. Basically, from this point on the program is in control (see below). Eventually, the program (hopefully) will issue a system call, namely
exit, when it is done, thus handing control back to the operating system, see Exit (system call (it might also terminate by using a simple return in main, but let us ignore that case for the moment).This leads to the second question. The interaction with the operating system takes place via system calls. These, in turns, are implemented via interrupts. Roughly speaking, you can think of a special processor instruction being called (i.e., it's a feature of the processor), which stops the execution of the current code and starts the execution of an interrupt handler, which is part of the operating system. The interrupt handler than tries to figure out why it was called and will act accordingly. How does the processor know where to find the interrupt handler? This is part of the loading process of the operating system at boot time, see e.g. Kernel startup stage.
For the third question, see Process management:
There are two possible ways for an OS to regain control of the
processor during a program’s execution in order for the OS to perform
de-allocation or allocation:
- The process issues a system call (sometimes called a software interrupt); for example, an I/O request occurs requesting to access a
file on hard disk.
- A hardware interrupt occurs; for example, a key was pressed on the keyboard, or a timer runs out (used in pre-emptive multitasking).
The situation is slightly different on multi-core architectures, as there obviously the operating system and the loaded program can run concurrently.
For the memory related question, see Memory protection. Again, the short answer is that this is supported by processor features, i.e., the operating system can set boundaries when loading a program leading to an interrupt when these boundaries are violated. Another important feature related to this are CPU modes.
To sum things up, in order to understand operating systems, it's also important to understand computer architecture, as many features of an operating system are based in corresponding features of processors. I always found the following a good introductory textbook: "Computer Organization and Design: The Hardware/Software Interface" by David Patterson and John Hennessy.
Finally, the following (Linux-specific) example might help. The usual
#include
#include
int main(void) {
printf("Hello, world!");
exit(0);
}will turn into assembler code as follows (more or less..., stolen from the Wikipedia article on Netwide Assembler)
; compile on 64-bit Linux using
; nasm -f elf syscall.asm
; ld -m elf_i386 -s -o syscall syscall.o
global _start
section .text
_start:
mov eax, 4 ; system call number 4: write
mov ebx, 1 ; file descriptor 1: stdout
mov ecx, msg
mov edx, msg.len
int 0x80 ; Passes control to interrupt vector
; invokes system call, in this case system call
; write(stdout, msg, strlen(msg));
mov eax, 1 ; system call number 1: exit()
mov ebx, 0 ; exit status 0
int 0x80 ; Passes control to interrupt vector
; invokes system call, in this case system call
; number 1 with argument 0, i.e., exit(0)
section .data
msg: db "Hello, world!", 10
.len: equ $ - msgOnce actually compiled, this will look like (just the part between the first and last column, the rest is for illustration purposes, generated using
xxd programName)```
0000000: 7f45 4c46 0101 0100 0000 0000 0000 0000 .ELF............
0000
Code Snippets
#include <stdlib.h>
#include <stdio.h>
int main(void) {
printf("Hello, world!");
exit(0);
}; compile on 64-bit Linux using
; nasm -f elf syscall.asm
; ld -m elf_i386 -s -o syscall syscall.o
global _start
section .text
_start:
mov eax, 4 ; system call number 4: write
mov ebx, 1 ; file descriptor 1: stdout
mov ecx, msg
mov edx, msg.len
int 0x80 ; Passes control to interrupt vector
; invokes system call, in this case system call
; write(stdout, msg, strlen(msg));
mov eax, 1 ; system call number 1: exit()
mov ebx, 0 ; exit status 0
int 0x80 ; Passes control to interrupt vector
; invokes system call, in this case system call
; number 1 with argument 0, i.e., exit(0)
section .data
msg: db "Hello, world!", 10
.len: equ $ - msg0000000: 7f45 4c46 0101 0100 0000 0000 0000 0000 .ELF............
0000010: 0200 0300 0100 0000 8080 0408 3400 0000 ............4...
0000020: cc00 0000 0000 0000 3400 2000 0200 2800 ........4. ...(.
0000030: 0400 0300 0100 0000 0000 0000 0080 0408 ................
0000040: 0080 0408 a200 0000 a200 0000 0500 0000 ................
0000050: 0010 0000 0100 0000 a400 0000 a490 0408 ................
0000060: a490 0408 0e00 0000 0e00 0000 0600 0000 ................
0000070: 0010 0000 0000 0000 0000 0000 0000 0000 ................
0000080: b804 0000 00bb 0100 0000 b9a4 9004 08ba ................
0000090: 0e00 0000 cd80 b801 0000 00bb 0000 0000 ................
00000a0: cd80 0000 4865 6c6c 6f2c 2077 6f72 6c64 ....Hello, world
00000b0: 210a 002e 7368 7374 7274 6162 002e 7465 !...shstrtab..te
00000c0: 7874 002e 6461 7461 0000 0000 0000 0000 xt..data........
00000d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000f0: 0000 0000 0b00 0000 0100 0000 0600 0000 ................
0000100: 8080 0408 8000 0000 2200 0000 0000 0000 ........".......
0000110: 0000 0000 1000 0000 0000 0000 1100 0000 ................
0000120: 0100 0000 0300 0000 a490 0408 a400 0000 ................
0000130: 0e00 0000 0000 0000 0000 0000 0400 0000 ................
0000140: 0000 0000 0100 0000 0300 0000 0000 0000 ................
0000150: 0000 0000 b200 0000 1700 0000 0000 0000 ................
0000160: 0000 0000 0100 0000 0000 0000 ............Contents of section .text:
8048080 b8040000 00bb0100 0000b9a4 900408ba ................
8048090 0e000000 cd80b801 000000bb 00000000 ................
80480a0 cd80 ..
Contents of section .data:
80490a4 48656c6c 6f2c2077 6f726c64 210a Hello, world!.
Disassembly of section .text:
08048080 <.text>:
8048080: b8 04 00 00 00 mov $0x4,%eax
8048085: bb 01 00 00 00 mov $0x1,%ebx
804808a: b9 a4 90 04 08 mov $0x80490a4,%ecx
804808f: ba 0e 00 00 00 mov $0xe,%edx
8048094: cd 80 int $0x80
8048096: b8 01 00 00 00 mov $0x1,%eax
804809b: bb 00 00 00 00 mov $0x0,%ebx
80480a0: cd 80 int $0x80Context
StackExchange Computer Science Q#23463, answer score: 14
Revisions (0)
No revisions yet.