Hello everyone,
I'm learning Linux and there is something about my folders size that is intriguing to me. When I use the ls command in my ~ I can see that a lot of folders are the same size (4.0K) despite the fact some of them are empty and others are with a lot of files. So what is happening there? This makes me think that the real size of a folder is 4.0k and the ls command shows that, not the size of the contents combined inside a folder. Is this correct? If yes, is there an option in ls or another command to see the size of a folder considering the elements inside of it?
The directory size of 4.0K is, as
@KGIII mentioned, the size of a directory file in the output described above. The directory file is not usually accessible to the user, as distinct from the directory contents which are shown using the ls command.
The directory file holds the metadata some of which consists of the list of the inode numbers of each file, including link files, and then lists the pointers associated with each inode that point to the blocks that contain the file data. The metadata enables the accurate output of the commands used to list files.
To see relative sizes of directory files which contain greater and lesser amounts of metadata one can run the following, where in the /usr/bin directory, there are usually over a thousand files, and in /mnt where, if nothing is mounted, there are no files:
Code:
$ ls -alhd /usr/bin
drwxr-xr-x 2 root root 116K Feb 8 18:35 /usr/bin
$ ls -alhd /mnt
drwxr-xr-x 2 root root 4.0K Nov 15 2023 /mnt
The directory file holding the metadata for over a thousand files in the
/usr/bin directory, which is 116K in the output above, clearly needs a larger size than the directory file holding the metadata for the /mnt directory. In both cases, the size of the directory files are built in blocks of 4096 bytes, which amount to 4 kibibytes each.
In relation to ordinary file sizes, there are two aspects which are best differentiated: the file size and the disk usage.
The disk usage is the amount of space taken up on the disk by the files and the directories. It is not a measure of the size of an individual file or the accumulated size of the files which is the result of adding all the file sizes together to get a total of the file sizes.
Rather, when a file is written it is allocated blocks on the filesystem, so the disk usage is the size of the accumulated number of the blocks the files and directories use.
To make this clearer, here are some examples.
The file named file1 has the contents of a single word "hello":
To see the size of the file one can run the following:
Code:
$ ls -al file1
-rw-rw-r-- 1 tom tom 6 Aug 11 20:53 file1
Note that the number 6 before the date is the size in bytes. To prove it, the following is conclusive:
where the -c option to the wc command outputs the number of bytes.
There are 5 characters and a line feed (or newline byte) in this file which makes its size 6 bytes. Each character and the line feed use 1 byte.
When the disk usage of the file is sought using the du command, the file is seen to use a lot more disk space than just 6 bytes:
Code:
$ du -sh file1
4.0K file1
The file is using 4 kibibytes, or 4096 bytes of disk space. The file has been allocated a block on the disk. If the file is enlarged with added text, it will still occupy a single block until the text exceeds that single block size, when the operating system will allocate it a second block to accommodate it.
To get the exact size of the file with the du command, one can use the command and option as follows:
Code:
$ du --apparent-size -sh file1
6 file1
This shows the same size as the ls output for the same file above.
In relation to the expression "everything is a file", it has a metaphorical "truth" for linux, as it originated as a description for the same aspect of the UNIX operating system pre-dating linux. It is derived from the fact that everything in UNIX and linux is treated as a file in the sense that files are just a stream of bytes with no special structure. The structure of such files is determined by the programs that use the files. Perhaps the expression "everything is treated as a file" is more accurate since there are some elements in the system which are not really accessible as files such network interfaces and some low-level file types keeping track of disks, but they are still treated as a file consisting of a stream of bytes. YMMV.