Rush: ft_tar

The main purpose of the rush is to re-create the GNU tar function. tar (tape archiving) is similar to the gzip command. It combines multiple files into an archive file (or a single file so it is easier to send to people).

Note: I’ve heard from others that if you did init, this will be easy. If you didn’t, you’re on the same boat as me haha — struggling.

[Brief Overview]

Exercise 00: Archiving and Unarchiving
This a substep of tar. Tar is able to combine files into a small file and it is also able to read a .tar file to extract or recover the files from the .tar file.

Exercise 01: Archiving and Unarchiving with Directories (recursively)
Some files may be in different folders. We have to take this into account!

Exercise 02: Managing permissions and the dates of the file archives
A step closer toward creating ft_tar! It’s time to look at flags! The flags we need to consider are xcvftp.

Exercise 03: Decompress an archive generated ft_tar file using tar
If you create a .tar file using ft_tar, you should be able to untar it using the system’s tar function!

Exercise 04: Compressing files via ft_tar
Time to reduce the size of your files to have a faster transmission over a network and reduce disk space. This should mimic tar’s compression command.

[Basic Understanding of Tar]

→ Creating a tar file (https://kb.iu.edu/d/acfi)

tar -cvf toto.tar file1 file2

-c = creating a new archive file
-v = verbose
-f = combining the files instead of creating a tape archive

→ To tar & compress the files:
tar -cvzf toto.tar.gz file1 file2

This will combines all the files and compress it into a compressed archive file.

-z = uses the gzip command to compress the archive

Note: .tgz and .tar.gz are the same

Alternative tar & compression
(if system does not use GNU. Our MAC computers are using GNU):

tar -cvf — file1 file2 | gzip > my_files.tar.gz

tar -cvf — file1 file2 | compress > my_files.tar.Z

→ Extracting archive file

tar -xvf toto.tar

tar -xvzf toto.tar.gz

Alternative tar & compression
(if system does not use GNU. Our MAC computers are using GNU):

gunzip -c toto.tar.gz | tar -xvf - # for files compressed with gzip

uncompress -c toto.tar.Z | tar -xvf - #for files compressed with compress

[Simple Tar Program]

Obtain: (Link)
1) File Name
2) Length of File Name
3) File
4) File Size

Put it in the format:
FileNameLengthFileNameSizeOfFileFile

Example:
File1: xxx
File 2: yyyyy

5File13xxx5File25yyyyy

You can additionally add the file type (file, link, directory) and permissions.

[Finding the Size of Files]

**Alternatively, you can use stat and import it (scroll down for more info)**

https://stackoverflow.com/questions/238603/how-can-i-get-a-files-size-in-c

fseek(fptr, 0, SEEK_END); // seek to end of file
long int size = ftell(f); // get current file pointer (count bytes)

To learn more about fseek: https://www.tutorialspoint.com/c_standard_library/c_function_fseek.htm

To learn more about ftell: https://www.tutorialspoint.com/c_standard_library/c_function_ftell.htm

[Understanding the Mac’s Tar Archive File Format]

http://www.onicos.com/staff/iz/formats/tar.html
https://www.gnu.org/software/tar/manual/html_node/Standard.html

test1.c (has chmod 777)
12345

test2.c
67891

toto.tar
test.c000777 134063 122320 00000000005 13621744055 013557 0ustar00vinguyen2019_july000000 000000 12345test1.c000644 134063 122320 00000000005 13621744063 013630 0ustar00vinguyen2019_july000000 000000 67891

test.c = file name

000777 = file mode (permissions like chmod. For all rwx, 777 is 000777)

134063 = user ID

122320 = group ID

000000000005 = file size

13621744055 = modify time

013557 = header checksum

0 = type flag (1 for linked flag, 2 for a symbolic link…)

ustar = magic (allows tar to determine the USTAR format is being used)

vinguyen = owner user name

2019_july = owner group name?

000000 = device major number?

000000 = device minor number?

~~Padding (155 bytes)

12345 = file content (512 bytes)

MacOS uses a USTAR format for their tar.

Offset   Length   Contents
0 100 bytes File name ('\0' terminated, 99 maxmum length)
100 8 bytes File mode (in octal ascii)
108 8 bytes User ID (in octal ascii)
116 8 bytes Group ID (in octal ascii)
124 12 bytes File size (s) (in octal ascii)
136 12 bytes Modify time (in octal ascii)
148 8 bytes Header checksum (in octal ascii)
156 1 bytes Link flag (or type flag)
157 100 bytes Linkname ('\0' terminated, 99 maxmum length)
257 8 bytes Magic ("ustar \0")
265 32 bytes User name ('\0' terminated, 31 maxmum length)
297 32 bytes Group name ('\0' terminated, 31 maxmum length)
329 8 bytes Major device ID (in octal ascii)
337 8 bytes Minor device ID (in octal ascii)
345 167 bytes Padding
512 (s+p)bytes File contents (s+p) := (((s) + 511) & ~511), round up to 512 bytes
struct posix_header
{ /* byte offset */
char name[100]; /* 0 */
char mode[8]; /* 100 */
char uid[8]; /* 108 */
char gid[8]; /* 116 */
char size[12]; /* 124 */
char mtime[12]; /* 136 */
char chksum[8]; /* 148 */
char typeflag; /* 156 */
char linkname[100]; /* 157 */
char magic[6]; /* 257 */
char version[2]; /* 263 */
char uname[32]; /* 265 */
char gname[32]; /* 297 */
char devmajor[8]; /* 329 */
char devminor[8]; /* 337 */
char prefix[155]; /* 345 */
/* 500 */
};

[Obtain Header Block Details via STAT() System Call]

https://linuxhint.com/stat-system-call-linux/

#include <sys/stat.h>
int stat(const char *path, struct stat *buf)

If the function is executed successfully, 0 is returned if there are any errors, -1 will be returned.

const char *path = name of the file

struct stat *buf = pointer to a stat structure

Relevant Flags:
st_size = size of the file in bytes
st_mtime = modified time
st_uid = uid(user ID)
st_gid = gid (group ID)
st_mode = mode (file permissions)

Matching Needed Parameters:
name = argv[i]
mode = st_mode
uid = st_uid
gid = st_gid
size = st_size
mtime = st_mtime
4) *chksum
3)typeflag = uses stat + sys/stat.h (function needed -> additional link)
5) linkname = the link file name that it is linked to (see below)
*magic = “ustar”
*version = “00” (Link)
*1) uname = s_pwd->pw_name
*2) gname = s_grp->gr_name
devmajor = major(statObject.st_dev)
devminor = minor(statObject.st_dev)

*These will require another struct that is not stat
1) Password structure will be used (Link)

#include <pwd.h>
struct passwd *s_pwd;
uname = s_pwd->pw_name;

2) Group structure will be used (Link)

#include <grp.h>
struct group *s_grp;
gname = s_grp->gr_name;

3) Type flag Field in C for tar (Link)
From src/tar.h

You would have to create a function to determine which link flag it is. If it is a directory, then the name of the file will have to take that into account.

From sys/stat.h
The function will have to consider the mode (permissions) for the file. The files mode contains the file type code and access permission bits. More information can be found here: https://www.gnu.org/software/libc/manual/html_node/Testing-File-Type.html

4) Check Sum (chksum)
Checksum is a small sized datum that is used to verify data integrity and to detect errors that may have been introduced during transmission/storage. They are used to compare two data sets to make sure they are the same.

The checksum is calculated by taking the sum of the unsigned byte values of the header record with the eight checksum bytes taken to be ascii spaces (decimal value 32). It is stored as a six digit octal number with leading zeroes followed by a NUL and then a space.

To compute the checksum of a file: https://people.cs.umu.se/isak/snippets/checksum.c

5) LinkName
Soft Link = shortcut to a program. When you delete the original, the short cut has no purpose.

Hard Link = exact mirror copy of the original file. You can delete original.

ln sourceFN linkFN

sourceFN
12345

linkFN
12345 (I didn’t need to add it since I hardlinked it to sourceFN)

For soft links, to get the name of what it is pointing to, use the command readlink fileName . This will output the filename of what it is linked to.

For the hard links, you have to get the inode use the ls -i flag. Once you get the inode, you can search for all the files that are hardlinked using find . -inum inodeValue

In this case, test5.c is the original file. test6.c was hardlinked to test5.c and test9.c was hardlinked to test6.c. They both have a linkname of test5.c

Using Stat:

#include<stdio.h>
#include<sys/stat.h>

int main()
{
struct stat sfile; //pointer to stat struct
stat(“stat.c”, &sfile); //stat system call
printf(“st_mode = %o”, sfile.st_mode); //accessing st_mode (data member of stat struct)
return 0;
}

--

--