Bash Fundamentals

Need a refresher of bash fundamentals? I did too. An HTML rendered version of the Jupyter Notebook follows, and the notebook can be accessed at https://github.com/tjnewton/snippets/blob/master/bash/bash_fundamentals.ipynb

My journey [re]learning bash, by Tyler Newton.

Purpose of notebook:

  1. reference for [re]learning the fundamentals of bash, the Bourne Again Shell, a command line interpreter.

This notebook is admittedly pretty meta, as running a bash shell from a jupyter notebook is roundabout, but I enjoy note-taking in notebooks so here we are in the land of redundancy. Content adapted from linux manual pages, the second chapter of C and Data Structures by J. Sventek, and personal trial and error.

In [ ]:
%%bash 
# %%bash is cell magic that runs the notebook cell with bash

# We have to start somewhere so let's begin with some random 
# and simple bash commands. bash is picky about whitespace in
# some cases so keep that in mind while scrolling. 

# print the date
date

# print the path to the current working directory
pwd

# list files in current working directory
ls

# date, pwd, and ls don't require additional information, or
# arguments. However most commands require agrumemts, like echo.
# print 'testing' using echo. 
echo testing

# semicolons are command seperators, notice how spaces are treated
echo 1; echo 2.0; echo       three; echo ' four'; echo five and six
# note how consecutive whitespace is ignored unless it is inside
# quotes, e.g. "" or ''.
In [ ]:
%%bash

################################################
# now let's back up. how does bash read input? #
################################################

# bash breaks up input by spaces (blanks), tabs, or punctuation (e.g ;).
# the first word of the input denotes the program to execute, so echo is 
# the program being executed in "echo testing". The remaining contents
# of the input (before any ;) are supplied to the program as a list of 
# words. This is why leading or trailing whitespace is ignored by echo, 
# it does not change the list of words. Items surroumded by quotes are 
# treated as a single word. For this same reason we can put quotes around
# a characters that we want to be considered a single word, even if it
# contains a quote. For example:

echo "Howdy y'all!"

echo 'That snake just said "sssssss" (⊙_☉)'
In [ ]:
%%bash

#########
# paths #
#########

# Using cd you can change the working directory to the path
# of you choice. You can reference folders in the current
# directory by typing the directory name after cd. ~ is a 
# shortcut to your home directory, so "cd ~" will change the
# working directory to your home directory. "cd ", cd followed
# by a space, will also change the working directory to your 
# home directory. . (a single period) is a shortcut for the 
# current working directory, and .. (two periods) is a shortcut
# for the parent directory of the working directory, or the
# folder that contains the folder you are currently working
# in. 

# list files in current directory
ls

# add a blank line
echo

# since . is the current directory, ls . prints the same thing
ls .

# add a blank line
echo

# what is in the parent directory?
ls ..

# add a blank line
echo

# what is in our home directory?
ls ~

# add a blank line
echo

# we can also check by changing the working directory to that
# directory
cd ~
ls
In [ ]:
%%bash

# You can chain these shortcuts together to access files in
# any directory on your machine.
# move to the parent directory
cd ..
# print the current working directory
pwd
# list the files in the current working directory
ls

# add a blank line
echo

# move to the parent directory of the current directory,
# which is the grandparent directory of the original directory
cd ..
# print the current working directory
pwd
# list the files in the current working directory
ls

# one more time
echo
cd ..
pwd
ls
In [ ]:
%%bash

###############################
# program options (arguments) #
###############################

# There are short options start with a - and are usually a single letter,
# and there are long options that start with -- and are written out in full
# (e.g. --name=value). 

# cd changes the working directory and .. specifies the parent
# directory of the current working directory (there are more 
# files in that folder for the examples below)
cd ..

# normal ls without options as seen above
ls

# add blank line under the output from above
echo  

# short option to list all files in current working directory
# (including files that start with ".")
ls -a

# add blank line under the output from above
echo  

# long option to list all files in current working directory (same as -a)
ls --all

# ! # ! # ! # ! # ! # ! # ! # ! # ! # ! # ! # ! #
#   The above long option won't work on macOS.  #
# Use VirtualBox and ArchLinux to explore bash. #
# ! # ! # ! # ! # ! # ! # ! # ! # ! # ! # ! # ! #
In [ ]:
%%bash

####################
# pattern matching #
####################

# bash also checks for *, ?, [, and ] when breaking up input.
# These characters are used for pattern matching.

# cd changes the working directory and .. specifies the parent
# directory of the current working directory (there are more 
# files in that folder for the examples below)
cd ..

# The wildcard character, *, matches everything. So the command
# below lists all files in the current directory that end in .md
ls *.md

# add blank line under the output from above
echo 

# The ? character matches any single character. The command below
# lists all files with a three character file extension.
ls *.???

# add blank line under the output from above
echo 

# The square brackets allow specification of a range or list of 
# characters to match. - denotes a range. The command below lists
# all files that start with an R or P followed by 5 characters and a .md
# extension. 
ls [RP]?????.md

# add blank line under the output from above
echo 

# The command below lists all files that start with any lowercase
# letter followed by 5 characters and a .md extension (there are none
# so this command will raise an error).
ls [a-z]?????.md
In [ ]:
%%bash

######################
# manual (man) pages #
######################

# to find out more information about a command and its arguments
# use "man" followed by the command. Optional arguments are 
# surrounded by brackets [], vertical bars | separate choices,
# and ellipses ... indicate repetition. 
man ls
In [ ]:
%%bash

####################################
# listing files and their metadata #
####################################

# move to the parent directory where there are more files
cd ..

# the -l option of ls prints the number of blocks of disk
# space the listed files occupy, followed by information for
# each file consisting of: 
#   -first character indicates directory (d) or file (-)
#   -characters 2-4 indicate the read, write, and execute
#     permissions for the owner
#   -characters 5-7 indicate the read, write, and execute
#     permissions for the associated user group
#   -characters 8-10 indicate the read, write, and execute
#     permissions for everyone
#   -the next character indicates the number of links
#   -next are the owner of the file, and the user group
#   -the size of the file in bytes follows
#   -next the date and time of last modification
#   -followed by the name of the file
ls -l
In [ ]:
%%bash

# additionally, you can display information for only a 
# specific file
ls -l bash_fundamentals.ipynb
In [ ]:
%%bash

##################################
# inspecting short file contents #
##################################
# move to the parent directory where there are more files
cd ..

# if you don't need to edit a file, why open it in a text editor?
# cat prints the contents of each file argument to the terminal,
# and because of this it is ideal for short files
cat README.md
In [ ]:
%%bash

#################################
# inspecting long file contents #
#################################

# more allows paging through text one screenfull at a time,
# however this doesn't work in a jupyter notebook because it
# allows the program a very large screen. Try it in terminal!
more bash_fundamentals.ipynb
In [ ]:
%%bash

######################################
# inspecting very long file contents #
######################################

# less allows paging through text one screenfull at a time,
# backward movement, and optimal file loading for large files,
# however this doesn't work in a jupyter notebook because it
# allows the program a very large screen. Try it in terminal!
less bash_fundamentals.ipynb
In [ ]:
%%bash

#####################
# previewing a file #
#####################

# head displays the first 10 lines of a file by default,
# but you can change the number of lines displayed. 
head bash_fundamentals.ipynb
In [ ]:
%%bash

# and tail displays the last 10 lines of the file by default.
# In the same way as head, the number of lines can be specified,
# as in the example below.
tail -n 4 bash_fundamentals.ipynb
In [ ]:
%%bash

###################
# comparing files #
###################

# cmp and diff allow us to tell if files are different, and 
# how different files differ from one another. 

# first make a file using echo and some features that will be
# described in future sections
echo -e "file to move,\ncopy,\nedit,\nand delete" >move.file

# then make a copy of move.file so the contents can be changed
# for comparison
cp move.file move2.file

# then edit the file to make it different from move.file
In [ ]:
%%bash

# cmp compares the two files and prints the first difference
cmp move.file move2.file
In [ ]:
%%bash

# diff prints the line numbers that are different and shows 
# lines from the first file with a < prefix and lines from 
# the second file with the > prefix.
diff move.file move2.file

# remove that file
rm move2.file
In [ ]:
%%bash

#######################
# count file contents #
#######################

# wc displays the number of characters, words (sequence of
# non-whitespace characters separated from other words by 
# whitespace), and lines.
wc bash_fundamentals.ipynb
In [ ]:
# RUN THIS CELL IN TERMINAL

########################
# translate characters #
########################

# tr replaces all occurences of the first input with the 
# second input. Translate doesn't work in a jupyter notebook
# because it accepts standard input from the shell and 
# displays to standard output, so try the following in your 
# terminal:
tr e a
# then type:
hello
# then "hallo" would be printed to the terminal
# then press "ctl-d" to exit (press the control or ^ key 
# and the d key at the same time).

# tr can also accept character classes. use "man tr" for more
tr '[:upper:]' '[:lower:]'
# then type:
Hey Folks
# then "hey folks" would be printed to the terminal
# then press "ctl-d" to exit

#####################
# delete characters #
#####################
# tr can also delete specified characters using the -d option.
# Run the following in your terminal
tr -d l
# then type:
hello
# then "heo" would be printed to the terminal
# then press "ctl-d" to exit
In [ ]:
%%bash

#########################
# inspecting file lines #
#########################

# uniq returns the unique lines in a file with no repeats,
# so it is a filter. appending the -c argument will also 
# return the number of times each line occurs in the file.
# uniq is case sensitive. 
uniq -c move.file
In [ ]:
%%bash

######################
# sorting file lines #
######################

# sort returns the sorted lines of a file in alphabetical
# order by default, in the sorting order of digit, blank, 
# uppercase letter, lowercase letter.
sort move.file
In [ ]:
%%bash

# we can reverse the sort order using the -r option
sort -r move.file
In [ ]:
%%bash

#########################
# pattern matching pt.2 #
#########################

# grep (get regular expression and print) is a command that
# searches the file arguments for lines that contain a match
# to the specified pattern. The below command looks for the 
# word "to" in the file move.file:
grep to move.file
In [ ]:
%%bash

# additionally, we can use grep to print all lines that do
# not contain the specified pattern
grep -v to move.file
In [ ]:
%%bash

# if you give grep multiple files to search it will prefix
# the result with the originating file
grep to move.file ../README.md
In [ ]:
%%bash

############################
# copying and moving files #
############################

# cp copies the specified file to the target file path. 
# Note that if the target file path already exists it will
# be overwritten. Be careful!
cp move.file move_copy.file

# mv moves the specified file to the target file path,
# and we can see from ls and cat that move_copy.file has
# been moved to moVe.file and the former file named 
# move_copy.file no longer exists. Note that if the target
# file path already exists it will be overwritten.
mv move_copy.file moved.file

# list all files in the current working directory
ls -a

# view the contents of move_copy.file (it doesn't exist)
cat move_copy.file
In [ ]:
%%bash

##################
# removing files #
##################

# BE CAREFUL
# BE CAREFUL
# BE CAREFUL
# BE CAREFUL

# It is easy to delete all of the files on your hard drive
# using the rm command. Be careful. Use VirtualBox with
# ArchLinux to explore rm safely. 

# You can also alias the rm command to 'rm -i' to prompt at 
# deletion with the command alias rm='rm -i'
# Alternatively, you can create a folder named trash and 
# alias the rm command to move files into the trash folder,
# imitating the concept of a trash bin. To do that,
# use the command alias rm='mv -t ~/trash'
# You will then need to unalias the rm command when deleting
# the contents of the trash folder. 

# rm removes the specified files
rm moved.file

# check that the file was removed
ls -a
In [ ]:
%%bash

####################
# create directory #
####################

# mkdir creates a new directory of the specified name
mkdir newDirectory

# check for the directory
ls

# delete the directory, alternatively use rm -d
rmdir newDirectory
In [ ]:
%%bash

##############################
# list environment variables #
##############################

# env lists your environment variables, which are just 
# specific cases of shell variables.
env
In [ ]:
%%bash 

# check the current values of USER using echo
echo $USER
In [ ]:
%%bash

# add environment variables by defining the variable first
ID=/usr/local/include

# check the variable
echo $ID

# add the variable to the environment
export ID

# add a blank line
echo

# print the environment variables and check for ID
env
In [ ]:
%%bash

# PATH is a special environment variable consisting of a list
# of directory names separated by colons (:). bash searches
# for an executable file with the command name specified in 
# each entry of PATH from left to right until it finds a file
# or has searched all paths in PATH. 
echo $PATH
In [ ]:
%%bash

# which searches for the path associated with a program. In 
# the example below we are searching for the path corresponding
# to the ls program.
which ls
In [ ]:
%%bash

#########################
# editing PATH variable #
#########################

# bash searches PATH directories left to right. This is handy
# if you want to find a particular program before another
# program with the same name, because you can add the path
# containing the desired program in your PATH variable before
# the undesirable path. This cell demonstrates that with the 
# ls program. 

# make a personal bin directory in your home directory
mkdir ~/bin

# make a copy of ls in your personal bin directory
cp /bin/ls ~/bin

# Insert ~/bin at the front of the PATH variable by appending
# the current PATH variable to the string "~/bin:". $ is the 
# syntax to access the arguments stored in environment variables
PATH=~/bin:$PATH
    
# what is the path of the ls program?
which ls

# remove the ls copy from ~/bin
rm ~/bin/ls

# add a blank line
echo

# what is the path of the ls program now?
which ls

# delete the directory, alternatively use rm -d
rmdir ~/bin
In [ ]:
%%bash

##################
# standard input #
##################

# Standard input is the default means by which a program reads
# data, normally the keyboard. The shell is in charge of this
# input so it can redirect it to files or objects. 

# We saw above that the program cat will print the text in a
# file. Instead of specifying a file, type only the command
# "cat" to read from standard input (the keyboard). As mentioned
# above, reading from standard input doesn't work in a .ipynb
# so try this in your terminal. 
cat
# type something
# that something will be echoed back to the terminal window
# press ctl+d to indicate end of input
In [ ]:
%%bash

# input can be directed to a command using the < symbol

# let's check the result of cat move.file
cat move.file

# So that's what it looks like when cat checks a file, and 
# the cell above illustrated the cat program's ability to take
# standard input. Now let's feed move.file to the cat program
# using standard input and <.
# first print a blank line
echo
cat <move.file

# they are the same. The cat program doesn't know the difference.
# This is why < is powerful in bash. Stay tuned!
In [ ]:
%%bash

###################
# standard output #
###################

# Standard output is the default channel to which a program
# writes results, usually the terminal window (or a jupyter 
# notebook window). Output can be redirected similar to how
# input can be redirected, but instead using > to write to 
# a file, and >> to append to the end of a file. Additionally,
# commands can be grouped to act as a single command, shown
# below.

# list all files in current working directory and their metadata,
# then print a blank line, then print all files in the parent 
# folder and their metadata, then write to the file tmp.out. 
(ls -l; echo ; ls -l ..) >tmp.out

# print the contents of tmp.out
cat tmp.out

# delete tmp.out
rm tmp.out
In [ ]:
%%bash

#########################
# standard error output #
#########################

# Standard error output is the default means by which a program
# writes error messages, normally the terminal window. Standard
# error is redirected very similarly to standard output, but 
# instead using 2> to write to a file, and 2>> to append to the
# end of a file. 

# here the error is directed to the standard output because
# there is a typo in the filename
cat moove.file

# here the error is directed to the file tmp.err
cat moove.file 2>tmp.err

# try another file that doesn't exist, append error to tmp.err
cat mooove.file 2>>tmp.err

# view contents of tmp.err
cat tmp.err

# delete the file
rm tmp.err

# note that in the same way 2> means write standard error 
# output to a file, 1> means write standard output to a file,
# so we could have used 1> and 1>> in the previous cell instead
# of > and >>.
In [ ]:
%%bash

# To redirect standard error to the standard output file, use
# the 2>&1 syntax, like below. The syntax means to redirect 
# standard error output to the same stream as standard output.
cat move.file mooove.file >tmp.out 2>&1

# view the file
cat tmp.out

# delete the file
rm tmp.out
In [ ]:
%%bash

#############
# pipelines #
#############

# The power of bash is in its ability to manage multiple 
# processes while allowing the user to specify how these 
# processes talk to each other, without the processes being
# aware of this communication. This is done with pipelines. 
# To count the number of files in a directory we can pair ls,
# wc, and a pipeline (often referred to as a pipe, denoted by
# | ).

# ls . lists the contents of the current working directory,
# then the pipe, or |, feeds the contents of the output from ls
# to wc, the same as if you were to specify a file containing 
# the output from ls . as the input for wc. The -w option 
# specifies the number of words in the file, thus telling us how
# many files are present. 
ls . | wc -w
In [ ]:
%%bash

# pipelines only consider standard input and output by default,
# not standard error output. The 2>&1 syntax works for redirecting
# error messages with output. 2>&1 denotes that the standard
# error output should be directed to the same place as the 
# standard output (the pipe). 
cat move.file muve.file 2>&1 | more
In [ ]:
%%bash

####################################
# bash's order of operations & tee #
####################################

# bash first searches for pipe | symbols and redirects the left
# process's standard output to the pipe and reads the right 
# process's standard input from the redirected output on the 
# left side of the pipe. Then bash processes any other input or
# output (I/O) redirection for each command. 

# Pipes are great for redirecting input, but if you want the
# output and errors to be printed to the terminal as well as
# be logged to a file, tee is a convenient command, as it copies
# its standard input to its standard output, which can be
# specified as zero or more files. Thus the below command prints
# to standard output and the file "log". Errors would have also
# printed to standard output and the file "log", but there are
# no errors.
ls . 2>&1 | wc -w 2>&1 | tee log

# add a blank line
echo

# check the contents of log
cat log

# delete log
rm log
In [ ]:
%%bash 

# another pipe example

# To determine the frequency of words in a file we can utilize 
# tr to break up the file into one word on each line using the
# command tr -s '[:blank:]' '\n' then we can use sort on the 
# output of tr, followed by using uniq on the output of sort 
# with the -c flag to count the frequency of each word.
tr -s '[:blank:]' '\n' <../README.md | sort | uniq -c
In [ ]:
%%bash

# If another sort is added to the command pipeline above,
# the results will be sorted by word frequency.
tr -s '[:blank:]' '\n' <../README.md | sort | uniq -c | sort
In [ ]:
%%bash

#################
# shell scripts #
#################

# Jupyter notebook cells are extremely similar to shell scripts
# in that they run a block containing bash commands. Shell
# scripts are executable files that contain shell (bash in this
# case) commands.

# save pwd and ls -l commands to a shell script
echo 'pwd; ls -l' >dirinfo

# run the file
bash dirinfo
In [ ]:
%%bash

# To change the permissions associated with a file, use the
# chmod command. The +x option adds executable permissions
# to the file. 

# give dirinfo executable permissions so we can invoke it by name
chmod +x dirinfo

# run dirinfo directly without the "bash " prefix from terminal
./dirinfo
In [ ]:
%%bash

# to run the file without the "./" prefix, move the file to 
# the ~/bin directory and add the ~/bin directory to the 
# PATH variable. Then you can invoke the program with "dirinfo"

# delete the file use in the previous cell
rm dirinfo
In [ ]:
%%bash

##########################
# shell script arguments #
##########################

# bash uses the $ syntax to indicate arguments in a script. 

# first make a script (as above) that accepts one argument, 
# denoted $1. $1 specifies that bash should replace $1 with 
# the first argument specified after the command dirinfo. Up
# to 9 arguments can be specified using $2, $3, etc.
echo 'pwd; ls $1' >dirinfo

# run the file
bash dirinfo -l

# print a blank line
echo

# if no argument is supplied, $1 is replaced with an empty string
bash dirinfo

# delete the file
rm dirinfo
In [ ]:
%%bash

############################
# for, do, done statements #
############################

# The for-do-done flow control allows commands to be executed
# in sequence for each of a number of values.

# print the loop iteration. On each iteration of the loop, 
# the variable "index" takes the next value, starting at 1 and
# ending at 4.
for index in 1 2 3 4; do
    echo Iteration number $index
done
In [ ]:
%%bash

#########################
# arguments from output #
#########################

# To use line items in a file as items in a for loop, bash 
# allows construction of a single string containing all lines
# from a specified input with all newline characters (\n)
# replaced by spaces. There are two diffent syntaxes that
# accomplish this, with the latter being preferred.

# make a file with a different word on each line
echo -e "word1\nword2\nword3" > wordlist

# display the contents of wordlist
cat wordlist

# make a blank line
echo

# first syntax
echo `cat wordlist`

# make a blank line
echo

# preferred second syntax
echo $(cat wordlist)
In [ ]:
%%bash

# this can be use in conjunction with for loops
for word in $(cat wordlist); do
    echo 'The word is' $word
done

# delete the file
rm wordlist
In [ ]:
%%bash

########################
# case-esac statements #
########################

# case statements allow different actions to be taken based on
# the value of a variable. The value specified after case is 
# compared with the values listed below it to the left of the 
# ). When a match is found, all commands to the right of ") "
# are executed until ;; is found, indicating the end of commands.
for index in 1 2 3 4 5 6 7 8 9; do
    x="$index-th"
    case $index in
        1) x="1-st";;
        2) x="2-nd";;
        3) x="3-rd";;
    esac
    echo $x iteration
done
In [ ]:
%%bash

#################################
# if, then, else, fi statements #
#################################

# bash allows if-then-else statements
for index in 1 2 3 4; do
    if [[ $index -eq 1 ]]; then
        echo first line
    else
        echo 'not first line'
    fi
done
In [ ]:
%%bash

########
# find #
########

# find applies the specified commands to all files in a directory
# tree. The below command prints the names of all objects in
# the tree rooted at the current directory. 
find . -print
In [ ]:
%%bash

#######
# tar #
#######

# tar (tape archive) packages many files into a single file.
# The -c option creates an archive, -v wirtes the name of each
# file as it is added to the archive, and -f specifies the
# filename to write to. 

# change to parent directory
cd ..

# lets tar the grids directory
tar -cvf grids.tar grids

# move that file into the bash folder
mv grids.tar ./bash/grids.tar
In [ ]:
%%bash

# we can also check the contents of a tar archive using the -t
# option to display the table of contents
tar -tf grids.tar
In [ ]:
%%bash

# and even further information about the archive contents can
# be displayed, similar to ls -l
tar -tvf grids.tar grids/resampling_grids.ipynb
In [ ]:
%%bash

# to extract files from the tar archive use -xf. tar will
# extract to the home directory by default so use the -C
# flag to specify the desired path

# extract grids.tar to the current working directory
tar -xf grids.tar -C .

# list the files in grids
ls grids

# delete the grids folder
rm -r grids
In [ ]:
%%bash

# one can also extract a file to standard output using tar
tar -xOf grids.tar grids/resampling_grids.ipynb >notebookcopy.ipynb

# preview contents of notebookcopy.ipynb
head notebookcopy.ipynb

# delete files
rm notebookcopy.ipynb grids.tar
In [ ]:
%%bash

###############
# compression #
###############

# gzip encodes files using Lempel-Ziv coding
# gunzip decodes an encoded file into the original file
# zcat decodes an encoded file into the original file on standard
# output

# by default gzip replaces the file with the .gz compressed file
# so lets make a copy of a file first
cp move.file moved.file

# compress it
gzip moved.file

# check the files present
ls
In [ ]:
%%bash

# gunzip replaces the compressed .gz file with the unencoded file
gunzip moved.file.gz

# check the files present
ls
In [ ]:
%%bash

# zcat unencodes each file with a .gz extension and writes that
# content to standard output

# first lets make a gz file
gzip moved.file

# zcat syntax for linux
# zcat moved.file.gz

# zcat syntax for macos
zcat <moved.file.gz

# delete the file
rm moved.file.gz move.file
In [ ]:
%%bash

#######################
# compressed archives #
#######################

# Combining tar and gzip for compressed archives seems like
# the next logical step, but gzip and gunzip both delete the
# starting file and store both compressed and uncompressed 
# versions of the archive during processing. This isn't ideal.
# tar's -z option solves this and compresses or decompresses
# archives during processing. First lets make a .tar file like
# above. 

# change to parent directory
cd ..

# tar compresses the grids directory, .tgz is the extension
# used for gzipped tar archives
tar -zcf grids.tgz grids

# and tar decompresses the grids directory to inspect elements
tar -ztf grids.tgz

# to decompress an archive into the current working directory 
# in a verbose manner use:
tar -xzvf grids.tgz

# remove the tarchive
rm grids.tgz
In [ ]:
%%bash

#######
# zip #
#######

# zip works similar to tar to compress files. 

# change to parent directory
cd ..

# zip the grids folder 
zip grids.zip grids

# unzip the zipped file, -l lists the contents of the zipped file
unzip -l grids.zip

# remove the zip file
rm grids.zip

# zip has a -p option that allows extracted a member of the 
# archive to standard output. Files are extracted into the 
# current working directory by default. The -o option overwrites
# existing files without a prompt.
In [ ]:
%%bash

###############################
# compression with a manifest #
###############################

# a tgz archive can be created from a list of files, e.g.
# tar -zcvf new_archive.tgz file1 file2 file3 file4 file5
# Instead, a file manifest can be generated and supplied to
# tar. 

# the command touch creates files. We can combine this with
# a loop to generate some temporary files
for index in 1 2 3 4 5; do
    touch file$index
done

# generate a file with the list of files
echo -e "file1\nfile2\nfile3\nfile4\nfile5" >manifest

# inspect manifest
cat manifest

# make a tgz file from the manifest
tar -zcvf new_archive.tgz $(cat manifest)

# print blank line
echo

# inspect elements of tgz file
tar -ztf new_archive.tgz

# remove the files
rm new_archive.tgz manifest file1 file2 file3 file4 file5
In [ ]:
# hyphen - denotes standard input (N/A in a jupyter notebook)

# Keyboard commands
# pressing control+d will end standard input
# pressing control+c will stop a running command
# pressing control+u will erase the current line in a bash terminal