Linux/Unix Commands
The way of interacting with Linux or Unix Operating System is linux commands. The commands are way of passing message to the OS to perform certain operations and return back desired results. It is one of the most popular and powerful operating system used by many organizations. Developers,administrators use Linux Commands, shell scripting to perform day to day activities using top linux commands in sequence or series as per requirement. Shell scripts , cron jobs and many other programs can be written to automate daily boring stuffs. Even shell scripts are used as a scripting language to interact with databases like Oracle,Sql Server, Mysql, Rest API’s programmatically.
Shell scripts are set of linux commands with programming flavor used to automate recurring or repetitive task written in a particular order. Shell scripts can be invoked directly from command line or can be scheduled through Cron Jobs. In a nutshell, its very essential to learn linux basics commands at least to begin with Data Engineering carrier. We are going to cover at least top 50 linux common commands which are very very useful and easy to learn.
Linux file commands
- ls : List directory contents
- cd : Change directory
- pwd : Print working directory
- mkdir : Create a new directory
- rmdir : Remove an empty directory
- touch : Create an empty file or update file timestamps
- rm : Remove files and directories
- cp : Copy files and directories
- mv : Move or rename files and directories
- chmod : Change file permissions
- cat : Displays the contents of a file
- head : Displays the first ten lines of a file
- tail : Displays the last ten lines of a file
- find : Search for files and directories
- vi : Editor to open a file in edit mode. vi <filename>. To save the content press esc+shift+colon +wq. To quit without saving press esc+shift+colon+!+q
- sort : Sort lines of text
- uniq : Displays uniq lines from file by eliminating duplicate records
- tee : Redirect output to multiple files
- tr : Translate characters in a file
- diff : Compare two files or directories
- wc : Count words, lines, and characters in a file
- ln : Create links between files
Linux System Information Commands
- top : Display system resource usage and running processes
- ps : Show information about running processes
- uptime : Show system uptime
- df : Show disk space usage
- du : Show disk usage of files and directories
- ifconfig : Show network interface configuration
- netstat : Show network statistics and connections
- ping : Test network connectivity
Linux networking commands
- ifconfig : Configure and display network interfaces
- route : Display or modify the routing table
- netstat : Display network connections and statistics
- ping : Test network connectivity
- traceroute : Display the path packets take to reach a destination
- nslookup : Print list of hosts/ips in a domain
- telnet : Telnet command followed by the host name to establish a connection between two systems using the telnet command.
- nc -z -v <ip address> : It firewall port is open it will get connected else it will fail
- ssh : Connect to a remote server securely
- scp : Copy files securely between servers
- sftp : Securely transfer files between servers
- wget : Download files from the web
- curl : Transfer data from or to a server
- netcat : Network connection tool for sending and receiving data
- tcpdump : Network packet analyzer
Linux Compression Commands
- gzip : Compress files using the gzip algorithm
- gunzip : Decompress files compressed with gzip
- tar : Create and extract archive files
- zip : Compress files using the zip algorithm
- unzip : Decompress files compressed with zip
Linux Process Management
- kill : Send a signal to a process
- pkill : Send a signal to a process by name
- ps : Show information about running processes
- top : Display system resource usage and running processes
- nice : Run a command with modified scheduling priority
- renice : Change the scheduling priority of a running process
- bg : Put a process in the background
- fg : Bring a process to the foreground
Linux Text Processing Commands
- awk : Process text files and generate reports
- sed : Stream editor for modifying text
- grep : Search for patterns in text files
- cut : Extract columns from text files
- paste : Merge lines from multiple text files
Miscellaneous linux commands
- echo : Print a message to the screen
- date : Display or set the system date and time
- cal : Display a calendar
- uname : Print system information
- history : Show command history
Now lets go deep dive into few list of linux commands which are very frequently used and needed in day to day work for a data engineer. Here it goes :
linux ls command options
The “ls” command is used in Linux and Unix operating systems to list the files and directories in a given directory. There are several “ls” command categories which are listed below, along with some examples for each:
- ls -l : This command shows the permissions, owner, group, size, and modification time of each file and directory while displaying them in long format.
- ls -a : This command lists all files and directories, including hidden files that start with a dot (“.”) in the current directory.
- ls -lh : This command displays the file sizes using units like KB, MB, or GB in a human-readable format.
- ls -t : This command displays a list of files and directories ordered by the date of last modification, starting with the most recent ones.
- ls -lrt :This command display a list of files and directories in reverse chronological order, with the oldest files or directories showing up first. The “l” option presents the files and folders in long format; the “r” option puts them in reverse order; and the “t” option arranges them according to the date and time of last modification.
Linux cd command options
The “cd” command is to change directory from current path.Below are few examples of the commands :
If you want to go X number of sub-folders from a current directory then you can use for example “cd /home/user/Directory_examples”. Now you are inside Directory_examples folder or directory If you want to go back or come out from a sub folder then you can use “cd ../../” . First double dot followed by back slash means it is going back to one folder , followed by another double dot back slash will take you back to two folders back. If you want go back directly back to last working directory then “cd -” can be used.
Linux Commands to create a file
There are multiple ways to create a file in Linux. Below are the ways a developer can create a file based on the requirement :
- touch <filename> : It creates an empty zero byte file in the current directory.To create a new file ,write touch <filename> command and hit enter.
- Using Vi or Vim editor : There are few text editors present like vim,nano,etc which can be used to create and edit content of the file. To create a file we can use vi filename command and the press “i” key to insert mode and write contents and then press “Esc” key to exit mode and “:wq” to save and exit the file.
- Using cat command : cat command is to display the content of a file over the screen. But it can be also used to create or edit a file using this command “cat>test_file.txt” . Then start writing the content of the file and once completes then press “Ctrl + d” to save and exit.
“The main difference between vim and cat command to create or edit file is that while editing a file in vi text editor you can undo the changes by not saving the file using “Esc” key to exit mode and “:q!” to not save and exit the file. But if you use cat command to edit a file there is no option to not saving the file. It will always open by deleting the old content and allows the user to put fresh new content and save it by pressing “Ctrl + d”.
- Using echo command : echo command is used to print something directly over screen but we can also redirect the message to a file using the below command.
cut command in linux with examples
The cut command has various options to cut characters or fields from a line. It means it can extract required column from each line of a file, also it can extract range of fields from each line of a file or a simple set of words. We will try to understand with examples. The syntax of cut command is “cut [options] [filename]”
where “filename” refers to the file’s name that has to be processed. Without a filename, “cut” functions by default on standard input. Below are a few often chosen “cut” command options:
- The option (-f) indicates the fields (columns) that are to be extracted.
- The field delimiter is specified by the -d option.
Example:
Let’s say we have the content listed below in a file called “test_file.txt”:
The following command can be used to extract the second column location of this test_file.txt, which is known as “location”:
The “cut” command can also be used to extract a range of selection of columns. For instance, we can use the following command to extract the first and third columns from the test_file.txt:
Linux compression commands with example
Compression is a technique to reduce the size of a big file or folder which will consume less space. These linux commands are very useful when it comes to archiving or effectively use the allocated space.
“unzip” is used to extract data from a zip archive, “gzip” and “gunzip” are used in Linux to compress and decompress files in the gzip format. These are a few examples :
Command : “gzip test_file.txt” . This will compress and create test_file.txt.gz in the same directory. To decompress , “gunzip test_file.txt.gz” command can be used and it bring back to decompressed original state.
Creating a ZIP archive with multiple files:
To create a ZIP archive containing multiple files, use the following command:
Command : “zip archive.zip file1.txt file2.txt file3.txt”
The files “file1.txt,” “file2.txt,” and “file3.txt” will be contained in a ZIP archive with the name “archive.zip” as a result.
File extraction from a ZIP archive:
Use the following command to extract files from a ZIP archive with the name “archive.zip”:
Command : “unzip archive.zip“
With the following command, you can additionally specify particular files or directories to extract:
Command : “unzip archive.zip file1.txt dir1/”
By doing this, the archive’s directory “dir1” and file “file1.txt” will be extracted to the current directory.
Common Linux commands using grep keyword
The command-line tool “grep” is used in Linux to search for text or regular expression patterns in one or more files. These are some instances of “grep” in action:
- To search for a pattern “test” in a file named “file.txt : “grep test file.txt” . This will display the lines containing word test.
- To search for a pattern present in multiple files : “grep -ril insert” . This will search the word insert irrespective of case sensitivity recursively among the directories from current directory and will display the file names wherever the keyword was found.
- To display line numbers along with matched pattern : “grep -n test file.txt” . This will display the line number along with the entire matched pattern line from the file content.
- To search for exact matching pattern : “grep -w test file_1.txt” . Sometimes what happens is that the search pattern is a sub set of a big word and displays that line as well as output. To avoid this we can use -w option to get exact match and provide the output to user.
- To avoid match and display rest : “grep -v bye file_1.txt” . This will produce all the lines where bye keyword is not present.
- Linux grep multiple patterns or linux grep or condition : Use the pipe (|) character to divide up as many as patterns to search for in a file: “grep -ir pattern1\|pattern2 directory/”
- Linux grep regex commands: Using regular expressions (regex) in Linux, the “grep” command can also be used to look for more complex patterns. To use “grep” with regex, consider these examples:\
- []: Matches any character from a list.
- [ a-z ] with hyphen: Matches any one or more from the range of a to z of character.
- ^: Each line’s beginning must have the pattern that comes after it.
- ^ with [ ] : The pattern must not contain any character from the list specified.
- $: Each line’s end must contain the pattern that came before it.
- . (dot): Matches any single character.
- \ (backslash):ignores the special character that comes after backslash
- *: none or more occurrences of the previous character
- (dot).*: none or any numbers of characters.
Along with grep command ,Linux also have “egrep and fgrep” commands which are similar to grep command but with some additional features. egrep stands for extended regular expressions. With egrep you can use special characters like “?” or “+” to match complex search patterns. “fgrep” command is similar to grep -f command. It searches for fixed strings rather searching for regular expressions.Here special characters are treated as literals. For example : “fgrep ‘learn data engineering skills’ test_file.txt”. Here it will search for the set of all words ignoring the space. You dont have to escape the space.
Run shell script in background nohup
A shell script is a file which contains series of command to perform certain steps. It can contain commands,programming logic,loops,etc. Below are few ways to execute a shell script :
- Running the script directly from the current folder : “./script.sh”
- Running the script with sh command : “sh -x script.sh or sh script.sh” . Here -x will help us to print with debugging output enabled. It will print each and every command with variables and all. It helps in testing or debugging a script.
- Running the script with “bash” command : “bash script.sh” .It invokes the bash shell to execute the script
- Running the script with “source” command : “source script.sh” . This command is useful when you want to store some environment variables in a file and want to use it in another shell script.
These are the ways to execute a shell script. Now lets say if your shell script is huge and it might take an hour to finish the work. In the meantime , a network or power failure happens and your script will stopped in middle and you have to wait for re-execution of the shell script until power or network is back. To avoid this we can run the shell script in continuous nohup mode in background. It will not stop script even there is a network or power failure in local as the shell script is submitted to the server and running in background.
Command to run the script in background : nohup sh -x script.sh>script.log 2>&1 &
nohup will keep the script running continuously and & will submit script in background mode. Now to list all the jobs running in background use this command “jobs -l” . Lets say you need to stop the shell script which is still running in background.
Then use this command to identify the pid : “ps -ef | grep -i script.sh” . If the script is still running in background then it will display two rows. One with sh -x script.sh command and another grep -i script.sh command.
Now we have got the pid for the shell script running in background and due to some issue we need to stop this shell script. Then use “kill” command to kill the pid and “kill -9 <pid>” .
Link to Home Page : https://learndataengineeringskills.com/
Link to Python : https://learndataengineeringskills.com/python/