This article is part of the on-going Unix Sed Tips and Tricks series.
Like any other programming language, sed also provides special branching commands to control the flow of the program.
In this article, let us review following two types of Sed branching.
- Sed Unconditional Branch
- Sed Conditional Branch
Sed Unconditional Branch Syntax:
$ sed ':label command(s) b label'
- :label – specification of label.
- commands – Any sed command(s)
- label – Any Name for the label
- b label – jumps to the label with out checking any conditions. If label is not specified, then jumps to the end of the script.
Sed Conditional Branch Syntax:
$ sed ':label command(s) t label'
- :label – specification of label.
- commands – Any sed command(s)
- label – Any Name for the label
- t label – jumps to the label only if the last substitute command modified the pattern space. If label is not specified, then jumps to the end of the script.
Create a sample test file
Let us first create thegeekstuff.txt file that will be used in the examples mentioned below.
$ cat thegeekstuff.txt Linux Administration Scripting Tips and Tricks Windows Administration Database Administration of Oracle Administration of Mysql Security Network Online\ Security Productivity Google Search\ Tips "Web Based Time Tracking, Web Based Todo list and Reduce Key Stores etc" $
I. Sed Examples for Unconditional Branch
Sed Example 1. Replace the first occurrence of a pattern in a whole file
In the file thegeekstuff.txt replace the first occurrence of “Administration” to “Supervision”.
$ sed '/Administration/{ s/Administration/Supervision/ :loop n b loop }' thegeekstuff.txt Linux Supervision Scripting Tips and Tricks Windows Administration Database Administration of Oracle Administration of Mysql Security Network Online\ Security Productivity Google Search\ Tips "Web Based Time Tracking, Web Based Todo list and Reduce Key Stores etc"
- In the above sed command, it just read line by line and prints the pattern space till Administration occurs.
- Once Administration occurs, substitute Administration to Supervision (only single occurrence, note that no ‘g’ flag in substitution).
- Once the first occurrence has been replaced, just read the remaining file content and print.
- “n” is a sed command which prints the pattern space and overwrite it with next line.
- Used “loop” as a label. “n” prints the current line and overwrite pattern space with the next line. b loop jumps to the :loop again. So this loop prints the remaining content of thegeekstuff.txt.
Sed Example 2. Remove the data between pattern ” ” in a whole file
In our example file there are three lines between “”.
sed -e ':loop $!{ N /\n$/!b loop } s/\"[^\"]*\"//g' thegeekstuff.txt Linux Administration Scripting Tips and Tricks Windows Administration Database Administration of Oracle Administration of Mysql Security Network Online\ Security Productivity Google Search\ Tips $
- Above command keep appends all the lines of a file till end of file occurs.
- $! – If its not a end of file.
- N – Append the next line with the pattern space delimited by \n
- /\n$/!b loop – If this is not the last line of the file jump to the loop again.
- Now all the lines will be available in pattern space delimited by newline. Substitute all the occurrence of data between ” with the empty.
Sed Example 3. Remove the HTML tags of a file
Let us say, I have a file with the following html content
$ cat index.html <html><body> <table border=2><tr><td valign=top align=right>1.</td> <td>Line 1 Column 2</ td> </table> </body></html>
The following sed command removes all the html tags from the given file
$ sed '/</{ :loop s/<[^<]*>//g /</{ N b loop } }' index.html 1. Line 1 Column 2
- Each time find a line contains ‘<‘, first remove all HTML tags of that line.
- If now the pattern space contains ‘<‘, this implies a multi-line tag. Now repeat the following loop:
- Join next line
- Remove all HTML tags until no single ‘<‘ exists
- When no ‘<‘ exists in the pattern space, we print it out and start a new cycle.
II. Sed Examples for Conditional Branch
Sed Example 4. If a line ends with a backslash append the next line to it.
Our example file has two lines ends with backslash, now we have to append its next line to it.
$ sed ' :loop /\\$/N s/\\\n */ / t loop' thegeekstuff.txt Linux Administration Scripting Tips and Tricks Windows Administration Database Administration of Oracle Administration of Mysql Security Network Online Security Productivity Google Search Tips "Web Based Time Tracking, Web Based Todo list and Reduce Key Stores etc"
- Check if the line ends with the backslash (/\\$/), if yes, read and append the next line to pattern space, and substitute the \ at the end of the line and number of spaces followed by that, with the single space.
- If the substitution is success repeat the above step. The branch will be executed only if substitution is success.
- Conditional branch mostly used for recursive patterns.
Sed Example 5. Commify a numeric strings.
sed ' :loop s/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/ t loop' 12342342342343434 12,342,342,342,343,434
- Group the digits into two groups.
- The first group is all the digits up to last three digits. The last three digits gets captures in the 2nd group.
- Then the two matching groups get separated by a comma. Then the same rules get applied to the line again and again until all the numbers have been grouped in groups of three.
- For example, in the first iteration it will be 12342342342343,434
- In the next iteration 12342342342,343,434 and goes on till there are less than three digits.
Sed Example 6. Formatting : Replace every leading space of a line with ‘+’
$ sed ' s/^ */&\n/ :loop s/^\n//;s/ \n/\n+/ t loop' test Linux ++++++++Administration ++++++++Scripting ++++++++++++++++Tips and Tricks Windows ++++++++Administration Database ++++++++Administration of Oracle ++++++++Administration of Mysql Security ++++++++Network +++++++++++++++++Online\ ++++++++Security Productivity ++++++++Google Search\ ++++++++Tips ++++++++"Web Based Time Tracking, ++++++++Web Based Todo list and ++++++++Reduce Key Stores etc"
- Seperate all the leading spaces and other characters of a line with a newline character.
- Now replace space and newline with newline and +. So from right to left space will be replaced by + and newline will be moved left for one character.
- At last in the beginning of the line \n will be there, so remove that new line.
Comments on this entry are closed.
This is a good article . I like it !
In Example 2,
4th line, ‘/\n$/!b loop’, the article said it checks if it comes to the last line of the file
but actuallty, it checks whether it is a blank line, because when a blank line appends to the pattern buffer, the pattern buffer only adds a ‘\n’ to seperate the orginal pattern buffer and the new blank line, then the buffer ends with a ‘\n’
it means that the script will not loop if it sees a blank line, so if the file is (for example):
abcd
“efg
blank line
hij”
the script will not remove the text between multiline double quotes
and because the script has checked if it comes to the last line in 2nd line of the script, I wonder can we use ‘b loop’ directly in the 4th line of the script
Thank you very much for sharing so great things with us, ^_^
Writing larger pieces in sed can quickly get awkward (no pun intended). In those cases I recommend Perl, also because it has PCRE which are a lot more logical than the REs of the old tools.
I also recommend perl.
For beginner, this seems to be difficult. Can you please suggest something for beginner to start with for learning sed/awk?
In my tests in example 2:
/n$/!
should be left out all together
if there are blanks between “Web Based Time Tracking, AND Reduce Key Stores etc”
it will skip contacting the next line with the previous:
and thus the s command regex will fail to match.
for example 2.
sed ‘s/\”[^\”]*\”//g’ thegeekstuff.txt
why this is not working.
Please answer