Uniq command is helpful to remove or detect duplicate entries in a file. This tutorial explains few most frequently used uniq command line options that you might find helpful.
The following test file is used in some of the example to understand how uniq command works.
$ cat test aa aa bb bb bb xx
1. Basic Usage
Syntax:
$ uniq [-options]
For example, when uniq command is run without any option, it removes duplicate lines and displays unique lines as shown below.
$ uniq test aa bb xx
2. Count Number of Occurrences using -c option
This option is to count occurrence of lines in file.
$ uniq -c test 2 aa 3 bb 1 xx
3. Print only Duplicate Lines using -d option
This option is to print only duplicate repeated lines in file. As you see below, this didn’t display the line “xx”, as it is not duplicate in the test file.
$ uniq -d test aa bb
The above example displayed all the duplicate lines, but only once. But, this -D option will print all duplicate lines in file. For example, line “aa” was there twice in the test file, so the following uniq command displayed the line “aa” twice in this output.
$ uniq -D test aa aa bb bb bb
4. Print only Unique Lines using -u option
This option is to print only unique lines in file.
$ uniq -u test xx
If you like to delete duplicate lines from a file using certain pattern, you can use sed delete command.
5. Limit Comparison to ‘N’ characters using -w option
This option restricts comparison to first specified ‘N’ characters only. For this example, use the following test2 input file.
$ cat test2 hi Linux hi LinuxU hi LinuxUnix hi Unix
The following uniq command using option ‘w’ is compares the first 8 characters of lines in file, and then using ‘c’ option prints number of occurrences of lines of file.
$ uniq -c -w 8 testNew 3 hi Linux 1 hi Unix
The following uniq command using option ‘w’ is compares first 8 characters of lines in file, and then using ‘D’ option prints all duplicate lines of file.
$ uniq -D -w 8 testNew hi Linux hi LinuxU hi LinuxUnix
6. Avoid Comparing first ‘N’ Characters using -s option
This option skips comparison of first specified ‘N’ characters. For this example, use the following test3 input file.
$ cat test3 aabb xxbb bbc bbd
The following uniq command using option ‘s’ skips comparing first 2 characters of lines in file, and then using ‘D’ option prints all duplicate lines of file.
Here, starting 2 characters i.e. ‘aa’ in 1st line and ‘’xx’ in 2nd line would not be compared and then next 2 characters ‘bb’ in both lines are same so would be shown as duplicated lines.
$ uniq -D -s 2 test3 aabb xxbb
7. Avoid Comparing first ‘N’ Fields using -f option
This option skips comparison of first specified ‘N’ fields of lines in file.
$ cat test2 hi hello Linux hi friend Linux hi hello LinuxUnix
The following uniq command using option ‘f’ skips comparing first 2 fields of lines in file, and then using ‘D’ option prints all duplicate lines of file.
Here, starting 2 fields i.e. ‘hi hello’ in 1st line and ‘hi friend’ in 2nd line would not be compared and then next field ‘Linux’ in both lines are same so would be shown as duplicated lines.
$ uniq -D -f 2 test2 hi hello Linux hi friend Linux
Comments on this entry are closed.
For these commands to work the original file must be properly sorted. People not familiar with this tool are going to be confused by that unless it is specifically stated. In your example file they are sorted, but there is explicit mention of that.
From the man page:
Note: ‘uniq’ does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use `sort -u’ without `uniq’. Also, comparisons honor the rules specified by `LC_COLLATE’.
Good article.
One point you want to say is that it only works if the duplicate lines are next to one another.
So I always have to sort the file then pass it to uniq like this
cat filename | sort | uniq
Without the sort, a file that has this will not work as expected
aa
bb
aa
bb
Very interesting options.
Thanks.
Hi,
Thanks a lot,,,
thanks for this wonderful and powerful tip..
there is a way to avoid the use of “sort” if i had:
$ cat extensions.txt
jpg
png
jpg
gif
and only need to retrieve one ocurrence? (if it don’t duplicate, show it, and if repeated… only show once) actually i use
$ sort extensions.txt |uniq
gif
jpg
png
To remove non-adjacent duplicate lines without re-ordering the file, use awk:
awk ‘!x[$0]++’ “$file”
thanks..
Really helpful………….. 🙂 thanx
Ur tutorials r quite useful fr beginners lyk me!!Thanks a lot!!
Awesome ! I Iike your all articles .. Very Useful…. Thank You…………………….
Very Helpful.Thank you:)
Command: echo -e “test\ntest\ntest\nanother test\ntest”
Output:
test
test
test
another test
test
Command :echo -e “test\ntest\ntest\nanother test\ntest”|uniq
Output:
test
another test
test
Using ‘uniq’ command also the test is being repeated twice why it is so? Reference: here.
Duplicate file are really disturbing. Just use “DuplicateFilesDeleter program”