7 Linux Uniq Command Examples to Remove Duplicate Lines from File

by Himanshu Arora on May 30, 2013

Uniq command is helpful to remove or detect duplicate entries in a file. This tutorial explains few most frequently used uniq command line options that you might find helpful.

The following test file is used in some of the example to understand how uniq command works.

$ cat test
aa
aa
bb
bb
bb
xx

1. Basic Usage

Syntax:

$ uniq [-options]

For example, when uniq command is run without any option, it removes duplicate lines and displays unique lines as shown below.

$ uniq test
aa
bb
xx

2. Count Number of Occurrences using -c option

This option is to count occurrence of lines in file.

$ uniq -c test
      2 aa
      3 bb
      1 xx

3. Print only Duplicate Lines using -d option

This option is to print only duplicate repeated lines in file. As you see below, this didn’t display the line “xx”, as it is not duplicate in the test file.

$ uniq -d test
aa
bb

The above example displayed all the duplicate lines, but only once. But, this -D option will print all duplicate lines in file. For example, line “aa” was there twice in the test file, so the following uniq command displayed the line “aa” twice in this output.

$ uniq -D test
aa
aa
bb
bb
bb

4. Print only Unique Lines using -u option

This option is to print only unique lines in file.

$ uniq -u test
xx

If you like to delete duplicate lines from a file using certain pattern, you can use sed delete command.

5. Limit Comparison to ‘N’ characters using -w option

This option restricts comparison to first specified ‘N’ characters only. For this example, use the following test2 input file.

$ cat test2
hi Linux
hi LinuxU
hi LinuxUnix
hi Unix

The following uniq command using option ‘w’ is compares the first 8 characters of lines in file, and then using ‘c’ option prints number of occurrences of lines of file.

$ uniq -c -w 8 testNew
  3 hi Linux
  1 hi Unix

The following uniq command using option ‘w’ is compares first 8 characters of lines in file, and then using ‘D’ option prints all duplicate lines of file.

$ uniq -D -w 8 testNew
hi Linux
hi LinuxU
hi LinuxUnix

6. Avoid Comparing first ‘N’ Characters using -s option

This option skips comparison of first specified ‘N’ characters. For this example, use the following test3 input file.

$ cat test3
aabb
xxbb
bbc
bbd

The following uniq command using option ‘s’ skips comparing first 2 characters of lines in file, and then using ‘D’ option prints all duplicate lines of file.

Here, starting 2 characters i.e. ‘aa’ in 1st line and ‘’xx’ in 2nd line would not be compared and then next 2 characters ‘bb’ in both lines are same so would be shown as duplicated lines.

$ uniq -D -s 2 test3
aabb
xxbb

7. Avoid Comparing first ‘N’ Fields using -f option

This option skips comparison of first specified ‘N’ fields of lines in file.

$ cat test2
hi hello Linux
hi friend Linux
hi hello LinuxUnix

The following uniq command using option ‘f’ skips comparing first 2 fields of lines in file, and then using ‘D’ option prints all duplicate lines of file.

Here, starting 2 fields i.e. ‘hi hello’ in 1st line and ‘hi friend’ in 2nd line would not be compared and then next field ‘Linux’ in both lines are same so would be shown as duplicated lines.

$ uniq -D -f 2 test2
hi hello Linux
hi friend Linux

Add your comment

If you enjoyed this article, you might also like..

Comments on this entry are closed.

Don Dailey May 30, 2013, 8:29 am

For these commands to work the original file must be properly sorted. People not familiar with this tool are going to be confused by that unless it is specifically stated. In your example file they are sorted, but there is explicit mention of that.

From the man page:

Note: ‘uniq’ does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use `sort -u’ without `uniq’. Also, comparisons honor the rules specified by `LC_COLLATE’.

∞
Bob May 30, 2013, 9:48 am

Good article.
One point you want to say is that it only works if the duplicate lines are next to one another.

So I always have to sort the file then pass it to uniq like this

cat filename | sort | uniq

Without the sort, a file that has this will not work as expected

aa
bb
aa
bb

∞
Júlio Hoffimann Mendes May 30, 2013, 10:02 am

Very interesting options.

Thanks.

∞
Jalal Hajigholamali May 30, 2013, 2:38 pm

Hi,

Thanks a lot,,,

∞
whizid May 30, 2013, 9:58 pm

thanks for this wonderful and powerful tip..

∞
Javier E. Pérez P. May 31, 2013, 2:40 pm

there is a way to avoid the use of “sort” if i had:

$ cat extensions.txt
jpg
png
jpg
gif

and only need to retrieve one ocurrence? (if it don’t duplicate, show it, and if repeated… only show once) actually i use

$ sort extensions.txt |uniq
gif
jpg
png

∞
Chris F.A. Johnson May 31, 2013, 7:56 pm

To remove non-adjacent duplicate lines without re-ordering the file, use awk:

awk ‘!x[$0]++’ “$file”

∞
lau June 1, 2013, 5:23 am

thanks..

∞
shweta June 8, 2013, 3:31 am

Really helpful………….. 🙂 thanx

∞
VIVEK July 30, 2013, 8:36 pm

Ur tutorials r quite useful fr beginners lyk me!!Thanks a lot!!

∞
Gayatri November 27, 2013, 5:10 am

Awesome ! I Iike your all articles .. Very Useful…. Thank You…………………….

∞
Avinash Raju October 17, 2014, 12:24 am

Very Helpful.Thank you:)

∞
sunil August 2, 2015, 11:58 pm

Command: echo -e “test\ntest\ntest\nanother test\ntest”
Output:
test
test
test
another test
test

Command :echo -e “test\ntest\ntest\nanother test\ntest”|uniq
Output:
test
another test
test

Using ‘uniq’ command also the test is being repeated twice why it is so? Reference: here.

∞
brown clark January 28, 2017, 5:07 am

Duplicate file are really disturbing. Just use “DuplicateFilesDeleter program”

∞

Next post: Buffer Overflow Attack Explained with a C Program Example

Previous post: How to Enable DELL BIOS Password for both Setup and System