How to Find Duplicate Data in a Linux Text File With uniq

Rate this post
How to Find Duplicate Data in a Linux Text File With uniq

Have you ever come across text files containing duplicate lines and words? Perhaps you often deal with command output and wish to filter it for certain strings. In Linux, the uniq command is your best choice for text files and the removal of unnecessary material.

In this post, we will go over the uniq command in detail, as well as provide a step-by-step guidance on how to use it to eliminate duplicate lines from a text file.

What Is the uniq Command?

In Linux, the uniq command is used to show identical lines in a text file. If you wish to eliminate duplicate words or strings from a text file, use this command. Because the uniq command compares nearby lines to discover superfluous copies, it can only be used on sorted text files.

Fortunately, you may pipe the sort command with uniq to structure the text file in a fashion that the program accepts. In addition to showing repeated lines, the uniq command may count the number of duplicate lines in a text file.

How to Use the uniq Command

You may use uniq with a variety of settings and flags. Some are simple, such as displaying repeated lines, while others are for expert users who routinely interact with text files on Linux.

Basic Syntax

The basic syntax of the uniq command is:

uniq option input output

…where option is a flag used to execute certain methods of the command, input is the text file to be processed, and output is the path to the file where the output will be stored.

The output parameter is optional and may be left out entirely. If the user does not provide an input file, uniq uses data from the standard output. This enables a user to use uniq in conjunction with other Linux commands.

  How to Use Pixton to Let Children Create Their Very Own Comics

Example Text File

The text file duplicate.txt will be used as the command’s input.

127.0.0.1 TCP
127.0.0.1 UDP
Do catch this
DO CATCH THIS
Don't match this
Don't catch this
This is a text file.
This is a text file.
THIS IS A TEXT FILE.
Unique lines are really rare.

We’ve previously used the sort command to sort this text file. If you’re dealing with another text file, use the following command to sort it:

sort filename.txt > sorted.txt

Remove Duplicate Lines

The most basic use of uniq is to eliminate repeated strings from input and provide unique output.

uniq duplicate.txt

Output:

The system does not show the second occurrence of the line. This is a text document. Furthermore, the aforementioned command merely outputs the file’s unique lines and has no effect on the original text file’s content.

Count Repeated Lines

Use the -c argument with the default command to report the number of repeated lines in a text file.

uniq -c duplicate.txt

Output:

The system shows the total number of lines in the text file. As you can see, the sentence This is a text file appears twice in the file. The uniq command is case-sensitive by default.

Print Only Repeated Lines

Use the -D option to only output duplicate lines from a text file. The -D denotes Duplicate.

uniq -D duplicate.txt

The system will display output as follows.

This is a text file.
This is a text file.

Skip Fields While Checking for Duplicates

If you want to skip a certain number of fields while matching the strings, you can use the -f flag with the command. The -f stands for Field.

  How to Change Your Facebook Profile Picture

Consider the following text file fields.txt.

192.168.0.1 TCP
127.0.0.1 TCP
354.231.1.1 TCP
Linux FS
Windows FS
macOS FS

To skip the first field:

uniq -f 1 fields.txt

Output:

192.168.0.1 TCP
Linux FS

The aforementioned command skipped the first field (the IP addresses and OS names) and matched the second word (TCP and FS) (TCP and FS).Then, it displayed the first occurrence of each match as the output.

Ignore Characters When Comparing

Like skipping fields, you can skip characters as well. The -s flag allows you to specify the number of characters to skip while matching duplicate lines. This feature helps when the data you are working with is in the form of a list as follows:

1. First
2. Second
3. Second
4. Second
5. Third
6. Third
7. Fourth
8. Fifth

To ignore the first two characters (the list numberings) in the file list.txt:

uniq -s 2 list.txt

Output:

The first two characters in the output above were disregarded, and the remaining characters were matched for unique lines.

Check First N Number of Characters for Duplicates

The -w switch enables you to check for duplicates on a set amount of characters. As an example:

uniq -w 2 duplicate.txt

The preceding command will only match the first two characters and, if any, will output unique lines.

Output:

Remove Case Sensitivity

As previously stated, uniq is case sensitive when matching lines in a file. Use the -i argument with the command to disregard character case.

uniq -i duplicate.txt

You will see the following output.

Uniq did not show the lines in the output above. DO NOTE THAT THIS IS A TEXT FILE.

  How to Import and Export LUTs in DaVinci Resolve

Send Output to a File

You may use the Output Redirection (>) character to redirect the output of the uniq command to a file like follows:

uniq -i duplicate.txt > otherfile.txt

The system does not show the command’s output when transferring it to a text file. The cat command may be used to inspect the contents of the new file.

cat otherfile.txt

In Linux, you may also use alternative methods to save command line output to a file.

Analyzing Duplicate Data With uniq

You will spend the majority of your time administering Linux servers on the terminal or editing text files. Knowing how to eliminate unnecessary copies of lines in a text file might therefore be a valuable addition to your Linux skill set.

Working with text files might be difficult if you don’t know how to filter and sort text. Linux has a number of text editing functions, such as sed and awk, that enable you to work quickly with text files and command-line outputs.

You are looking for information, articles, knowledge about the topic How to Find Duplicate Data in a Linux Text File With uniq on internet, you do not find the information you need! Here are the best content compiled and compiled by the achindutemple.org team, along with other related topics such as: How.

Similar Posts