Count Frequency of Word by Removing Punctuation Character

 

Python Program to Count Frequency of each Word in a given file by Removing Punctuation Character.

Problem Definition

Develop a Python program to read through the lines of the file, break each line into a list of words, remove all punctuation characters, and then loop through each of the words in the line and count the frequency of each word using a dictionary.

Video Tutorial

Step by Step solution to in Problem

First, Import string library, which contains definitions for punctuation, maketrans, and translates functions.

Read filename form user and store into a variable say, fname. Check whether the file exists or not by opening the file in reading mode. If the file is not present then display the proper message to the user except block of the program.

Create an empty dictionary to store the frequency of each word in the given file.

Use for loop to read the contents of the file line by line and remove any extra trailing and leading whitespaces using strip() function.

Use maketrans() function of string to remove all the punctuation characters in the file. Then divide the line into words and store it into a words list.

Read one word from the words list and check whether the word is present in the dictionary using an operator. If the word is present in the dictionary increase the value of the word or add the word into the dictionary with a value of 1.

Finally print the dictionary, which contains the frequency of each word in the word file, where the key is word and value is

Contents of the sample input file for demonstrating the program say test1.txt

HIT Nidasoshi. HIT,

VTU, BGM

HSIT Nidasoshi@!

@VTU! BELAGAVI

Program Source code to Count Frequency of Word by Removing Punctuation Character

import string

fname = input('Enter the file name: ')
try:
    fhand = open(fname)
    counts = dict()
    for line in fhand:
        line = line.strip()
        line = line.translate(line.maketrans('', '', string.punctuation))
        words = line.split()
        for word in words:
            if word in counts:
                counts[word] += 1
            else:
                counts[word] = 1
    print(counts)
except:
    print('File cannot be opened:', fname)
    exit()

Output of Program

Enter the file name: test1.txt

{‘HIT’: 1, ‘NDS’: 2, ‘VTU’: 2, ‘BGM’: 1, ‘HSIT’: 1, ‘BELAGAVI’: 1}

Note: The output dictionary contains key as word and its frequency as value.

Summary:

This tutorial discusses how to develop a Python Program to Count the Frequency of each Word in a given file by Removing Punctuation characters.

Leave a Comment

Your email address will not be published. Required fields are marked *