Python program to retrieve all links from Webpage

Python program to retrieve all links from a given Webpage

Write a python program to retrieve all links from a given Webpage and save them as a text file.

Video Tutorial

Installing Necessary Libraries to get all links from a given Webpage

Following libraries are required to Python program to retrieve all links from a given Webpage.

  1. BeautifulSoup4
  2. requests

Use the following commands to install the above libraries:

pip install beautifulsoup4==4.9.2

pip install requests==2.24.0

Source Code to retrieve all links from a given Webpage using BeautifulSoup

import requests as rq
from bs4 import BeautifulSoup

url = input("Enter website Link: ")

# Check whether link contatins https or http call rq.get(url) 
# else append url to https:// before call rq.get()
if ("https" or "http") in url:
    data = rq.get(url)
else:
    data = rq.get("https://" + url)

#Extract the html data using html.parser of BeautifulSoup
s = BeautifulSoup(data.text, "html.parser")

links = []
for link in s.find_all("a"):
    links.append(link.get("href"))

print ("All links of the given website:")
for link in links:
    print (link[:11])
    
# Writing the output links to a file (myLinks.txt)
with open("myLinks1.txt", 'w') as saved:
    print(links[:11], file=saved)

Explanation:

First, import the necessary libraries or modules. Next, ask the user to enter the website link.

See also  Variable Operators and Built-in Functions in Python

Once user enters the website link, Check whether link contatins https or http call rq.get(url) otherwise append url to https:// before call rq.get().

Extract the html data using html.parser of BeautifulSoup function.

Find all the tags with “a” and get the content of “href” and store into links list. Finally display the links on screen and store into a text file.

Output:

Enter website Link: https://www.vtupulse.com/
All links of the given website:

“https://www.vtupulse.com/”,
“https://www.vtupulse.com/category/cplusplus-programs/”,
“https://www.vtupulse.com/category/computer-graphics/”,
“https://www.vtupulse.com/python-programs/python-application-programming-tutorial/”,
“https://www.vtupulse.com/julia-tutorial/introduction-to-julia-julia-tutorial/”,
“https://www.vtupulse.com/cbcs-cse-notes/big-data-analytics-17cs82-vtu-cbcs-notes/”,
“https://www.vtupulse.com/cbcs-cse-notes/15cs73-machine-learning-vtu-notes/”,
“https://www.vtupulse.com/category/perl/”,

Summary:

This tutorial discusses how to write Python program to get all links from a given Webpage and save them as a txt file. If you like the tutorial share it with your friends. Like the Facebook page for regular updates and YouTube channel for video tutorials.

Leave a Comment

Your email address will not be published. Required fields are marked *