HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Image downloader for a website

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
websitedownloaderimagefor

Problem

This code takes a website and downloads all .jpg images in the webpage. It supports only websites that have the ` element and src` contains a .jpg link.

(Tested here)

import random
import urllib.request
import requests
from bs4 import BeautifulSoup

def Download_Image_from_Web(url):
    source_code = requests.get(url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text, "html.parser")
    raw_text = r'links.txt'
    with open(raw_text, 'w') as fw:
        for link in soup.findAll('img'):
            image_links = link.get('src')
            if '.jpg' in image_links:
                for i in image_links.split("\\n"):
                    fw.write(i + '\n')
    num_lines = sum(1 for line in open('links.txt'))
    if num_lines == 0:
        print("There is 0 photo in this web page.")
    elif num_lines == 1:
        print("There is", num_lines, "photo in this web page:")
    else:
        print("There are", num_lines, "photos in this web page:")
    k = 0
    while k <= (num_lines-1):
        name = random.randrange(1, 1000)
        fullName = str(name) + ".jpg"
        with open('links.txt', 'r') as f:
            lines = f.readlines()[k]
            urllib.request.urlretrieve(lines, fullName)
            print(lines+fullName+'\n')
        k += 1

Download_Image_from_Web("https://pixabay.com")

Solution

Unnecessary file operations

This is horribly inefficient:

k = 0
while k <= (num_lines-1):
    name = random.randrange(1, 1000)
    fullName = str(name) + ".jpg"
    with open('links.txt', 'r') as f:
        lines = f.readlines()[k]
        urllib.request.urlretrieve(lines, fullName)
        print(lines+fullName+'\n')
    k += 1


Re-reading the same file num_lines times, to download the k-th!

Btw, do you really need to write the list of urls to a file?
Why not just keep them in a list?
Even if you want the urls in a file,
you could keep them in a list in memory and never read that file, only write.

Code organization

Instead of having all the code in a single function that does multiple things,
it would be better to organize your program into smaller functions,
each with a single responsibility.

Python conventions

Python has a well-defined set of coding conventions in PEP8,
many of which are violated here.
I suggest to read through that document,
and follow as much as possible.

Code Snippets

k = 0
while k <= (num_lines-1):
    name = random.randrange(1, 1000)
    fullName = str(name) + ".jpg"
    with open('links.txt', 'r') as f:
        lines = f.readlines()[k]
        urllib.request.urlretrieve(lines, fullName)
        print(lines+fullName+'\n')
    k += 1

Context

StackExchange Code Review Q#162123, answer score: 4

Revisions (0)

No revisions yet.