HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Download a list of songs from Youtube using Selenium

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
seleniumyoutubesongsusingdownloadlistfrom

Problem

Given a text file containing a list of links, I wrote some code to download them, one by one, from YouTube, using either YoutubeMP3 or Convert2mp3 logging any songs that failed to download.

The code with Youtube-mp3 is significantly faster than Convert2mp3 but less accurate (some songs fail for unknown reasons, even manually), due to the nature of the websites, but any tips on improving performance would be greatly appreciated.

Main.py

```
import sys
from tkinter.filedialog import askdirectory, askopenfilename

from selenium import webdriver
from selenium.common.exceptions import NoSuchWindowException, WebDriverException

from downloader import downloader
from downloader import logger

FIREFOX_DRIVER_PATH = "" # A path need to coded here.
CONVERTER2MP3 = "http://convert2mp3.net/en/index.php"
MP3_CONVERTER = "http://www.youtube-mp3.org/"
WEBSITE = {"1": CONVERTER2MP3, "2": MP3_CONVERTER}

def get_firefox_profile(download_directory):
profile = webdriver.FirefoxProfile()
profile.set_preference("dom.popup_maximum", 0)
profile.set_preference("browser.download.folderList", 2)
profile.set_preference("browser.download.dir", download_directory)
profile.set_preference("browser.download.panel.shown", False)
profile.set_preference("privacy.popups.showBrowserMessage", False)
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", ".mp3 audio/mpeg")
return profile

def get_user_choice():
input("press enter when ready to choose the list of songs")
songs_list = askopenfilename(initialdir='.')
if not songs_list:
raise ValueError("You did not choose the songs list file.")
if not songs_list.endswith("txt"):
raise ValueError("A text file must be chosen.")

input("press enter when ready to choose the download directory")
download_directory = askdirectory(initialdir='.')
if not download_directory:
raise ValueError("

Solution

Here are some first impressions and thoughts.

Code Organization

  • Move all selenium specific code to the "downloader". I think your "main" script should only be responsible for getting the input parameters and then invoke the downloader which "hides" the way it does the downloading internally. I would move the profile preparation and the webdriver initialization to the "downloader".



  • to continue on the topic of your "downloader". You should consider creating a "Downloader" class. One of the benefits of having a class is that you may keep your driver as an instance variable. And, you may also define the WebDriverWait instance once and reuse instead of re-instantiating it every time you need to wait.



Code Style

  • consider using multi-line strings for your JS scripts



  • add meaningful docstrings to your modules and functions



-
you can improve the get_firefox_profile() function by pre-defining a dictionary of preferences and setting them at once. Something like:

FIREFOX_PREFERENCES = {
    "dom.popup_maximum": 0,
    "browser.download.folderList": 2,
    "browser.download.panel.shown": False,
    "privacy.popups.showBrowserMessage": False,
    "browser.download.manager.showWhenStarting": False,
    "browser.helperApps.neverAsk.saveToDisk": ".mp3 audio/mpeg",
}

def get_firefox_profile(download_directory):
    profile = webdriver.FirefoxProfile()
    profile.default_preferences.update(FIREFOX_PREFERENCES)
    profile.set_preference("browser.download.dir", download_directory)

    return profile


Reliability

  • Fragile locators. Even though you are defining JS scripts as locators, the locators themselves are fragile - for instance, relying on the a element to be 12th or 9th in the DOM tree may be broken easily.



  • To continue on this topic, I would expect the "downloader" to use Selenium-specific location techniques instead of doing it in pure JavaScript. You could have used CSS selectors or XPath expressions based on meaningful element classes or other attributes. Though, I suspect that you did that intentionally to minimize the number of Selenium commands processed via JSON over HTTP. If this is the case, I think you sacrifice reliability. And, also make things more difficult to understand. For instance, it is not immediately obvious what PRESS_DOWNLOAD script does.



  • avoid using time.sleep() - it generally leads to waiting more than needed in 95% of cases and waiting less in 5%. Ideally, you should use Explicit Waits if it is at all possible in your situation.



Other Random Thoughts

  • Improve reading command-line arguments. You don't have a lot of CLI arguments, but you can still improve the way you do it - consider switching to argparse which will help to make this part of the code more readable and concise



  • If the project and the browser interaction logic will continue to grow, consider using a Page Object pattern



  • selenium brings you a lot of overhead, please see if you can either find an mp3 converter that has a public API, or see if you can use existing two without using a real browser

Code Snippets

FIREFOX_PREFERENCES = {
    "dom.popup_maximum": 0,
    "browser.download.folderList": 2,
    "browser.download.panel.shown": False,
    "privacy.popups.showBrowserMessage": False,
    "browser.download.manager.showWhenStarting": False,
    "browser.helperApps.neverAsk.saveToDisk": ".mp3 audio/mpeg",
}


def get_firefox_profile(download_directory):
    profile = webdriver.FirefoxProfile()
    profile.default_preferences.update(FIREFOX_PREFERENCES)
    profile.set_preference("browser.download.dir", download_directory)

    return profile

Context

StackExchange Code Review Q#160318, answer score: 2

Revisions (0)

No revisions yet.