patternpythonMinor
Download a list of songs from Youtube using Selenium
Viewed 0 times
seleniumyoutubesongsusingdownloadlistfrom
Problem
Given a text file containing a list of links, I wrote some code to download them, one by one, from YouTube, using either YoutubeMP3 or Convert2mp3 logging any songs that failed to download.
The code with Youtube-mp3 is significantly faster than Convert2mp3 but less accurate (some songs fail for unknown reasons, even manually), due to the nature of the websites, but any tips on improving performance would be greatly appreciated.
Main.py
```
import sys
from tkinter.filedialog import askdirectory, askopenfilename
from selenium import webdriver
from selenium.common.exceptions import NoSuchWindowException, WebDriverException
from downloader import downloader
from downloader import logger
FIREFOX_DRIVER_PATH = "" # A path need to coded here.
CONVERTER2MP3 = "http://convert2mp3.net/en/index.php"
MP3_CONVERTER = "http://www.youtube-mp3.org/"
WEBSITE = {"1": CONVERTER2MP3, "2": MP3_CONVERTER}
def get_firefox_profile(download_directory):
profile = webdriver.FirefoxProfile()
profile.set_preference("dom.popup_maximum", 0)
profile.set_preference("browser.download.folderList", 2)
profile.set_preference("browser.download.dir", download_directory)
profile.set_preference("browser.download.panel.shown", False)
profile.set_preference("privacy.popups.showBrowserMessage", False)
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", ".mp3 audio/mpeg")
return profile
def get_user_choice():
input("press enter when ready to choose the list of songs")
songs_list = askopenfilename(initialdir='.')
if not songs_list:
raise ValueError("You did not choose the songs list file.")
if not songs_list.endswith("txt"):
raise ValueError("A text file must be chosen.")
input("press enter when ready to choose the download directory")
download_directory = askdirectory(initialdir='.')
if not download_directory:
raise ValueError("
The code with Youtube-mp3 is significantly faster than Convert2mp3 but less accurate (some songs fail for unknown reasons, even manually), due to the nature of the websites, but any tips on improving performance would be greatly appreciated.
Main.py
```
import sys
from tkinter.filedialog import askdirectory, askopenfilename
from selenium import webdriver
from selenium.common.exceptions import NoSuchWindowException, WebDriverException
from downloader import downloader
from downloader import logger
FIREFOX_DRIVER_PATH = "" # A path need to coded here.
CONVERTER2MP3 = "http://convert2mp3.net/en/index.php"
MP3_CONVERTER = "http://www.youtube-mp3.org/"
WEBSITE = {"1": CONVERTER2MP3, "2": MP3_CONVERTER}
def get_firefox_profile(download_directory):
profile = webdriver.FirefoxProfile()
profile.set_preference("dom.popup_maximum", 0)
profile.set_preference("browser.download.folderList", 2)
profile.set_preference("browser.download.dir", download_directory)
profile.set_preference("browser.download.panel.shown", False)
profile.set_preference("privacy.popups.showBrowserMessage", False)
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", ".mp3 audio/mpeg")
return profile
def get_user_choice():
input("press enter when ready to choose the list of songs")
songs_list = askopenfilename(initialdir='.')
if not songs_list:
raise ValueError("You did not choose the songs list file.")
if not songs_list.endswith("txt"):
raise ValueError("A text file must be chosen.")
input("press enter when ready to choose the download directory")
download_directory = askdirectory(initialdir='.')
if not download_directory:
raise ValueError("
Solution
Here are some first impressions and thoughts.
Code Organization
Code Style
-
you can improve the
Reliability
Other Random Thoughts
Code Organization
- Move all
seleniumspecific code to the "downloader". I think your "main" script should only be responsible for getting the input parameters and then invoke the downloader which "hides" the way it does the downloading internally. I would move the profile preparation and the webdriver initialization to the "downloader".
- to continue on the topic of your "downloader". You should consider creating a "Downloader" class. One of the benefits of having a class is that you may keep your
driveras an instance variable. And, you may also define theWebDriverWaitinstance once and reuse instead of re-instantiating it every time you need to wait.
Code Style
- consider using multi-line strings for your JS scripts
- add meaningful docstrings to your modules and functions
-
you can improve the
get_firefox_profile() function by pre-defining a dictionary of preferences and setting them at once. Something like:FIREFOX_PREFERENCES = {
"dom.popup_maximum": 0,
"browser.download.folderList": 2,
"browser.download.panel.shown": False,
"privacy.popups.showBrowserMessage": False,
"browser.download.manager.showWhenStarting": False,
"browser.helperApps.neverAsk.saveToDisk": ".mp3 audio/mpeg",
}
def get_firefox_profile(download_directory):
profile = webdriver.FirefoxProfile()
profile.default_preferences.update(FIREFOX_PREFERENCES)
profile.set_preference("browser.download.dir", download_directory)
return profileReliability
- Fragile locators. Even though you are defining JS scripts as locators, the locators themselves are fragile - for instance, relying on the
aelement to be 12th or 9th in the DOM tree may be broken easily.
- To continue on this topic, I would expect the "downloader" to use Selenium-specific location techniques instead of doing it in pure JavaScript. You could have used CSS selectors or XPath expressions based on meaningful element classes or other attributes. Though, I suspect that you did that intentionally to minimize the number of Selenium commands processed via JSON over HTTP. If this is the case, I think you sacrifice reliability. And, also make things more difficult to understand. For instance, it is not immediately obvious what
PRESS_DOWNLOADscript does.
- avoid using
time.sleep()- it generally leads to waiting more than needed in 95% of cases and waiting less in 5%. Ideally, you should use Explicit Waits if it is at all possible in your situation.
Other Random Thoughts
- Improve reading command-line arguments. You don't have a lot of CLI arguments, but you can still improve the way you do it - consider switching to
argparsewhich will help to make this part of the code more readable and concise
- If the project and the browser interaction logic will continue to grow, consider using a Page Object pattern
seleniumbrings you a lot of overhead, please see if you can either find an mp3 converter that has a public API, or see if you can use existing two without using a real browser
Code Snippets
FIREFOX_PREFERENCES = {
"dom.popup_maximum": 0,
"browser.download.folderList": 2,
"browser.download.panel.shown": False,
"privacy.popups.showBrowserMessage": False,
"browser.download.manager.showWhenStarting": False,
"browser.helperApps.neverAsk.saveToDisk": ".mp3 audio/mpeg",
}
def get_firefox_profile(download_directory):
profile = webdriver.FirefoxProfile()
profile.default_preferences.update(FIREFOX_PREFERENCES)
profile.set_preference("browser.download.dir", download_directory)
return profileContext
StackExchange Code Review Q#160318, answer score: 2
Revisions (0)
No revisions yet.