Sweep: add new parameter to download playlist or specific book #4

makhalaf · 2023-08-08T07:36:32Z

add new parameter to download playlist or specific book

sweep-ai · 2023-08-08T07:36:36Z

Here's the PR! #5.

⚡ Sweep Free Trial: I used GPT-4 to create this ticket. You have 3 GPT-4 tickets left. For more GPT-4 tickets, visit our payment portal.To get Sweep to recreate this ticket, leave a comment prefixed with "sweep:" or edit the issue.

Install Sweep Configs: Pull Request

Step 1: 🔍 Code Search

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I looked at (click to expand). If some file is missing from here, you can mention the path in the ticket description.

safaribooks/safaribooks.py

Lines 766 to 1124 in 48ae994

	return dirname if not clean_space else dirname.replace(" ", "")

	def create_dirs(self):
	if os.path.isdir(self.BOOK_PATH):
	self.display.log("Book directory already exists: %s" % self.BOOK_PATH)

	else:
	os.makedirs(self.BOOK_PATH)

	oebps = os.path.join(self.BOOK_PATH, "OEBPS")
	if not os.path.isdir(oebps):
	self.display.book_ad_info = True
	os.makedirs(oebps)

	self.css_path = os.path.join(oebps, "Styles")
	if os.path.isdir(self.css_path):
	self.display.log("CSSs directory already exists: %s" % self.css_path)

	else:
	os.makedirs(self.css_path)
	self.display.css_ad_info.value = 1

	self.images_path = os.path.join(oebps, "Images")
	if os.path.isdir(self.images_path):
	self.display.log("Images directory already exists: %s" % self.images_path)

	else:
	os.makedirs(self.images_path)
	self.display.images_ad_info.value = 1

	def save_page_html(self, contents):
	self.filename = self.filename.replace(".html", ".xhtml")
	open(os.path.join(self.BOOK_PATH, "OEBPS", self.filename), "wb") \
	.write(self.BASE_HTML.format(contents[0], contents[1]).encode("utf-8", 'xmlcharrefreplace'))
	self.display.log("Created: %s" % self.filename)

	def get(self):
	len_books = len(self.book_chapters)

	for _ in range(len_books):
	if not len(self.chapters_queue):
	return

	first_page = len_books == len(self.chapters_queue)

	next_chapter = self.chapters_queue.pop(0)
	self.chapter_title = next_chapter["title"]
	self.filename = next_chapter["filename"]

	asset_base_url = next_chapter['asset_base_url']
	api_v2_detected = False
	if 'v2' in next_chapter['content']:
	asset_base_url = SAFARI_BASE_URL + "/api/v2/epubs/urn:orm:book:{}/files".format(self.book_id)
	api_v2_detected = True

	if "images" in next_chapter and len(next_chapter["images"]):
	for img_url in next_chapter['images']:
	if api_v2_detected:
	self.images.append(asset_base_url + '/' + img_url)
	else:
	self.images.append(urljoin(next_chapter['asset_base_url'], img_url))


	# Stylesheets
	self.chapter_stylesheets = []
	if "stylesheets" in next_chapter and len(next_chapter["stylesheets"]):
	self.chapter_stylesheets.extend(x["url"] for x in next_chapter["stylesheets"])

	if "site_styles" in next_chapter and len(next_chapter["site_styles"]):
	self.chapter_stylesheets.extend(next_chapter["site_styles"])

	if os.path.isfile(os.path.join(self.BOOK_PATH, "OEBPS", self.filename.replace(".html", ".xhtml"))):
	if not self.display.book_ad_info and \
	next_chapter not in self.book_chapters[:self.book_chapters.index(next_chapter)]:
	self.display.info(
	("File `%s` already exists.\n"
	" If you want to download again all the book,\n"
	" please delete the output directory '" + self.BOOK_PATH + "' and restart the program.")
	% self.filename.replace(".html", ".xhtml")
	)
	self.display.book_ad_info = 2

	else:
	self.save_page_html(self.parse_html(self.get_html(next_chapter["content"]), first_page))

	self.display.state(len_books, len_books - len(self.chapters_queue))

	def _thread_download_css(self, url):
	css_file = os.path.join(self.css_path, "Style{0:0>2}.css".format(self.css.index(url)))
	if os.path.isfile(css_file):
	if not self.display.css_ad_info.value and url not in self.css[:self.css.index(url)]:
	self.display.info(("File `%s` already exists.\n"
	" If you want to download again all the CSSs,\n"
	" please delete the output directory '" + self.BOOK_PATH + "'"
	" and restart the program.") %
	css_file)
	self.display.css_ad_info.value = 1

	else:
	response = self.requests_provider(url)
	if response == 0:
	self.display.error("Error trying to retrieve this CSS: %s\n From: %s" % (css_file, url))

	with open(css_file, 'wb') as s:
	s.write(response.content)

	self.css_done_queue.put(1)
	self.display.state(len(self.css), self.css_done_queue.qsize())


	def _thread_download_images(self, url):
	image_name = url.split("/")[-1]
	image_path = os.path.join(self.images_path, image_name)
	if os.path.isfile(image_path):
	if not self.display.images_ad_info.value and url not in self.images[:self.images.index(url)]:
	self.display.info(("File `%s` already exists.\n"
	" If you want to download again all the images,\n"
	" please delete the output directory '" + self.BOOK_PATH + "'"
	" and restart the program.") %
	image_name)
	self.display.images_ad_info.value = 1

	else:
	response = self.requests_provider(urljoin(SAFARI_BASE_URL, url), stream=True)
	if response == 0:
	self.display.error("Error trying to retrieve this image: %s\n From: %s" % (image_name, url))
	return

	with open(image_path, 'wb') as img:
	for chunk in response.iter_content(1024):
	img.write(chunk)

	self.images_done_queue.put(1)
	self.display.state(len(self.images), self.images_done_queue.qsize())

	def _start_multiprocessing(self, operation, full_queue):
	if len(full_queue) > 5:
	for i in range(0, len(full_queue), 5):
	self._start_multiprocessing(operation, full_queue[i:i + 5])

	else:
	process_queue = [Process(target=operation, args=(arg,)) for arg in full_queue]
	for proc in process_queue:
	proc.start()

	for proc in process_queue:
	proc.join()

	def collect_css(self):
	self.display.state_status.value = -1

	# "self._start_multiprocessing" seems to cause problem. Switching to mono-thread download.
	for css_url in self.css:
	self._thread_download_css(css_url)

	def collect_images(self):
	if self.display.book_ad_info == 2:
	self.display.info("Some of the book contents were already downloaded.\n"
	" If you want to be sure that all the images will be downloaded,\n"
	" please delete the output directory '" + self.BOOK_PATH +
	"' and restart the program.")

	self.display.state_status.value = -1

	# "self._start_multiprocessing" seems to cause problem. Switching to mono-thread download.
	for image_url in self.images:
	self._thread_download_images(image_url)

	def create_content_opf(self):
	self.css = next(os.walk(self.css_path))[2]
	self.images = next(os.walk(self.images_path))[2]

	manifest = []
	spine = []
	for c in self.book_chapters:
	c["filename"] = c["filename"].replace(".html", ".xhtml")
	item_id = escape("".join(c["filename"].split(".")[:-1]))
	manifest.append("<item id=\"{0}\" href=\"{1}\" media-type=\"application/xhtml+xml\" />".format(
	item_id, c["filename"]
	))
	spine.append("<itemref idref=\"{0}\"/>".format(item_id))

	for i in set(self.images):
	dot_split = i.split(".")
	head = "img_" + escape("".join(dot_split[:-1]))
	extension = dot_split[-1]
	manifest.append("<item id=\"{0}\" href=\"Images/{1}\" media-type=\"image/{2}\" />".format(
	head, i, "jpeg" if "jp" in extension else extension
	))

	for i in range(len(self.css)):
	manifest.append("<item id=\"style_{0:0>2}\" href=\"Styles/Style{0:0>2}.css\" "
	"media-type=\"text/css\" />".format(i))

	authors = "\n".join("<dc:creator opf:file-as=\"{0}\" opf:role=\"aut\">{0}</dc:creator>".format(
	escape(aut.get("name", "n/d"))
	) for aut in self.book_info.get("authors", []))

	subjects = "\n".join("<dc:subject>{0}</dc:subject>".format(escape(sub.get("name", "n/d")))
	for sub in self.book_info.get("subjects", []))

	return self.CONTENT_OPF.format(
	(self.book_info.get("isbn", self.book_id)),
	escape(self.book_title),
	authors,
	escape(self.book_info.get("description", "")),
	subjects,
	", ".join(escape(pub.get("name", "")) for pub in self.book_info.get("publishers", [])),
	escape(self.book_info.get("rights", "")),
	self.book_info.get("issued", ""),
	self.cover,
	"\n".join(manifest),
	"\n".join(spine),
	self.book_chapters[0]["filename"].replace(".html", ".xhtml")
	)

	@staticmethod
	def parse_toc(l, c=0, mx=0):
	r = ""
	for cc in l:
	c += 1
	if int(cc["depth"]) > mx:
	mx = int(cc["depth"])

	r += "<navPoint id=\"{0}\" playOrder=\"{1}\">" \
	"<navLabel><text>{2}</text></navLabel>" \
	"<content src=\"{3}\"/>".format(
	cc["fragment"] if len(cc["fragment"]) else cc["id"], c,
	escape(cc["label"]), cc["href"].replace(".html", ".xhtml").split("/")[-1]
	)

	if cc["children"]:
	sr, c, mx = SafariBooks.parse_toc(cc["children"], c, mx)
	r += sr

	r += "</navPoint>\n"

	return r, c, mx

	def create_toc(self):
	response = self.requests_provider(urljoin(self.api_url, "toc/"))
	if response == 0:
	self.display.exit("API: unable to retrieve book chapters. "
	"Don't delete any files, just run again this program"
	" in order to complete the `.epub` creation!")

	response = response.json()

	if not isinstance(response, list) and len(response.keys()) == 1:
	self.display.exit(
	self.display.api_error(response) +
	" Don't delete any files, just run again this program"
	" in order to complete the `.epub` creation!"
	)

	navmap, _, max_depth = self.parse_toc(response)
	return self.TOC_NCX.format(
	(self.book_info["isbn"] if self.book_info["isbn"] else self.book_id),
	max_depth,
	self.book_title,
	", ".join(aut.get("name", "") for aut in self.book_info.get("authors", [])),
	navmap
	)

	def create_epub(self):
	open(os.path.join(self.BOOK_PATH, "mimetype"), "w").write("application/epub+zip")
	meta_info = os.path.join(self.BOOK_PATH, "META-INF")
	if os.path.isdir(meta_info):
	self.display.log("META-INF directory already exists: %s" % meta_info)

	else:
	os.makedirs(meta_info)

	open(os.path.join(meta_info, "container.xml"), "wb").write(
	self.CONTAINER_XML.encode("utf-8", "xmlcharrefreplace")
	)
	open(os.path.join(self.BOOK_PATH, "OEBPS", "content.opf"), "wb").write(
	self.create_content_opf().encode("utf-8", "xmlcharrefreplace")
	)
	open(os.path.join(self.BOOK_PATH, "OEBPS", "toc.ncx"), "wb").write(
	self.create_toc().encode("utf-8", "xmlcharrefreplace")
	)

	zip_file = os.path.join(PATH, "Books", self.book_id)
	if os.path.isfile(zip_file + ".zip"):
	os.remove(zip_file + ".zip")

	shutil.make_archive(zip_file, 'zip', self.BOOK_PATH)
	os.rename(zip_file + ".zip", os.path.join(self.BOOK_PATH, self.book_id) + ".epub")


	# MAIN
	if __name__ == "__main__":
	arguments = argparse.ArgumentParser(prog="safaribooks.py",
	description="Download and generate an EPUB of your favorite books"
	" from Safari Books Online.",
	add_help=False,
	allow_abbrev=False)

	login_arg_group = arguments.add_mutually_exclusive_group()
	login_arg_group.add_argument(
	"--cred", metavar="<EMAIL:PASS>", default=False,
	help="Credentials used to perform the auth login on Safari Books Online."
	" Es. ` --cred \"[email protected]:password01\" `."
	)
	login_arg_group.add_argument(
	"--login", action='store_true',
	help="Prompt for credentials used to perform the auth login on Safari Books Online."
	)

	arguments.add_argument(
	"--no-cookies", dest="no_cookies", action='store_true',
	help="Prevent your session data to be saved into `cookies.json` file."
	)
	arguments.add_argument(
	"--kindle", dest="kindle", action='store_true',
	help="Add some CSS rules that block overflow on `table` and `pre` elements."
	" Use this option if you're going to export the EPUB to E-Readers like Amazon Kindle."
	)
	arguments.add_argument(
	"--preserve-log", dest="log", action='store_true', help="Leave the `info_XXXXXXXXXXXXX.log`"
	" file even if there isn't any error."
	)
	arguments.add_argument("--help", action="help", default=argparse.SUPPRESS, help='Show this help message.')
	arguments.add_argument(
	"bookid", metavar='<BOOK ID>',
	help="Book digits ID that you want to download. You can find it in the URL (X-es):"
	" `" + SAFARI_BASE_URL + "/library/view/book-name/XXXXXXXXXXXXX/`"
	)

	args_parsed = arguments.parse_args()
	if args_parsed.cred or args_parsed.login:
	user_email = ""
	pre_cred = ""

	if args_parsed.cred:
	pre_cred = args_parsed.cred

	else:
	user_email = input("Email: ")
	passwd = getpass.getpass("Password: ")
	pre_cred = user_email + ":" + passwd

	parsed_cred = SafariBooks.parse_cred(pre_cred)

	if not parsed_cred:
	arguments.error("invalid credential: %s" % (
	args_parsed.cred if args_parsed.cred else (user_email + ":*******")
	))

	args_parsed.cred = parsed_cred

	else:
	if args_parsed.no_cookies:
	arguments.error("invalid option: `--no-cookies` is valid only if you use the `--cred` option")

	SafariBooks(args_parsed)
	# Hint: do you want to download more then one book once, initialized more than one instance of `SafariBooks`...
	sys.exit(0)

safaribooks/safaribooks.py

Lines 770 to 910 in 48ae994

 self.display.log("Book directory already exists: %s" % self.BOOK_PATH) 

 else: 

 os.makedirs(self.BOOK_PATH) 

 oebps = os.path.join(self.BOOK_PATH, "OEBPS") 

 if not os.path.isdir(oebps): 

 self.display.book_ad_info = True 

 os.makedirs(oebps) 

 self.css_path = os.path.join(oebps, "Styles") 

 if os.path.isdir(self.css_path): 

 self.display.log("CSSs directory already exists: %s" % self.css_path) 

 else: 

 os.makedirs(self.css_path) 

 self.display.css_ad_info.value = 1 

 self.images_path = os.path.join(oebps, "Images") 

 if os.path.isdir(self.images_path): 

 self.display.log("Images directory already exists: %s" % self.images_path) 

 else: 

 os.makedirs(self.images_path) 

 self.display.images_ad_info.value = 1 

 def save_page_html(self, contents): 

 self.filename = self.filename.replace(".html", ".xhtml") 

 open(os.path.join(self.BOOK_PATH, "OEBPS", self.filename), "wb") \ 

 .write(self.BASE_HTML.format(contents[0], contents[1]).encode("utf-8", 'xmlcharrefreplace')) 

 self.display.log("Created: %s" % self.filename) 

 def get(self): 

 len_books = len(self.book_chapters) 

 for _ in range(len_books): 

 if not len(self.chapters_queue): 

 return 

 first_page = len_books == len(self.chapters_queue) 

 next_chapter = self.chapters_queue.pop(0) 

 self.chapter_title = next_chapter["title"] 

 self.filename = next_chapter["filename"] 

 asset_base_url = next_chapter['asset_base_url'] 

 api_v2_detected = False 

 if 'v2' in next_chapter['content']: 

 asset_base_url = SAFARI_BASE_URL + "/api/v2/epubs/urn:orm:book:{}/files".format(self.book_id) 

 api_v2_detected = True 

 if "images" in next_chapter and len(next_chapter["images"]): 

 for img_url in next_chapter['images']: 

 if api_v2_detected: 

 self.images.append(asset_base_url + '/' + img_url) 

 else: 

 self.images.append(urljoin(next_chapter['asset_base_url'], img_url)) 

 # Stylesheets 

 self.chapter_stylesheets = [] 

 if "stylesheets" in next_chapter and len(next_chapter["stylesheets"]): 

 self.chapter_stylesheets.extend(x["url"] for x in next_chapter["stylesheets"]) 

 if "site_styles" in next_chapter and len(next_chapter["site_styles"]): 

 self.chapter_stylesheets.extend(next_chapter["site_styles"]) 

 if os.path.isfile(os.path.join(self.BOOK_PATH, "OEBPS", self.filename.replace(".html", ".xhtml"))): 

 if not self.display.book_ad_info and \ 

 next_chapter not in self.book_chapters[:self.book_chapters.index(next_chapter)]: 

 self.display.info( 

 ("File `%s` already exists.\n" 

 " If you want to download again all the book,\n" 

 " please delete the output directory '" + self.BOOK_PATH + "' and restart the program.") 

 % self.filename.replace(".html", ".xhtml") 

 ) 

 self.display.book_ad_info = 2 

 else: 

 self.save_page_html(self.parse_html(self.get_html(next_chapter["content"]), first_page)) 

 self.display.state(len_books, len_books - len(self.chapters_queue)) 

 def _thread_download_css(self, url): 

 css_file = os.path.join(self.css_path, "Style{0:0>2}.css".format(self.css.index(url))) 

 if os.path.isfile(css_file): 

 if not self.display.css_ad_info.value and url not in self.css[:self.css.index(url)]: 

 self.display.info(("File `%s` already exists.\n" 

 " If you want to download again all the CSSs,\n" 

 " please delete the output directory '" + self.BOOK_PATH + "'" 

 " and restart the program.") % 

 css_file) 

 self.display.css_ad_info.value = 1 

 else: 

 response = self.requests_provider(url) 

 if response == 0: 

 self.display.error("Error trying to retrieve this CSS: %s\n From: %s" % (css_file, url)) 

 with open(css_file, 'wb') as s: 

 s.write(response.content) 

 self.css_done_queue.put(1) 

 self.display.state(len(self.css), self.css_done_queue.qsize()) 

 def _thread_download_images(self, url): 

 image_name = url.split("/")[-1] 

 image_path = os.path.join(self.images_path, image_name) 

 if os.path.isfile(image_path): 

 if not self.display.images_ad_info.value and url not in self.images[:self.images.index(url)]: 

 self.display.info(("File `%s` already exists.\n" 

 " If you want to download again all the images,\n" 

 " please delete the output directory '" + self.BOOK_PATH + "'" 

 " and restart the program.") % 

 image_name) 

 self.display.images_ad_info.value = 1 

 else: 

 response = self.requests_provider(urljoin(SAFARI_BASE_URL, url), stream=True) 

 if response == 0: 

 self.display.error("Error trying to retrieve this image: %s\n From: %s" % (image_name, url)) 

 return 

 with open(image_path, 'wb') as img: 

 for chunk in response.iter_content(1024): 

 img.write(chunk) 

 self.images_done_queue.put(1) 

 self.display.state(len(self.images), self.images_done_queue.qsize()) 

 def _start_multiprocessing(self, operation, full_queue): 

 if len(full_queue) > 5: 

 for i in range(0, len(full_queue), 5): 

 self._start_multiprocessing(operation, full_queue[i:i + 5]) 

 else: 

 process_queue = [Process(target=operation, args=(arg,)) for arg in full_queue] 

 for proc in process_queue: 

 proc.start()

safaribooks/safaribooks.py

Lines 191 to 448 in 48ae994

 self.SH_DEFAULT + ("%4s" % progress) + "%" + ("\n" if progress == 100 else "") 

 ) 

 def done(self, epub_file): 

 self.info("Done: %s\n\n" % epub_file + 

 " If you like it, please * this project on GitHub to make it known:\n" 

 " https://github.com/lorenzodifuccia/safaribooks\n" 

 " e don't forget to renew your Safari Books Online subscription:\n" 

 " " + SAFARI_BASE_URL + "\n\n" + 

 self.SH_BG_RED + "[!]" + self.SH_DEFAULT + " Bye!!") 

 @staticmethod 

 def api_error(response): 

 message = "API: " 

 if "detail" in response and "Not found" in response["detail"]: 

 message += "book's not present in Safari Books Online.\n" \ 

 " The book identifier is the digits that you can find in the URL:\n" \ 

 " `" + SAFARI_BASE_URL + "/library/view/book-name/XXXXXXXXXXXXX/`" 

 else: 

 os.remove(COOKIES_FILE) 

 message += "Out-of-Session%s.\n" % (" (%s)" % response["detail"]) if "detail" in response else "" + \ 

 Display.SH_YELLOW + "[+]" + Display.SH_DEFAULT + \ 

 " Use the `--cred` or `--login` options in order to perform the auth login to Safari." 

 return message 

 class WinQueue(list): # TODO: error while use `process` in Windows: can't pickle _thread.RLock objects 

 def put(self, el): 

 self.append(el) 

 def qsize(self): 

 return self.__len__() 

 class SafariBooks: 

 LOGIN_URL = ORLY_BASE_URL + "/member/auth/login/" 

 LOGIN_ENTRY_URL = SAFARI_BASE_URL + "/login/unified/?next=/home/" 

 API_TEMPLATE = SAFARI_BASE_URL + "/api/v1/book/{0}/" 

 BASE_01_HTML = "<!DOCTYPE html>\n" \ 

 "<html lang=\"en\" xml:lang=\"en\" xmlns=\"https://www.w3.org/1999/xhtml\"" \ 

 " xmlns:xsi=\"https://www.w3.org/2001/XMLSchema-instance\"" \ 

 " xsi:schemaLocation=\"https://www.w3.org/2002/06/xhtml2/" \ 

 " https://www.w3.org/MarkUp/SCHEMA/xhtml2.xsd\"" \ 

 " xmlns:epub=\"https://www.idpf.org/2007/ops\">\n" \ 

 "<head>\n" \ 

 "{0}\n" \ 

 "<style type=\"text/css\">" \ 

 "body{{margin:1em;background-color:transparent!important;}}" \ 

 "#sbo-rt-content *{{text-indent:0pt!important;}}#sbo-rt-content .bq{{margin-right:1em!important;}}" 

 KINDLE_HTML = "#sbo-rt-content *{{word-wrap:break-word!important;" \ 

 "word-break:break-word!important;}}#sbo-rt-content table,#sbo-rt-content pre" \ 

 "{{overflow-x:unset!important;overflow:unset!important;" \ 

 "overflow-y:unset!important;white-space:pre-wrap!important;}}" 

 BASE_02_HTML = "</style>" \ 

 "</head>\n" \ 

 "<body>{1}</body>\n</html>" 

 CONTAINER_XML = "<?xml version=\"1.0\"?>" \ 

 "<container version=\"1.0\" xmlns=\"urn:oasis:names:tc:opendocument:xmlns:container\">" \ 

 "<rootfiles>" \ 

 "<rootfile full-path=\"OEBPS/content.opf\" media-type=\"application/oebps-package+xml\" />" \ 

 "</rootfiles>" \ 

 "</container>" 

 # Format: ID, Title, Authors, Description, Subjects, Publisher, Rights, Date, CoverId, MANIFEST, SPINE, CoverUrl 

 CONTENT_OPF = "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n" \ 

 "<package xmlns=\"https://www.idpf.org/2007/opf\" unique-identifier=\"bookid\" version=\"2.0\" >\n" \ 

 "<metadata xmlns:dc=\"https://purl.org/dc/elements/1.1/\" " \ 

 " xmlns:opf=\"https://www.idpf.org/2007/opf\">\n" \ 

 "<dc:title>{1}</dc:title>\n" \ 

 "{2}\n" \ 

 "<dc:description>{3}</dc:description>\n" \ 

 "{4}" \ 

 "<dc:publisher>{5}</dc:publisher>\n" \ 

 "<dc:rights>{6}</dc:rights>\n" \ 

 "<dc:language>en-US</dc:language>\n" \ 

 "<dc:date>{7}</dc:date>\n" \ 

 "<dc:identifier id=\"bookid\">{0}</dc:identifier>\n" \ 

 "<meta name=\"cover\" content=\"{8}\"/>\n" \ 

 "</metadata>\n" \ 

 "<manifest>\n" \ 

 "<item id=\"ncx\" href=\"toc.ncx\" media-type=\"application/x-dtbncx+xml\" />\n" \ 

 "{9}\n" \ 

 "</manifest>\n" \ 

 "<spine toc=\"ncx\">\n{10}</spine>\n" \ 

 "<guide><reference href=\"{11}\" title=\"Cover\" type=\"cover\" /></guide>\n" \ 

 "</package>" 

 # Format: ID, Depth, Title, Author, NAVMAP 

 TOC_NCX = "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\" ?>\n" \ 

 "<!DOCTYPE ncx PUBLIC \"-//NISO//DTD ncx 2005-1//EN\"" \ 

 " \"https://www.daisy.org/z3986/2005/ncx-2005-1.dtd\">\n" \ 

 "<ncx xmlns=\"https://www.daisy.org/z3986/2005/ncx/\" version=\"2005-1\">\n" \ 

 "<head>\n" \ 

 "<meta content=\"ID:ISBN:{0}\" name=\"dtb:uid\"/>\n" \ 

 "<meta content=\"{1}\" name=\"dtb:depth\"/>\n" \ 

 "<meta content=\"0\" name=\"dtb:totalPageCount\"/>\n" \ 

 "<meta content=\"0\" name=\"dtb:maxPageNumber\"/>\n" \ 

 "</head>\n" \ 

 "<docTitle><text>{2}</text></docTitle>\n" \ 

 "<docAuthor><text>{3}</text></docAuthor>\n" \ 

 "<navMap>{4}</navMap>\n" \ 

 "</ncx>" 

 HEADERS = { 

 "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", 

 "Accept-Encoding": "gzip, deflate", 

 "Referer": LOGIN_ENTRY_URL, 

 "Upgrade-Insecure-Requests": "1", 

 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) " 

 "Chrome/90.0.4430.212 Safari/537.36" 

 } 

 COOKIE_FLOAT_MAX_AGE_PATTERN = re.compile(r'(max-age=\d*\.\d*)', re.IGNORECASE) 

 def __init__(self, args): 

 self.args = args 

 self.display = Display("info_%s.log" % escape(args.bookid)) 

 self.display.intro() 

 self.session = requests.Session() 

 if USE_PROXY: # DEBUG 

 self.session.proxies = PROXIES 

 self.session.verify = False 

 self.session.headers.update(self.HEADERS) 

 self.jwt = {} 

 if not args.cred: 

 if not os.path.isfile(COOKIES_FILE): 

 self.display.exit("Login: unable to find `cookies.json` file.\n" 

 " Please use the `--cred` or `--login` options to perform the login.") 

 self.session.cookies.update(json.load(open(COOKIES_FILE))) 

 else: 

 self.display.info("Logging into Safari Books Online...", state=True) 

 self.do_login(*args.cred) 

 if not args.no_cookies: 

 json.dump(self.session.cookies.get_dict(), open(COOKIES_FILE, 'w')) 

 self.check_login() 

 self.book_id = args.bookid 

 self.api_url = self.API_TEMPLATE.format(self.book_id) 

 self.display.info("Retrieving book info...") 

 self.book_info = self.get_book_info() 

 self.display.book_info(self.book_info) 

 self.display.info("Retrieving book chapters...") 

 self.book_chapters = self.get_book_chapters() 

 self.chapters_queue = self.book_chapters[:] 

 if len(self.book_chapters) > sys.getrecursionlimit(): 

 sys.setrecursionlimit(len(self.book_chapters)) 

 self.book_title = self.book_info["title"] 

 self.base_url = self.book_info["web_url"] 

 self.clean_book_title = "".join(self.escape_dirname(self.book_title).split(",")[:2]) \ 

 + " ({0})".format(self.book_id) 

 books_dir = os.path.join(PATH, "Books") 

 if not os.path.isdir(books_dir): 

 os.mkdir(books_dir) 

 self.BOOK_PATH = os.path.join(books_dir, self.clean_book_title) 

 self.display.set_output_dir(self.BOOK_PATH) 

 self.css_path = "" 

 self.images_path = "" 

 self.create_dirs() 

 self.chapter_title = "" 

 self.filename = "" 

 self.chapter_stylesheets = [] 

 self.css = [] 

 self.images = [] 

 self.display.info("Downloading book contents... (%s chapters)" % len(self.book_chapters), state=True) 

 self.BASE_HTML = self.BASE_01_HTML + (self.KINDLE_HTML if not args.kindle else "") + self.BASE_02_HTML 

 self.cover = False 

 self.get() 

 if not self.cover: 

 self.cover = self.get_default_cover() if "cover" in self.book_info else False 

 cover_html = self.parse_html( 

 html.fromstring("<div id=\"sbo-rt-content\"><img src=\"Images/{0}\"></div>".format(self.cover)), True 

 ) 

 self.book_chapters = [{ 

 "filename": "default_cover.xhtml", 

 "title": "Cover" 

 }] + self.book_chapters 

 self.filename = self.book_chapters[0]["filename"] 

 self.save_page_html(cover_html) 

 self.css_done_queue = Queue(0) if "win" not in sys.platform else WinQueue() 

 self.display.info("Downloading book CSSs... (%s files)" % len(self.css), state=True) 

 self.collect_css() 

 self.images_done_queue = Queue(0) if "win" not in sys.platform else WinQueue() 

 self.display.info("Downloading book images... (%s files)" % len(self.images), state=True) 

 self.collect_images() 

 self.display.info("Creating EPUB file...", state=True) 

 self.create_epub() 

 if not args.no_cookies: 

 json.dump(self.session.cookies.get_dict(), open(COOKIES_FILE, "w")) 

 self.display.done(os.path.join(self.BOOK_PATH, self.book_id + ".epub")) 

 self.display.unregister() 

 if not self.display.in_error and not args.log: 

 os.remove(self.display.log_file) 

 def handle_cookie_update(self, set_cookie_headers): 

 for morsel in set_cookie_headers: 

 # Handle Float 'max-age' Cookie 

 if self.COOKIE_FLOAT_MAX_AGE_PATTERN.search(morsel): 

 cookie_key, cookie_value = morsel.split(";")[0].split("=") 

 self.session.cookies.set(cookie_key, cookie_value) 

 def requests_provider(self, url, is_post=False, data=None, perform_redirect=True, **kwargs): 

 try: 

 response = getattr(self.session, "post" if is_post else "get")( 

 url, 

 data=data, 

 allow_redirects=False, 

 **kwargs 

 ) 

 self.handle_cookie_update(response.raw.headers.getlist("Set-Cookie")) 

 self.display.last_request = ( 

 url, data, kwargs, response.status_code, "\n".join( 

 ["\t{}: {}".format(*h) for h in response.headers.items()] 

 ), response.text 

 ) 

 except (requests.ConnectionError, requests.ConnectTimeout, requests.RequestException) as request_exception: 

 self.display.error(str(request_exception)) 

 return 0 

 if response.is_redirect and perform_redirect: 

 return self.requests_provider(response.next.url, is_post, None, perform_redirect) 

 # TODO How about **kwargs? 

 return response

safaribooks/README.md

Lines 1 to 176 in 48ae994

	# SafariBooks
	Download and generate EPUB of your favorite books from [Safari Books Online](https://www.safaribooksonline.com) library.
	I'm not responsible for the use of this program, this is only for personal and educational purpose.
	Before any usage please read the O'Reilly's [Terms of Service](https://learning.oreilly.com/terms/).

	> ## ⚠ Attention needed ⚠
	> If you are a developer and want to help this project, please take a look to the current [Milestone](https://github.com/lorenzodifuccia/safaribooks/milestone/1).
	> Checkout also the new APIv2 branch: [apiv2](https://github.com/lorenzodifuccia/safaribooks/tree/apiv2)
	> The Community thanks 🙏🏻

	> ## ✨ ADV ✨
	> Take a look at my other GitHub projects: https://github.com/lorenzodifuccia 👀 ❤️

	## Overview:
	* [Requirements & Setup](#requirements--setup)
	* [Usage](#usage)
	* [Single Sign-On (SSO), Company, University Login](https://github.com/lorenzodifuccia/safaribooks/issues/150#issuecomment-555423085)
	* [Calibre EPUB conversion](https://github.com/lorenzodifuccia/safaribooks#calibre-epub-conversion)
	* [Example: Download Test-Driven Development with Python, 2nd Edition](#download-test-driven-development-with-python-2nd-edition)
	* [Example: Use or not the `--kindle` option](#use-or-not-the---kindle-option)

	## Requirements & Setup:
	First of all, it requires `python3` and `pip3` or `pipenv` to be installed.
	```shell
	$ git clone https://github.com/lorenzodifuccia/safaribooks.git
	Cloning into 'safaribooks'...

	$ cd safaribooks/
	$ pip3 install -r requirements.txt

	OR

	$ pipenv install && pipenv shell
	```

	The program depends of only two Python _3_ modules:
	```python3
	lxml>=4.1.1
	requests>=2.20.0
	```

	## Usage:
	It's really simple to use, just choose a book from the library and replace in the following command:
	* X-es with its ID,
	* `email:password` with your own.

	```shell
	$ python3 safaribooks.py --cred "[email protected]:password01" XXXXXXXXXXXXX
	```

	The ID is the digits that you find in the URL of the book description page:
	`https://www.safaribooksonline.com/library/view/book-name/XXXXXXXXXXXXX/`
	Like: `https://www.safaribooksonline.com/library/view/test-driven-development-with/9781491958698/`

	#### Program options:
	```shell
	$ python3 safaribooks.py --help
	usage: safaribooks.py [--cred <EMAIL:PASS> \| --login] [--no-cookies]
	[--kindle] [--preserve-log] [--help]
	<BOOK ID>

	Download and generate an EPUB of your favorite books from Safari Books Online.

	positional arguments:
	<BOOK ID> Book digits ID that you want to download. You can find
	it in the URL (X-es):
	`https://learning.oreilly.com/library/view/book-
	name/XXXXXXXXXXXXX/`

	optional arguments:
	--cred <EMAIL:PASS> Credentials used to perform the auth login on Safari
	Books Online. Es. ` --cred
	"[email protected]:password01" `.
	--login Prompt for credentials used to perform the auth login
	on Safari Books Online.
	--no-cookies Prevent your session data to be saved into
	`cookies.json` file.
	--kindle Add some CSS rules that block overflow on `table` and
	`pre` elements. Use this option if you're going to
	export the EPUB to E-Readers like Amazon Kindle.
	--preserve-log Leave the `info_XXXXXXXXXXXXX.log` file even if there
	isn't any error.
	--help Show this help message.
	```

	The first time you use the program, you'll have to specify your Safari Books Online account credentials (look [`here`](/../../issues/15) for special character).
	The next times you'll download a book, before session expires, you can omit the credential, because the program save your session cookies in a file called `cookies.json`.
	For SSO, please use the `sso_cookies.py` program in order to create the `cookies.json` file from the SSO cookies retrieved by your browser session (please follow [`these steps`](/../../issues/150#issuecomment-555423085)).

	Pay attention if you use a shared PC, because everyone that has access to your files can steal your session.
	If you don't want to cache the cookies, just use the `--no-cookies` option and provide all time your credential through the `--cred` option or the more safe `--login` one: this will prompt you for credential during the script execution.

	You can configure proxies by setting on your system the environment variable `HTTPS_PROXY` or using the `USE_PROXY` directive into the script.

	#### Calibre EPUB conversion
	Important: since the script only download HTML pages and create a raw EPUB, many of the CSS and XML/HTML directives are wrong for an E-Reader. To ensure best quality of the output, I suggest you to always convert the `EPUB` obtained by the script to standard-`EPUB` with [Calibre](https://calibre-ebook.com/).
	You can also use the command-line version of Calibre with `ebook-convert`, e.g.:
	```bash
	$ ebook-convert "XXXX/safaribooks/Books/Test-Driven Development with Python 2nd Edition (9781491958698)/9781491958698.epub" "XXXX/safaribooks/Books/Test-Driven Development with Python 2nd Edition (9781491958698)/9781491958698_CLEAR.epub"
	```
	After the execution, you can read the `9781491958698_CLEAR.epub` in every E-Reader and delete all other files.

	The program offers also an option to ensure best compatibilities for who wants to export the `EPUB` to E-Readers like Amazon Kindle: `--kindle`, it blocks overflow on `table` and `pre` elements (see [example](#use-or-not-the---kindle-option)).
	In this case, I suggest you to convert the `EPUB` to `AZW3` with Calibre or to `MOBI`, remember in this case to select `Ignore margins` in the conversion options:

	![Calibre IgnoreMargins](https://github.com/lorenzodifuccia/cloudflare/raw/master/Images/safaribooks/safaribooks_calibre_IgnoreMargins.png "Select Ignore margins")

	## Examples:
	* ## Download [Test-Driven Development with Python, 2nd Edition](https://www.safaribooksonline.com/library/view/test-driven-development-with/9781491958698/):
	```shell
	$ python3 safaribooks.py --cred "[email protected]:MyPassword1!" 9781491958698

	____ ___ _
	/ __/__ _/ _/__ _____(_)
	_\ \/ _ `/ _/ _ `/ __/ /
	/___/\_,_/_/ \_,_/_/ /_/
	/ _ )___ ___ / /__ ___
	/ _ / _ \/ _ \/ '_/(_-<
	/____/\___/\___/_/\_\/___/

	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	[-] Logging into Safari Books Online...
	[*] Retrieving book info...
	[-] Title: Test-Driven Development with Python, 2nd Edition
	[-] Authors: Harry J.W. Percival
	[-] Identifier: 9781491958698
	[-] ISBN: 9781491958704
	[-] Publishers: O'Reilly Media, Inc.
	[-] Rights: Copyright © O'Reilly Media, Inc.
	[-] Description: By taking you through the development of a real web application
	from beginning to end, the second edition of this hands-on guide demonstrates the
	practical advantages of test-driven development (TDD) with Python. You’ll learn
	how to write and run tests before building each part of your app, and then develop
	the minimum amount of code required to pass those tests. The result? Clean code
	that works.In the process, you’ll learn the basics of Django, Selenium, Git,
	jQuery, and Mock, along with curre...
	[-] Release Date: 2017-08-18
	[-] URL: https://learning.oreilly.com/library/view/test-driven-development-with/9781491958698/
	[*] Retrieving book chapters...
	[*] Output directory:
	/XXXX/safaribooks/Books/Test-Driven Development with Python 2nd Edition (9781491958698)
	[-] Downloading book contents... (53 chapters)
	[#####################################################################] 100%
	[-] Downloading book CSSs... (2 files)
	[#####################################################################] 100%
	[-] Downloading book images... (142 files)
	[#####################################################################] 100%
	[-] Creating EPUB file...
	[*] Done: /XXXX/safaribooks/Books/Test-Driven Development with Python 2nd Edition
	(9781491958698)/9781491958698.epub

	If you like it, please * this project on GitHub to make it known:
	https://github.com/lorenzodifuccia/safaribooks
	e don't forget to renew your Safari Books Online subscription:
	https://learning.oreilly.com

	[!] Bye!!
	```
	The result will be (opening the `EPUB` file with Calibre):

	![Book Appearance](https://github.com/lorenzodifuccia/cloudflare/raw/master/Images/safaribooks/safaribooks_example01_TDD.png "Book opened with Calibre")

	* ## Use or not the `--kindle` option:
	```bash
	$ python3 safaribooks.py --kindle 9781491958698
	```
	On the right, the book created with `--kindle` option, on the left without (default):

	![NoKindle Option](https://github.com/lorenzodifuccia/cloudflare/raw/master/Images/safaribooks/safaribooks_example02_NoKindle.png "Version compare")

	---

	## Thanks!!
	For any kind of problem, please don't hesitate to open an issue here on GitHub.

	Lorenzo Di Fuccia

safaribooks/safaribooks.py

Lines 166 to 250 in 48ae994

 
 except (html.etree.ParseError, html.etree.ParserError) as e: 

 self.log("Error parsing the description: %s" % e) 

 return "n/d" 

 def book_info(self, info): 

 description = self.parse_description(info.get("description", None)).replace("\n", " ") 

 for t in [ 

 ("Title", info.get("title", "")), ("Authors", ", ".join(aut.get("name", "") for aut in info.get("authors", []))), 

 ("Identifier", info.get("identifier", "")), ("ISBN", info.get("isbn", "")), 

 ("Publishers", ", ".join(pub.get("name", "") for pub in info.get("publishers", []))), 

 ("Rights", info.get("rights", "")), 

 ("Description", description[:500] + "..." if len(description) >= 500 else description), 

 ("Release Date", info.get("issued", "")), 

 ("URL", info.get("web_url", "")) 

 ]: 

 self.info("{0}{1}{2}: {3}".format(self.SH_YELLOW, t[0], self.SH_DEFAULT, t[1]), True) 

 def state(self, origin, done): 

 progress = int(done * 100 / origin) 

 bar = int(progress * (self.columns - 11) / 100) 

 if self.state_status.value < progress: 

 self.state_status.value = progress 

 sys.stdout.write( 

 "\r " + self.SH_BG_YELLOW + "[" + ("#" * bar).ljust(self.columns - 11, "-") + "]" + 

 self.SH_DEFAULT + ("%4s" % progress) + "%" + ("\n" if progress == 100 else "") 

 ) 

 def done(self, epub_file): 

 self.info("Done: %s\n\n" % epub_file + 

 " If you like it, please * this project on GitHub to make it known:\n" 

 " https://github.com/lorenzodifuccia/safaribooks\n" 

 " e don't forget to renew your Safari Books Online subscription:\n" 

 " " + SAFARI_BASE_URL + "\n\n" + 

 self.SH_BG_RED + "[!]" + self.SH_DEFAULT + " Bye!!") 

 @staticmethod 

 def api_error(response): 

 message = "API: " 

 if "detail" in response and "Not found" in response["detail"]: 

 message += "book's not present in Safari Books Online.\n" \ 

 " The book identifier is the digits that you can find in the URL:\n" \ 

 " `" + SAFARI_BASE_URL + "/library/view/book-name/XXXXXXXXXXXXX/`" 

 else: 

 os.remove(COOKIES_FILE) 

 message += "Out-of-Session%s.\n" % (" (%s)" % response["detail"]) if "detail" in response else "" + \ 

 Display.SH_YELLOW + "[+]" + Display.SH_DEFAULT + \ 

 " Use the `--cred` or `--login` options in order to perform the auth login to Safari." 

 return message 

 class WinQueue(list): # TODO: error while use `process` in Windows: can't pickle _thread.RLock objects 

 def put(self, el): 

 self.append(el) 

 def qsize(self): 

 return self.__len__() 

 class SafariBooks: 

 LOGIN_URL = ORLY_BASE_URL + "/member/auth/login/" 

 LOGIN_ENTRY_URL = SAFARI_BASE_URL + "/login/unified/?next=/home/" 

 API_TEMPLATE = SAFARI_BASE_URL + "/api/v1/book/{0}/" 

 BASE_01_HTML = "<!DOCTYPE html>\n" \ 

 "<html lang=\"en\" xml:lang=\"en\" xmlns=\"https://www.w3.org/1999/xhtml\"" \ 

 " xmlns:xsi=\"https://www.w3.org/2001/XMLSchema-instance\"" \ 

 " xsi:schemaLocation=\"https://www.w3.org/2002/06/xhtml2/" \ 

 " https://www.w3.org/MarkUp/SCHEMA/xhtml2.xsd\"" \ 

 " xmlns:epub=\"https://www.idpf.org/2007/ops\">\n" \ 

 "<head>\n" \ 

 "{0}\n" \ 

 "<style type=\"text/css\">" \ 

 "body{{margin:1em;background-color:transparent!important;}}" \ 

 "#sbo-rt-content *{{text-indent:0pt!important;}}#sbo-rt-content .bq{{margin-right:1em!important;}}" 

 KINDLE_HTML = "#sbo-rt-content *{{word-wrap:break-word!important;" \ 

 "word-break:break-word!important;}}#sbo-rt-content table,#sbo-rt-content pre" \ 

 "{{overflow-x:unset!important;overflow:unset!important;" \ 

 "overflow-y:unset!important;white-space:pre-wrap!important;}}" 

 BASE_02_HTML = "</style>" \

Step 2: 🧐 Snippet Analysis

From looking through the relevant snippets, I decided to make the following modifications:

File Path	Proposed Changes
`safaribooks.py`	Add a new argument in the argparse section for the playlist or specific book. Modify the main function to handle the new argument. If the argument is a playlist, iterate over the books in the playlist and download each one. If the argument is a specific book, download that book. Modify the download function to handle the new types of input (playlist or specific book).

Step 3: 📝 Planning

I have created a plan for writing the pull request. I am now working my plan and coding the required changes to address this issue. Here is the planned pull request:

Add new parameter to download playlist or specific book
sweep/playlist-download

Description

This PR adds a new feature to the safaribooks codebase that allows downloading a playlist or a specific book. The existing codebase only supports downloading a book based on its ID. With this new feature, users can now specify a playlist or a specific book to download.

Summary of Changes

Added a new command-line argument --playlist to accept a playlist URL or a specific book ID.

Modified the main function to handle the new argument. If the argument is a playlist, the code iterates over the books in the playlist and downloads each one. If the argument is a specific book, the code downloads that book.

Modified the download function to handle the new types of input (playlist or specific book).

This PR enhances the functionality of the safaribooks codebase by allowing users to download playlists or specific books, providing more flexibility and convenience.

Step 4: ⌨️ Coding

File	Instructions	Progress
`safaribooks.py`	Add a new argument in the argparse section for the playlist or specific book. Modify the main function to handle the new argument. If the argument is a playlist, iterate over the books in the playlist and download each one. If the argument is a specific book, download that book. Modify the download function to handle the new types of input (playlist or specific book).	✅ Done with commit `159eedc`

Step 5: 🔁 Code Review

Here are the my self-reviews of my changes at sweep/playlist-download.

Here is the 1st review

Thanks for your contribution. There are a few changes that need to be made:

In safaribooks.py, you've called a function get_books_from_playlist on line 1125, but it's not defined or imported anywhere in the changes. Please make sure to implement or import this function.

Also, please ensure that the bookid and playlist arguments are properly validated. They should be valid IDs, and the code should handle the case when they are not provided.

Keep up the good work!

I finished incorporating these changes.

To recreate the pull request, leave a comment prefixed with "sweep:" or edit the issue.
^{Join Our Discord}

makhalaf added the sweep Assigns Sweep to an issue or pull request. label Aug 8, 2023

sweep-ai bot linked a pull request Aug 8, 2023 that will close this issue

Add new parameter to download playlist or specific book #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sweep: add new parameter to download playlist or specific book #4

Sweep: add new parameter to download playlist or specific book #4

makhalaf commented Aug 8, 2023

sweep-ai bot commented Aug 8, 2023 •

edited

Loading

Description

Summary of Changes

Sweep: add new parameter to download playlist or specific book #4

Sweep: add new parameter to download playlist or specific book #4

Comments

makhalaf commented Aug 8, 2023

sweep-ai bot commented Aug 8, 2023 • edited Loading

Here's the PR! #5.

Step 1: 🔍 Code Search

Step 2: 🧐 Snippet Analysis

Step 3: 📝 Planning

Description

Summary of Changes

Step 4: ⌨️ Coding

Step 5: 🔁 Code Review

sweep-ai bot commented Aug 8, 2023 •

edited

Loading