Merge multiple PDF files based on their name using Python (Real-World Example)

  Рет қаралды 11,784

Coding Is Fun

Coding Is Fun

Күн бұрын

Пікірлер: 47
@CodingIsFun
@CodingIsFun 2 жыл бұрын
*The task was somewhat specific, but I hope you learned something new! :)*
@asankacool1
@asankacool1 2 жыл бұрын
Wow, this video represents a very practical scenario in the field of science industrials operation data analytics. My another suggestion would be the same scenario for excel file type append method for a key on ‘financial year month’ basis which the Key then also needs to be converted to a DATE format for proper analytics, graphs and exact time series order. Btw, great video Sven!!! 👍👍👍
@CodingIsFun
@CodingIsFun 2 жыл бұрын
Glad you find the video helpful. Thanks a lot for watching & your great suggestion.
@s33wagz
@s33wagz Жыл бұрын
Been looking into the best way to create a simple gui that shows a list of pdfs in a folder, has an area for creating an output pdf to combine files into (list of multiple output files as these are "pdf packages" that are being built), has a button to copy a selected pdf into the desired output file (this I imagine would just be the file path of the selected pdf to append or merge with the desired output pdf) I'm considering doing everything in excel but I'm now considering React/JS or maybe Python. What would you suggest?
@CodingIsFun
@CodingIsFun Жыл бұрын
Thank you for tuning in. If you're interested in building a GUI for your Python project, there are several options available. Personally, I highly recommend "PySimpleGUI". I've already created several tutorials on this library on my channel, so be sure to check them out. Happy coding!
@s33wagz
@s33wagz Жыл бұрын
@@CodingIsFun I did poke around after my comment and found your stuff on guis. I just gotta find all the pieces to this puzzle is all.
@KhalilYasser
@KhalilYasser 2 жыл бұрын
Awesome. I am waiting for your videos day after day.
@CodingIsFun
@CodingIsFun 2 жыл бұрын
Happy to hear that! 😃 As always, thank you very much for your comment. Your support is much appreciated!
@tobiewaldeck7105
@tobiewaldeck7105 4 ай бұрын
Hi. I have tried pymupdf and pypdf2 to merge forms with fill-able fields in them. Either fields are missing from resulting pages or all the fill-able field values are the same. What is going on?
@CodingIsFun
@CodingIsFun 4 ай бұрын
Thanks for watching. Hard to tell from a distance. Sorry, that I cannot help. Cheers, Sven ✌️
@brazilleros
@brazilleros 2 жыл бұрын
As always, understandable clean code and perfect solution! Thank you Sven, for your videos and professional attitude .
@CodingIsFun
@CodingIsFun 2 жыл бұрын
Happy to hear that you enjoyed this one too! Thanks for the comments and support!
@yasinnabi
@yasinnabi 2 жыл бұрын
woww this video is a wonderful video and pushed me to some other videos in your channel. great content. thanks for uploads. ,,,,
@CodingIsFun
@CodingIsFun 2 жыл бұрын
Glad you like the videos! Thanks for watching & your comment! :)
@diegodanciguer4901
@diegodanciguer4901 Жыл бұрын
very good video, is there a way to choose which order the pdf needs to be merged?
@CodingIsFun
@CodingIsFun Жыл бұрын
Thanks! Try to sort the list with the file names, see the following example: stackoverflow.com/questions/6618515/sorting-list-based-on-values-from-another-list Happy Coding!
@dain7787
@dain7787 6 ай бұрын
Hi just one question. Iv got error on file.name part line 18 ..... are there any solutions???
@CodingIsFun
@CodingIsFun 6 ай бұрын
Hey there, thanks for watching the video! I'm sorry I can't help you with your problem based on the information you provided. To give me a better idea of what's going on, it would be super helpful if you could write down which line of code is causing the error, let me know if you modified the code from the tutorial, and explain in more detail what you did to troubleshoot the problem. Don't forget to also give me some context about your setup and environment. If you're having trouble figuring things out, another option is to join our Discord server at pyhtonandvba.com/discord. You can ask your question there and maybe someone in the community can help out. Thanks for understanding.
@dule1635
@dule1635 2 жыл бұрын
Please, share the lesson how to make the book mark for the combined PDF file. Thank you very much!
@CodingIsFun
@CodingIsFun 2 жыл бұрын
Thanks for watching and your suggestion!
@Aditya-mx4gv
@Aditya-mx4gv Жыл бұрын
Hi, when I run the program it again starts to scan the already merged files, I want it to only scan the newly added files in the folder and to perform merge operation to those only, could you help me with this, thank you
@CodingIsFun
@CodingIsFun Жыл бұрын
Thanks for watching. Sure! To achieve this, you can maintain a record of the merged files in a separate text file. You can then read this file before scanning the folder to exclude any previously merged files. Here's an updated version of your script to do this: pastebin.com/ejfsSAXB
@Aditya-mx4gv
@Aditya-mx4gv Жыл бұрын
@@CodingIsFun Hey, thanks for this, it worked really well!👍
@everythinginpython1657
@everythinginpython1657 Жыл бұрын
if someone wants to authenticate data with then this code might be help them. for key in keys: merger = PdfMerger() base_file_name = None for file in pdf_files: str_pdf_file = str(file) split_str_pdf_files = str_pdf_file.split(" ") if split_str_pdf_files[0].endswith(key): merger.append(PdfReader(str(file), "rb")) if len(file.name) >= BASE_FILE_NAME_LENGTH: base_file_name = file.name if base_file_name: print(base_file_name) merger.write(str(pdf_output_dir / base_file_name)) merger.close()
@CodingIsFun
@CodingIsFun Жыл бұрын
Thanks for watching. Could you please let me know what you mean by authenticating the data? Thanks! :)
@dimasramawib
@dimasramawib Жыл бұрын
Hi, Sven! Thank you for such a helpful video! One question though, I have multiple files just like you've shown in the video. My files looks something like '001.pdf', '002.pdf', '001 - Content.pdf', '002 - Another Content.pdf' mixed in a single folder just like in the video as well. However, when I run the code, the merged file content order are '001 - Content.pdf" on the first page and '001.pdf' on the second page. My question is how can I swap the order of the content so that the merged content will be '001.pdf' on the first page and '001 - Content.pdf' on the second page? Cheers
@CodingIsFun
@CodingIsFun Жыл бұрын
Thanks for watching. Your script could look something like this: from pathlib import Path from PyPDF2 import PdfFileMerger, PdfFileReader # pip install PyPDF2 # Define input directory for the pdf files pdf_dir = Path(__file__).parent / "pdf_files" # Define & create output directory pdf_output_dir = Path(__file__).parent / "OUPUT" pdf_output_dir.mkdir(parents=True, exist_ok=True) # Determine the file name length of the base file # Example of the base files: # '902 17.03.2022 2000004496.pdf', '904 17.03.2022 2000004497.pdf' BASE_FILE_NAME_LENGTH = 20 # Define the desired order of the pdf files with specific key pdf_order = {'902': ['902 17.03.2022 2000004496.pdf','902 18.03.2022 2000004496.pdf'], '905': ['905 17.03.2022 2000004495.pdf'], '904': ['904 18.03.2022 2000004497.pdf']} for key, files in pdf_order.items(): merger = PdfFileMerger() for file in files: pdf_file = pdf_dir / file if pdf_file.is_file(): merger.append(PdfFileReader(str(pdf_file), "rb")) if len(file) >= BASE_FILE_NAME_LENGTH: base_file_name = file merger.write(str(pdf_output_dir / base_file_name)) merger.close() ___ Coffee donations are always welcome: pythonandvba.com/coffee-donation
@torque6389
@torque6389 2 жыл бұрын
Love the videos! Very helpful. Thank you!
@CodingIsFun
@CodingIsFun 2 жыл бұрын
Happy to hear that. Thanks for watching :)
@raajashekaran
@raajashekaran 2 жыл бұрын
Hi Thank you for your wonderful video, can include adding the Header and Footer along with Merge Please.
@CodingIsFun
@CodingIsFun 2 жыл бұрын
Thanks! What exactly do you mean by Header & Footer? Could you please provide more details? Thanks!
@raajashekaran
@raajashekaran 2 жыл бұрын
@@CodingIsFun Hi my nature of work to combine similar date pdf file and adding confidential note in each page header and footer. Like confidential yellow/ Green note in all pdf files header and footer. Unable to add existing pdf files. can you jelp me🙏
@CodingIsFun
@CodingIsFun 2 жыл бұрын
@@raajashekaran, I do not have the time to code an entire solution for you, but you might want to check out the following blog article on how to add a header/footer to PDFs using Python: dock2learn.com/tech/how-to-add-headers-and-footers-to-existing-pdf-document-using-python/ I hope it helps! Happy Coding!
@manuelbibbes2914
@manuelbibbes2914 Жыл бұрын
Thank you for your Video and yes i already learned something even if it didn't work for me. I got a the an Error: "TypeError: unhashable type: 'list'" and don't know how to handle that for now. Do you have a tip for me?
@CodingIsFun
@CodingIsFun Жыл бұрын
Thanks for watching. Can you please clone the repo and try again? Thanks!
@manuelbibbes2914
@manuelbibbes2914 Жыл бұрын
@@CodingIsFun Thank you for your superfast reply. Amazing it worked just fine after renaming PdfFileReader, PdfFileMerger to PdfMerger, PdfReader. 😁🤙
@gaganrastogi9624
@gaganrastogi9624 2 жыл бұрын
Please make a video, or just explain or give clue, How to covert all pdfs in folder to excel, or extract table and save Excel for each file.
@CodingIsFun
@CodingIsFun 2 жыл бұрын
Thanks for watching & your suggestion. Regarding your request, have a look at the following blog/video: 1. Extract table from PDF and convert to pandas dataframe: www.geeksforgeeks.org/how-to-extract-pdf-tables-in-python/ 2. Export Pandas DataFrames to new & existing Excel workbook: kzbin.info/www/bejne/eqPSkpmNhr2ketk Once you have a working solution, you could iterate over all pdf files in a folder: kzbin.info/www/bejne/rWeQY2ugmNdjb7M I hope it helps! Happy Coding!
@gaganrastogi9624
@gaganrastogi9624 2 жыл бұрын
Thankyou so much for your valuable time.
@jastorgallywix4424
@jastorgallywix4424 Жыл бұрын
IMPORTANT Change all occurences of "PdfFileMerger" to "PdfMerger" and "PdfFileReader" to "PdfReader" then the code will work. PdfFileMerger and PdfFileReader are no longer available(removed in PyPDF2 3.0.0.).
@CodingIsFun
@CodingIsFun Жыл бұрын
Hi Jastor Gallywix, Thanks for pointing this out. Feel free to send a pull request in GitHub (github.com/Sven-Bo/merge-pdfs-based-on-name). Your support is much appreciated! Cheers, Sven ✌
@qulinxao
@qulinxao 2 жыл бұрын
sorry, but your algo is O(n^2). simple change building keys on: keys={}; set(keys.setdefault(file.name[:3],[]).append(file.name)for file in pdf_files) now U don't need rescan all pdf_files for each key just: for key in keys: merger=PdfFileMerger() for file in keys[key]: merger.append.....
@CodingIsFun
@CodingIsFun 2 жыл бұрын
Thanks, appreciate that! 👍 Additionally, I also had to convert the file string to a Path object: for key in keys: merger = PdfFileMerger() for file in keys[key]: file = pdf_dir / file
@alejandramunoz8597
@alejandramunoz8597 Жыл бұрын
hi, could you help me with below error, how can i define path, thank you NameError Traceback (most recent call last) Cell In [12], line 1 ----> 1 pdf_dir = Path(__file__).parent / "pdf_files" NameError: name 'Path' is not defined
@CodingIsFun
@CodingIsFun Жыл бұрын
Ensure to import pathlib. from pathlib import Path
@alejandramunoz8597
@alejandramunoz8597 Жыл бұрын
@@CodingIsFun thank you, it works!, is there any way to add a order in pdf pages that we have combined
Create Stunning Python GUIs in 10 Minutes With Drag & Drop
11:38
Coding Is Fun
Рет қаралды 68 М.
HAH Chaos in the Bathroom 🚽✨ Smart Tools for the Throne 😜
00:49
123 GO! Kevin
Рет қаралды 12 МЛН
Как подписать? 😂 #shorts
00:10
Денис Кукояка
Рет қаралды 5 МЛН
Minecraft Creeper Family is back! #minecraft #funny #memes
00:26
小丑妹妹插队被妈妈教训!#小丑#路飞#家庭#搞笑
00:12
家庭搞笑日记
Рет қаралды 36 МЛН
Splitting PDF Files with Python
9:31
NeuralNine
Рет қаралды 10 М.
Merging PDFs in Python: The EASY Way
10:52
Misha Sv
Рет қаралды 2,7 М.
Merge PDF Files in Python
6:30
NeuralNine
Рет қаралды 15 М.
PyPDF2 Crash Course - Working with PDFs in Python [2023]
52:20
JCharisTech
Рет қаралды 47 М.
Build a Website in only 12 minutes using Python & Streamlit
12:48
Coding Is Fun
Рет қаралды 561 М.
How to merge multiple pdf file into one using python
6:27
I know python
Рет қаралды 7 М.
HAH Chaos in the Bathroom 🚽✨ Smart Tools for the Throne 😜
00:49
123 GO! Kevin
Рет қаралды 12 МЛН