I have a new need to fulfil. I need to merge a bunch of PDF files into a single one.
Of course I could have purchased the tool I recommend to all my clients (Foxit PhantomPDF but there is no fun for me in there!
I decided to spend a part of my Labor Day holiday to create my own tool instead and decided to share a good part of it with you!
Available source code
Both VB and C# projects are provided this month.
The solution was created using Visual Studio 2019 but should also work in earlier versions.
Figure 1: the demo application in action
Searching for a (free) library
When I thought of building my own tool, my first reaction was to use a library I really like: Aspose.PDF. But I decided to search for a free library since I wanted to share my solution with you, and I know it is often hard to convince bosses to spend a few bucks on good libraries.
I then remembered that a couple of years ago, many developers were pushing to use the free PDFsharp library (Nuget and official). It seems that this project has not been updated since April 2019. Not a good choice when you build a new tool and you have other options!
I then remembered the other free library used by a lot of developers: iTextSharp. I found out that the library has evolved under the name iText7 but one of the package is still free. This library also offers many examples on their website. By default, the samples are in Java but there is a toggle at the top of the sample that will switch to the C# version.
For the purpose of this demo, I will use the free iText7 package available on Nuget.
Figure 2: Referencing the library
Building the UI
For the demo, I wanted something simple.
As you can see in figure 1, there is a button to select the files to be merged. An Open File Dialog is used to let the user select multiple PDF files. The selected files are displayed in the grid control.
Because it is easy to bind a grid to a datatable, that is the mechanism I chose.
This snippet of code shows how the DataTable object is initialized and how the OpenFileDialog is called to fill the DataTable object (which content will then show in the DataGridView).
Private _dtFilesToMerge As DataTable Private Sub Form1_Load(sender As Object, e As EventArgs) Handles Me.Load InitializeDataTable() End Sub Private Sub InitializeDataTable() 'intialize an in-memory table to keep the list of files to merge _dtFilesToMerge = New DataTable() _dtFilesToMerge.Columns.Add("Path", GetType(String)) _dtFilesToMerge.Columns.Add("FileName", GetType(String)) dataGridView1.DataSource = _dtFilesToMerge End Sub Private Sub btnAddFiles_Click(sender As Object, e As EventArgs) Handles btnAddFiles.Click 'use a OpenFileDialog component to let users select the files to be merged Dim ofd As New OpenFileDialog With { .Filter = "PDF (*.PDF)|*.PDF", .Multiselect = True, .Title = "Select files to merge" } If ofd.ShowDialog() = DialogResult.OK Then 'add the selected files to the data table For Each file As String In ofd.FileNames Try Dim newRow As DataRow = _dtFilesToMerge.NewRow() newRow(0) = IO.Path.GetDirectoryName(file) newRow(1) = IO.Path.GetFileName(file) _dtFilesToMerge.Rows.Add(newRow) Catch ex As Exception MessageBox.Show("Error: " & ex.Message) End Try Next End If End Sub
private DataTable _dtFilesToMerge; public Form1() { InitializeComponent(); InitializeDataTable(); } private void InitializeDataTable() { //intialize an in-memory table to keep the list of files to merge _dtFilesToMerge = new DataTable(); _dtFilesToMerge.Columns.Add("Path", typeof(string)); _dtFilesToMerge.Columns.Add("FileName", typeof(string)); dataGridView1.DataSource = _dtFilesToMerge; } private void btnAddFiles_Click(object sender, EventArgs e) { //use a OpenFileDialog component to let users select the files to be merged OpenFileDialog ofd = new OpenFileDialog { Filter = "PDF (*.PDF)|*.PDF", Multiselect = true, Title = "Select files to merge" }; if (ofd.ShowDialog() == DialogResult.OK) { //add the selected files to the data table foreach (string file in ofd.FileNames) { try { DataRow newRow = _dtFilesToMerge.NewRow(); newRow[0] = System.IO.Path.GetDirectoryName(file); newRow[1] = System.IO.Path.GetFileName(file); _dtFilesToMerge.Rows.Add(newRow); } catch (Exception ex) { MessageBox.Show("Error: " + ex.Message); } } } }
Merging the PDFs
Now that we have the list of files to be merged, we need to know the output filename. You can type it in the textbox or use the browse button found to the right of the TextBox control. This button is using a SaveFileDialog to let the user select the folder and file name to save the output.
So far, nothing is related to the library that will merge the PDFs altogether. The use of the library makes it simple. You first need to create a new PDFDocument object to store your new file containing everything else. Then you loop through the selected files, open them into another PdfDocument object. You can then use the CopyPagesTo method of the library to copy your input into your new output. Pages will be added at end.
Private Sub btnBrowseOutput_Click(sender As Object, e As EventArgs) Handles btnBrowseOutput.Click 'use a SaveFileDialog component to let users specify where to save the merged file Dim sfd As New SaveFileDialog With { .Title = "Merge Files into", .CheckPathExists = True, .Filter = "PDF (*.PDF)|*.PDF", .RestoreDirectory = True } If sfd.ShowDialog() = DialogResult.OK Then txtOutput.Text = sfd.FileName End If End Sub Private Sub btnMergeFiles_Click(sender As Object, e As EventArgs) Handles btnMergeFiles.Click 'validate required fields If String.IsNullOrWhiteSpace(txtOutput.Text) Then MessageBox.Show("You must fill the output file name first!") Return End If If _dtFilesToMerge.Rows.Count = 0 Then MessageBox.Show("You must add files to merge first!") Return End If 'create a document that will contained all merged documents Dim pdfDoc As PdfDocument = New PdfDocument(New PdfWriter(txtOutput.Text)) 'add all other documents to be merged For Each dr As DataRow In _dtFilesToMerge.Rows Dim readerDoc As New PdfDocument(New PdfReader(IO.Path.Combine(dr("Path").ToString(), dr("FileName").ToString()))) readerDoc.CopyPagesTo(1, readerDoc.GetNumberOfPages(), pdfDoc) readerDoc.Close() Next pdfDoc.Close() MessageBox.Show("All files merged") End Sub
private void btnBrowseOutput_Click(object sender, EventArgs e) { //use a SaveFileDialog component to let users specify where to save the merged file SaveFileDialog sfd = new SaveFileDialog { Title = "Merge Files into", CheckPathExists = true, Filter = "PDF (*.PDF)|*.PDF", RestoreDirectory = true }; if (sfd.ShowDialog() == DialogResult.OK) { txtOutput.Text = sfd.FileName; } } private void btnMergeFiles_Click(object sender, EventArgs e) { //validate required fields if (string.IsNullOrWhiteSpace(txtOutput.Text)) { MessageBox.Show("You must fill the output file name first!"); return; } if (_dtFilesToMerge.Rows.Count == 0) { MessageBox.Show("You must add files to merge first!"); return; } //create a document that will contained all merged documents PdfDocument pdfDoc = new PdfDocument(new PdfWriter(txtOutput.Text)); //add all other documents to be merged foreach (DataRow dr in _dtFilesToMerge.Rows) { PdfDocument readerDoc = new PdfDocument(new PdfReader(System.IO.Path.Combine(dr["Path"].ToString(), dr["FileName"].ToString()))); readerDoc.CopyPagesTo(1, readerDoc.GetNumberOfPages(), pdfDoc); readerDoc.Close(); } pdfDoc.Close(); MessageBox.Show("All files merged"); }
Building a table of contents
As a nice feature I really wanted to implement is to add a simple table of contents as the first page of the new document listing all the files contained.
This is done by adding a custom page completely built using the library.
The first part of this code here goes into the btnMergeFile_Click event handler (just after the pdfDoc object has been created and just before looping through the rows of the datatable).
'add a table of contents if needed If chkTableOfContents.Checked Then Dim doc As New Document(pdfDoc) Dim pageSize As Rectangle = pdfDoc.GetDefaultPageSize() Dim sngWidth As Single = pageSize.GetWidth() - doc.GetLeftMargin() - doc.GetRightMargin() Dim line As New SolidLine() AddParagraphWithTabs(doc, line, sngWidth) Dim table As New Table(1) For Each dr As DataRow In _dtFilesToMerge.Rows table.AddCell(dr("FileName").ToString()) Next doc.Add(table) End If Private Shared Sub AddParagraphWithTabs(ByVal pDocument As Document, ByVal pLine As ILineDrawer, ByVal pWidth As Single) 'add a title to the Table of contents page Dim tabStops As New List(Of TabStop)() 'Create a TabStop at the middle of the page tabStops.Add(New TabStop(pWidth / 2, Properties.TabAlignment.CENTER, pLine)) 'Create a TabStop at the end of the page tabStops.Add(New TabStop(pWidth, Properties.TabAlignment.LEFT, pLine)) Dim bold As PdfFont = PdfFontFactory.CreateFont(iText.IO.Font.Constants.StandardFonts.HELVETICA_BOLD) Dim pageHeader As Text = New Text("Table of contents").SetFont(bold) Dim p As Paragraph = New Paragraph().AddTabStops(tabStops) p.Add(New Tab()).Add(pageHeader).Add(New Tab()) pDocument.Add(p) End Sub
//add a table of contents if needed if (chkTableOfContents.Checked) { Document doc = new Document(pdfDoc); Rectangle pageSize = pdfDoc.GetDefaultPageSize(); float width = pageSize.GetWidth() - doc.GetLeftMargin() - doc.GetRightMargin(); SolidLine line = new SolidLine(); AddParagraphWithTabs(doc, line, width); Table table = new Table(1); foreach (DataRow dr in _dtFilesToMerge.Rows) { table.AddCell(dr["FileName"].ToString()); } doc.Add(table); } private static void AddParagraphWithTabs(Document pDocument, ILineDrawer pLine, float pWidth) { //add a title to the Table of contents page List<TabStop> tabStops = new List<TabStop>(); // Create a TabStop at the middle of the page tabStops.Add(new TabStop(pWidth / 2, iText.Layout.Properties.TabAlignment.CENTER, pLine)); // Create a TabStop at the end of the page tabStops.Add(new TabStop(pWidth, iText.Layout.Properties.TabAlignment.LEFT, pLine)); PdfFont bold = PdfFontFactory.CreateFont(iText.IO.Font.Constants.StandardFonts.HELVETICA_BOLD); Text pageHeader = new Text("Table of contents").SetFont(bold); Paragraph p = new Paragraph().AddTabStops(tabStops); p.Add(new Tab()).Add(pageHeader).Add(new Tab()); pDocument.Add(p); }
Conclusion
A useful tool built in a few hours using a free library. Of course, the software I could have bought would have done so much more but for now I am satisfied with this custom tool built in a few hours. I am fairly sure that I will spend more time adding additional features (like reordering the pages) or a more complete table of contents. Time will tell.