I have a new need to fulfil. I need to merge a bunch of PDF files into a single one.
Of course I could have purchased the tool I recommend to all my clients (Foxit PhantomPDF but there is no fun for me in there!
I decided to spend a part of my Labor Day holiday to create my own tool instead and decided to share a good part of it with you!
Available source code
Both VB and C# projects are provided this month.
The solution was created using Visual Studio 2019 but should also work in earlier versions.
Figure 1: the demo application in action
Searching for a (free) library
When I thought of building my own tool, my first reaction was to use a library I really like: Aspose.PDF. But I decided to search for a free library since I wanted to share my solution with you, and I know it is often hard to convince bosses to spend a few bucks on good libraries.
I then remembered that a couple of years ago, many developers were pushing to use the free PDFsharp library (Nuget and official). It seems that this project has not been updated since April 2019. Not a good choice when you build a new tool and you have other options!
I then remembered the other free library used by a lot of developers: iTextSharp. I found out that the library has evolved under the name iText7 but one of the package is still free. This library also offers many examples on their website. By default, the samples are in Java but there is a toggle at the top of the sample that will switch to the C# version.
For the purpose of this demo, I will use the free iText7 package available on Nuget.
Figure 2: Referencing the library
Building the UI
For the demo, I wanted something simple.
As you can see in figure 1, there is a button to select the files to be merged. An Open File Dialog is used to let the user select multiple PDF files. The selected files are displayed in the grid control.
Because it is easy to bind a grid to a datatable, that is the mechanism I chose.
This snippet of code shows how the DataTable object is initialized and how the OpenFileDialog is called to fill the DataTable object (which content will then show in the DataGridView).
Private _dtFilesToMerge As DataTable
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles Me.Load
InitializeDataTable()
End Sub
Private Sub InitializeDataTable()
'intialize an in-memory table to keep the list of files to merge
_dtFilesToMerge = New DataTable()
_dtFilesToMerge.Columns.Add("Path", GetType(String))
_dtFilesToMerge.Columns.Add("FileName", GetType(String))
dataGridView1.DataSource = _dtFilesToMerge
End Sub
Private Sub btnAddFiles_Click(sender As Object, e As EventArgs) Handles btnAddFiles.Click
'use a OpenFileDialog component to let users select the files to be merged
Dim ofd As New OpenFileDialog With {
.Filter = "PDF (*.PDF)|*.PDF",
.Multiselect = True,
.Title = "Select files to merge"
}
If ofd.ShowDialog() = DialogResult.OK Then
'add the selected files to the data table
For Each file As String In ofd.FileNames
Try
Dim newRow As DataRow = _dtFilesToMerge.NewRow()
newRow(0) = IO.Path.GetDirectoryName(file)
newRow(1) = IO.Path.GetFileName(file)
_dtFilesToMerge.Rows.Add(newRow)
Catch ex As Exception
MessageBox.Show("Error: " & ex.Message)
End Try
Next
End If
End Sub
private DataTable _dtFilesToMerge;
public Form1()
{
InitializeComponent();
InitializeDataTable();
}
private void InitializeDataTable()
{
//intialize an in-memory table to keep the list of files to merge
_dtFilesToMerge = new DataTable();
_dtFilesToMerge.Columns.Add("Path", typeof(string));
_dtFilesToMerge.Columns.Add("FileName", typeof(string));
dataGridView1.DataSource = _dtFilesToMerge;
}
private void btnAddFiles_Click(object sender, EventArgs e)
{
//use a OpenFileDialog component to let users select the files to be merged
OpenFileDialog ofd = new OpenFileDialog
{
Filter = "PDF (*.PDF)|*.PDF",
Multiselect = true,
Title = "Select files to merge"
};
if (ofd.ShowDialog() == DialogResult.OK)
{
//add the selected files to the data table
foreach (string file in ofd.FileNames)
{
try
{
DataRow newRow = _dtFilesToMerge.NewRow();
newRow[0] = System.IO.Path.GetDirectoryName(file);
newRow[1] = System.IO.Path.GetFileName(file);
_dtFilesToMerge.Rows.Add(newRow);
}
catch (Exception ex)
{
MessageBox.Show("Error: " + ex.Message);
}
}
}
}
Merging the PDFs
Now that we have the list of files to be merged, we need to know the output filename. You can type it in the textbox or use the browse button found to the right of the TextBox control. This button is using a SaveFileDialog to let the user select the folder and file name to save the output.
So far, nothing is related to the library that will merge the PDFs altogether. The use of the library makes it simple. You first need to create a new PDFDocument object to store your new file containing everything else. Then you loop through the selected files, open them into another PdfDocument object. You can then use the CopyPagesTo method of the library to copy your input into your new output. Pages will be added at end.
Private Sub btnBrowseOutput_Click(sender As Object, e As EventArgs) Handles btnBrowseOutput.Click
'use a SaveFileDialog component to let users specify where to save the merged file
Dim sfd As New SaveFileDialog With {
.Title = "Merge Files into",
.CheckPathExists = True,
.Filter = "PDF (*.PDF)|*.PDF",
.RestoreDirectory = True
}
If sfd.ShowDialog() = DialogResult.OK Then
txtOutput.Text = sfd.FileName
End If
End Sub
Private Sub btnMergeFiles_Click(sender As Object, e As EventArgs) Handles btnMergeFiles.Click
'validate required fields
If String.IsNullOrWhiteSpace(txtOutput.Text) Then
MessageBox.Show("You must fill the output file name first!")
Return
End If
If _dtFilesToMerge.Rows.Count = 0 Then
MessageBox.Show("You must add files to merge first!")
Return
End If
'create a document that will contained all merged documents
Dim pdfDoc As PdfDocument = New PdfDocument(New PdfWriter(txtOutput.Text))
'add all other documents to be merged
For Each dr As DataRow In _dtFilesToMerge.Rows
Dim readerDoc As New PdfDocument(New PdfReader(IO.Path.Combine(dr("Path").ToString(), dr("FileName").ToString())))
readerDoc.CopyPagesTo(1, readerDoc.GetNumberOfPages(), pdfDoc)
readerDoc.Close()
Next
pdfDoc.Close()
MessageBox.Show("All files merged")
End Sub
private void btnBrowseOutput_Click(object sender, EventArgs e)
{
//use a SaveFileDialog component to let users specify where to save the merged file
SaveFileDialog sfd = new SaveFileDialog
{
Title = "Merge Files into",
CheckPathExists = true,
Filter = "PDF (*.PDF)|*.PDF",
RestoreDirectory = true
};
if (sfd.ShowDialog() == DialogResult.OK)
{
txtOutput.Text = sfd.FileName;
}
}
private void btnMergeFiles_Click(object sender, EventArgs e)
{
//validate required fields
if (string.IsNullOrWhiteSpace(txtOutput.Text))
{
MessageBox.Show("You must fill the output file name first!");
return;
}
if (_dtFilesToMerge.Rows.Count == 0)
{
MessageBox.Show("You must add files to merge first!");
return;
}
//create a document that will contained all merged documents
PdfDocument pdfDoc = new PdfDocument(new PdfWriter(txtOutput.Text));
//add all other documents to be merged
foreach (DataRow dr in _dtFilesToMerge.Rows)
{
PdfDocument readerDoc = new PdfDocument(new PdfReader(System.IO.Path.Combine(dr["Path"].ToString(), dr["FileName"].ToString())));
readerDoc.CopyPagesTo(1, readerDoc.GetNumberOfPages(), pdfDoc);
readerDoc.Close();
}
pdfDoc.Close();
MessageBox.Show("All files merged");
}
Building a table of contents
As a nice feature I really wanted to implement is to add a simple table of contents as the first page of the new document listing all the files contained.
This is done by adding a custom page completely built using the library.
The first part of this code here goes into the btnMergeFile_Click event handler (just after the pdfDoc object has been created and just before looping through the rows of the datatable).
'add a table of contents if needed
If chkTableOfContents.Checked Then
Dim doc As New Document(pdfDoc)
Dim pageSize As Rectangle = pdfDoc.GetDefaultPageSize()
Dim sngWidth As Single = pageSize.GetWidth() - doc.GetLeftMargin() - doc.GetRightMargin()
Dim line As New SolidLine()
AddParagraphWithTabs(doc, line, sngWidth)
Dim table As New Table(1)
For Each dr As DataRow In _dtFilesToMerge.Rows
table.AddCell(dr("FileName").ToString())
Next
doc.Add(table)
End If
Private Shared Sub AddParagraphWithTabs(ByVal pDocument As Document, ByVal pLine As ILineDrawer, ByVal pWidth As Single)
'add a title to the Table of contents page
Dim tabStops As New List(Of TabStop)()
'Create a TabStop at the middle of the page
tabStops.Add(New TabStop(pWidth / 2, Properties.TabAlignment.CENTER, pLine))
'Create a TabStop at the end of the page
tabStops.Add(New TabStop(pWidth, Properties.TabAlignment.LEFT, pLine))
Dim bold As PdfFont = PdfFontFactory.CreateFont(iText.IO.Font.Constants.StandardFonts.HELVETICA_BOLD)
Dim pageHeader As Text = New Text("Table of contents").SetFont(bold)
Dim p As Paragraph = New Paragraph().AddTabStops(tabStops)
p.Add(New Tab()).Add(pageHeader).Add(New Tab())
pDocument.Add(p)
End Sub
//add a table of contents if needed
if (chkTableOfContents.Checked)
{
Document doc = new Document(pdfDoc);
Rectangle pageSize = pdfDoc.GetDefaultPageSize();
float width = pageSize.GetWidth() - doc.GetLeftMargin() - doc.GetRightMargin();
SolidLine line = new SolidLine();
AddParagraphWithTabs(doc, line, width);
Table table = new Table(1);
foreach (DataRow dr in _dtFilesToMerge.Rows)
{
table.AddCell(dr["FileName"].ToString());
}
doc.Add(table);
}
private static void AddParagraphWithTabs(Document pDocument, ILineDrawer pLine, float pWidth)
{
//add a title to the Table of contents page
List<TabStop> tabStops = new List<TabStop>();
// Create a TabStop at the middle of the page
tabStops.Add(new TabStop(pWidth / 2, iText.Layout.Properties.TabAlignment.CENTER, pLine));
// Create a TabStop at the end of the page
tabStops.Add(new TabStop(pWidth, iText.Layout.Properties.TabAlignment.LEFT, pLine));
PdfFont bold = PdfFontFactory.CreateFont(iText.IO.Font.Constants.StandardFonts.HELVETICA_BOLD);
Text pageHeader = new Text("Table of contents").SetFont(bold);
Paragraph p = new Paragraph().AddTabStops(tabStops);
p.Add(new Tab()).Add(pageHeader).Add(new Tab());
pDocument.Add(p);
}
Conclusion
A useful tool built in a few hours using a free library. Of course, the software I could have bought would have done so much more but for now I am satisfied with this custom tool built in a few hours. I am fairly sure that I will spend more time adding additional features (like reordering the pages) or a more complete table of contents. Time will tell.