Convert Excel to PDF in Java using iText

In many real-world scenarios, there is a need to convert Excel files to PDF format. Excel files are great for data storage and manipulation, but PDF files are more suitable for sharing and printing as they preserve the layout and formatting. Java is a popular programming language for enterprise-level applications, and iText is a well-known Java library for creating and manipulating PDF documents. This blog post will guide you through the process of converting Excel files to PDF using iText in Java, covering core concepts, typical usage scenarios, common pitfalls, and best practices.

Table of Contents#

  1. Core Concepts
  2. Typical Usage Scenarios
  3. Prerequisites
  4. Code Example
  5. Common Pitfalls
  6. Best Practices
  7. Conclusion
  8. FAQ
  9. References

Core Concepts#

iText#

iText is a Java library that allows developers to create, manipulate, and convert PDF documents programmatically. It provides a rich set of APIs for working with PDF content, such as text, images, tables, and more.

Excel to PDF Conversion#

The process of converting an Excel file to a PDF involves reading the data and formatting from the Excel file and then creating a corresponding PDF document. This typically includes extracting text, numbers, and formatting information from Excel cells and then using iText to create a PDF with the same or similar layout.

Typical Usage Scenarios#

  • Report Generation: Many businesses generate reports in Excel format and then need to convert them to PDF for distribution to clients or stakeholders. PDF reports are more professional-looking and easier to share.
  • Printing: Excel files may not always print correctly, especially when it comes to page breaks and formatting. Converting to PDF ensures that the document prints exactly as intended.
  • Archiving: PDF files are more suitable for long-term archiving as they are more stable and less likely to be corrupted compared to Excel files.

Prerequisites#

  • Java Development Kit (JDK) installed on your system.
  • iText library added to your Java project. You can add it using a build tool like Maven or Gradle. For Maven, add the following dependency to your pom.xml:
<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itext7-core</artifactId>
    <version>7.2.5</version>
    <type>pom</type>
</dependency>
  • Apache POI library to read Excel files. Add the following Maven dependency:
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>5.2.3</version>
</dependency>
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi - ooxml</artifactId>
    <version>5.2.3</version>
</dependency>

Code Example#

import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.layout.Document;
import com.itextpdf.layout.element.Cell;
import com.itextpdf.layout.element.Table;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
 
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
 
public class ExcelToPdfConverter {
 
    public static void convertExcelToPdf(String excelFilePath, String pdfFilePath) throws IOException {
        // Open the Excel file
        FileInputStream excelFile = new FileInputStream(new File(excelFilePath));
        Workbook workbook = new XSSFWorkbook(excelFile);
        Sheet sheet = workbook.getSheetAt(0);
 
        // Create a new PDF document
        PdfWriter writer = new PdfWriter(new FileOutputStream(pdfFilePath));
        PdfDocument pdf = new PdfDocument(writer);
        Document document = new Document(pdf);
 
        // Create a table in the PDF
        int numColumns = sheet.getRow(0).getLastCellNum();
        Table table = new Table(numColumns);
 
        // Iterate through the Excel rows
        for (Row row : sheet) {
            for (Cell excelCell : row) {
                // Get the cell value
                String cellValue = getCellValue(excelCell);
                // Add the cell to the PDF table
                table.addCell(new Cell().add(cellValue));
            }
        }
 
        // Add the table to the PDF document
        document.add(table);
 
        // Close the documents
        document.close();
        pdf.close();
        workbook.close();
        excelFile.close();
    }
 
    private static String getCellValue(org.apache.poi.ss.usermodel.Cell cell) {
        switch (cell.getCellType()) {
            case STRING:
                return cell.getStringCellValue();
            case NUMERIC:
                if (DateUtil.isCellDateFormatted(cell)) {
                    return cell.getDateCellValue().toString();
                } else {
                    return String.valueOf(cell.getNumericCellValue());
                }
            case BOOLEAN:
                return String.valueOf(cell.getBooleanCellValue());
            case FORMULA:
                return cell.getCellFormula();
            default:
                return "";
        }
    }
 
    public static void main(String[] args) {
        try {
            String excelFilePath = "input.xlsx";
            String pdfFilePath = "output.pdf";
            convertExcelToPdf(excelFilePath, pdfFilePath);
            System.out.println("Excel file converted to PDF successfully.");
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Explanation of the code#

  1. Reading the Excel file: We use Apache POI to read the Excel file. We open the file, get the first sheet, and iterate through its rows and cells.
  2. Creating the PDF document: We use iText to create a new PDF document and a table in the PDF.
  3. Adding data to the PDF table: We iterate through the Excel cells, extract their values, and add them to the PDF table.
  4. Closing the documents: Finally, we close the PDF document, the Excel workbook, and the input file stream.

Common Pitfalls#

  • Formatting Loss: Excel files can have complex formatting, such as fonts, colors, and cell borders. The simple conversion process may not preserve all of these formatting details.
  • Large Excel Files: Processing large Excel files can be memory-intensive. You may need to consider processing the data in chunks to avoid running out of memory.
  • Date and Number Formats: Excel has its own way of formatting dates and numbers. The conversion may not always display these values correctly in the PDF.

Best Practices#

  • Use a Template: If you need to preserve a specific layout and formatting, consider using a PDF template. You can populate the template with data from the Excel file.
  • Error Handling: Implement proper error handling in your code to handle issues such as file not found, invalid Excel format, or out-of-memory errors.
  • Testing: Test your conversion process with different types of Excel files, including files with complex formatting, large datasets, and special characters.

Conclusion#

Converting Excel files to PDF in Java using iText is a useful technique for many real-world applications. By understanding the core concepts, typical usage scenarios, and following best practices, you can effectively convert Excel files to PDF while minimizing common pitfalls. The provided code example serves as a starting point for your own projects, and you can further enhance it to meet your specific requirements.

FAQ#

Q1: Can iText handle all types of Excel files?#

A1: iText itself does not handle Excel files. We use Apache POI to read Excel files. Apache POI can handle both .xls (Excel 97 - 2003) and .xlsx (Excel 2007 and later) formats.

Q2: How can I preserve the formatting of the Excel file in the PDF?#

A2: Preserving all formatting can be challenging. You may need to manually extract formatting information from the Excel cells (such as font, color, and borders) and apply the corresponding formatting in iText.

Q3: What if my Excel file has multiple sheets?#

A3: In the provided code example, we only process the first sheet. You can modify the code to iterate through all sheets in the Excel workbook and add them to the PDF document.

References#