NAME PDF::Data - Manipulate PDF files and objects as data structures VERSION version v1.0.0 SYNOPSIS use PDF::Data; DESCRIPTION This module can read and write PDF files, and represents PDF objects as data structures that can be readily manipulated. METHODS new my $pdf = PDF::Data->new(-compress => 1, -minify => 1); Constructor to create an empty PDF::Data object instance. Any arguments passed to the constructor are treated as key/value pairs, and included in the $pdf hash object returned from the constructor. When the PDF file data is generated, this hash is written to the PDF file as the trailer dictionary. However, hash keys starting with "-" are ignored when writing the PDF file, as they are considered to be flags or metadata. For example, $pdf->{-compress} is a flag which controls whether or not streams will be compressed when generating PDF file data. This flag can be set in the constructor (as shown above), or set directly on the object. The $pdf->{-minify} flag controls whether or not to save space in the generated PDF file data by removing comments and extra whitespace from content streams. This flag can be used along with $pdf->{-compress} to make the generated PDF file data even smaller, but this transformation is not reversible. clone my $pdf_clone = $pdf->clone; Deep copy the entire PDF::Data object itself. new_page my $page = $pdf->new_page(8.5, 11); Create a new page object with the specified size. copy_page my $copied_page = $pdf->copy_page($page); Deep copy a single page object. append_page $page = $pdf->append_page($page); Append the specified page object to the end of the PDF page tree. read_pdf my $pdf = PDF::Data->read_pdf($file, %args); Read a PDF file and parse it with $pdf->parse_pdf(), returning a new object instance. Any streams compressed with the /FlateDecode filter will be automatically decompressed. Unless the $pdf->{-decompress} flag is set, the same streams will also be automatically recompressed again when generating PDF file data. parse_pdf my $pdf = PDF::Data->parse_pdf($data, %args); Used by $pdf->read_pdf() to parse the raw PDF file data and create a new object instance. This method can also be called directly instead of calling $pdf->read_pdf() if the PDF file data comes another source instead of a regular file. write_pdf $pdf->write_pdf($file, $time); Generate and write a new PDF file from the current state of the PDF data. The $time parameter is optional; if not defined, it defaults to the current time. If $time is defined but false (zero or empty string), no timestamp will be set. The optional $time parameter may be used to specify the modification timestamp to save in the PDF metadata and to set the file modification timestamp of the output file. If not specified, it defaults to the current time. If a false value is specified, this method will skip setting the modification time in the PDF metadata, and skip setting the timestamp on the output file. pdf_file_data my $pdf_file_data = $document->pdf_file_data($time); Generate PDF file data from the current state of the PDF data structure, suitable for writing to an output PDF file. This method is used by the write_pdf() method to generate the raw string of bytes to be written to the output PDF file. This data can be directly used (e.g. as a MIME attachment) without the need to actually write a PDF file to disk. The optional $time parameter may be used to specify the modification timestamp to save in the PDF metadata. If not specified, it defaults to the current time. If a false value is specified, this method will skip setting the modification time in the PDF metadata. dump_pdf $pdf->dump_pdf($file); Dump the PDF internal structure and data for debugging. dump_outline $pdf->dump_outline($file); Dump an outline of the PDF internal structure for debugging. merge_content_streams my $stream = $pdf->merge_content_streams($array_of_streams); Merge multiple content streams into a single content stream. find_bbox $pdf->find_bbox($content_stream); Find bounding box by analyzing a content stream. This is only partially implemented. new_bbox $new_content = $pdf->new_bbox($content_stream); Find bounding box by analyzing a content stream. This is only partially implemented. timestamp my $timestamp = $pdf->timestamp($time); my $now = $pdf->timestamp; Generate timestamp in PDF internal format. UTILITY METHODS round my @numbers = $pdf->round(@numbers); Round numeric values to 12 significant digits to avoid floating-point rounding error and remove trailing zeroes. concat_matrix my $matrix = $pdf->concat_matrix($transformation_matrix, $original_matrix); Concatenate a transformation matrix with an original matrix, returning a new matrix. This is for arrays of 6 elements representing standard 3x3 transformation matrices as used by PostScript and PDF. invert_matrix my $inverse = $pdf->invert_matrix($matrix); Calculate the inverse of a matrix, if possible. Returns undef if not invertible. translate my $matrix = $pdf->translate($x, $y); Returns a 6-element transformation matrix representing translation of the origin to the specified coordinates. scale my $matrix = $pdf->scale($x, $y); Returns a 6-element transformation matrix representing scaling of the coordinate space by the specified horizontal and vertical scaling factors. rotate my $matrix = $pdf->rotate($angle); Returns a 6-element transformation matrix representing counterclockwise rotation of the coordinate system by the specified angle (in degrees). INTERNAL METHODS validate $pdf->validate; Used by new(), parse_pdf() and write_pdf() to validate some parts of the PDF structure. validate_key $pdf->validate_key($hash, $key, $value, $label); Used by validate() to validate specific hash key values. get_hash_node my $hash = $pdf->get_hash_node($path); Used by validate_key() to get a hash node from the PDF structure by path. parse_objects my @objects = $pdf->parse_objects($objects, $data, $offset); Used by parse_pdf() to parse PDF objects into Perl representations. parse_content my @objects = $pdf->parse_data($data); Uses parse_objects() to parse PDF objects from standalone PDF data. filter_stream $pdf->filter_stream($stream); Used by parse_objects() to inflate compressed streams. compress_stream $new_stream = $pdf->compress_stream($stream); Used by write_object() to compress streams if enabled. This is controlled by the $pdf->{-compress} flag, which is set automatically when reading a PDF file with compressed streams, but must be set manually for PDF files created from scratch, either in the constructor arguments or after the fact. resolve_references $object = $pdf->resolve_references($objects, $object); Used by parse_pdf() to replace parsed indirect object references with direct references to the objects in question. write_indirect_objects my $xrefs = $pdf->write_indirect_objects($pdf_file_data, $objects, $seen); Used by write_pdf() to write all indirect objects to a string of new PDF file data. enumerate_indirect_objects $pdf->enumerate_indirect_objects($objects); Used by write_indirect_objects() to identify which objects in the PDF data structure need to be indirect objects. enumerate_shared_objects $pdf->enumerate_shared_objects($objects, $seen, $ancestors, $object); Used by enumerate_indirect_objects() to find objects which are already shared (referenced from multiple objects in the PDF data structure). add_indirect_objects $pdf->add_indirect_objects($objects, @objects); Used by enumerate_indirect_objects() and enumerate_shared_objects() to add objects to the list of indirect objects to be written out. write_object $pdf->write_object($pdf_file_data, $objects, $seen, $object, $indent); Used by write_indirect_objects(), and called by itself recursively, to write direct objects out to the string of new PDF file data. dump_object my $output = $pdf->dump_object($object, $label, $seen, $indent, $mode); Used by dump_pdf(), and called by itself recursively, to dump/outline the specified PDF object.