emergingtrajectories.pdf
========================

.. py:module:: emergingtrajectories.pdf

.. autoapi-nested-parse::

   This is a very simple set of utility function(s) for loading PDF content. In fact, it might be easier to just use PyPDF directly and avoid this altogether. In the future, we might create specialized functions and classes for doing "fancy" things with PDFs (e.g., OCR, tables, etc.) so have created this module as a way to keep this in mind.


Functions
---------

.. autoapisummary::

   emergingtrajectories.pdf.get_PDF_content_from_file_by_page
   emergingtrajectories.pdf.get_PDF_content_from_url_by_page
   emergingtrajectories.pdf.get_PDF_content_by_page_from_file
   emergingtrajectories.pdf.get_PDF_content_by_page_from_url


Module Contents
---------------

.. py:function:: get_PDF_content_from_file_by_page(file_path: str) -> list

   Loads a PDF file and extracts the text into a list of strings, one for each page.

   :param file_path: The path to the PDF file.
   :type file_path: str

   :returns: A list of strings, one for each page.
   :rtype: list


.. py:function:: get_PDF_content_from_url_by_page(url: str) -> list

   Loads a PDF file from a URL and extracts the text into a list of strings, one for each page.

   :param url: The URL to the PDF file.
   :type url: str

   :returns: A list of strings, one for each page.
   :rtype: list


.. py:function:: get_PDF_content_by_page_from_file(file_path: str) -> str

   Loads a PDF file and extracts the text into one big string.

   :param file_path: The path to the PDF file.
   :type file_path: str

   :returns: The text content of the PDF file.
   :rtype: str


.. py:function:: get_PDF_content_by_page_from_url(url: str) -> str

   Loads a PDF file from a URL and extracts the text into one big string.

   :param url: The URL to the PDF file.
   :type url: str

   :returns: The text content of the PDF file.
   :rtype: str