emergingtrajectories.pdf¶

This is a very simple set of utility function(s) for loading PDF content. In fact, it might be easier to just use PyPDF directly and avoid this altogether. In the future, we might create specialized functions and classes for doing “fancy” things with PDFs (e.g., OCR, tables, etc.) so have created this module as a way to keep this in mind.

Functions¶

`get_PDF_content_from_file_by_page`(→ list)	Loads a PDF file and extracts the text into a list of strings, one for each page.
`get_PDF_content_from_url_by_page`(→ list)	Loads a PDF file from a URL and extracts the text into a list of strings, one for each page.
`get_PDF_content_by_page_from_file`(→ str)	Loads a PDF file and extracts the text into one big string.
`get_PDF_content_by_page_from_url`(→ str)	Loads a PDF file from a URL and extracts the text into one big string.

Module Contents¶

emergingtrajectories.pdf.get_PDF_content_from_file_by_page(file_path: str) → list¶

Loads a PDF file and extracts the text into a list of strings, one for each page.

Parameters:: file_path (str) – The path to the PDF file.
Returns:: A list of strings, one for each page.
Return type:: list

emergingtrajectories.pdf.get_PDF_content_from_url_by_page(url: str) → list¶

Loads a PDF file from a URL and extracts the text into a list of strings, one for each page.

Parameters:: url (str) – The URL to the PDF file.
Returns:: A list of strings, one for each page.
Return type:: list

emergingtrajectories.pdf.get_PDF_content_by_page_from_file(file_path: str) → str¶

Loads a PDF file and extracts the text into one big string.

Parameters:: file_path (str) – The path to the PDF file.
Returns:: The text content of the PDF file.
Return type:: str

emergingtrajectories.pdf.get_PDF_content_by_page_from_url(url: str) → str¶

Loads a PDF file from a URL and extracts the text into one big string.

Parameters:: url (str) – The URL to the PDF file.
Returns:: The text content of the PDF file.
Return type:: str