emergingtrajectories.chunkers ============================= .. py:module:: emergingtrajectories.chunkers .. autoapi-nested-parse:: chunkers.py is used to chunk facts using different strategies. Emerging Trajectories started by chunking via GPT-4, but we can also appreciate using sentences, paragraphs, or other verbatim approaches. We'll be adding more chunkers as time goes on. Chunkers should simply take a piece of content and chunk it into a list of facts. Attributes ---------- .. autoapisummary:: emergingtrajectories.chunkers.fact_system_prompt Classes ------- .. autoapisummary:: emergingtrajectories.chunkers.ChunkerGPT4 emergingtrajectories.chunkers.ChunkerNewLines Module Contents --------------- .. py:data:: fact_system_prompt :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """You are a researcher helping extract facts about {topic}, trends, and related observations. We will give you a piece of content scraped on the web. Please extract facts from this. Each fact should stand on its own, and can be several sentences long if need be. You can have as many facts as needed. For each fact, please start it as a new line with "---" as the bullet point. For example: --- Fact 1... This is the fact. --- Here is a second fact. --- And a third fact. Please do not include new lines between bullet points. Make sure you write your facts in ENGLISH. Translate any foreign language content/facts/observations into ENGLISH. We will simply provide you with content and you will just provide facts.""" .. raw:: html
.. py:class:: ChunkerGPT4(openai_api_key: str, model='gpt-4-turbo') Chunker based on GPT-4 reading text and providing a list of facts. :param openai_api_key: The OpenAI API key. :type openai_api_key: str :param model: The OpenAI model to use. Defaults to "gpt-4-turbo". :type model: str .. py:attribute:: openai_api_key .. py:attribute:: model :value: 'gpt-4-turbo' .. py:method:: chunk(content: str, topic: str) -> list[str] Chunk text into facts. :param content: The content to chunk. :type content: str :param topic: The topic to focus on when building facts. :type topic: str :returns: The list of facts. :rtype: list[str] .. py:class:: ChunkerNewLines(min_length: int = 7) Chunker using line breaks for content. :param min_length: The minimum length (in characters) of a fact. Defaults to 7 characters. :type min_length: int .. py:attribute:: min_length :value: 7 .. py:method:: chunk(content: str, topic: str = None) -> list[str] Chunk text into facts. :param content: The content to chunk. :type content: str :param topic: The topic to focus on when building facts. This defaults to None so we can keep the same function calls as other chunkers. :type topic: str :returns: The list of facts. :rtype: list[str]