emergingtrajectories.chunkers

chunkers.py is used to chunk facts using different strategies. Emerging Trajectories started by chunking via GPT-4, but we can also appreciate using sentences, paragraphs, or other verbatim approaches. We’ll be adding more chunkers as time goes on.

Chunkers should simply take a piece of content and chunk it into a list of facts.

Attributes

Classes

ChunkerGPT4

Chunker based on GPT-4 reading text and providing a list of facts.

ChunkerNewLines

Chunker using line breaks for content.

Module Contents

emergingtrajectories.chunkers.fact_system_prompt = Multiline-String
Show Value
"""You are a researcher helping extract facts about {topic}, trends, and related observations. We will give you a piece of content scraped on the web. Please extract facts from this. Each fact should stand on its own, and can be several sentences long if need be. You can have as many facts as needed. For each fact, please start it as a new line with "---" as the bullet point. For example:

--- Fact 1... This is the fact.
--- Here is a second fact.
--- And a third fact.

Please do not include new lines between bullet points. Make sure you write your facts in ENGLISH. Translate any foreign language content/facts/observations into ENGLISH.

We will simply provide you with content and you will just provide facts."""
class emergingtrajectories.chunkers.ChunkerGPT4(openai_api_key: str, model='gpt-4-turbo')

Chunker based on GPT-4 reading text and providing a list of facts.

Parameters:
  • openai_api_key (str) – The OpenAI API key.

  • model (str) – The OpenAI model to use. Defaults to “gpt-4-turbo”.

openai_api_key
model = 'gpt-4-turbo'
chunk(content: str, topic: str) list[str]

Chunk text into facts.

Parameters:
  • content (str) – The content to chunk.

  • topic (str) – The topic to focus on when building facts.

Returns:

The list of facts.

Return type:

list[str]

class emergingtrajectories.chunkers.ChunkerNewLines(min_length: int = 7)

Chunker using line breaks for content.

Parameters:

min_length (int) – The minimum length (in characters) of a fact. Defaults to 7 characters.

min_length = 7
chunk(content: str, topic: str = None) list[str]

Chunk text into facts.

Parameters:
  • content (str) – The content to chunk.

  • topic (str) – The topic to focus on when building facts. This defaults to None so we can keep the same function calls as other chunkers.

Returns:

The list of facts.

Return type:

list[str]