emergingtrajectories.knowledge

Solutions for finding, extracting, storing, and reviisting knowledge.

Classes

KnowledgeBaseFileCache

The KnowledgeBaseFileCache is a simple file-based cache for web content and local files. The cache stores the original HTML, PDF, or TXT content and tracks when (if ever) an agent actually accessed the content.

Functions

statement_to_search_queries(→ list[str])

Given a specific statement ID, this will return a list of queries you can put into a search engine to get useful information.

uri_to_local(→ str)

Convert a URI to a local file name. In this case, we typically will use an MD5 sum.

Module Contents

emergingtrajectories.knowledge.statement_to_search_queries(statement_id: int, client: emergingtrajectories.Client, openai_api_key: str, num_queries: int = 3) list[str]

Given a specific statement ID, this will return a list of queries you can put into a search engine to get useful information.

Parameters:
  • statement_id (int) – The ID of the statement to get search queries for.

  • client (Client) – The Emerging Trajectories API client.

  • openai_api_key (str) – The OpenAI API key.

  • num_queries (int, optional) – The number of queries to return. Defaults to 3.

Returns:

A list of search queries.

Return type:

list[str]

emergingtrajectories.knowledge.uri_to_local(uri: str) str

Convert a URI to a local file name. In this case, we typically will use an MD5 sum.

Parameters:

uri (str) – The URI to convert.

Returns:

The MD5 sum of the URI.

Return type:

str

class emergingtrajectories.knowledge.KnowledgeBaseFileCache(folder_path: str, cache_file: str = 'cache.json')

The KnowledgeBaseFileCache is a simple file-based cache for web content and local files. The cache stores the original HTML, PDF, or TXT content and tracks when (if ever) an agent actually accessed the content.

Parameters:
  • folder_path (str) – The folder where the cache will be stored.

  • cache_file (str, optional) – The name of the cache file. Defaults to “cache.json”.

root_path
root_parsed
root_original
cache_file
cache
save_state() None

Saves the in-memory changes to the knowledge base to the JSON cache file.

load_cache() None

Loads the cache from the cache file, or creates the relevant files and folders if one does not exist.

in_cache(uri: str) bool

Checks if a URI is in the cache already.

Parameters:

uri (str) – The URI to check.

Returns:

True if the URI is in the cache, False otherwise.

Return type:

bool

update_cache(uri: str, obtained_on: datetime.datetime, last_accessed: datetime.datetime) None

Updates the cache file for a given URI, specifically when it was obtained and last accessed.

Parameters:
  • uri (str) – The URI to update.

  • obtained_on (datetime) – The date and time when the content was obtained.

  • last_accessed (datetime) – The date and time when the content was last accessed.

log_access(uri: str) None

Saves the last accessed time and updates the accessed tracker for a given URI.

Parameters:

uri (str) – The URI to update.

get_unaccessed_content() list[str]

Returns a list of URIs that have not been accessed by the agent.

Returns:

A list of URIs that have not been accessed by the agent.

Return type:

list[str]

get(uri: str) str

Returns the content for a given URI. If the content is not in the cache, it will be scraped and added to the cache.

Parameters:

uri (str) – The URI to get the content for.

Returns:

The content for the given URI.

Return type:

str

add_content(content: str, uri: str = None) None

Adds content to cache.

Parameters:
  • content (str) – The content to add to the cache.

  • uri (str, optional) – The URI to use for the content. Defaults to None, in which case an MD5 sum of the content will be used.

add_content_from_file(filepath: str, uri: str = None) None

Adds content from a text file to the cache.

Parameters:
  • filepath (str) – The path to the file to add to the cache.

  • uri (str, optional) – The URI to use for the content. Defaults to None, in which case an MD5 sum of the content will be used.