emergingtrajectories.knowledge¶
Solutions for finding, extracting, storing, and reviisting knowledge.
Classes¶
The KnowledgeBaseFileCache is a simple file-based cache for web content and local files. The cache stores the original HTML, PDF, or TXT content and tracks when (if ever) an agent actually accessed the content. |
Functions¶
|
Given a specific statement ID, this will return a list of queries you can put into a search engine to get useful information. |
|
Convert a URI to a local file name. In this case, we typically will use an MD5 sum. |
Module Contents¶
- emergingtrajectories.knowledge.statement_to_search_queries(statement_id: int, client: emergingtrajectories.Client, openai_api_key: str, num_queries: int = 3) list[str]¶
Given a specific statement ID, this will return a list of queries you can put into a search engine to get useful information.
- Parameters:
statement_id (int) – The ID of the statement to get search queries for.
client (Client) – The Emerging Trajectories API client.
openai_api_key (str) – The OpenAI API key.
num_queries (int, optional) – The number of queries to return. Defaults to 3.
- Returns:
A list of search queries.
- Return type:
list[str]
- emergingtrajectories.knowledge.uri_to_local(uri: str) str¶
Convert a URI to a local file name. In this case, we typically will use an MD5 sum.
- Parameters:
uri (str) – The URI to convert.
- Returns:
The MD5 sum of the URI.
- Return type:
str
- class emergingtrajectories.knowledge.KnowledgeBaseFileCache(folder_path: str, cache_file: str = 'cache.json')¶
The KnowledgeBaseFileCache is a simple file-based cache for web content and local files. The cache stores the original HTML, PDF, or TXT content and tracks when (if ever) an agent actually accessed the content.
- Parameters:
folder_path (str) – The folder where the cache will be stored.
cache_file (str, optional) – The name of the cache file. Defaults to “cache.json”.
- root_path¶
- root_parsed¶
- root_original¶
- cache_file¶
- cache¶
- save_state() None¶
Saves the in-memory changes to the knowledge base to the JSON cache file.
- load_cache() None¶
Loads the cache from the cache file, or creates the relevant files and folders if one does not exist.
- in_cache(uri: str) bool¶
Checks if a URI is in the cache already.
- Parameters:
uri (str) – The URI to check.
- Returns:
True if the URI is in the cache, False otherwise.
- Return type:
bool
- update_cache(uri: str, obtained_on: datetime.datetime, last_accessed: datetime.datetime) None¶
Updates the cache file for a given URI, specifically when it was obtained and last accessed.
- Parameters:
uri (str) – The URI to update.
obtained_on (datetime) – The date and time when the content was obtained.
last_accessed (datetime) – The date and time when the content was last accessed.
- log_access(uri: str) None¶
Saves the last accessed time and updates the accessed tracker for a given URI.
- Parameters:
uri (str) – The URI to update.
- get_unaccessed_content() list[str]¶
Returns a list of URIs that have not been accessed by the agent.
- Returns:
A list of URIs that have not been accessed by the agent.
- Return type:
list[str]
- get(uri: str) str¶
Returns the content for a given URI. If the content is not in the cache, it will be scraped and added to the cache.
- Parameters:
uri (str) – The URI to get the content for.
- Returns:
The content for the given URI.
- Return type:
str
- add_content(content: str, uri: str = None) None¶
Adds content to cache.
- Parameters:
content (str) – The content to add to the cache.
uri (str, optional) – The URI to use for the content. Defaults to None, in which case an MD5 sum of the content will be used.
- add_content_from_file(filepath: str, uri: str = None) None¶
Adds content from a text file to the cache.
- Parameters:
filepath (str) – The path to the file to add to the cache.
uri (str, optional) – The URI to use for the content. Defaults to None, in which case an MD5 sum of the content will be used.