emergingtrajectories.news
=========================

.. py:module:: emergingtrajectories.news


Classes
-------

.. autoapisummary::

   emergingtrajectories.news.RSSAgent
   emergingtrajectories.news.NewsBingAgent
   emergingtrajectories.news.NewsAPIAgent
   emergingtrajectories.news.FinancialTimesAgent


Functions
---------

.. autoapisummary::

   emergingtrajectories.news.force_empty_content


Module Contents
---------------

.. py:function:: force_empty_content(rss_url: str, content, cache_function) -> None

   Force the crawler to visit every URL in the RSS feed and save it as a blank content file. We do this because some RSS feeds have a lot of old URLs we do not need to crawl, and only want to crawl the delta over some period.

   :param rss_url: The URL of the RSS feed.
   :type rss_url: str
   :param content: the content string to save.
   :param cache_function: the specific function to call the rss_url and content to save.


.. py:class:: RSSAgent(rss_url, crawler=None)

   A simple wrapper for an RSS feed, so we can query it for URLs.

   :param rss_url: The URL of the RSS feed.
   :type rss_url: str
   :param crawler: The crawler to use. Defaults to None, in which case we will use crawlerPlaywright in headless mode.
   :type crawler: Crawler, optional


   .. py:attribute:: rss_url


   .. py:method:: get_news_as_list() -> list

      Query the RSS feed for news articles, and return them as a list of dictionaries.

      :returns: A list of URLs.
      :rtype: list


.. py:class:: NewsBingAgent(api_key: str, endpoint: str)

   Creates a new Bing News API agent. To learn more, see: https://github.com/microsoft/bing-search-sdk-for-python/

   :param api_key: The Bing News API key.
   :type api_key: str
   :param endpoint: The Bing News API endpoint.
   :type endpoint: str


   .. py:attribute:: api_key


   .. py:attribute:: endpoint


   .. py:method:: get_news_as_list(query: str, market: str = 'en-us') -> list

      Gets a list of URLS from the Bing News API. For more information on markets, see: https://learn.microsoft.com/en-us/bing/search-apis/bing-web-search/reference/market-codes

      :param query: The query to search for.
      :type query: str
      :param market: The market to search in. Defaults to "en-us". (US English
      :type market: str, optional

      :returns: A list of URLs.
      :rtype: list


.. py:class:: NewsAPIAgent(api_key, top_headlines=False, crawler=None)

   A simple wrapper for the News API, so we can query it for URLs.

   :param api_key: The News API key.
   :type api_key: str
   :param top_headlines: Whether to get top headlines. Defaults to False.
   :type top_headlines: bool, optional
   :param crawler: The crawler to use. Defaults to None, in which case we will use crawlerPlaywright in headless mode.
   :type crawler: Crawler, optional


   .. py:attribute:: api_key


   .. py:attribute:: top_headlines
      :value: False


   .. py:method:: get_news_as_list(query: str) -> list

      Query the News API for news articles, and return them as a list of dictionaries.

      :param query: The query to search for.
      :type query: str

      :returns: A list of dictionaries, where each dictionary represents a news article.
      :rtype: list


.. py:class:: FinancialTimesAgent(user_email, user_password)

   This is a POC agent that uses Playwright to crawl the Financial Times articles you are interested in. Note that you *NEED* to be a subscriber to the FT to make this work, and thus need to provide your FT user name and password.

   :param user_email: Your FT email.
   :type user_email: str
   :param user_password: Your FT password.
   :type user_password: str


   .. py:attribute:: ft_rss_feed_urls
      :value: ['https://www.ft.com/rss/home', 'https://www.ft.com/world?format=rss',...


   .. py:attribute:: ft_login_url
      :value: 'https://ft.com/login'


   .. py:attribute:: ft_main_url
      :value: 'https://ft.com/'


   .. py:attribute:: user_email


   .. py:attribute:: user_password


   .. py:method:: get_news(urls: list[str] = None) -> list

      Get the news from the Financial Times as a list of tuples, where each tuple contains the URL and the extracted text content.

      :param urls: a list of URLs to get content for.

      :returns: A list of lists -- urls, html, and text content