Processing an Article

The main work happens in the Article class. This is how you load and parse an online news article at a URL. You can pass in a URL or HTML to work with.

The tweetfinder.article.Article class

class tweetfinder.article.Article(url: Optional[str] = None, html: Optional[str] = None, mentions_list: Optional[list] = None, timeout: Optional[int] = None)[source]

This is how you parse an article for embeds and mentions of Tweets. Pass in a url or html to the constructor. Then call any of the get_ methods to see what the code found.

count_embedded_tweets()[source]

How many tweets are embedded on this webpage?

count_mentioned_tweets()[source]

How many times are tweets mentioned on this webpage?

embeds_tweets() bool[source]

Does this webpage have any embedded tweets?

get_content() str[source]

Return the part of the webpage that we considered as content, via the readability library.

get_html() str[source]

Return the HTML fetched if you passed in a url, or the same HTML you passed in if not.

list_embedded_tweets() List[Dict][source]

Detailed information about the tweets embedded on the webpage. :return: The exact info depends on how the tweets were embeded. If they were embedded the official way, then we can return a link to the tweet, the tweet id, and the author’s username. But there are some other ways tweets are embededed via Javascript that only let us parse out the tweet id easily. So you can check the html_source property of each returned one to identify how we found it and then look for other data based on that. You will at least get the tweet id no matter which method we found it with.

list_mentioned_tweets() List[Dict][source]

Detailed information about each mention of a tweet we found=. :return: None if this isn’t a supported language, otherwise a List. Each item includes the phrase found, some context via a window of text around it, and content_start_index to help you find it yourself in the get_content string.

mentions_tweets() int[source]

Does this webpage mention any tweets?