social

preprocess

remove_mentions(text)[source]

Function that removes words preceded with a ‘@’.

Parameters:

text (str) –

Return type:

string

extract_mentions(text)[source]

Function that extracts words preceded with a ‘@’ eg. “I take care of my skin with @thisproduct” –> [“@thisproduct”].

Parameters:

text (str) –

Return type:

string

remove_html_tags(text)[source]

Function that removes words between < and >.

Parameters:

text (str) –

Return type:

string

remove_emoji(text)[source]

Remove emoji from any str by stripping any unicode in the range of Emoji unicode as defined in the unicode convention: http://www.unicode.org/emoji/charts/full-emoji-list.html.

Parameters:

text (str) –

Return type:

str

convert_emoji_to_text(text, code_delimiters=(':', ':'))[source]

Convert emoji to their CLDR Short Name, according to the unicode convention http://www.unicode.org/emoji/charts/full-emoji-list.html eg. 😀 –> :grinning_face:

Parameters:
  • text (str) –

  • code_delimiters (tuple of symbols around the emoji code.) –

  • eg ((':',':') --> :grinning_face:) –

Returns:

string

Return type:

str

extract_emojis(text)[source]

Function that extracts emojis from a text and translates them into words eg. “I take care of my skin 😀 :(” –> [“:grinning_face:”].

Parameters:

text (str) –

Returns:

list of all emojis converted with their unicode conventions

Return type:

list

extract_hashtags(text)[source]

Function that extracts words preceded with a ‘#’ eg. “I take care of my skin #selfcare#selfestim” –> [“skincare”, “selfestim”].

Parameters:

text (str) –

Returns:

list of all hashtags

Return type:

list

remove_hashtag(text)[source]

Function that removes words preceded with a ‘#’ eg. “I take care of my skin #selfcare#selfestim” –> “I take care of my skin”.

Parameters:

text (str) –

Returns:

text of a post without hashtags

Return type:

str