Social Media Information Extraction using NLP

Photo by Adem AY on Unsplash
  • Noisy Text Filtering: Huge number of data is generated on social media each day. On a typical, the quantity of tweets exceeds 140 million tweets per day sent by over 200 million users around the world. These numbers are growing exponentially. In order to extract useful information, we need to filter non-informative posts. Filtering could be done on supported domain or language or other criteria to make sure to keep only relevant posts that contain information about the domain need to be processed.
  • Named Entity Extraction: With the shortage of formal writing style, we’d like new approaches for NEE that don’t rely heavily on syntactic features like capitalization and Part-Of-Speech (POS). Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts, especially user-generated social media content. Semantic augmentation is a potential way to alleviate this problem. Given that rich semantic information is implicitly preserved in pre-trained word embeddings, they are potential ideal resources for semantic augmentation.
  • Named Entity Disambiguation: It is one of the foremost interesting pieces of this puzzle of information extraction. Named Entity Disambiguation is the undertaking of planning expressions of interest, similar to names of people, areas, and enterprises, from an info text archive to relating exceptional substances during an objective information base. The target Knowledge Base depends on the appliance but vast text data is available on Wikipedia. Usually Named Entity Disambiguation doesn’t employ Wikipedia directly, but they exploit databases that contain structured versions of it, like DBpedia or Wikidata.
  • Feedback Extraction: The feedback loop takes place between the FE (fact extraction) and thus the NED (Named Entity Disambiguation) modules. This feedback helps to resolve errors that happened earlier within the disambiguation step.
Image Reference- Research paper published by Mena B. Habib and Maurice van Keulen on Image Extraction for Social Media
  1. Automatic Summarization: Automatic Summarization is the procedure of decreasing a textual content record with the assistance of pc software so as to create a precise that keeps the maximum tremendous factors of the unique record. Technologies that can make a coherent precise remember variables together with length, writing fashion, and syntax. The most important perception of summarization is to discover a consultant subset of the data, which includes the records of the whole set. Generally, there are two methods to computerized summarization: Extraction and Abstraction. Extraction refers to choosing a subset of present words, phrases, or sentences with inside the unique textual content to shape the precise. In contrast, abstraction builds an inner semantic illustration after which use herbal language era strategies to create a precise this is towards what a human may generate. Automatic Summarization gadget takes 3 fundamental steps namely, Analysis, Transformation, and Realization which might be in short defined below: In evaluation, a concise and fluent precise of the maximum tremendous records is produced with inside the input.
Fig 1. Process of Auto Summarization
Fig 2. Process of Text Mining




Learner. Observer. Interested in software development.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Obtain Agriculture Per Lb Rates With This API

Get to Know Your Market Easily and Simply in a Click

Make Your Business Reach Everywhere Using A Website Categorization API!

HDSC Stage G OSP: Credit Card Fraud Detection

What do Europeans feel most attached to?

Using ADX (Azure Data Explorer) to get insights from COVID-19 dataset

Trace the Impacts of Omicron on Colab

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Akanksha Pardeshi

Akanksha Pardeshi

Learner. Observer. Interested in software development.

More from Medium

How to Extract Multiple User Journey Paths in BigQuery

Ferran Torres — Barcelona’s Latest Reinforcement

The Dawn of Smart Agriculture with SmartFarming Data

Integrate Oswald with Google Assistant to create a powerful voice assistant

Google Home Mini featuring Google Assistant