Media Resilience
Leveraged large-scale Twitter data, machine learning, and NLP techniques to quantify and compare social–psychological resilience across five countries.
Social Media-Based Analysis of Community Resilience Using NLP and Machine Learning During COVID-19.
Project Overview
This project introduces a scalable, data-driven framework for measuring Social–Psychological Community Resilience (SPCR) using Twitter data collected during the COVID-19 pandemic (Valinejad, 2021). By applying advanced natural language processing (NLP) and machine learning (ML) techniques, we assessed public sentiment, misinformation impact, and resilience patterns across five countries: Australia, Singapore, South Korea, the United Kingdom, and the United States (missing reference).
Data Science & NLP Techniques
Data Collection & Preprocessing
- Collected 50,000 tweets per country using the Twitter API, filtered by COVID-related keywords and timeline (Mar–Nov 2020).
- Preprocessing pipeline included tokenization, stopword removal, stemming, and lemmatization using NLTK, WordNetLemmatizer, and PorterStemmer.
Fake Tweet Classification
- Implemented and evaluated multiple ML classifiers using Scikit-learn, achieving high accuracy in detecting misinformation.
- Features included TF-IDF vectors and text-based sentiment cues.
Psychological Feature Extraction
- Used LIWC (Linguistic Inquiry and Word Count) to derive psychological and social indicators (e.g., anxiety, social cohesion, pronoun usage).
- Quantified Community Wellbeing (CW) and Community Capital (CC) for each tweet.
SPCR Metric & Trend Analysis
- Computed SPCR as a composite score of CW and CC.
- Applied polynomial regression and Gaussian fitting to model SPCR dynamics over time.
- Conducted correlation analysis (Pearson, Spearman) between misinformation prevalence and SPCR.
Results & Insights
- South Korea demonstrated the highest SPCR, highlighting the impact of effective governance and social cohesion (Valinejad et al., 2023).
- Real tweets contributed to significantly higher SPCR scores (up to 80% improvement vs. fake tweets).
- Country-specific patterns revealed cultural and policy-driven differences in resilience.
- Found strong negative correlation between misinformation and resilience across all countries.
Tools & Technologies
- Languages: Python
- NLP Libraries: NLTK, LIWC
- ML Libraries: Scikit-learn, NumPy, Pandas
- Analytics: Correlation analysis, Trend modeling (polynomial/Gaussian), Normalization
- Visualization: Matplotlib
- APIs: Twitter API
Impact & Applications
- Provided a real-time, scalable alternative to traditional survey-based resilience assessments.
- Supported evidence-based policymaking by uncovering how social media sentiment and information quality influence community resilience.
- Demonstrated practical use of NLP and ML in public health informatics and crisis response analysis.
Future Work
- Enhance SPCR framework using transformer-based language models (e.g., BERT, RoBERTa) for deeper sentiment and intent analysis.
- Apply agent-based simulation and deep learning for forecasting resilience trends.
- Generalize the framework to other domains (e.g., disaster recovery, geopolitical crises).
References
2023
2021
- Measuring and analyzing community resilience during COVID-19 using social media2021