diff --git a/README.md b/README.md new file mode 100644 index 0000000..12d00e1 --- /dev/null +++ b/README.md @@ -0,0 +1,154 @@ +# Wings of Unrest: Flight Connectivity and Social Instability Research + +A quantitative research project investigating the relationship between global air connectivity and social unrest across 2,620 cities worldwide. + +## Overview + +This study explores whether cities with greater flight connectivity experience systematically different levels of protest activity compared to less connected urban areas. Using a Negative Binomial regression model, the analysis reveals that increased air connectivity is significantly associated with higher frequencies of unrest events (p < 0.001), explaining approximately 32% of the deviance. + +## Key Findings + +- **Air connectivity** positively correlates with protest frequency (each additional flight → 0.41% increase in expected events) +- **Population served** by airports shows strong positive association with unrest +- **Unemployment** and **freedom of expression** are positively associated with protest intensity +- **HDI** and **land area** exhibit negative associations with unrest +- Model pseudo R² = 0.323, suggesting moderate explanatory power + +## Dataset + +The analysis combines multiple authoritative data sources: + +- **Air connectivity**: OpenFlights (2014 direct flight data) +- **Social unrest**: ACLED (Armed Conflict Location & Event Data Project, June 2020 - April 2025) +- **Economic indicators**: World Bank (GDP per capita, unemployment, land area) +- **Development metrics**: UNDP (HDI, life expectancy, education) +- **Political measures**: V-Dem (freedom of expression, civil society indices) + +**Final dataset**: 2,620 cities from 191 countries + +## Methodology + +### Data Processing Pipeline + +1. **Spatial matching**: Cities linked to airports within 50km radius using Haversine distance +2. **Service score calculation**: `score = departures / (distance + 1)` to weight airport assignment +3. **Event geocoding**: Protest events mapped to nearest city with same country constraint +4. **Country standardization**: Harmonized country names across all datasets + +### Statistical Model + +Negative Binomial regression (handles overdispersion in count data): + +``` +log(E[Number_of_Events]) = β₀ + β₁(Flights) + β₂(Population) + β₃(Land_area) + + β₄(Unemployment) + β₅(Freedom_Expression) + β₆(HDI) +``` + +## Project Structure + +``` +FlightUnrestResearch/ +├── Data/ # Raw datasets (airports, routes, ACLED, V-Dem, etc.) +├── clean_data.csv # Final processed dataset for analysis +├── clean_data.ipynb # Data cleaning and preparation pipeline +├── haversine.py # Geospatial distance calculations (PyTorch-accelerated) +├── mappings.py # Country name standardization mappings +├── FINAL_r_reg_final.ipynb # Statistical analysis (Negative Binomial regression) +├── final_paper.pdf # Complete research paper with literature review +└── final_paper.odt # Source document +``` + +## Core Components + +### `haversine.py` +PyTorch-accelerated geospatial functions for large-scale spatial assignments: +- `calculate_served_population()`: Assigns city populations to airports using weighted scoring +- `assign_events_to_cities()`: Maps protest events to nearest cities via Haversine distance + +### `clean_data.ipynb` +Complete data pipeline: +- Merges flight routes with airport locations +- Calculates served populations within 50km radius +- Geocodes 428K+ ACLED events to cities +- Integrates World Bank, V-Dem, and HDI indicators +- Handles missing data and country name harmonization + +### `FINAL_r_reg_final.ipynb` +Statistical analysis in R: +- Negative Binomial regression model +- Multicollinearity diagnostics (VIF analysis) +- Model fit evaluation (pseudo R², deviance) +- Coefficient interpretation and significance testing + +## Requirements + +### Python +``` +pandas +numpy +torch +openpyxl +tqdm +``` + +### R +``` +MASS (for glm.nb) +``` + +## Usage + +### 1. Data Preparation +```bash +jupyter notebook clean_data.ipynb +``` +Outputs: `clean_data.csv` with all variables merged and geocoded + +### 2. Statistical Analysis +```bash +jupyter notebook FINAL_r_reg_final.ipynb +``` +Runs regression model and produces coefficient tables + +## Key Variables + +| Variable | Description | Source | +|----------|-------------|--------| +| `Number_of_Flights` | Total direct flights at city airports | OpenFlights | +| `Number_of_Events` | Protest/riot incidents (Jun 2020-Apr 2025) | ACLED | +| `Served_Population` | Population within 50km of airports | Cities database + calculation | +| `GDP_per_capita` | National GDP per capita (2019) | World Bank | +| `Unemployment` | National unemployment rate (2019) | World Bank | +| `HDI` | Human Development Index | UNDP | +| `Freedom_of_Expression` | Civil liberties index (2021) | V-Dem | +| `Civil_Society_Index` | Civil society strength (2021) | V-Dem | + +## Limitations + +- **Cross-sectional design**: Cannot establish causality +- **Temporal mismatch**: Flight data (2014) vs. protest data (2020-2025) +- **Measurement error**: Potential ACLED under-reporting in authoritarian contexts +- **National-level controls**: HDI, unemployment applied uniformly within countries +- **Spatial uncertainty**: 50km radius may misrepresent complex metro areas +- **Data access**: High-resolution aviation data remains proprietary + +## Citation + +Bitton, R. (2025). *Wings of Unrest: The Relationship Between Global Flight Connectivity and Social Instability*. + +## Related Research + +This work contributes to literature on: +- Globalization and domestic political contention +- Infrastructure networks and protest diffusion +- Urban political dynamics and global integration +- Spatial determinants of collective action + +## Contact + +Raphael Bitton +rbitton@uchicago.edu + +## License + +Research data sourced from publicly available datasets. Analysis code available for academic use.