This readme.txt file was generated by Yuchen He, Junming Huang and Feng Yang in April 2024.

-------------------
GENERAL INFORMATION
-------------------

Title of Dataset: Data for "Declining Chinese Attitudes toward the United States amid COVID-19" 

Author Information

Yu Xie
Paul and Marcia Center on Contemporary China, Princeton University, Princeton, NJ 08544, United States
Center for Social Research, Guanghua School of Management, Peking University, Beijing 100871, China

Feng Yang
Center for Social Research, Guanghua School of Management, Peking University, Beijing 100871, China

Junming Huang
Paul and Marcia Center on Contemporary China, Princeton University, Princeton, NJ 08544, United States

Yuchen He
Center for Social Research, Guanghua School of Management, Peking University, Beijing 100871, China

Yi Zhou
Center for Social Research, Guanghua School of Management, Peking University, Beijing 100871, China

Yue Qian
Department of Sociology, University of British Columbia, 6303 NW Marine Drive, Vancouver, BC, V6T 1Z1 Canada

Weicheng Cai
Center for Social Research, Guanghua School of Management, Peking University, Beijing 100871, China

Jie Zhou
Institute of Psychology, Chinese Academy of Sciences, 16 Lincui Road, Chaoyang District, Beijing 100101, China

Date of data collection: 2016 - 2023


Description:

This dataset encompasses three distinct sets of data analyzed in the study, namely the survey data on favorability to the US, the survey data on trust in Americans, and the social media data.

The first part of the dataset includes data analyzed in Study 1 and Study 3, collected from three surveys: the Social Attitude Questionnaire of Urban and Rural Residents (SAQURR) in 2019 and 2020, the COVID-19 Multi-Wave Study (CMWS) between 2020 and 2022, and the Survey on Living Conditions (SLC) in 2023.

The second part of the dataset provides information used in Study 4, involving the CFPS data, Baidu Index data, and data on COVID-19 cases and deaths.

The third part of the dataset depicts trends in attitudes toward the US in Study 2.

--------------------------
SHARING/ACCESS INFORMATION
-------------------------- 

Licenses/restrictions placed on the data, or limitations of reuse: CC BY-NC-SA 4.0 .

Recommended citation for the data: Xie, Y., Yang, F., Huang, J., He, Y,. Zhou, Y., Qian, Y., Cai, W., & Zhou, J. (2024). Data for "Declining Chinese Attitudes toward the United States amid COVID-19" [Data set]. Princeton University. https://doi.org/10.34770/ew2y-jy92

Links to other publicly accessible locations of the data:

This data is available at yuxie.com (https://yuxie.scholar.princeton.edu/share-files/data-files-declining-chinese-attitudes-toward-united-states-amidst-covid-19) and Princeton DataSpace (https://doi.org/10.34770/ew2y-jy92).

COVID-19 Multi Wave Study (CMWS) and Survey on Living Conditions (SLC) are conducted by Population Development Studies Center, Renmin University of China. Social Attitude of Urban and Rural Residents Survey (SAURRS) is conducted by Institute of Psychology of Chinese Academy of Sciences. China Family Panel Studies (CFPS) is conducted by Institute of Social Science, Peking University. The Weibo data is owned by Sina.

--------------------
DATA & FILE OVERVIEW
--------------------

File list:

README.txt
media-data-average-opinion-us.csv
survey-data-trust-analytical-sample.csv
survey-data-trust-descriptive-sample-12to20.csv
survey-data-trust-descriptive-sample-18to20.csv
survey-data-favorability-study-1.csv
survey-data-favorability-study-3.csv
survey-questionnaires.pdf


Relationship between files, if important for context:  

[survey-data-favorability-study-1.csv] and [survey-data-favorability-study-3.csv] suffice the replication of the results presented in Study 1 and Study 3, respecitvely. [survey-data-trust-descriptive-sample-12to20.csv] and [survey-data-trust-descriptive-sample-18to20.csv] report the descriptive trust levels and trends for Study 4. [survey-data-trust-analytical-sample.csv] suffices the replication of the regression results in Study 4. [media-data-average-opinion-us.csv] provides the daily attitude averaging across all users in Weibo for Study 2. [survey-questionnaires.pdf] collects relevant sections of questionnaires of the four surveys.

If data were derived from another source, list source:

    The first part of the dataset includes data analyzed in Study 1 and Study 3, collected from three surveys: the Social Attitude Questionnaire of Urban and Rural Residents (SAQURR) in 2019 and 2020, the COVID-19 Multi-Wave Study (CMWS) between 2020 and 2022, and the Survey on Living Conditions (SLC) in 2023.

    The second part of the dataset provides information used in Study 4, involving the CFPS data, Baidu Index data, and data on COVID-19 cases and deaths.

    The third part of the dataset depicts trends in attitudes toward the US in Study 2. The data is collected from 53,949,720 posts containing US-related keywords (美国, 灯塔国, 美利坚, 米国, 美帝) from January 1, 2016, to November 28, 2023, on the Chinese social media platform Weibo, which is similar to Twitter. 

    COVID-19 Multi Wave Study (CMWS) and Survey on Living Conditions (SLC) are conducted by Population Development Studies Center, Renmin University of China. Social Attitude of Urban and Rural Residents Survey (SAURRS) is conducted by Institute of Psychology of Chinese Academy of Sciences. China Family Panel Studies (CFPS) is conducted by Institute of Social Science Survey, Peking University. The Weibo data is owned by Sina. 

--------------------------
METHODOLOGICAL INFORMATION
--------------------------

----------------------------------------------------------------
DATA-SPECIFIC INFORMATION: Survey Data on Favorability to the US
survey-data-favorability-study-1.csv
survey-data-favorability-study-3.csv
----------------------------------------------------------------

The first part of the dataset includes data analyzed in Study 1 and Study 3, collected from three surveys: the Social Attitude Questionnaire of Urban and Rural Residents (SAQURR) in 2019 and 2020, the COVID-19 Multi-Wave Study (CMWS) between 2020 and 2022, and the Survey on Living Conditions (SLC) in 2023. We append the data from the three surveys. The raw data at the micro level can be found in [survey-data-favorability-study-1.csv] and [survey-data-favorability-study-3.csv], sufficing the replication of the results presented in Study 1 and Study 3. We disclose the relevant variables used in the research, including the favorability score, source of survey, year and month of the interview, and background information such as education and age, accompanying the weights. The PIDs, the personal identifications, are part of the original compilation from SAQURR, CMWS, and SLC. Our study object is to examine the trends in attitudes toward America, so our sample is limited to only those who reported their favorability towards the US, containing 3,266 observations in SAQURR, 28,897 observations in CMWS, and 2,592 observations in SLC. 

------------------------------------------------------------
DATA-SPECIFIC INFORMATION: Survey Data on Trust in Americans
survey-data-trust-descriptive-sample-18to20.csv
survey-data-trust-descriptive-sample-12to20.csv
survey-data-trust-analytical-sample.csv
------------------------------------------------------------

The second part of the dataset provides information used in Study 4, involving the CFPS data, Baidu Index data, and data on COVID-19 cases and deaths.

The China Family Panel Studies (CFPS), conducted by Peking University, is a nationally representative, longitudinal, comprehensive, and biennial social survey started in 2010. The outcome of interest in Study 4 is trust in Americans measured in the 2020 CFPS, incorporating the baseline trust from the 2018 CFPS. We confined the sample to respondents who indicated their level of trust in Americans in both the 2018 and 2020 waves (N=17,497). [survey-data-trust-descriptive-sample-18to20.csv] reports the trust level in 2018 and 2020 and the changes in between. As a supplementary analysis, we also used all respondents aged 16 or above in each wave of the CFPS since 2012 to document the changes in Chinese trust in Americans from 2012 to 2020 (survey-data-trust-descriptive-sample-12to20.csv).

In the regression analysis, we provide the subsample of those who have the "potential" to decrease trust (baseline trust scored above 0) and have complete information on location and interview date (N=11,430). They are interviewed at some point over the 23 weeks spanning from July 2020 to December 2020. 

We measure the Chinese public attention to the pandemic in the US using the Baidu Index [https://index.baidu.com/v2/index.html]. Baidu is the largest search engine in China. The Baidu Index provides query-based data that reflects the daily intensity of keywords entered into Baidu. We applied a logarithmic transformation to the Baidu Index scores for the keywords, such as "美国疫情" (pandemic in the US), "疫情" (pandemic) and "中美贸易战" (Sino-US trade war), to quantify public attention. 

Our analysis in this part also involves the COVID-19 cases and deaths data obtained from the Oxford COVID-19 Government Response Tracker [https://www.bsg.ox.ac.uk/research/covid-19-government-response-tracker]. We used two measures with logarithmic transformation: the daily number of confirmed cases and the daily number of deaths occurring one day before the 2020 CFPS interview date. Due to the time difference between China and the US, these statistics are possibly the most up-to-date information available to the survey respondents who closely follow US news. 

[survey-data-trust-analytical-sample.csv] includes variables used in the regression analysis, including the trust in Americans in 2018 and 2020, demographic variables, and location details (province) from CFPS, along with the merged data of Baidu Index and the COVID-19 cases and deaths data, used to produce the main results (Table 1) and all SI tables for Study 4. The variable meanings are explained below.

Variable name           Meaning
trust_americans         Trust in Americans in 2020
trust_parents           Trust in parents in 2020
trust_neighbors         Trust in neighbors in 2020
trust_doctors           Trust in doctors in 2020
trust_officials         Trust in officials in 2020
trust_americans_18      Trust in Americans in 2018
trust_parents_18        Trust in parents in 2018
trust_neighbors_18      Trust in neighbors in 2018
trust_doctors_18        Trust in doctors in 2018
trust_officials_18      Trust in officials in 2018
increase                Trust in Americans increased from 2018 to 2020 (binary)
logUS_pandemic          logged Baidu Search Index score of "pandemic in US"
logpandemic	            logged Baidu Search Index score of "pandemic"
logtrade_war            logged Baidu Search Index score of "Sino-American trade war"
logMeng           logged Baidu Search Index score of "Meng Wanzhou"
logFloyd            logged Baidu Search Index score of "Floyd"
logUS_SouthSea            logged Baidu Search Index score of "US China South Sea"
logUS_case_new          logged number of new COVID-19 cases in the US one day ago
logUS_death_new         logged number of new COVID-19 related deaths in the US one day ago
age	                    Age
age2                    Age squared
married	                Married
male                    Male
hs_above                Completed senior high school or a higher level of education
uhukou                  Urban hukou
internet                Internet user
student	                In full-time education, including undergraduate and postgraduate education
employed                In full- or part-time paid employment or was self-employed
weekend	                Interviewed at weekend
logUS_pandemic_lag1	    logged Baidu Search Index score of "pandemic in US" one day ago
logUS_pandemic_lag2	    logged Baidu Search Index score of "pandemic in US" two days ago
logUS_pandemic_lag3	    logged Baidu Search Index score of "pandemic in US" three days ago
logUS_pandemic_lead1	logged Baidu Search Index score of "pandemic in US" one day later
logUS_pandemic_lead2	logged Baidu Search Index score of "pandemic in US" two days later
logUS_pandemic_lead3	logged Baidu Search Index score of "pandemic in US" three days later
week                    Week indicator
provcd18                Province indicator
date_N15                Indicating at least 15 respondents are interviewed on a given day

--------------------------------------------
DATA-SPECIFIC INFORMATION: Social Media Data
media-data-average-opinion-us.csv
--------------------------------------------

The third part of the dataset depicts trends in social media attitudes toward the US in Study 2. The data is collected from 53,949,720 posts containing US-related keywords (美国, 灯塔国, 美利坚, 米国, 美帝) from January 1, 2016, to November 28, 2023, on the Chinese social media platform Weibo, which is similar to Twitter. The substantial size provides us with a high level of confidence that this dataset encompasses prevalent viewpoints on Chinese social media. Each post was labeled with an attitude score toward the US on a scale of -2 (most unfavorable), -1 (somewhat unfavorable), 0 (neutral), 1 (somewhat favorable), and 2 (most favorable). Subsequently, we employed fine-tuning on a large language model, BERT, using these annotations for two tasks. The first task involved binary classification to determine whether a Weibo post conveyed attitudes toward the US. The second task was a regression model to predict the attitude score.
The daily attitude averaging across all users is provided in [media-data-average-opinion-us.csv], smoothed using a 540-day sliding window to filter out minor fluctuations.




---------------
EARLIER VERSION
---------------

An earlier version of this dataset was previously published in Princeton DataSpace (https://doi.org/10.34770/5pk2-8345). In this updated version, we have made several revisions, including: (a) expanding the time range of social media data from 2016-2022 to 2016-2023, (b) applying a wider window to smooth the sentiment trends on social media, and (c) reporting the trust level in 2018 and 2020 and the changes in between, while retaining statistics on all respondents since 2012 as supplementary analysis.