This dataset encompasses three distinct sets of data analyzed in the study, namely the survey data on favorability to the US, the survey data on trust in Americans, and the social media data.
This data is available at yuxie.com and Princeton DataSpace.
Survey Data on Favorability to the US
The first part of the dataset comprises the analysis in Study 1 and Study 3.
The analysis in Study 1 uses data from three surveys: the Social Attitude Questionnaire of Urban and Rural Residents (SAQURR) in 2019 and 2020 (N=3,408), the COVID-19 Multi-Wave Study (CMWS) between 2020 and 2022 (N=38,613), and the Survey on Living Conditions (SLC) in 2023 (N=2,596). The Chinese and English versions of the survey questionnaires are provided in [survey-questionnaires.pdf].
Study 1 uses individual-level data appended from the three surveys, which is provided in [survey-data-favorability-study-1.csv]. The data includes Chinese favorability scores toward the US, survey sources, the year and month of the interview, demographic information of respondents, and the survey weights.
Analysis in Study 3 involves a subsample from Northwest China in SAQURR (N=1880), which is provided in [survey-data-favorability-study-3.csv]. The data includes Chinese favorability scores towards the US and seven other countries or regions. To assess the comparability of the control group (respondents interviewed in December 2019) and treatment group (April 2020) in the quasi-experimental design, we provide background information on sex, education, and age.
Survey Data on Trust in Americans
The second part of the datasets provides information used in Study 4, involving the CFPS data, Baidu Index data, and the COVID-19 cases and deaths data.
The China Family Panel Studies (CFPS), conducted by Peking University, is a nationally representative, longitudinal, comprehensive, and biennial social survey started in 2010. The outcome of interest in Study 4 is trust in Americans measured in the 2020 CFPS, incorporating the baseline trust from the 2018 CFPS. We confined the sample to respondents who indicated their level of trust in Americans in both the 2018 and 2020 waves (N=17,497). [survey-data-trust-descriptive-sample-18to20.csv] reports the trust level in 2018 and 2020 and the changes in between. As a supplementary analysis, we also used all respondents aged 16 or above in each wave of the CFPS since 2012 to document the changes in Chinese trust in Americans from 2012 to 2020 (survey-data-trust-descriptive-sample-12to20.csv).
In the regression analysis, we provide the subsample of those who have the “potential” to decrease trust (baseline trust scored above 0) and have complete information on location and interview date (N=11,430). They are interviewed at some point over the 23 weeks spanning from July 2020 to December 2020.
We measure the Chinese public attention on the pandemic in the US using the Baidu Index. Baidu is China’s search engine. The Baidu Index provides query-based data that reflects the daily intensity of keywords entered into Baidu, the largest search engine in China. We applied a logarithmic transformation to the Baidu Index scores for the keywords, such as “美国疫情” (“pandemic in the US”), “疫情” (“pandemic”) and “中美贸易战” (“Sino-US trade war”), to quantify public attention to these issues.
Our analysis in Study 4 also involves the COVID-19 cases and deaths data obtained from the Oxford COVID-19 Government Response Tracker. We used two measures with logarithmic transformation: the daily number of confirmed cases and the daily number of deaths occurring one day before the 2020 CFPS interview date. Due to the time difference between China and the US, these statistics are possibly the most up-to-date information available to the survey respondents who closely follow US news.
[survey-data-trust-analytical-sample.csv] collects variables used for the regression analysis, including the trust in Americans in 2018 and 2020, demographic variables, and location details (province) from the CFPS, along with the merged data of Baidu Index and the COVID-19 cases and deaths data.
Social Media Data
The third dataset is provided to depict trends in attitudes toward the US in Study 2. The data is collected from 53,949,720 posts containing US-related keywords (美国, 灯塔国, 美利坚, 米国, 美帝) from January 1, 2016, to November 28, 2023, on the Chinese social media platform Weibo, which is similar to Twitter. The substantial size provides us with a high level of confidence that this dataset encompasses prevalent viewpoints on Chinese social media. Each post was labeled with an attitude score toward the US on a scale of -2 (most unfavorable), -1 (somewhat unfavorable), 0 (neutral), 1 (somewhat favorable), and 2 (most favorable). Subsequently, we employed fine-tuning on a large language model, BERT, using these annotations for two tasks. The first task involved binary classification to determine whether a Weibo post conveyed attitudes toward the US. The second task was a regression model to predict the attitude score.
The daily attitude averaging across all users is provided in [media-data-average-opinion-us.csv], smoothed using a 540-day sliding window to filter out minor fluctuations.
Data Publisher
COVID-19 Multi Wave Study (CMWS) and Survey on Living Conditions (SLC) are conducted by the Population Development Studies Center, Renmin University of China. Social Attitude of Urban and Rural Residents Survey (SAURRS) is conducted by the Institute of Psychology of the Chinese Academy of Sciences. China Family Panel Studies (CFPS) is conducted by the Institute of Social Science Survey, Peking University. The Weibo data is owned by Sina.