This readme.txt file was generated by Yuchen He, Junming Huang and Feng Yang in April 2024. ------------------- GENERAL INFORMATION ------------------- Title of Dataset: Data for "Declining Chinese Attitudes toward the United States amid COVID-19" Author Information Yu Xie Paul and Marcia Center on Contemporary China, Princeton University, Princeton, NJ 08544, United States Center for Social Research, Guanghua School of Management, Peking University, Beijing 100871, China Feng Yang Center for Social Research, Guanghua School of Management, Peking University, Beijing 100871, China Junming Huang Paul and Marcia Center on Contemporary China, Princeton University, Princeton, NJ 08544, United States Yuchen He Center for Social Research, Guanghua School of Management, Peking University, Beijing 100871, China Yi Zhou Center for Social Research, Guanghua School of Management, Peking University, Beijing 100871, China Yue Qian Department of Sociology, University of British Columbia, 6303 NW Marine Drive, Vancouver, BC, V6T 1Z1 Canada Weicheng Cai Center for Social Research, Guanghua School of Management, Peking University, Beijing 100871, China Jie Zhou Institute of Psychology, Chinese Academy of Sciences, 16 Lincui Road, Chaoyang District, Beijing 100101, China Date of data collection: 2016 - 2023 Description: This dataset encompasses three distinct sets of data analyzed in the study, namely the survey data on favorability to the US, the survey data on trust in Americans, and the social media data. The first part of the dataset includes data analyzed in Study 1 and Study 3, collected from three surveys: the Social Attitude Questionnaire of Urban and Rural Residents (SAQURR) in 2019 and 2020, the COVID-19 Multi-Wave Study (CMWS) between 2020 and 2022, and the Survey on Living Conditions (SLC) in 2023. The second part of the dataset provides information used in Study 4, involving the CFPS data, Baidu Index data, and data on COVID-19 cases and deaths. The third part of the dataset depicts trends in attitudes toward the US in Study 2. -------------------------- SHARING/ACCESS INFORMATION -------------------------- Licenses/restrictions placed on the data, or limitations of reuse: CC BY-NC-SA 4.0 . Recommended citation for the data: Xie, Y., Yang, F., Huang, J., He, Y,. Zhou, Y., Qian, Y., Cai, W., & Zhou, J. (2024). Data for "Declining Chinese Attitudes toward the United States amid COVID-19" [Data set]. Princeton University. https://doi.org/10.34770/ew2y-jy92 Links to other publicly accessible locations of the data: This data is available at yuxie.com (https://yuxie.scholar.princeton.edu/share-files/data-files-declining-chinese-attitudes-toward-united-states-amidst-covid-19) and Princeton DataSpace (https://doi.org/10.34770/ew2y-jy92). COVID-19 Multi Wave Study (CMWS) and Survey on Living Conditions (SLC) are conducted by Population Development Studies Center, Renmin University of China. Social Attitude of Urban and Rural Residents Survey (SAURRS) is conducted by Institute of Psychology of Chinese Academy of Sciences. China Family Panel Studies (CFPS) is conducted by Institute of Social Science, Peking University. The Weibo data is owned by Sina. -------------------- DATA & FILE OVERVIEW -------------------- File list: README.txt media-data-average-opinion-us.csv survey-data-trust-analytical-sample.csv survey-data-trust-descriptive-sample-12to20.csv survey-data-trust-descriptive-sample-18to20.csv survey-data-favorability-study-1.csv survey-data-favorability-study-3.csv survey-questionnaires.pdf Relationship between files, if important for context: [survey-data-favorability-study-1.csv] and [survey-data-favorability-study-3.csv] suffice the replication of the results presented in Study 1 and Study 3, respecitvely. [survey-data-trust-descriptive-sample-12to20.csv] and [survey-data-trust-descriptive-sample-18to20.csv] report the descriptive trust levels and trends for Study 4. [survey-data-trust-analytical-sample.csv] suffices the replication of the regression results in Study 4. [media-data-average-opinion-us.csv] provides the daily attitude averaging across all users in Weibo for Study 2. [survey-questionnaires.pdf] collects relevant sections of questionnaires of the four surveys. If data were derived from another source, list source: The first part of the dataset includes data analyzed in Study 1 and Study 3, collected from three surveys: the Social Attitude Questionnaire of Urban and Rural Residents (SAQURR) in 2019 and 2020, the COVID-19 Multi-Wave Study (CMWS) between 2020 and 2022, and the Survey on Living Conditions (SLC) in 2023. The second part of the dataset provides information used in Study 4, involving the CFPS data, Baidu Index data, and data on COVID-19 cases and deaths. The third part of the dataset depicts trends in attitudes toward the US in Study 2. The data is collected from 53,949,720 posts containing US-related keywords (美国, 灯塔国, 美利坚, 米国, 美帝) from January 1, 2016, to November 28, 2023, on the Chinese social media platform Weibo, which is similar to Twitter. COVID-19 Multi Wave Study (CMWS) and Survey on Living Conditions (SLC) are conducted by Population Development Studies Center, Renmin University of China. Social Attitude of Urban and Rural Residents Survey (SAURRS) is conducted by Institute of Psychology of Chinese Academy of Sciences. China Family Panel Studies (CFPS) is conducted by Institute of Social Science Survey, Peking University. The Weibo data is owned by Sina. -------------------------- METHODOLOGICAL INFORMATION -------------------------- ---------------------------------------------------------------- DATA-SPECIFIC INFORMATION: Survey Data on Favorability to the US survey-data-favorability-study-1.csv survey-data-favorability-study-3.csv ---------------------------------------------------------------- The first part of the dataset includes data analyzed in Study 1 and Study 3, collected from three surveys: the Social Attitude Questionnaire of Urban and Rural Residents (SAQURR) in 2019 and 2020, the COVID-19 Multi-Wave Study (CMWS) between 2020 and 2022, and the Survey on Living Conditions (SLC) in 2023. We append the data from the three surveys. The raw data at the micro level can be found in [survey-data-favorability-study-1.csv] and [survey-data-favorability-study-3.csv], sufficing the replication of the results presented in Study 1 and Study 3. We disclose the relevant variables used in the research, including the favorability score, source of survey, year and month of the interview, and background information such as education and age, accompanying the weights. The PIDs, the personal identifications, are part of the original compilation from SAQURR, CMWS, and SLC. Our study object is to examine the trends in attitudes toward America, so our sample is limited to only those who reported their favorability towards the US, containing 3,266 observations in SAQURR, 28,897 observations in CMWS, and 2,592 observations in SLC. ------------------------------------------------------------ DATA-SPECIFIC INFORMATION: Survey Data on Trust in Americans survey-data-trust-descriptive-sample-18to20.csv survey-data-trust-descriptive-sample-12to20.csv survey-data-trust-analytical-sample.csv ------------------------------------------------------------ The second part of the dataset provides information used in Study 4, involving the CFPS data, Baidu Index data, and data on COVID-19 cases and deaths. The China Family Panel Studies (CFPS), conducted by Peking University, is a nationally representative, longitudinal, comprehensive, and biennial social survey started in 2010. The outcome of interest in Study 4 is trust in Americans measured in the 2020 CFPS, incorporating the baseline trust from the 2018 CFPS. We confined the sample to respondents who indicated their level of trust in Americans in both the 2018 and 2020 waves (N=17,497). [survey-data-trust-descriptive-sample-18to20.csv] reports the trust level in 2018 and 2020 and the changes in between. As a supplementary analysis, we also used all respondents aged 16 or above in each wave of the CFPS since 2012 to document the changes in Chinese trust in Americans from 2012 to 2020 (survey-data-trust-descriptive-sample-12to20.csv). In the regression analysis, we provide the subsample of those who have the "potential" to decrease trust (baseline trust scored above 0) and have complete information on location and interview date (N=11,430). They are interviewed at some point over the 23 weeks spanning from July 2020 to December 2020. We measure the Chinese public attention to the pandemic in the US using the Baidu Index [https://index.baidu.com/v2/index.html]. Baidu is the largest search engine in China. The Baidu Index provides query-based data that reflects the daily intensity of keywords entered into Baidu. We applied a logarithmic transformation to the Baidu Index scores for the keywords, such as "美国疫情" (pandemic in the US), "疫情" (pandemic) and "中美贸易战" (Sino-US trade war), to quantify public attention. Our analysis in this part also involves the COVID-19 cases and deaths data obtained from the Oxford COVID-19 Government Response Tracker [https://www.bsg.ox.ac.uk/research/covid-19-government-response-tracker]. We used two measures with logarithmic transformation: the daily number of confirmed cases and the daily number of deaths occurring one day before the 2020 CFPS interview date. Due to the time difference between China and the US, these statistics are possibly the most up-to-date information available to the survey respondents who closely follow US news. [survey-data-trust-analytical-sample.csv] includes variables used in the regression analysis, including the trust in Americans in 2018 and 2020, demographic variables, and location details (province) from CFPS, along with the merged data of Baidu Index and the COVID-19 cases and deaths data, used to produce the main results (Table 1) and all SI tables for Study 4. The variable meanings are explained below. Variable name Meaning trust_americans Trust in Americans in 2020 trust_parents Trust in parents in 2020 trust_neighbors Trust in neighbors in 2020 trust_doctors Trust in doctors in 2020 trust_officials Trust in officials in 2020 trust_americans_18 Trust in Americans in 2018 trust_parents_18 Trust in parents in 2018 trust_neighbors_18 Trust in neighbors in 2018 trust_doctors_18 Trust in doctors in 2018 trust_officials_18 Trust in officials in 2018 increase Trust in Americans increased from 2018 to 2020 (binary) logUS_pandemic logged Baidu Search Index score of "pandemic in US" logpandemic logged Baidu Search Index score of "pandemic" logtrade_war logged Baidu Search Index score of "Sino-American trade war" logMeng logged Baidu Search Index score of "Meng Wanzhou" logFloyd logged Baidu Search Index score of "Floyd" logUS_SouthSea logged Baidu Search Index score of "US China South Sea" logUS_case_new logged number of new COVID-19 cases in the US one day ago logUS_death_new logged number of new COVID-19 related deaths in the US one day ago age Age age2 Age squared married Married male Male hs_above Completed senior high school or a higher level of education uhukou Urban hukou internet Internet user student In full-time education, including undergraduate and postgraduate education employed In full- or part-time paid employment or was self-employed weekend Interviewed at weekend logUS_pandemic_lag1 logged Baidu Search Index score of "pandemic in US" one day ago logUS_pandemic_lag2 logged Baidu Search Index score of "pandemic in US" two days ago logUS_pandemic_lag3 logged Baidu Search Index score of "pandemic in US" three days ago logUS_pandemic_lead1 logged Baidu Search Index score of "pandemic in US" one day later logUS_pandemic_lead2 logged Baidu Search Index score of "pandemic in US" two days later logUS_pandemic_lead3 logged Baidu Search Index score of "pandemic in US" three days later week Week indicator provcd18 Province indicator date_N15 Indicating at least 15 respondents are interviewed on a given day -------------------------------------------- DATA-SPECIFIC INFORMATION: Social Media Data media-data-average-opinion-us.csv -------------------------------------------- The third part of the dataset depicts trends in social media attitudes toward the US in Study 2. The data is collected from 53,949,720 posts containing US-related keywords (美国, 灯塔国, 美利坚, 米国, 美帝) from January 1, 2016, to November 28, 2023, on the Chinese social media platform Weibo, which is similar to Twitter. The substantial size provides us with a high level of confidence that this dataset encompasses prevalent viewpoints on Chinese social media. Each post was labeled with an attitude score toward the US on a scale of -2 (most unfavorable), -1 (somewhat unfavorable), 0 (neutral), 1 (somewhat favorable), and 2 (most favorable). Subsequently, we employed fine-tuning on a large language model, BERT, using these annotations for two tasks. The first task involved binary classification to determine whether a Weibo post conveyed attitudes toward the US. The second task was a regression model to predict the attitude score. The daily attitude averaging across all users is provided in [media-data-average-opinion-us.csv], smoothed using a 540-day sliding window to filter out minor fluctuations. --------------- EARLIER VERSION --------------- An earlier version of this dataset was previously published in Princeton DataSpace (https://doi.org/10.34770/5pk2-8345). In this updated version, we have made several revisions, including: (a) expanding the time range of social media data from 2016-2022 to 2016-2023, (b) applying a wider window to smooth the sentiment trends on social media, and (c) reporting the trust level in 2018 and 2020 and the changes in between, while retaining statistics on all respondents since 2012 as supplementary analysis.