Stata serves as my companion for data preprocessing, variable construction, and causal inference analysis. I can do this with R as well, but Stata works best for me! When I was an undergraduate, I was able to use Stata to complete data processing for three sub-studies in less than three weeks. After two years of postgraduate study, my data analysis skills have further improved. I have received strict training in econometric techniques and am very interested in experimental methods and causal inference. In addition to being able to implement linear mixed-effects models, traditional causal inference methods (such as IV, DID, RD, etc.), I have also mastered some latest methods, such as SCM, Staggered DID, Bartik Instrument.
- This work was part of the China’s National Natural Science Foundation Programme titled Research on the Formation Mechanism and Model of Breakthrough Innovation in High-Tech Industries. This code was designed to analyze firm-level data, particularly focusing on how heterogeneous knowledge influences radical innovation in AI firms. It performs descriptive analysis on the dataset, including generating summary statistics and correlation matrices for key variables, followed by various regression analyses to test the effects of technological diversity, technological distance, and other firm-level variables on radical innovation. These analyses include both fixed and random effects Tobit models and negative binomial regression models. The code also constructs and tests interaction terms between technological diversity and business digitalization to assess moderating effects on radical innovation; Several robustness checks are conducted, including regression models with lagged dependent variables and panel data models with multidimensional fixed effects. The code also generates graphical illustrations to depict the moderating effects of business and technology digitalization on breakthrough innovation, as well as U-shaped relationships between technological distance and innovation. The margins and marginsplot commands are used to calculate and display marginal effects of these interactions.
- This project is the data work I conducted for my undergraduate thesis, which received the Outstanding Thesis Award (the only one in the entire college). This Stata code primarily handles data processing and analysis for a study examining the impact of diversification and business strategies on the performance of publicly listed companies. It involves importing and cleaning datasets, converting raw data into a panel format, handling duplicates, and merging multiple datasets containing company performance indicators (e.g., ROA, ROE, EPS) with strategic indicators like tone and diversification scores. The code also calculates various financial and strategic metrics, such as accrual earnings management based on the Jones model, and incorporates geographic and industry-level spatial data for further spatial econometric analysis, including Moran’s I test for spatial correlation.The code then progresses to running multiple regression models, including panel fixed effects, stepwise regressions, spatial lag, spatial error, and threshold regression models, to explore how different strategic variables (e.g., diversification, tone, strategic differences) and control variables (e.g., size, leverage, governance) affect company performance. It also tests for mediation and moderation effects within the strategic variables, using tools like bootstrapping and the Heckman correction for endogeneity, and concludes with three-stage least squares (3SLS) estimations to account for potential endogeneity in the models.