/*This is the dataset compilation program for "Self-Enforcing Trade Agreements: Evidence from Time-Varying Trade Policy" by Chad P. Bown and Meredith A. Crowley. This project was supported by research assistance of Aksel Erbahar, Adam Hogan, Xi Luo, and Chrissy Ostrowski. */ *This file creates the main dataset for the project. In order to run this file, *we need the following raw data sets: *IMP_US_AGG_NAICS5.txt *IMP_US_AGG_NAICS6.txt *IMP_value_1.txt *IMP_value_2.txt *IMP_value_3.txt *IMP_value_4.txt *IMP_value_5.txt *IMP_value_6.txt *IMP_value_NAICS5_1.txt *IMP_value_NAICS5_2.txt *IMP_value_NAICS5_3.txt *IMP_value_NAICS5_4.txt *IMP_value_NAICS5_5.txt *IMP_value_NAICS5_6.txt *NAICS-CONC-CNTRY.dta *hs2naics.txt *conversion02to07.csv *AD-USA-Products.txt *AD-USA-Master.txt *originalconcratHHvalues.dta *real_exchange_rate.txt *Sigmas73countries_9403_HS3digit.dta *OmegasUSA_94-03_HS4digit.dta *SG_NAICS_COMB.dta *The main directory of the project (i.e. 'mypath') should have three folders: *raw (contains all files listed above) *mod (empty to start) *final (empty to start) /*After running this do file, run SETA_variable_creation_05-01-2012.do (variable creation file). Then run the estimation files: SETA-Estimation-Table-2-05-01-2012.do, SETA-Estimation-Table-3-05-01-2012.do, and SETA-Estimation-Table-4-05-01-2012.do. */ clear matrix clear set mem 700m set more off cd "C:\mypath" *this is the only line that refers to an absolute path, put your main directory here! *STAGE 1 ADD IMPORTS ************************************************************************** *Creates a data set of imports by industry and year (aggregated over countries) *and a data set of imports by country, industry, and year. Merges this data with *United States industry data from the Annual Survey of Manufactures (ASM) *that is grouped by industry and year. /* Import data comes from USITC DataWeb (Choose "Imports for Consumption", "NAIC Codes", "Customs Value", "In 1,000", "Annual", "All Country Sub Codes", "All Extended Country Sub Codes", "All Rate Provision Codes". Use 5- and 6-digit NAICS codes.*/ * US Aggregate imports by industry(NAICS5) from ALL 49 countries in dataset insheet using raw\IMP_US_AGG_NAICS5.txt, clear /* NAICS are all 5 digit numeric codes, except for 1123X. 1123X is dropped later because it does not begin with a manufacturing prefix (31, 32, 33) */ rename v1 NAICS rename v2 IMP1997 rename v3 IMP1998 rename v4 IMP1999 rename v5 IMP2000 rename v6 IMP2001 rename v7 IMP2002 rename v8 IMP2003 rename v9 IMP2004 rename v10 IMP2005 rename v11 IMP2006 drop if NAICS == "NAIC Number" keep if substr(NAICS, 1, 2) == "31" | substr(NAICS, 1, 2) == "32" | substr(NAICS, 1, 2) == "33" /* NAICS codes beginning in 31, 32, and 33 make up manufacturing industries */ notes: contains data for US total import by NAICS5 save mod\IMP_US_AGG_NAICS5.dta, replace * US Aggregate imports by industry(NAICS6) from ALL 49 countries in dataset insheet using raw\IMP_US_AGG_NAICS6.txt, clear /* NAICS are all 6 digit numeric codes, except for 11211X, 1123XX, 31131X, 31181X, 31511X, 33631X, all of which have identical import values to their 5-digit equivalents (11211, 1123X, 31131, etc) */ rename v1 NAICS rename v2 IMP1997 rename v3 IMP1998 rename v4 IMP1999 rename v5 IMP2000 rename v6 IMP2001 rename v7 IMP2002 rename v8 IMP2003 rename v9 IMP2004 rename v10 IMP2005 rename v11 IMP2006 drop if NAICS == "NAIC Number" keep if substr(NAICS, 1, 2) == "31" | substr(NAICS, 1, 2) == "32" | substr(NAICS, 1, 2) == "33" /* NAICS codes beginning in 31, 32, and 33 make up manufacturing industries */ drop if substr(NAICS, 6, 1)=="X" /*because these codes are taken care of in the NAICS5 file */ notes: contains data for US total import by NAICS6 save mod\IMP_US_AGG_NAICS6.dta, replace * Combine two US aggregate import data set(NAICS6 + NAICS5) from ALL 49 countries in dataset use mod\IMP_US_AGG_NAICS5.dta, clear append using mod\IMP_US_AGG_NAICS6.dta reshape long IMP, i(NAICS) j(Year) *Most of the NAICS ending in zero are actually aggregate level 5 NAICS codes that are already in the data, *except 323110 and 334510, which are level 6 NAICS codes. replace NAICS = substr(NAICS, 1, 5) if substr(NAICS, 6, 1) == "0" & NAICS != "323110" /// & NAICS != "334510" duplicates drop /*This drops the level 5 dups that were just created */ cap rename IMP agg_IMP sort NAICS Year save mod\IMP_US_AGG_long.dta, replace * US import data by NAICS5 and Country forvalues i = 1/6{ insheet using raw\IMP_value_NAICS5_`i'.txt, clear /* NAICS codes are all numeric 5-digit codes except 1123X */ rename v1 Country rename v2 NAICS rename v3 IMP1997 rename v4 IMP1998 rename v5 IMP1999 rename v6 IMP2000 rename v7 IMP2001 rename v8 IMP2002 rename v9 IMP2003 rename v10 IMP2004 rename v11 IMP2005 rename v12 IMP2006 keep if substr(NAICS, 1, 2) == "31" | substr(NAICS, 1, 2) == "32" | substr(NAICS, 1, 2) == "33" drop if Country == "Country" destring IMP*, replace ignore(",") save mod\IMP_value_NAICS5_`i'.dta, replace } use mod\IMP_value_NAICS5_1.dta, clear forvalues i = 2/6{ append using mod\IMP_value_NAICS5_`i'.dta } keep if substr(NAICS, 1, 2) == "31" | substr(NAICS, 1, 2) == "32" | substr(NAICS, 1, 2) == "33" sort NAICS save mod\NAICS5_IMP_value_wide.dta, replace * US import data by NAICS6 and Country forvalues i = 1/6{ insheet using raw\IMP_value_`i'.txt, clear /* NAICS are all 6 digit numeric codes, except for 11211X, 1123XX, 31131X, 31181X, 31511X, 33631X, all of which have identical import values to their 5-digit equivalents (11211, 1123X, 31131, etc) */ rename v1 Country rename v2 NAICS rename v3 IMP1997 rename v4 IMP1998 rename v5 IMP1999 rename v6 IMP2000 rename v7 IMP2001 rename v8 IMP2002 rename v9 IMP2003 rename v10 IMP2004 rename v11 IMP2005 rename v12 IMP2006 keep if substr(NAICS, 1, 2) == "31" | substr(NAICS, 1, 2) == "32" | substr(NAICS, 1, 2) == "33" drop if substr(NAICS, 6, 1)=="X" /*because these codes are taken care of in the NAICS5 file */ drop if Country == "Country" destring IMP*, replace ignore(",") save mod\IMP_value_`i'.dta, replace } use mod\IMP_value_1.dta, clear forvalues i = 2/6{ append using mod\IMP_value_`i'.dta } keep if substr(NAICS, 1, 2) == "31" | substr(NAICS, 1, 2) == "32" | substr(NAICS, 1, 2) == "33" sort NAICS replace NAICS = substr(NAICS, 1, 5) if substr(NAICS, 6, 1) == "0" /// & NAICS != "323110" & NAICS != "334510" /* This does not create any duplicates because it adds 6-digit codes to a dataset of 5-digit codes */ save mod\NAICS6_IMP_value_wide.dta, replace use mod\NAICS5_IMP_value_wide.dta, clear append using mod\NAICS6_IMP_value_wide.dta duplicates report duplicates drop /* All of these duplicates were created from dropping Xs or zeroes off of 6-digit codes. Now all NAICS-country pairings are unique */ save mod\NAICS_IMP_value_wide.dta, replace reshape long IMP, i(NAICS Country) j(Year) rename Country country replace country = "South Korea" if country == "Korea" replace country = "Trinidad" if country == "Trin & Tobago" sort NAICS country Year save mod\NAICS_IMP_value_long.dta, replace *For some reason, the original aggregate data aggregated imports from *all countries but Trinidad. This was an error in pulling the data *from the USITC website, not an error on the USITC website itself. * I discovered this by comparing the sum of the disaggregated *import data to the aggregated data. It was clear that the Trinidad *import data was almost exactly the same as the discrepancy *between the aggregate data and the sum of the disaggregated data. *The following code segment adds the Trinidad data to the aggregate data. use mod\NAICS_IMP_value_long.dta, clear keep if country == "Trinidad" save mod\TrinidadIMP.dta, replace use mod\IMP_US_AGG_long.dta, clear merge 1:1 NAICS Year using mod\TrinidadIMP.dta /* There are some _merge==1 because Trinidad, like most of the countries, does not have data for all NAICS Year pairs. That is not a problem. */ destring agg_IMP, replace ignore(",") replace agg_IMP = agg_IMP + IMP if country=="Trinidad" sort NAICS Year keep NAICS Year agg_IMP save mod\IMP_US_AGG_long_Trin.dta, replace *In this next step, the disaggregated import data set is merged with the *aggregated import data set. In some cases, the aggregate data is not *equal to the sum of the disaggregated data (even after fixing the Trinidad *problem). Sometimes the discrepancy is clearly rounding error, in which *case a sum_flag variable is set equal to zero. In other cases, the *discrepancy is larger. The value of sum_flag grows as the ratio of *the discrepancy to the data grows. The sum_flag variable is used later *to determine which observations should be given a missing value for imports. use mod\NAICS_IMP_value_long.dta, clear egen sum_IMP = sum(IMP), by(NAICS Year) merge m:1 NAICS Year using mod\IMP_US_AGG_long_Trin.dta tab _merge gen sum_flag = . replace sum_flag = -10 if _merge == 1 gen sum_missing = agg_IMP - sum_IMP replace sum_flag = 0 if sum_missing == 0 replace sum_flag = 0 if sum_missing == 1 replace sum_flag = 0 if sum_missing == -1 replace sum_flag = 1 if abs(sum_missing) < 10 & sum_flag == . & abs(sum_missing) < agg_IMP/10000 replace sum_flag = 2 if abs(sum_missing) < agg_IMP/1000 & sum_flag == . replace sum_flag = 3 if abs(sum_missing) < agg_IMP/100 & sum_flag == . replace sum_flag = 4 if abs(sum_missing) < agg_IMP/20 & sum_flag == . replace sum_flag = 5 if sum_flag == . keep NAICS country Year IMP sum_flag sum_missing sort NAICS country Year save mod\US_IMP_AGG_CNTRY_long.dta, replace * NOTE: raw\NAICS-CONC-CNTRY.dta has already been pruned * of country-industry observations that have too many missing years * to be useful. Also, when all six-digit "children" codes are available, * the five-digit "parent" code is dropped to remove redundancy. The * code 33661 should have been dropped but was apparently overlooked * when this file was created. Since the industry is small, both 33661 and * its children are dropped entirely to reduce confusion. * Unfortunately, the HAVER data uses 2002 NAICS coding, while the * import data uses 2007 NAICS coding. It would be nice if this problem * could be fixed with a quick merge of an 02 to 07 concordance; however, * some industries that were under the same code in 2002 were split between * two or more codes in 2007, while other industries in different codes * were consolidated. Since this creates a many-to-many mapping, information * on the portion of an industry that is mapped into another industry is necessary. * Using the websites listed below, I created a conversion data set that maps the * correct portions of 2002 industries into their new codes. In all cases, less than 15% * of a 2007 code is made up of a code other than itself (where codes are weighted * by data on sales from the Economic census). In most cases, it is 5% or less. * Thus, it is reasonable to keep the Herfindahl-Hirschman Index * (HHindex) and the Four-Firm Concentration Ratio (concrat4) the same, since * there is no clear way to merge two values of these variables into one value. * Since variables ET, EE, ..., GE are related to the total sales of the industry, * it makes sense to weight these values by an industry's total sales and sum * the weighted values of all industries that are being combined into a 2007 code. * NOTE: These observations make up a very small portion of the data set, and * most of the industry composition does not change from 2002 to 2007. Thus, * this procedure has very little effect on the specifications. A reader trying to do * a similar process may find it easier to simply map each 2007 industry into * the 2002 industry that composes 85+% of it. * The variable 'wasconverted' tags the 2007 industries in this file with multiple * 2002 industries mapping into them to note that they contain artificially * created data. This way, if a decision is made to treat them differently, * they are easily identifiable. The industry 326199 was not tagged, because no other * industry maps into it other than itself. * This file is a little confusing, so here's an example: * naics02 putinthis07 portion wasconverted * 326122 326122 1 1 * 326199 326122 .0037536 1 * In this case, the category 326122 was redefined in 2007 to include some businesses * that belonged to 326199 in 2002. About 0.4% of the business volume from 326199 * was transferred to 326122 (where business volume is measured in total sales). *Websites used for sales weighting: *2007 to 2002: *http://factfinder.census.gov/servlet/IBQTable?_bm=y&-ds_name=EC0700CBDG2&-_lang=en *2002 to 2007: *http://factfinder.census.gov/servlet/IBQTable?_bm=y&-ds_name=EC0700CBDG1 insheet using raw\conversion02to07.csv, clear tostring naics02, replace tostring putinthis07, replace save mod\conversion02to07.dta, replace use raw\NAICS-CONC-CNTRY.dta, clear replace naics02 = substr(naics02, 1, 5) if substr(naics02, 6, 1) == "0" & naics02 != "323110" & naics02 != "334510" keep naics02 duplicates drop merge 1:m naics02 using mod\conversion02to07.dta drop if _merge==2 /* No need for NAICS that aren't in our data */ drop _merge replace wasconverted = 0 if wasconverted == . replace putinthis07 = naics02 if putinthis07 == "" replace portion = 1 if portion == . sort naics02 save mod\conversion02to07total.dta, replace * Now, we create ASM data for industries that changed in 2007 by weighting their * 2002 components by total sales and summing the weighted components. *NOTE: The variable "wasconverted" marks the observations that have * been created using these weights. use raw\NAICS-CONC-CNTRY.dta, clear joinby naics02 using mod\conversion02to07total.dta, unmatched(both) *joinby is used instead of merge because it's a many-to-many mapping tab _merge /*Should be all _merge==3 */ drop if _merge==2 drop _merge foreach var of varlist ET EE PT PH PE VA MT SH EC EB VN VF VE QE CP CPS CPE I IW IM IF /// GE{ replace `var' = `var'*portion if `var' != . egen tot = sum(`var'), by(putinthis07 country Year) missing *the missing option ensures that the sum of a missing and nonmissing *value is a missing value replace `var' = tot drop tot } drop naics02 portion rename putinthis07 NAICS *It doesn't make sense to sum the variables HHindex or concrat4. since *they describe the level of concentration of an industry. Taking the average *of the values is questionable. Instead, when multiple industries are combined *into one, we use the values for HHindex or concrat4 from one of the industries *as a representative value. Previously, the Stata sorting mechanism "randomly" *chose this industry. Now that the do file has been cleaned, files are sorted *differently than before. This section reproduces the former sorting. Since this *process affects only a few industries, picking different industries to act as the *representatives has a negligible effect on the specification results. Nevertheless, *in order to keep files identical over time, we reproduce the "random" choice of the *original file. rename concrat4 old_concrat4 rename HHindex old_HHindex merge m:1 NAICS country Year using raw\originalconcratHHvalues.dta keep if _merge !=2 replace old_concrat4 = concrat4 if concrat4 != . replace old_HHindex = HHindex if HHindex != . drop concrat4 HHindex rename old_concrat4 concrat4 rename old_HHindex HHindex drop _merge duplicates drop order NAICS drop if NAICS == "337122" *this is dropped because it does not exist in the import data (the only such *code in the HAVER data that is completely nonexistent in the import data) sort NAICS country Year ********************************************************************* *We now merge the HAVER data with the import data. merge m:1 NAICS country Year using mod\US_IMP_AGG_CNTRY_long.dta tab _merge /* The _merge==1 observations exist because the IMP data has missing values for certain NAICS country Year groups, and the USITC site simply leaves out any missing values from files rather than marking them as missing. We keep these because we can use the aggregate import data to make certain conclusions about whether IMP is actually 0. In particular, - If, for a certain industry year group, the aggregate import data is almost equal to the sum of all the non-missing countries' imports, then we set any countries with missing imports to have IMP=0. (Here, 'almost equal' means that sum_flag < 4.) - If, for a certain industry year group, the sum of imports over countries was greater than the aggregate imports, then we set any countries with missing imports to have IMP=0. - Any missing data that does not meet either of these conditions has IMP set to missing. As the 'count' command below displays, all missing data is dealt with by the first condition, so the second and third conditions do nothing. The _merge==2 observations exist because HAVER data is missing certain NAICS country groups entirely. We drop these because we can't do anything with observations that don't have the HAVER data. */ drop if _merge == 2 sort NAICS Year egen max_sum_flag = max(sum_flag), by(NAICS Year) replace sum_flag = max_sum_flag if sum_flag == . drop max_sum_flag gen dum_itc_IMP = _merge == 3 drop _merge replace IMP = 0 if sum_flag == 0 & dum_itc_IMP == 0 replace IMP = 0 if sum_flag == 1 & dum_itc_IMP == 0 replace IMP = 0 if sum_flag == 2 & dum_itc_IMP == 0 replace IMP = 0 if sum_flag == 3 & dum_itc_IMP == 0 /*The following seven lines do nothing, since there are no such cases */ /* We include them in case a new data set is used */ replace IMP = 0 if sum_flag == 4 & sum_missing < 0 & dum_itc_IMP == 0 replace IMP = 0 if sum_flag == 5 & sum_missing < 0 & dum_itc_IMP == 0 replace IMP = . if sum_flag == 4 & sum_missing > 0 & dum_itc_IMP == 0 count if IMP == . & sum_flag == 4 & sum_missing > 0 & dum_itc_IMP == 0 replace IMP = . if sum_flag == 5 & sum_missing > 0 & dum_itc_IMP == 0 count if IMP == . & sum_flag == 5 & sum_missing > 0 & dum_itc_IMP == 0 replace IMP = . if IMP == . & sum_flag == -10 sort NAICS gen byte agglvl = length(NAICS) gen NAICS5 = substr(NAICS, 1, 5) egen g5 = group(NAICS5) destring NAICS, replace egen count_g5 = count(NAICS), by(NAICS5 country Year) tostring NAICS, replace egen maxlvl_g5 = max(agglvl), by(g5) gen relalvl = 0 if agglvl == 6 replace relalvl = 1 if agglvl == 5 & relalvl == . & maxlvl_g5 == 6 replace relalvl = 0 if agglvl == 5 & relalvl == . & maxlvl_g5 == 5 egen maxrelvl_g5 = max(relalvl), by(g5) /* relalvl = 1 for 5-digit codes with "child" (6-digit) codes and equals 0 otherwise. */ /* maxrelvl_g5 = 1 if there are both 5-digit and 6-digit codes corresponding to the g5 group in question. It equals 0 otherwise. */ drop sum_missing NAICS5 maxlvl_g5 sum_flag dum_itc_IMP count_g5 save mod\NAICS_CNTRY_IMP_v3.dta, replace ************************************************************************************** * The following codes will creates naics codes with 5 digits followed by an "M" *These codes are equal to the 5 digit aggregate minus the matching 6 digit codes. *They represent the sum of all missing 6 digit codes in a 5 digit aggregate. * NOTE: As was said above, when creating the raw data file NAICS-CONC-CNTRY, * parent data was dropped if all children data were present, in order to remove * redundancy. Thus, all parent codes that have children also have missing children. * So we can simply subtract children from parents without worrying * about producing "m" codes with data equal to zero. use mod\NAICS_CNTRY_IMP_v3.dta, clear count if NAICS == "33661" | NAICS == "336611" | NAICS == "336612" * Since 33661 is a small industry, we drop it and its children to reduce confusion, since * it was accidentally kept in the data set even though it has no missing children. drop if NAICS == "33661" | NAICS == "336611" | NAICS == "336612" keep if maxrelvl_g5 == 1 * drop ones where changes in naics codes happens drop if substr(NAICS, 1, 5) == "33329" | substr(NAICS, 1, 5) == "33399" /// |substr(NAICS, 1, 5) == "33712" |substr(NAICS, 1, 5) == "33712" /// | substr(NAICS, 1, 5) == "33911" drop if relalvl == 1 /*Now our dataset consists of all the 6 digit codes that have "parents". We sum the data by their parents' codes (g5) with the collapse command and then multiply the data by -1 so that we can subtract it from the parents' data */ collapse (sum) ET EE PT PH PE VA MT SH EC EB VN VF VE QE CP CPS CPE I IW IM IF GE IMP, /// by(g5 Year country) foreach var of varlist ET EE PT PH PE VA MT SH EC EB VN VF VE QE CP CPS CPE I IW IM IF /// GE IMP { replace `var' = `var'*(-1) } save mod\NAICS_CNTRY_IMP_subtotal.dta, replace use mod\NAICS_CNTRY_IMP_v3.dta, clear drop if substr(NAICS, 1, 5) == "33329" | substr(NAICS, 1, 5) == "33399" /// | substr(NAICS, 1, 5) == "33712" |substr(NAICS, 1, 5) == "33712" /// | substr(NAICS, 1, 5) == "33911" keep if maxrelvl_g5 == 1 drop if NAICS == "33661" | NAICS == "336611" | NAICS == "336612" keep if relalvl == 1 /*Now our dataset consists of all the 5 digit codes that have children. */ keep g5 NAICS duplicates drop replace NAICS = NAICS + "m" save mod\NAICSm_concordance.dta, replace use mod\NAICS_CNTRY_IMP_v3.dta, clear keep if maxrelvl == 1 drop if NAICS == "33661" | NAICS == "336611" | NAICS == "336612" drop if substr(NAICS, 1, 5) == "33329" | substr(NAICS, 1, 5) == "33399" /// |substr(NAICS, 1, 5) == "33712" |substr(NAICS, 1, 5) == "33712" /// | substr(NAICS, 1, 5) == "33911" keep if relalvl == 1 /*Now our dataset consists of all the 5 digit codes that have children. */ append using mod\NAICS_CNTRY_IMP_subtotal.dta /*Negative children values */ collapse (sum) HHindex concrat4 ET EE PT PH PE VA MT SH EC EB VN VF VE QE CP /// CPS CPE I IW IM IF GE IMP, by(country Year g5) /*By summing the parents with the negative children, we subtract children from parents and get the aggregate of all the children that are missing */ merge m:1 g5 using mod\NAICSm_concordance.dta /* So we can label observations with NAICS+"m" */ drop _merge /*Since we subtracted, we should make sure we didn't create any negative values. */ foreach var of varlist ET EE PT PH PE VA MT SH EC EB VN VF VE QE CP CPS CPE I IW IM IF GE{ di "`var'" count if `var' < 0 & country == "Mexico" list NAICS `var' if `var' < 0 & country == "Mexico" } * after browsing the data, noticed that two modified NAICS code takes on negative values. * they are 33699m & 33461m * Should not care about GE, because GE can take on negative values drop if NAICS == "33699m" | NAICS == "33461m" replace IMP = 0 if IMP <0 & IMP >= -2 count if IMP < 0 foreach var of varlist ET EE PT PH PE VA MT SH EC EB VN VF VE QE CP CPS CPE I IW IM IF IMP{ di "`var'" count if `var' < 0 } * it should show zero all over the place gen newNAICS=1 save mod\NAICS_CNTRY_IMP_newNAICS.dta, replace * Done with creation of NAICS with "m" ************************************************************************************** use mod\NAICS_CNTRY_IMP_v3.dta, clear gen newNAICS=0 append using mod\NAICS_CNTRY_IMP_newNAICS.dta sort g5 country Year drop if agglvl == 6 & substr(NAICS, 1, 5) == "33699" drop if agglvl == 6 & substr(NAICS, 1, 5) == "33461" replace relalvl = 0 if NAICS == "33699" | NAICS == "33461" drop if relalvl == 1 & agglvl == 5 & newNAICS==0 /* Because we already replaced these parents with NAICS with "m" */ drop maxrelvl_g5 relalvl g5 newNAICS agglvl save final\NAICS_CNTRY_IMP_v3_final.dta, replace *STAGE 2 ADD AD ************************************************************************** *Merges data on US antidumping actions with Stage 1 dataset. The antidumping data *gives information on each antidumping case. We need to convert case_id to NAICS by *using the case_id to HS mapping in AD-USA-Products and the HS to NAICS mapping in *hs2naics.txt. This gives us a many-to-many mapping between case_id and NAICS. *Using AD-USA-Products, we match case_id to a year and country, giving us a mapping *between NAICS-country-Year groups and cases (or lack of cases). Most commonly, there *are zero or one cases to a NAICS-country-Year group. However, some NAICS-country-Year *groups correspond to multiple antidumping cases. In these cases, we keep the information *from only one case. We try to keep the case that preserves the most information. * Create HS to NAICS concordance * http://www.census.gov/foreign-trade/schedules/b/2008/imp-code.txt insheet using raw\hs2naics.txt, clear keep v1 v9 rename v1 HS_code rename v9 NAICS drop if HS_code == "MI" * Like previous files, this file disguises 5-digit codes as 6-digit codes by adding * a zero to the end. The codes 23110 and 334510 are the only codes ending with *a zero that should be 6-digit codes. replace NAICS = substr(NAICS, 1, 5) if substr(NAICS, 6, 1) == "0" & /// NAICS != "323110" & NAICS != "334510" destring NAICS, replace ignore("X") *HS codes are already in 10 digit form gen HS10 = HS_code save mod\hs2naics.dta, replace **************************************************************************** * import AD-USA-Master into Stata. **************************************************************************** * AD data is revised quarterly. The data set comes from the first two * tabs of GAD-USA.xls on the following site: * http://econ.worldbank.org/ttbd/gad/ * I removed the first column of the AD-USA-Master tab (it just stated the revision version) * and saved the first two tabs as tab-delimited text files AD-USA-Master and AD-USA-Products. * AD-USA-Master gives a description of all the cases filed. * AD-USA-Products provides the HS product codes for each case. insheet using raw\AD-USA-Master.txt, names clear keep case_id inv_cty_name init_date p_ad_date p_ad_measure f_ad_date /// f_ad_measure p_ad_duty f_ad_duty gen case_filed = 1 * The variable case_filed equals one for country-year-industry groups for which an * antidumping case was filed - that is, it equals one for all country-year-industry * groups that are present in AD-USA-Master.txt. Otherwise, case_filed=0. This will * become clear once this dataset is merged with the Stage 1 datset. drop if case_id == "" rename inv_cty_name country replace country = "Trinidad" if country == "Trinidad & Tobago" save mod\AD-USA-Master.dta, replace **************************************************************************** use mod\AD-USA-Master.dta, clear /* The 49 countries we're using */ keep if country == "Argentina" | country == "Australia" | country == "Austria" /// | country == "Bangladesh" | country == "Belgium" | country == "Brazil" | /// country == "Canada" | country == "Chile"| country == "China"| /// country == "Colombia" | country == "Costa Rica" | country == "Denmark" | /// country == "Ecuador"| country == "Egypt"| country == "El Salvador"| /// country == "Finland"| country == "France"| country == "Germany"| /// country == "Greece"| country == "Hong Kong"| country == "Hungary"| /// country == "India"| country == "Indonesia" | country == "Ireland"| /// country == "Israel"| country == "Italy"| country == "Japan"| /// country == "Kenya"| country == "South Korea"| country == "Malaysia" | /// country == "Mexico"| country == "Netherlands"| country == "New Zealand"| /// country == "Norway"| country =="Peru"| country =="Philippines" |country =="Poland"| /// country == "Portugal"| country == "Singapore"| country == "South Africa"| /// country == "Spain"| country == "Sweden"| country == "Switzerland"| /// country == "Taiwan"| country == "Thailand"| country == "Trinidad"| /// country =="Turkey"| country =="United Kingdom"| country =="Venezuela" *Don't worry that West Germany is dropped, it doesn't fall within our time span anyway gen Year = substr(init_date,-4, . ) destring Year, replace keep if Year >= 1997 & Year <= 2006 *Cases without information on dates and antidumping measures can't be incorporated into *our time-based data set. We count how many observations have this problem: count if f_ad_date=="." & f_ad_measure == "." & f_ad_duty =="." /// & p_ad_date == "." & p_ad_measure == "." & p_ad_duty == "." *Since there are only 85 such observations, it is not too detrimental to drop them. drop if f_ad_date=="." & f_ad_measure == "." & f_ad_duty =="." /// & p_ad_date == "." & p_ad_measure == "." & p_ad_duty == "." drop if f_ad_date=="MI" & f_ad_measure == "MI" & f_ad_duty =="MI" *We never use the numerical value of p_ad_duty; we only check to see *whether p_ad_duty is nonmissing to check for important observations. *We clean it up anyway to make the data easier on the eyes replace p_ad_duty = "6.12" if p_ad_duty == "8.15 (Durum) & 6.12 (Hard)" replace p_ad_duty = substr(p_ad_duty, 1, 6) destring p_ad_duty, replace ignore("(" "f" ) destring f_ad_duty, replace ignore("(Hard only)" "(non-canned)" "MI") *We are primarily concerned with the final AD duty - this will be used *in some of our regression specifications rename f_ad_duty AD_DUTY drop p_ad_date f_ad_date save mod\AD-USA-Master_sample.dta, replace keep case_id save mod\sel_id.dta, replace ********************************************************************************* insheet using raw\AD-USA-Products.txt, clear duplicates drop keep case_id hs_code hs_digits drop if hs_code == . drop if hs_digits == . rename hs_code HS_code rename hs_digits HS_digits *We need to convert HS_code to a string; however, in %10.0g format, Stata *cannot convert to string reversibly. So we first change the format and we *then usedisplayformat when converting to string so as to eliminate *conversion ambiguity. format HS_code %20.0g tostring HS_code, replace usedisplayformat list if length(HS_code) != HS_digits *Some of the HS codes in this data set have a different number of digits than *HS_digits suggests, which is a sign that the HS codes are not all accurate. *A further indication is that not all of the HS codes are real, according to the site *http://www.usitc.gov/publications/docs/tata/hts/bychapter/1100_HTS_delimited.txt *It seems that many of the HS codes have lost leading zeroes, perhaps in transfer. *By using the site above, it is possible to check all HS codes that have digit *discrepancies to make sure they exist. The following changes were made *using that site. *One additional problem: 3021200029 cannot be found in the site or in a general *internet search, even when preceded with a zero. This makes sense in light of *the fact that notes="was converted to 0302120003 and 0302120004". Since *the former represents farmed Atlantic salmon and the latter represents Atlantic *salmon that is not farmed, I replaced this with the former, since farmed Atlantic *salmon represents a larger part of US consumption. replace HS_digits = 10 if substr(HS_code, 9, 2) == "00" & length(HS_code) == 10 *codes that are formatted in this way are most likely correct replace HS_digits = 6 if HS_code == "481032" | HS_code == "481039" | HS_code == "481092" replace HS_digits = 10 if HS_code == "4810142010" | HS_code == "4810142090" | HS_code == "4810192010" | /// HS_code == "4810192090" | HS_code == "7411101030" | HS_code == "7411101090" *these codes are correct but HS_digits is wrong replace HS_code = "0302120003" if HS_code == "3021200029" replace HS_digits = 10 if HS_code == "0302120003" *Farmed Atlantic salmon, see above replace HS_code = "0" + HS_code if length(HS_code) != HS_digits & (substr(HS_code, 1, 3) == "102" | /// substr(HS_code, 1, 3) == "103" | /// substr(HS_code, 1, 3) == "207" | substr(HS_code, 1, 3) == "302" | substr(HS_code, 1, 3) == "304" | /// substr(HS_code, 1, 3) == "306" | substr(HS_code, 1, 3) == "307" | /// substr(HS_code, 1, 3) == "401" | substr(HS_code, 1, 3) == "409" | substr(HS_code, 1, 3) == "603" | /// substr(HS_code, 1, 3) == "702" | /// substr(HS_code, 1, 3) == "703" | substr(HS_code, 1, 3) == "710" | substr(HS_code, 1, 3) == "711" | /// substr(HS_code, 1, 3) == "802" | substr(HS_code, 1, 3) == "806" | substr(HS_code, 1, 3) == "810" | /// substr(HS_code, 1, 3) == "811") replace HS_code = "0" + HS_code if HS_code == "7020020" | HS_code == "7020040" | HS_code == "7020060" list if length(HS_code) != HS_digits //All should be fixed now save mod\AD-USA-Products.dta, replace use mod\AD-USA-Products.dta, clear merge m:1 case_id using mod\sel_id.dta keep if _merge == 3 /*We only keep the cases that are included in AD-USA-Master */ drop _merge *We convert all codes to 10 digit codes by adding zeroes to the end tab HS_digits gen HS10 = HS_code replace HS10 = HS_code + "00" if HS_digits == 8 replace HS10 = HS_code + "0000" if HS_digits == 6 replace HS10 = HS_code + "000000" if HS_digits == 4 sort HS_code save mod\AD-USA-Products_sample.dta, replace * strategy: * round 1: merge all, * round 2: if HS_digits < 10, do follow(many mini steps); merge on first 4, 6, 7, 8, 9 * digits. * round 3: if HS_digits == 10, if there exists HS10_g > HS and HS10_l < HS, then * if both HS10_g and HS10_l correpond to the same HS. then they took the same value. use mod\hs2naics.dta, clear gen HS4 = substr(HS_code, 1, 4) gen HS6 = substr(HS_code, 1, 6) gen HS8 = substr(HS_code, 1, 8) save mod\hs2naics_HS_multi.dta, replace /*Try to merge on 10 digits */ ****************************************** use mod\AD-USA-Products_sample.dta, clear duplicates report case_id merge m:1 HS10 using mod\hs2naics.dta rename _merge merge0 save mod\sel_id_hs2naics_r10_all.dta, replace use mod\sel_id_hs2naics_r10_all.dta, clear keep if merge0 == 3 *These are 'good' observations, they match an HS from the concordance exactly save mod\sel_id_hs2naics_r10_good.dta, replace use mod\sel_id_hs2naics_r10_all.dta, clear keep if merge0 == 1 *These are 'bad' observations, they did not match an HS from the concordance *NAICS are all missing because concordance did not work on these drop NAICS save mod\sel_id_hs2naics_r10_bad.dta, replace ****************************************** /*Try to merge on 8 digits */ ****************************************** use mod\sel_id_hs2naics_r10_bad.dta, clear gen HS8 = substr(HS_code, 1, 8) joinby HS8 using mod\hs2naics_HS_multi.dta, unmatched(both) rename _merge merge8 tab merge8 drop if merge8==2 drop HS4 HS6 HS8 save mod\sel_id_hs2naics_r8_all.dta, replace keep if merge8 == 3 save mod\sel_id_hs2naics_r8_good.dta, replace use mod\sel_id_hs2naics_r8_all.dta, clear keep if merge8 == 1 drop NAICS save mod\sel_id_hs2naics_r8_bad.dta, replace ****************************************** /*Try to merge on 6 digits */ ****************************************** use mod\sel_id_hs2naics_r8_bad.dta, clear gen HS6 = substr(HS_code, 1, 6) joinby HS6 using mod\hs2naics_HS_multi.dta, unmatched(both) rename _merge merge6 tab merge6 drop if merge6==2 drop HS4 HS6 HS8 save mod\sel_id_hs2naics_r6_all.dta, replace keep if merge6 == 3 save mod\sel_id_hs2naics_r6_good.dta, replace use mod\sel_id_hs2naics_r6_all.dta, clear keep if merge6== 1 drop NAICS save mod\sel_id_hs2naics_r6_bad.dta, replace ****************************************** /*Try to merge on 4 digits */ ****************************************** use mod\sel_id_hs2naics_r6_bad.dta, clear gen HS4= substr(HS_code, 1, 4) joinby HS4 using mod\hs2naics_HS_multi.dta, unmatched(both) rename _merge merge4 tab merge4 *All have been merged - we are done! drop if merge4==2 drop HS4 HS6 HS8 save mod\sel_id_hs2naics_r4_good.dta, replace ****************************************** use mod\sel_id_hs2naics_r10_good.dta, clear forvalues i = 4(2)8{ append using mod\sel_id_hs2naics_r`i'_good.dta } keep case_id NAICS save mod\sel_id_hs2naics.dta, replace *Create many-to-many concordance of case_id to NAICS use mod\AD-USA-Products_sample.dta, clear keep case_id duplicates drop merge 1:m case_id using mod\sel_id_hs2naics.dta drop _merge duplicates report duplicates drop save mod\id2naics.dta, replace use mod\AD-USA-Master_sample.dta, clear merge 1:m case_id using mod\id2naics.dta drop _merge init_date order country Year NAICS tostring NAICS, replace keep if substr(NAICS, 1, 2) == "31" | substr(NAICS, 1, 2) == "32" | substr(NAICS, 1, 2) == "33" duplicates report egen num_case = count(Year), by (country Year NAICS) egen g1 = group(country Year NAICS) * Here we want to drop duplicating observations in terms of * NAICS country Year, referred to as g1. * ind_keep = 1 for observations that should be kept in the dataset. gen ind_keep = . * if a group only has one AD imposed, keep it. 267 of the 332 * observations with AD cases only have one AD case. replace ind_keep = 1 if num_case == 1 * If a group has more than one AD case imposed, * but only one case has a nonmissing AD_DUTY, * keep the one with the nonmissing AD_DUTY egen num_F_AD = count(AD_DUTY), by(g1) egen num_P_AD = count(p_ad_duty), by(g1) replace ind_keep = 1 if num_case > 1 & num_F_AD == 1 & AD_DUTY != . replace ind_keep = 0 if num_case > 1 & num_F_AD == 1 & AD_DUTY == . * There are 51 observations left that need to be considered, and they * correspond to 22 NAICS-country-Year groups. We must keep one * observation for each group and drop the others. We choose these 22 * using a random seed. Since this code was cleaned in trivial ways * after this code was used to produce the results for the 2011 paper submission, * the dataset is sorted differently than it was before. The following lines of code * were added to maintain the random picks from the 2011 submission. When * using this code in the future to produce new results, this section can be deleted. replace ind_keep = 1 if case_id == "USA-AD-791" replace ind_keep = 0 if case_id == "USA-AD-801" replace ind_keep = 1 if case_id == "USA-AD-814" & substr(NAICS, 1, 5)=="32541" replace ind_keep = 0 if case_id == "USA-AD-828" & substr(NAICS, 1, 5)=="32541" replace ind_keep = 1 if case_id == "USA-AD-990" & substr(NAICS, 1, 5)=="33291" replace ind_keep = 0 if case_id == "USA-AD-1021" & substr(NAICS, 1, 5)=="33291" replace ind_keep = 1 if case_id == "USA-AD-1070a" & (substr(NAICS, 1, 5)=="32212" | NAICS=="322291" | substr(NAICS, 1, 5)=="33999") replace ind_keep = 0 if case_id == "USA-AD-1070b" & (substr(NAICS, 1, 5)=="32212" | NAICS=="322291" | substr(NAICS, 1, 5)=="33999") replace ind_keep = 1 if case_id == "USA-AD-1070b" & NAICS=="322299" replace ind_keep = 0 if case_id == "USA-AD-1070a" & NAICS=="322299" replace ind_keep = 1 if case_id == "USA-AD-1070b" & substr(NAICS, 1, 5)=="32799" replace ind_keep = 0 if case_id == "USA-AD-1070a" & substr(NAICS, 1, 5)=="32799" replace ind_keep = 1 if case_id == "USA-AD-891" & substr(NAICS, 1, 5)=="33111" & country=="China" & Year==2000 replace ind_keep = 0 if case_id == "USA-AD-899" & substr(NAICS, 1, 5)=="33111" & country=="China" & Year==2000 replace ind_keep = 0 if case_id == "USA-AD-874" & substr(NAICS, 1, 5)=="33111" & country=="China" & Year==2000 replace ind_keep = 1 if case_id == "USA-AD-854" & substr(NAICS, 1, 5)=="33111" & country=="South Korea" & Year==1999 replace ind_keep = 0 if case_id == "USA-AD-821" & substr(NAICS, 1, 5)=="33111" & country=="South Korea" & Year==1999 replace ind_keep = 1 if case_id == "USA-AD-793" & substr(NAICS, 1, 5)=="33111" & country=="Taiwan" & Year==1998 replace ind_keep = 0 if case_id == "USA-AD-803" & substr(NAICS, 1, 5)=="33111" & country=="Taiwan" & Year==1998 replace ind_keep = 1 if AD_DUTY==47.86 & substr(NAICS, 1, 5)=="33111" & country=="Indonesia" & Year==2000 replace ind_keep = 0 if AD_DUTY==60.46 & substr(NAICS, 1, 5)=="33111" & country=="Indonesia" & Year==2000 replace ind_keep = 1 if AD_DUTY==49.86 & substr(NAICS, 1, 5)=="33111" & country=="Mexico" & Year==2001 replace ind_keep = 0 if AD_DUTY==20.11 & substr(NAICS, 1, 5)=="33111" & country=="Mexico" & Year==2001 replace ind_keep = 1 if case_id == "USA-AD-889" & NAICS == "331111" & country=="South Korea" & Year == 2000 replace ind_keep = 0 if case_id == "USA-AD-877" & NAICS == "331111" & country=="South Korea" & Year==2000 replace ind_keep = 1 if case_id == "USA-AD-942" & NAICS == "331111" & country == "Taiwan" & Year == 2001 replace ind_keep = 0 if case_id == "USA-AD-980" & NAICS == "331111" & country == "Taiwan" & Year == 2001 replace ind_keep = 1 if case_id == "USA-AD-799" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-790" & ind_keep == . replace ind_keep = 1 if case_id == "USA-AD-807" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-800" & ind_keep == . replace ind_keep = 1 if case_id == "USA-AD-820" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-847a" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-847b" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-860" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-833" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-859" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-853" & ind_keep == . replace ind_keep = 1 if case_id == "USA-AD-1070a" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-1070b" & ind_keep == . replace ind_keep = 1 if case_id == "USA-AD-935" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-943" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-968" & ind_keep == . replace ind_keep = 1 if case_id == "USA-AD-977" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-940" & ind_keep == . replace ind_keep = 1 if case_id == "USA-AD-769" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-764" & ind_keep == . /*End of section that can be deleted */ list g1 case_id p_ad_duty AD_DUTY num_case num_F_AD num_P_AD if ind_keep == . /* This next section of code is for drawing the random observations among the duplicate NAICS-country-Year groups. This code must be included in the program if the hard-coded instructions to keep and remove observations (above) are deleted. */ /* set seed 999 by g1, sort: gen draw = runiform() + 1 if ind_keep == . gsort g1 -draw by g1, sort: gen num = _n if ind_keep == . replace ind_keep = 1 if num == 1 & ind_keep == . replace ind_keep = 0 if num > 1 & ind_keep == . */ keep if ind_keep == 1 keep country Year NAICS case_id case_filed AD_DUTY sort country Year NAICS save mod\imposed_AD.dta, replace use mod\AD-USA-Master_sample.dta, clear merge 1:m case_id using mod\id2naics.dta drop _merge init_date tostring NAICS, replace keep if substr(NAICS, 1, 2) == "31" | substr(NAICS, 1, 2) == "32" | substr(NAICS, 1, 2) == "33" gen NAICS5 = substr(NAICS, 1, 5) order country Year NAICS5 drop NAICS duplicates drop egen num_case = count(Year), by (country Year NAICS5) egen g5 = group(country Year NAICS5) * Here we want to drop duplicating observations in terms of * NAICS5 country Year, referred to as g5. For this section of code, the * duplicate observations occur within the more aggregated NAICS 5 digit * industries. * ind_keep = 1 for observations that should be kept in the dataset. gen ind_keep = . * if a group only has one AD imposed, keep it. 267 of the 332 * observations with AD cases only have one AD case. replace ind_keep = 1 if num_case == 1 * If a group has more than one AD case imposed, * but only one case has a nonmissing AD_DUTY, * keep the one with the nonmissing AD_DUTY egen num_F_AD = count(AD_DUTY), by(g5) egen num_P_AD = count(p_ad_duty), by(g5) replace ind_keep = 1 if num_case > 1 & num_F_AD == 1 & AD_DUTY != . replace ind_keep = 0 if num_case > 1 & num_F_AD == 1 & AD_DUTY == . * There are 46 observations left that need to be considered, and they * correspond to 20 NAICS5-country-Year groups. We must keep one * observation for each group and drop the others. We choose these 20 * using a random seed. Since this code was cleaned in trivial ways * after this code was used to produce the results for the 2011 paper submission, * the dataset is sorted differently than it was before. The following lines of code * were added to maintain the random picks from the 2011 submission. When * using this code in the future to produce new results, this section can be deleted. replace ind_keep = 1 if case_id == "USA-AD-791" replace ind_keep = 0 if case_id == "USA-AD-801" replace ind_keep = 1 if case_id == "USA-AD-814" & NAICS5=="32541" replace ind_keep = 0 if case_id == "USA-AD-828" & NAICS5=="32541" replace ind_keep = 1 if case_id == "USA-AD-990" & NAICS5=="33291" replace ind_keep = 0 if case_id == "USA-AD-1021" & NAICS5=="33291" replace ind_keep = 1 if case_id == "USA-AD-1070a" & (NAICS5=="32212" | NAICS5=="33999") replace ind_keep = 0 if case_id == "USA-AD-1070b" & (NAICS5=="32212" | NAICS5=="33999") replace ind_keep = 1 if case_id == "USA-AD-1070b" & NAICS5=="32799" replace ind_keep = 0 if case_id == "USA-AD-1070a" & NAICS5=="32799" replace ind_keep = 1 if case_id == "USA-AD-891" & NAICS5=="33111" & country=="China" & Year==2000 replace ind_keep = 0 if case_id == "USA-AD-899" & NAICS5=="33111" & country=="China" & Year==2000 replace ind_keep = 0 if case_id == "USA-AD-874" & NAICS5=="33111" & country=="China" & Year==2000 replace ind_keep = 1 if case_id == "USA-AD-854" & NAICS5=="33111" & country=="South Korea" & Year==1999 replace ind_keep = 0 if case_id == "USA-AD-821" & NAICS5=="33111" & country=="South Korea" & Year==1999 replace ind_keep = 1 if AD_DUTY==22.89 & NAICS5=="33111" & country=="South Korea" & Year==2000 replace ind_keep = 0 if AD_DUTY==40.21 & NAICS5=="33111" & country=="South Korea" & Year==2000 replace ind_keep = 1 if case_id == "USA-AD-793" & NAICS5=="33111" & country=="Taiwan" & Year==1998 replace ind_keep = 0 if case_id == "USA-AD-803" & NAICS5=="33111" & country=="Taiwan" & Year==1998 replace ind_keep = 1 if AD_DUTY==47.86 & NAICS5=="33111" & country=="Indonesia" & Year==2000 replace ind_keep = 0 if AD_DUTY==60.46 & NAICS5=="33111" & country=="Indonesia" & Year==2000 replace ind_keep = 1 if AD_DUTY==49.86 & NAICS5=="33111" & country=="Mexico" & Year==2001 replace ind_keep = 0 if AD_DUTY==20.11 & NAICS5=="33111" & country=="Mexico" & Year==2001 replace ind_keep = 1 if AD_DUTY==198.08 & NAICS5=="33712" & country=="China" & Year==2003 replace ind_keep = 0 if AD_DUTY==157.63 & NAICS5=="33712" & country=="China" & Year==2003 replace ind_keep = 1 if AD_DUTY==266.83 & NAICS5=="32223" & country=="China" & Year==2004 replace ind_keep = 0 if AD_DUTY==112.64 & NAICS5=="32223" & country=="China" & Year==2004 replace ind_keep = 1 if case_id=="USA-AD-942" & NAICS=="33111" & country=="Taiwan" & Year==2001 replace ind_keep = 0 if case_id=="USA-AD-980" & NAICS=="33111" & country=="Taiwan" & Year==2001 replace ind_keep = 1 if case_id=="USA-AD-769" & NAICS=="33111" & country=="Germany" & Year==1997 replace ind_keep = 0 if case_id=="USA-AD-764" & NAICS=="33111" & country=="Germany" & Year==1997 replace ind_keep = 1 if case_id == "USA-AD-799" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-790" & ind_keep == . replace ind_keep = 1 if case_id == "USA-AD-807" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-800" & ind_keep == . replace ind_keep = 1 if case_id == "USA-AD-820" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-847a" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-847b" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-860" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-833" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-859" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-853" & ind_keep == . replace ind_keep = 1 if case_id == "USA-AD-1070a" & ind_keep == . replace ind_keep = 0 if case_id == "USA-AD-1070b" & ind_keep == . /*End of section that can be deleted */ list g5 case_id p_ad_duty AD_DUTY num_case num_F_AD num_P_AD if ind_keep == . /* This next section of code is for drawing the random observations among the duplicate NAICS-country-Year groups. This code must be included in the program if the hard-coded instructions to keep and remove observations (above) are deleted. */ /* by g5, sort: gen draw = runiform() + 1 if ind_keep == . gsort g5 -draw by g5, sort: gen num = _n if ind_keep == . replace ind_keep = 1 if num == 1 & ind_keep == . replace ind_keep = 0 if num > 1 & ind_keep == . */ keep if ind_keep == 1 keep country Year NAICS5 case_id case_filed AD_DUTY sort country Year NAICS5 save mod\NAICS5_imposed_AD.dta, replace *Merge antidumping data with Stage 1 data. We first merge the 6-digit *antidumping dataset, then the 5-digit dataset. use final\NAICS_CNTRY_IMP_v3_final.dta, clear merge m:1 country Year NAICS using mod\imposed_AD.dta *Some of the nonmatches from imposed_AD occur because Stage 3 is sometimes *more aggregated (5-digit). However, there are some cases that refer to *NAICS-country-Year groups that are not in the Stage 1 dataset. *(For example, there is a case for 331315, but this code is not in the *original dataset.) * AD_imposed refers to final AD_duty gen AD_imposed = . replace AD_imposed = 1 if _merge == 3 & AD_DUTY != . replace AD_imposed = 0 if _merge == 3 & AD_DUTY == . save mod\NAICS_CNTRY_IMP_AD_r1.dta, replace use mod\NAICS_CNTRY_IMP_AD_r1.dta, clear keep if _merge == 3 drop _merge save mod\NAICS_CNTRY_IMP_AD_r1_matched.dta, replace use mod\NAICS_CNTRY_IMP_AD_r1.dta, clear keep if _merge == 2 drop _merge keep country Year NAICS case_filed gen NAICS5 = substr(NAICS, 1, 5) drop NAICS duplicates drop sort country Year NAICS5 merge 1:1 country Year NAICS5 using mod\NAICS5_imposed_AD.dta tab _merge drop if _merge == 2 drop _merge sort country Year NAICS5 save mod\NAICS_CNTRY_IMP_AD_r2_using.dta, replace use mod\NAICS_CNTRY_IMP_AD_r1.dta, clear keep if _merge == 1 drop _merge case_id AD_DUTY case_filed AD_imposed gen NAICS5 = substr(NAICS, 1, 5) merge m:1 country Year NAICS5 using mod\NAICS_CNTRY_IMP_AD_r2_using.dta gen AD_imposed = 1 if _merge == 3 & AD_DUTY != . replace AD_imposed = 0 if _merge == 3 & AD_DUTY == . drop _merge append using mod\NAICS_CNTRY_IMP_AD_r1_matched.dta replace AD_imposed = 0 if AD_imposed == . drop NAICS5 save final\NAICS_CNTRY_IMP_AD_final.dta, replace *STAGE 3 ADD ELASTICITIES *We merge elasticity data with the Stage 2 dataset. We merge the *Stage 2 dataset with the sigma file, matching on 3-digit HS codes. We *merge the result with the omega file on 4-digit HS codes. *Some NAICS are missing omega data. *Elasticity data comes from Broda, Limao, and Weinstein (2008) and *Broda, Greenfield, and Weinstein (2006) *http://www.columbia.edu/~dew35/TradeElasticities/TradeElasticities.html *Sigma elasticity data for 3-digit HS codes use raw\Sigmas73countries_9403_HS3digit.dta, clear rename cname country keep if country == "USA" drop country ccode rename hs_3digit HS3 tostring HS3, replace replace HS3 = "0" + HS3 if length(HS3) == 2 save mod\Sigmas_US_HS3digit.dta, replace *Match sigmas with NAICS codes using our concordance of *NAICS and HS10 use mod\hs2naics.dta, clear tostring NAICS, replace gen HS3 = substr(HS_code, 1, 3) merge m:1 HS3 using mod\Sigmas_US_HS3digit.dta *One NAICS code in hs2naics can map to multiple HS codes. Not all *of the HS codes have a match in Sigmas_US_HS3digit, but all NAICS codes *that we use have at least one HS that has a Sigma match. drop _merge HS10 HS3 HS_code save mod\Sigmas_NAICS_HS.dta, replace use mod\Sigmas_NAICS_HS.dta, clear collapse (min) sig_min = sigma (max) sig_max = sigma (median) sig_med = sigma, by(NAICS) keep if length(NAICS) == 5 | length(NAICS) == 6 save mod\Sigmas_NAICS_HS_r1.dta, replace *This file is the list of sigmas matched to 6-digit NAICS use mod\Sigmas_NAICS_HS_r1.dta, clear keep if length(NAICS) == 6 save mod\Sigmas_NAICS_HS_lvl6.dta, replace *This file is the list of sigmas matched to 5-digit NAICS. This includes *sigmas that matched directly to 5-digit NAICS and sigmas that matched *to 6-digit NAICS but were collapsed to the 5-digit NAICS level. use mod\Sigmas_NAICS_HS.dta, clear gen NAICS5 = substr(NAICS, 1, 5) drop if NAICS5 == NAICS collapse (min) sig_min = sigma (max) sig_max = sigma (median) sig_med = sigma, by(NAICS5) rename NAICS5 NAICS append using mod\Sigmas_NAICS_HS_r1.dta drop if length(NAICS) == 6 rename NAICS NAICS5 save mod\Sigmas_NAICS_HS_lvl5.dta, replace *We now merge Stage 2 data with the Sigmas data. We first merge *with 6-digit NAICS sigmas. use final\NAICS_CNTRY_IMP_AD_final.dta, clear merge m:1 NAICS using mod\Sigmas_NAICS_HS_lvl6.dta drop if _merge == 2 save mod\NAICS_CNTRY_IMP_AD_sigma_merge1.dta, replace *We now merge the unmatched Stage 2 data with 5-digit NAICS sigmas. use mod\NAICS_CNTRY_IMP_AD_sigma_merge1.dta, clear keep if _merge == 1 drop sig_min sig_max sig_med _merge gen NAICS5 = substr(NAICS, 1, 5) merge m:1 NAICS5 using mod\Sigmas_NAICS_HS_lvl5.dta keep if substr(NAICS, 1, 2)=="31" | substr(NAICS, 1, 2)=="32" | substr(NAICS, 1, 2)=="33" tab _merge drop _merge save mod\NAICS_CNTRY_IMP_AD_sigma_merge2.dta, replace use mod\NAICS_CNTRY_IMP_AD_sigma_merge1.dta, clear keep if _merge == 3 drop _merge append using mod\NAICS_CNTRY_IMP_AD_sigma_merge2.dta drop NAICS5 save mod\NAICS_CNTRY_IMP_AD_M_elast.dta, replace *We now do the same thing with omegas use raw\OmegasUSA_94-03_HS4digit.dta, clear keep hscode4 Export_Elasticity rename hscode4 HS4 rename Export_Elasticity Omega tostring HS4, replace replace HS4 = "0" + HS4 if length(HS4) == 3 save mod\Omegas_US_HS4digit.dta, replace use mod\hs2naics.dta, clear tostring NAICS, replace gen HS4 = substr(HS_code, 1, 4) merge m:1 HS4 using mod\Omegas_US_HS4digit.dta drop _merge HS10 HS4 HS_code save mod\Omegas_NAICS_HS.dta, replace use mod\Omegas_NAICS_HS.dta, clear collapse (min) ome_min = Omega (max) ome_max = Omega (median) ome_med = Omega, by(NAICS) *Unlike with the sigmas, some NAICS codes have no omegas. Specifically, *311512, 31152, 326191, 32732, 332115, 333311, 335912, 336413, 336419. *Of these, 31152 and 32732 are in our data set keep if length(NAICS) == 5 | length(NAICS) == 6 save mod\Omegas_NAICS_HS_r1.dta, replace use mod\Omegas_NAICS_HS_r1.dta, clear keep if length(NAICS) == 6 save mod\Omegas_NAICS_HS_lvl6.dta, replace use mod\Omegas_NAICS_HS.dta, clear gen NAICS5 = substr(NAICS, 1, 5) drop if NAICS5 == NAICS collapse (min) ome_min = Omega (max) ome_max = Omega (median) ome_med = Omega, by(NAICS5) *33211 has no omegas rename NAICS5 NAICS append using mod\Omegas_NAICS_HS_r1.dta drop if length(NAICS) == 6 rename NAICS NAICS5 save mod\Omegas_NAICS_HS_lvl5.dta, replace use mod\NAICS_CNTRY_IMP_AD_M_elast.dta, clear merge m:1 NAICS using mod\Omegas_NAICS_HS_lvl6.dta drop if _merge == 2 save mod\NAICS_CNTRY_IMP_AD_M_omega_merge1.dta, replace use mod\NAICS_CNTRY_IMP_AD_M_omega_merge1.dta, clear keep if _merge == 1 drop _merge drop ome_min ome_max ome_med gen NAICS5 = substr(NAICS, 1, 5) merge m:1 NAICS5 using mod\Omegas_NAICS_HS_lvl5.dta *All are matched, but 31152, 32732, and 33211 still don't have omegas, so they *will be left out of certain regression specifications drop if _merge==2 drop _merge save mod\NAICS_CNTRY_IMP_AD_M_omega_merge2.dta, replace use mod\NAICS_CNTRY_IMP_AD_M_omega_merge1.dta, clear keep if _merge == 3 drop _merge append using mod\NAICS_CNTRY_IMP_AD_M_omega_merge2.dta drop NAICS5 save final\NAICS_CNTRY_IMP_AD_elast.dta, replace *STAGE 4 ADD EXCHANGE RATES * We merge country-Year exchange rate data with our Stage 3 data. * All EU countries are given the same EU exchange rate. * The EU countries are as follows: Austria, Belgium, Denmark, * Finland, France, Germany, Greece, Ireland, Italy, Luxembourg, * the Netherlands, Portugal, Spain, Sweden and the United Kingdom. * Our data set does not include Luxembourg. *http://www.ers.usda.gov/Data/ExchangeRates/ insheet using raw\real_exchange_rate.txt, clear rename v1 country keep country year* drop if country == "Austria" | country == "Belgium" | country == "Denmark" | /// country == "Finland" | country == "France" | country == "Germany" | country == "Greece" | /// country == "Ireland" | country == "Italy" | country == "Netherlands" | country == "Portugal" | /// country == "Spain" | country == "Sweden" | country == "United Kingdom" replace country = "Trinidad" if country == "Trin/Tobago" replace country = "China" if country == "China,PR" replace country = "EU" if country == "EU 15" replace country = "South Korea" if country == "Korea" foreach i of numlist 1996(1)2006{ rename year`i' EXR`i' } duplicates drop reshape long EXR, i(country) j(Year) drop if Year == 1996 save mod\EXR.dta, replace use mod\EXR.dta, clear keep if country=="EU" drop country gen dum_EU = 1 save mod\EXR_EUonly, replace use final\NAICS_CNTRY_IMP_AD_elast.dta, clear *Merge all but EU first merge m:1 country Year using mod\EXR.dta *EXR data has countries that are not in our dataset drop if _merge==2 drop _merge *Now merge EU countries gen dum_EU = (country == "Austria" | country == "Belgium" | country == "Denmark" | /// country == "Finland" | country == "France" | country == "Germany" | country == "Greece" | /// country == "Ireland" | country == "Italy" | country == "Netherlands" | country == "Portugal" | /// country == "Spain" | country == "Sweden" | country == "United Kingdom") *Put aside EXR variable to make room for EXR from the next merge rename EXR EXR_old merge m:1 dum_EU Year using mod\EXR_EUonly *Now bring EXR back replace EXR = EXR_old if EXR==. drop _merge dum_EU EXR_old save final\NAICS_CNTRY_IMP_AD_elast_EXR.dta, replace *STAGE 5 use raw\SG_NAICS_COMB.dta, clear drop case_id SG_tariff rename year Year duplicates drop save mod\SG_NAICS_COMB_mod.dta, replace use final\NAICS_CNTRY_IMP_AD_elast_EXR.dta, clear merge m:1 NAICS country Year using mod\SG_NAICS_COMB_mod.dta replace SG_indicator=0 if _merge==1 rename SG_indicator SG_imposed gen AD_or_SG_imposed=0 replace AD_or_SG_imposed=1 if AD_imposed==1 | SG_imposed==1 drop _merge sort country Year NAICS save final\NAICS_CNTRY_IMP_AD_elast_EXR_SG.dta, replace *This is the final data file. This file is the only file needed to run the *variable creation do file and the subsequent regression do file. *Now run SETA_variable_creation_05-01-2012.do