$C ************************************************************* SANOASW: A PROGRAM FOR THE ESTIMATION OF THE ANALYSIS-OF- ASSOCIATION MODEL, BASED ON THE PRESENCE OF A LINEAR-BY-LINEAR INTERACTION FOR MULTIWAY TABLES. WRITTEN BY HERBERT L. SMITH POPULATION STUDIES CENTER UNIVERSITY OF PENNSYLVANIA 3718 LOCUST WALK PHILADELPHIA, PENNSYLVANIA USA 19104-6298 CURRENT EDITION: 1987. IF YOU USE THIS PROGRAM, OR ANY MODIFICATION THEREOF, PLEASE CITE IN ANY WRITTEN WORK. COMMENTS ON THE PROGRAM ARE WELCOME, AS ARE APPLICATIONS TO DATA. $ECHO $C **OVERVIEW: THIS PROGRAM ESTIMATES A SET OF ROW SCORES AND COLUMN SCORES FOR AN I-BY-J-BY-K TABLE. THE METHOD USED IS THE SUCCESSIVE ESTIMATION OF FIRST COLUMN SCORES (GIVEN FIXED ROW SCORES), THEN ROW SCORES (GIVEN FIXED COLUMN SCORES), UNTIL CONVERGENCE IS REACHED. THIS METHOD IS SUGGESTED BY GOODMAN (1979, JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION) AND FURTHER DESCRIBED BY BREEN (1984, SOCIOLOGICAL METHODS AND RESEARCH). THE EXTENSION TO MULTIWAY TABLES FOLLOWS THE LEAD OF CLOGG (1982, JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION). AN APPLICATION WITH INTERPRETATION OF PARAMETERS IS PROVIDED BY SMITH AND GARNIER (1986, SOCIOLOGICAL METHODS AND RESEARCH). BOTH ROW SCORES AND COLUMN SCORES ARE CONSTRAINED TO HAVE MEAN ZERO AND STANDARD DEVIATION ONE. SEE SMITH AND GARNIER (CITED ABOVE), CLOGG (1982, AMERICAN JOURNAL OF SOCIOLOGY), AND GOODMAN (1985, FOR COMMENTS RELATED TO THE APPLICATION OF ROW AND COLUMN SCORE RESTRICTIONS. THIS PROGRAM MUST BE 'CALLED' FROM ANOTHER 'COMMAND' GLIM PROGRAM. THE 'COMMAND' PROGRAM SHOULD CONTAIN THE FOLLOWING STATEMENTS: CALC %R = 4 CALC %C = 4 CALC %G = 2 INPUT 10 STOP WHERE '%R' IS ASSIGNED THE NUMBER OF ROWS (E.G., 4): '%C' IS THE NUMBER OF COLUMNS (AGAIN, E.G., 4): '%G' IS THE NUMBER OF TABLES ACROSS WHICH HOMOGENEOUS SETS OF ROW AND COLUMN SCORES ARE ASSUMED TO APPLY: AND THIS PROGRAM IS ON DEVICE 10. NOTE THAT THIS PROGRAM IS SET UP TO FIT HOMOGENEOUS ROW AND COLUMNS SCORES TO MULTIPLE GROUPS OF TABLES. IF THE USER HAS ONLY A SINGLE TABLE, THEN '%G' SHOULD BE SET EQUAL TO 1 IN THE 'COMMAND' GLIM PROGRAM. IF THE USER WISHES TO SPECIFY HETEROGENEOUS ROW AND COLUMN EFFECTS ACROSS TABLES, THEN EACH TABLE SHOULD BE INPUT SEPARATELY, AND THE SANOASW PROGRAM INVOKED ON EACH OCCASION. THE SUM OF THE LIKELIHOOD-RATIO CHI-SQUARED STATISTICS ACORSS TABLES WILL BE EQUIVALENT TO THAT OBTAINED IF THE SEVERAL TABLES WERE INPUT SIMULTANEOUSLY AND PHI COEFFICIENTS AND ROW AND COLUMN SCORES WERE ALLOWED TO INTERACT ACROSS GROUPS: AND SIMILARLY FOR CALCULATION OF DEGREES OF FREEDOM. SEE CLOGG (1982, JASA) FOR RELATED POINTS. ******************** MACRO MAIN ******************** THIS MACRO SUCCESSIVELY ITERATES ROW AND COLUMN SCORES, AS DESCRIBED ABOVE. THE LOGIC OF THIS ROUTINE IS SUGGESTED BY GOODMAN (1979, JASA) AND DESCRIBED BY BREEN (CITED ABOVE). THIS GLIM MACRO MAKES USE OF THE SYSTEM VECTOR '%PE' (PARAMETER ESTIMATES FROM THE PRECEDING 'FIT') AND SYSTEM SCALAR '%PL' (THE LENGTH OF THE '%PE' VECTOR) TO TRANSFORM SETS OF ESTIMATED ROW OR COLUMN COEFFICIENTS INTO FIXED VARIATES FOR THE NEXT STEP OF THE ESTIMATION PROCEDURE. $MAC MAIN! $CALC %O = %DV $ $FIT ROW+COL+LAY+(ROW*LAY)+(COL*LAY)+(RS*COL) $ $EXTRACT %PE! $CALC CS = %IF(%GT(COL,1),%PE(%PL-(%C-COL)),0) $ $FIT ROW+COL+LAY+(ROW*LAY)+(COL*LAY)+(CS*ROW) $ $EXTRACT %PE! $CALC RS = %IF(%GT(ROW,1),%PE(%PL-(%R-ROW)),0) $ $C THE FOLLOWING STATEMENT CHECKS TO SEE IF CONVERGENCE HAS BEEN REACHED, BY CHECKING THE CURRENT LIKELIHOOD-RATIO CHI-SQUARED STATISTIC AGAINST THE LIKELIHOOD RATIO CHI-SQUARED STATISTIC FROM THE PREVIOUS ITERATION OF THE 'MAIN' MACRO. NOTE THAT THE CRITICAL VALUE CHOSEN HERE--.0001--IS EXCEEDINGLY SMALL. THIS MEANS THAT THE PROGRAM WILL KEEP ITERATING FOR A 'TIGHTER' FIT EVEN AFTER MOST (OR ALL) OF THE PARAMETERS HAVE STABILIZED. FOR MOST PURPOSES I RECOMMEND THAT THE USER ALTER THIS CRITICAL VALUE, TO .01 OR EVEN .1--ESPECIALLY IF THE MODEL IS PRELIMINARY AND THE USER IS IN A COMPUTING ENVIRONMENT IN WHICH CPU TIME COSTS 'REAL' MONEY. $CALC %A = %GE((%O-%DV),0.0001) $ $ENDMAC $ $C ****************************************************** THIS IS THE BODY OF THE PROGRAM, WHICH BEGINS WITH THE INPUT OF DATA, DEFINITION OF THE FORM OF THE MODEL, AND THE ESTIMATION OF THE LIKELIHOOD-RATIO CHI-SQUARED FOR BOTH THE ENTIRE SET OF TABLES AND THE HYPOTHESIS OF CONDITIONAL (TABLE-BY-TABLE) INDEPENDENCE. THE 'MAIN' MACRO IS THEN INVOKED. WHEN CONTROL RETURNS TO THE CENTRAL PRGRAM, SUMMARY STATISTICS ARE CALCULATED AND PRINTED. $CALC %Q = %R*%C*%G $ $UNITS %Q $ $FAC ROW %R COL %C LAY %G $ $CALC ROW = %GL(%R,%C) $ $CALC COL = %GL(%C,1) $ $CALC LAY = %GL(%G,(%R*%C)) $ $C THE FOLLOWING TWO STATEMENTS INITIALIZE ROW AND COLUMN SCORES AS LINEAR FUNCTIONS OF ROW AND COLUMN INDEXES (I.E., THE FIRST ROW SCORE IS INITIALIZED AS 0, THE SECOND AS 1, THE THIRD AS 2, AND SO ON). GOODMAN (1985, BIOMETRIKA, P. 66) NOTES THAT TO ENSURE THAT A GLOBAL (AS OPPOSED TO LOCAL) MAXIMUM HAS BEEN OBTAINED FOR THE LIKELIHOOD FUNCTION, ALTERNATIVE START VALUES SHOULD BE TRIED. USERS WISHING TO DO SO SHOULD THUS MODIFY THESE STATEMENTS AS APPROPRIATE. SIMILARLY, IN CERTAIN CIRCUMSTANCES USERS MAY HAVE APPROXIMATE SOLUTIONS FOR ROW AND/OR COLUMN SCORES, AND WISH TO USE THESE AS START VALUES, TO SAVE ITERATIONS OF THE FITTING ALGORITHM. BE AWARE, HOWEVER, THAT (A) ALL CELLS MUST BE INITIALIZED--I.E., ALL CELLS IN ROW 1 MUST BE GIVEN THE INITIAL ROW SCORE, AND SO ON: AND (B) THAT CONVERGENCE TO APPROXIMATE SOLUTIONS IS QUITE RAPID, SO THAT THE SUBSTITUTION OF ESTIMATED ROW AND COLUMN SCORES FOR THE 'NAIVE' (LINEAR) INITAL SCORES IS UNLIKELY TO SAVE MUCH TIME OR COST. $CALC RS = ROW - 1. $ $CALC CS = COL - 1. $ $DATA COUNT $ $DINPUT 12 $ $CALC %Z = 0. $ $C THE FOLLOWING THREE COMMANDS CAUSE ALL CELLS WITH OBSERVED FREQUENCIES OF ZERO TO BE DELETED. TO SUPPRESS THIS FEATURE-- AND HENCE USE ALL CELLS, INCLUDING THOSE WITH SAMPLING ZEROS-- DELETE THE FOLLOWING THREE COMMANDS (STATEMENTS). $C CALC WT = %GT(COUNT,0) $ $C CALC %Z = %CU(WT) $ $C WEIGHT WT $ $YVAR COUNT $ $ERR P $ $FIT $C FIT 1/N $FIT ROW+COL+(ROW*LAY)+(COL*LAY) $C FIT CONDITIONAL INDEPENDENCE $CALC %N = %CU(COUNT) $ $CALC W = COUNT/%N $ $CALC %A = 1. $ $WHILE %A MAIN $ $CALC %P = ((%G*(%R-1)*(%C-1))-%R-%C+3-%Q+%Z)*(%GT(%R,2))*(%GT(%C,2)) $ $PRINT : 'CORRECTED DEGREES OF FREEDOM:' %P : $ $CALC %S = %CU(RS*W) $ $CALC %U = %CU(CS*W) $ $CALC %T = %SQRT((%CU((RS**2)*W))-(%S**2)) $ $CALC %V = %SQRT((%CU((CS**2)*W))-(%U**2)) $ $CALC B = (RS-%S)/%T $ $CALC D = (CS-%U)/%V $ $CALC %H = %CU(B*D*W) $ $PRINT : 'CORRELATION BETWEEN ROW AND COLUMN SCORES:' %H : $ $CALC %B = %T*%V $ $PRINT : ' PHI:' %B : $ $CALC %E = (%EXP(%B))-1. $ $PRINT : ' DELTA:' %E : $ $CALC I = %EQ((COL*LAY),1) $ $VAR %R RSZ $ $CALC RSZ(I*(%CU(I))) = B $ $CALC J = %EQ((ROW*LAY),1) $ $VAR %C CSZ $ $CALC CSZ(J*(%CU(J))) = D $ $PRINT : 'ROW SCORES:' : $ $LOOK RSZ $ $PRINT : 'COLUMN SCORES:' : $ $LOOK CSZ $ $PRINT : 'OBSERVED AND FITTED CELL FREQUENCIES:' : $ $DIS R $ $RETURN $