TwinLife Documentation - ID Variables, Wave and Data Collection Identifiers

ID Variables, Wave and Data Collection Identifiers

In TwinLife, various ID variables/identifiers are available. Each person belongs to a family that has a unique family ID (fid) and to a household that has a unique household ID which is wave-specific (hid; a composite of the family ID and an indicator for the household).¹ Additionally, each person also has a unique person ID (pid, which is a composite of the family ID and the person type). Although the person types except for the twins can change (i.e. ‘700 - other person’ might change to ‘110 - partner of twin’, or ‘200 - surveyed sibling’ might change to ‘201 - non-surveyed sibling’), the person ID is invariable over time.

The family ID consists of six digits: the first digit indicates the twin birth cohort (e.g., 1 for the first cohort; note that information about birth cohort is also coded in variable cgr); the other five digits are assigned randomly. ID variables are particularly important when different data files have to be combined. To match data of different survey waves in the family-wide-format, the variable fid needs to be used; to match the master data set with the person-format, the variable pid has to be used. Please note that time variable information in the master data set need to be reshaped into the long format in order to match the data with the person-format of the survey data. Before matching the master data set with the family format, the master data set has to be restructured to family format.

Furthermore, the variables with variable stem wav describe exactly in which survey wave (wav0200, wav0300) and subsample (wav0100) the data was assessed. The variable wid is the data collection identifier (wid == 1 stands for the first face-to-face household survey (F2F1), wid == 2 for the first telephone survey (CATI1), and so on).

¹ The indicator for the household and therefore the household ID itself are linked to the information whether the twins live in this household or not. It is possible that in two consecutive years two households with different household compositions have the same household ID. Therefore, hid should not be used for longitudinal analyses.