Print

5.1 Generated Variables

In the course of data processing, some variables were generated based on the raw variables that are surveyed in the field.

Personal income manipulation (inc0110)
Twin families are a special type of family that could be de-anonymized more easily than other families when combining several characteristics known from the data, especially when these characteristics exceedingly diverge from the average, such as a very high or very low income. In order to guarantee the twin families’ anonymity, very high and very low values of the personal income variable inc0100 are manipulated in a way that the incomes cannot lead back to one household or individual, but the income distribution, as well as analyses, are not affected by the alterations. The manipulation follows the recommendations of Wirth (1992)2. Personal income was collected in variable inc0100 in all data collections except CATI 2 and 3. First, only for CATI 1 a categorical variable (corresponding to inc0103) is generated from the variables inc0102, inc0105, inc0106. In the next step, for the cases in which no monthly personal income was reported, the annual income from inc0101 (if available) is divided by 12 and integrated into inc0100. Other conspicuities are recorded and flagged in the variables inc0120 and inc0121. Finally, following Wirth (1992), the five highest and lowest income specifications are replaced by their average (manipulation of the strongest outliers) and the 10% highest and lowest income statements are assigned a random error of +/- 1% of the initial value using a random variable. As a result, 80% of the incomes are not affected by the manipulation. The resulting manipulated variable inc0110 is supplied in the SUF.

Household income manipulation (inc0401)
In order to ensure the anonymity of the twin families, very high and very low values of the household income variable inc0400 are manipulated in a way that the incomes cannot lead back to one household or individual, but the income distribution, as well as analyses, are not affected by the alterations. The manipulation follows the recommendations of Wirth (1992). In order to process a maximum of income information, the first step is to integrate the specification of income categories from inc0403 into the variable inc0400. Here, the middle of the specified category is used. In F2F 2, CATI 2 and F2F 3, the yearly household income (inc0404) was also recorded as an alternative - this information is also converted first (divided by 12) and then integrated into inc0400. In CATI 1, household income was recorded at the personal level. Therefore, it may happen that within one household several different household incomes are specified. Here, the median of all information provided in the household was used. Finally, following Wirth (1992), the five highest and lowest income specifications are replaced by their average (manipulation of the strongest outliers) and the 10% highest and lowest income statements are assigned a random error of +/- 1% of the initial value using a random variable. As a result, 80% of the incomes are not affected by the manipulation. The resulting manipulated variable inc0401 is supplied in the SUF. Conspicuities are recorded and flagged in the variables inc0420, inc0421, inc0422 and inc0423.

Net equivalent household income (inc0411)
Besides the (manipulated) household income, we generate and provide the net equivalent household income using the ➔ modified OECD scale. For this generated variable, the household size and composition are taken into account. The concept is based on the assumption that people (e.g., two adults) save money and are fictitiously ‘richer’ by living together in one household and sharing a common budget and the fixed costs instead of living in separate households. For more information about the concept, see OECD (2011).3 For this purpose, the household composition recorded in the household questionnaire is used to calculate how many 'adult' persons (aged 14 and older) and how many children (under 14) live in the household. In CATI 2 and 3, the household composition was not recorded systematically by the interviewer, but was surveyed in the household questionnaire. This self-report of the respondent was used as the basis for calculating the equivalence income. According to OECD (2011), the 'household head' gets a weight of 1, each further adult person gets a weight of 0.5, and each child a weight of 0.3. The household income is then divided by the sum of all weights in the household. The resulting variable inc0411, the net equivalent household income, is published in the SUF.

ISCED-1997 (eca0106)
For the generation of educational qualifications according to ISCED-1997, the highest educational qualifications reported by the participants are taken into account. The International Standard Classification of Education (ISCED) is a classification of educational qualification and the level of education of individuals. It ranges from 0 (primary education / first step of basic education) to 6 (second stage of tertiary education, leading directly to an advanced research qualification). The ISCED classification takes into account the national educational system – the type of school at which a person has graduated, the individual duration of education, and the type of graduation an individual has reached. For more information about the ISCED classification see OECD (1999).4

Classifications of occupation and occupational activity: ISCO-08 (eca0205, emp0503, emp0513, emp0553), KldB-2010 (eca0205, emp0501), SIOPS (eca0208, emp0506), ISEI (eca0207, emp0505), EGP (emp0508)
The International Standard Classification of Occupations (ISCO) from 2008 classifies occupations (internationally comparable) considering the required skill level (degree of complexity, based on the educational qualification) of an occupation as well as the skill specialization (the type of skills that are needed especially for this occupation). TwinLife delivers the ISCO classification on the sub-major group level (two digits). For more information about the ISCO classification visit the ➔ ILO website. Please note: Because some detailed employment information was missing in CATI 1 (wid2) to code ISCO-08 comparable to F2F 1 and F2F 2, it received a different variable name (emp0513) and should not be directly compared with the ISCO-08 information in F2F 1 and F2F 2.
SIOPS (Standard Index of Occupational Prestige Scala) is a classification for a prestige ranking of occupations (ranging from 0 to 100). It is based on the ISCO-88 classification. For more information see Ganzeboom & Treiman (1996).5
ISEI (International Socio-Economic Index of Occupational Status) is a measure for the socio-economic status of a person and considers the individual occupation, income, and education. It ranges from 12 to 90 using the ISCO-88 classification. For more information see Ganzeboom & Treiman (1996).
The EGP-classes (Erikson-Goldthorpe-Portocarero classes) is a classification for the socioeconomic status of the parents, considering the type of occupational activity, the occupational status, managerial responsibility, and the kind of qualification needed for the occupational activity. For more information, see Ganzeboom & Treiman (1996).

Housing conditions and household type (liv0210, liv0410)
The housing conditions were surveyed on the household level and the corresponding questions were answered by the person that has filled in the household questionnaire. The variable liv0210 was generated by processing the available information into the twin perspective and on a personal level. The household type was also surveyed in a personal perspective (dependent on the person type that filled in the household questionnaire). This information was processed into the generated variable liv0410 that provides the information in a general and (between person types) comparable perspective.

Regional variables (ewi, gkpol_r)
In order to meet data protection requirements, regional information about a person, household, or family can only be delivered in an aggregated form. Therefore, an east-west indicator (ewi) was generated by dividing the German states (Bundesländer) into two groups depending on their former affiliation to Western Germany (Federal Republic of Germany) or Eastern Germany (German Democratic Republic). Further, the (political) size of the community (politische Gemeindegrößenklasse, gkpol) is classified into four groups and delivered in the SUF data. The regional variables are available on the household level.
Please note that the regional variables are generated from context information that are based on the contact address of a household and not on the household ID, which might not be identical and result in a slight inaccuracy in the regional information in some cases, especially in the older cohorts.

Country of birth (mig2000/mig3000, mig2100/mig3100, mig2200/mig3200)
The country of birth was surveyed in the first face-to-face data collection (F2F 1) as a self-report (respondents aged 16 or above, mig0100, mig0101), proxy-report of parents about their children aged below 16 (twins and sibling, mig0100t/u/s, mig0101t/u/s) as well as a proxy-report about the own parents (mig0300, mig0301, mig0400, mig0401; persons aged 16 or above).

In F2F 2 only first-time respondents were asked for their country of birth. In F2F 3, the country of birth was not surveyed. The information was collected by providing a list as well as the opportunity to give an open answer. For data privacy reasons the answers were recoded into country groups and stored in generated variables, where the self-reports and the proxy-reports of parents about their children were integrated into mig2000. The proxy-reports of the twins and the sibling about their parents were integrated into mig2100 and mig2200.
Please note that proxy information by the twins or the sibling about the country of birth of their parents is not used for the generation of mig2000 (e.g. mig0300 for the generation of mig2000 of the twin's mother, etc.). Also, the self-reports of the parents about the country of their birth are not used for the generation of mig2100 and mig2200 (e.g. mig0100 of the twins' mother for the generation of mig2100, etc.). The treatment of missing values and extension of generation is addressed in mig3000, mig3100 and mig3200. It can be derived from the table ➔ mig2000_countrygroups.pdf in the ➔ Downloads section of the TwinLife documentation website which countries were included in which group.
Please also note that different statements of twins about their country of birth and different statements of twins and the sibling about the country of birth of their parents are not harmonized. In about 20 (and 30, resp.) cases/twin pairs the given information differs between the twins. The treatment of contradicting information is addressed in mig3000, mig3100 and mig3200.
mig2000, mig2100 and mig2200 are stored in the master data set.

The variable mig3000 is an extended and harmonized version of mig2000 and combines all available information on respondents' country of birth.
For those cases where the parents' self-report is not available, the information provided by the twins or the sibling about their biological parents' countries of birth is used (e.g. mig2100 for the generation of mig3000 of the twin's mother, etc.). If the twins' and the sibling's statements about their parents' countries of birth differ, the statement that is shared by two out of the three is designated as the parent's country of birth. Missing information of one of the twins is filled in with the information of the other twin. If the information about the twins' country of birth is contradicting, Germany is designated as the country of birth if at least one twin indicated to be born in Germany. The value -81 (contradictory information) is assigned for cases with no unambiguous information about the twins' country of birth.    
mig3000 is stored in the master data set.

The variables mig3100 and mig3200 are an extended and harmonized version of mig2100 and mig2200 and combine all available information on the country of birth of the respondent's mother (mig3100) and father (mig3200).
For the twins and the sibling, the self-reported country of birth of their biological parent (i.e. mig2000) is used for the generation of mig3100 and mig3200. For those cases where there is no self-reported information available, the proxy-information provided by the twins or the sibling is used instead (i.e. mig2100 and mig2200). If the twins' and the sibling's statements about their parent's country of birth differ, the statement that is shared by two out of the three is designated as the parent's country of birth. If there is no unambiguous information about the parent's country of birth, the value -81 (contradictory information) is assigned.    
mig3100 and mig3200 are stored in the master data set.

Born in the GDR (mig2001, mig2101, mig2201)
Whether a person was born in the GDR is generated on the basis of the country of birth and considers whether the person was born during the existence of the GDR between 7/10/1949 and 2/10/1990.

Migration background (mig4000, mig4100)
The variable mig4000 is based on mig3000, mig3100, and mig3200 and contains information on the twins and the sibling's migration background up to the third generation. The variable indicates whether the twins or the sibling were born in Germany (0); born abroad (1); have no personal migration but migration of at least one parent (2); no personal nor migration of either parent but migration of at least one grandparent (3). The migration background was constructed for twins with at least one (grand-)parental information available. In cases where the information on the country of birth is contradicting but it is known that the respondent is born abroad, the information about the foreign birth is used without considering the particular country. Twins and siblings with missing information on both parents or all grandparents were assigned the value -86 (not available/empty/not codable). Please note, mig4000 only considers information about biological parents.

The variable mig4100 is based on mig3000, mig3100, and mig3200 and contains information on respondents' migration background up to the second generation. The variable indicates whether respondents were born in Germany (0); born in a country other than Germany (1); or have no personal migration but migration of at least one parent (2). The migration background was constructed for respondents with at least one parental information available. In cases where the information on the country of birth is contradicting but it is known that the respondent was born abroad, the information about the foreign birth is used without considering the particular country. Respondents with missing information on both parents were assigned the value -86 (not available/empty/not codable). Please note, mig4100 only considers information about biological parents.

The variables mig4000 and mig4100 are stored in the master data set.

German citizenship (mig0520)
mig0520, which is part of the master file of the Scientific Use File, includes all available information on the personal level and displays whether the individual has the German citizenship or not (1: yes or 2: no). In the first face-to-face data collection (F2F 1) the citizenship was surveyed in a multicoding format as a self-report and a proxy-report of parents about their children, offering the respondent to mention one or several citizenships and give open answers (mig0500(t/u/s) - mig0519(t/u/s)). In the second face-to-face data collection (F2F 2), it was explicitly asked for the first citizenship as well as for the second citizenship, if applicable (including the offering of an open answer). The corresponding variables are mig0550(t/u/s) - mig0553(t/u/s). The citizenship was surveyed of all respondents in both F2F 1 and F2F 2. It is stored, like the other migration variables, as constant variable in the master. In cases were the information given in F2F 2 differed from the one given in F2F 1, the latest information was used for the generation of mig0520.

Report cards / certificates (cer variables)
We also recorded data on school performance based on photographs of the children’s report cards. For more information on the coding scheme as well as general descriptions of the German school and grading system, please see the ➔ TwinLife Technical Report No. 04.
2 Wirth, H. (1992). Die faktische Anonymität von Mikrodaten: Ergebnisse und Konsequenzen eines Forschungsprojektes. [The factual anonymity of microdata: results and consequences of a research project]. ZUMA Nachrichten 16, 30, 7 - 65.
3 OECD (2011). What Are Equivalence Scales? OECD Project on Income Distribution and Poverty.
4 OECD (1999). Classifying educational programmes: Manual for ISCED-97 implementation in OECD countries. Organisation for Economic Co-operation and Development.
5 Ganzeboom, H. B. G., & Treiman, D. J. (1996). Internationally comparable measures of occupational status for the 1988 International Standard Classification of Occupations. Social Science Research, 25 (3), 201-239.