Converting Check Boxes Into Single Variables


Using checkboxes for categorical variables is a common way for censuses and surveys to quickly get data onto a paper form and scanned into a data file. Enumerators do not have to translate responses into numbers, and scanners can more easily recognize checked boxes than handwritten characters. These checkboxes need to be converted into single variables with single responses. But what do you do when there is more than one checked box, either due to enumerators not understanding how to complete the question or due to scanners reading stray marks as checked boxes?

When converting checkbox variables into a single variable, there are two basic methods for correcting multiple checks: hotdeck between two marked boxes or determine an order of likelihood. If three or more boxes are marked, the edits become more complicated and, due to the small probability of such occurrences, one can either write complicated edits or the converted variable can be set to blank and edited later using more detailed content edits.

Before converting multiple checkbox variables into a single variable, one must first check the number of boxes marked. This can be done by counting the number of individual checkbox variables. Then the conversion coding will be done based on the number of checked boxes. If only one box is checked, the new variable can easily be created without any special methods to determine the value.

Assuming that the checkbox variables have a value of 1 if checked and 0 if not checked, one can count the number of marked boxes like so:

numeric numChecks = DWELLING_TYPE_SCAN1 + DWELLING_TYPE_SCAN2 + DWELLING_TYPE_SCAN3;

Using a Hotdeck

Hotdecking is a simple way to estimate the value of a variable based on the checked boxes and on related variables. In the example below, in cases when two boxes are checked, the selection of which of the two values will be used for the new variable is determined from the roofing material. The hotdeck array has two dimensions: the first for each combination of the two checked boxes, and the other for the roofing material. When there are no checked boxes, then the new variable is left blank, to be handled by later edits. When one box is checked, the new variable is coded according to the checked box and the hotdeck is updated based on each combination of two checked boxes and on the roofing material. If two boxes are checked, then the new variable is coded based on the hotdeck using the given combination of checked boxes and the roof type.

Question: Type of Dwelling?
Answers: Separate, Apartment, Joint/Barrack House
Conversion Method: Hotdeck based on roofing material

numeric dwellTypeHD12 = 1,dwellTypeHD13 = 2,dwellTypeHD23 = 3;

numeric roofMaterial;

if ROOF_MATERIAL_SCAN1 then         roofMaterial = 1;
   
elseif ROOF_MATERIAL_SCAN2 then roofMaterial = 2;
   
elseif ROOF_MATERIAL_SCAN3 then roofMaterial = 3;
   
elseif ROOF_MATERIAL_SCAN4 then roofMaterial = 4;
   
else                            roofMaterial = 5;
endif;

if numChecks = 0 then
   
// leave DWELLING_TYPE blank, to be edited later during the content edits

elseif numChecks = 1 then

   
if DWELLING_TYPE_SCAN1 then
        DWELLING_TYPE = 
1;
        dwellTypeHD(dwellTypeHD12,roofMaterial) = 
1;
        dwellTypeHD(dwellTypeHD13,roofMaterial) = 
1;

   
elseif DWELLING_TYPE_SCAN2 then
        DWELLING_TYPE = 
2;
        dwellTypeHD(dwellTypeHD12,roofMaterial) = 
2;
        dwellTypeHD(dwellTypeHD23,roofMaterial) = 
2;

   
elseif DWELLING_TYPE_SCAN3 then
        DWELLING_TYPE = 
3;
        dwellTypeHD(dwellTypeHD13,roofMaterial) = 
3;
        dwellTypeHD(dwellTypeHD23,roofMaterial) = 
3;

   
endif;

elseif numChecks = 2 then

   
if DWELLING_TYPE_SCAN1 and DWELLING_TYPE_SCAN2 then
        DWELLING_TYPE = dwellTypeHD(dwellTypeHD12,roofMaterial);

   
elseif DWELLING_TYPE_SCAN1 and DWELLING_TYPE_SCAN3 then
        DWELLING_TYPE = dwellTypeHD(dwellTypeHD13,roofMaterial);

   
elseif DWELLING_TYPE_SCAN2 and DWELLING_TYPE_SCAN3 then
        DWELLING_TYPE = dwellTypeHD(dwellTypeHD23,roofMaterial);

   
endif;

elseif numChecks > 2 then
   
// leave DWELLING_TYPE blank, to be edited later during the content edits

endif;

Order of Likelihood

In a series of responses, it may be decided that certain responses are more likely than others. When multiple boxes are checked, the predetermined more likely response will be recoded as the actual response. In the example below, enumerators may mistakenly select more than one source of drinking water if the household has more than one source. However, the enumerators were instructed to select only the most "modern" water source, so the responses were prioritized in the order that they appear, with the exception of tubewell. Thus, if tubewell and any other response were selected, the new variable is set to tubewell, and this recoding priority continues with tap, then well, and so on.

Question: Main Source of Drinking Water?
Answers: Tap, Tubewell/Deep Tubewell, Well, Pond, River/Ditch/Canal, Others
Conversion Method: Recode with priority (tubewell, tap, well, etc.)

numChecks = WATER_SOURCE_SCAN1 + WATER_SOURCE_SCAN2 + WATER_SOURCE_SCAN3 +
            WATER_SOURCE_SCAN4 + WATER_SOURCE_SCAN5 + WATER_SOURCE_SCAN6;

if numChecks = 0;
   
// leave DWELLING_TYPE blank, to be edited later during the content edits

elseif numChecks = 1 then

   
if WATER_SOURCE_SCAN1 then      WATER_SOURCE = 1;
   
elseif WATER_SOURCE_SCAN2 then  WATER_SOURCE = 2;
   
elseif WATER_SOURCE_SCAN3 then  WATER_SOURCE = 3;
   
elseif WATER_SOURCE_SCAN4 then  WATER_SOURCE = 4;
   
elseif WATER_SOURCE_SCAN5 then  WATER_SOURCE = 5;
   
elseif WATER_SOURCE_SCAN6 then  WATER_SOURCE = 6;
   
endif;

elseif numChecks >= 2 then

   
if WATER_SOURCE_SCAN2 then      WATER_SOURCE = 2;
   
elseif WATER_SOURCE_SCAN1 then  WATER_SOURCE = 1;
   
elseif WATER_SOURCE_SCAN3 then  WATER_SOURCE = 3;
   
elseif WATER_SOURCE_SCAN4 then  WATER_SOURCE = 4;
   
elseif WATER_SOURCE_SCAN5 then  WATER_SOURCE = 5;
   
// elseif WATER_SOURCE_SCAN6 ... not needed because it would have been handled in the numChecks = 1 case
    endif;

endif;

If conversions like this are often being made, the code in the numChecks = 1 and numChecks = 2 sections can be combined in a reusable function.

array waterPriorities(6) = 2 1 3 4 5 6;

function assignValueFromCheckbox(array priorities,alpha (20) checks)

   
numeric cnt;

   
do cnt = 1 while cnt <= tblrow(priorities)

       
if checks[priorities(cnt):1] = "1" then
            assignValueFromCheckbox = priorities(cnt);
           
exit;
       
endif;

   
enddo;

    assignValueFromCheckbox = 
notappl// no checkbox was marked

end;

WATER_SOURCE = assignValueFromCheckbox( waterPriorities,
                                       
maketext("%d%d%d%d%d%d",WATER_SOURCE_SCAN1,
                                        WATER_SOURCE_SCAN2,WATER_SOURCE_SCAN3,
                                        WATER_SOURCE_SCAN4,WATER_SOURCE_SCAN5,
                                        WATER_SOURCE_SCAN6));