Index File
-
- Posts: 608
- Joined: July 9th, 2012, 11:32 am
- Location: Islamabad, Pakistan
Index File
I am using an external CSDB file containing 40,000 cases, with each case consisting of a single line of information.
I am using a forcase statement to select cases based on a where clause and then writing the selected data to another CSDB file. The process works but is time-consuming. However, when I implement an index, the file is not written. It creates blank csdb file.
Could you advise on why the use of an index might prevent the file from being written?
setfile (External_Dictionary, "..\file.csdb");Contains 40000 cases
setfile (External_Dictionary, "..\file.csidx",create);
setfile (Another_Ext_Dictionary, "..\writefile.csdb");for selected sample
forcase External_Dictionary, where Supervisor_Code= Suplogin_code do
HH_ID=SW_SEL_HH_ID;
PRO_CODE=SW_PRO_CODE;
HH=SW_SELECTED_HH;
HHH_NAME=SW_HHH_NAME;
writecase(Another_Ext_Dictionary);
endfor;
close(ASSIGN_HOUSEHOLD);
I am using a forcase statement to select cases based on a where clause and then writing the selected data to another CSDB file. The process works but is time-consuming. However, when I implement an index, the file is not written. It creates blank csdb file.
Could you advise on why the use of an index might prevent the file from being written?
setfile (External_Dictionary, "..\file.csdb");Contains 40000 cases
setfile (External_Dictionary, "..\file.csidx",create);
setfile (Another_Ext_Dictionary, "..\writefile.csdb");for selected sample
forcase External_Dictionary, where Supervisor_Code= Suplogin_code do
HH_ID=SW_SEL_HH_ID;
PRO_CODE=SW_PRO_CODE;
HH=SW_SELECTED_HH;
HHH_NAME=SW_HHH_NAME;
writecase(Another_Ext_Dictionary);
endfor;
close(ASSIGN_HOUSEHOLD);
-
- Posts: 608
- Joined: July 9th, 2012, 11:32 am
- Location: Islamabad, Pakistan
Re: Index File
Dear Team;
Any update?
a.
Any update?
a.
-
- Posts: 219
- Joined: November 21st, 2022, 4:41 pm
Re: Index File
Hello,
The example you gave was of the working code with the forcase, so it's not clear why the version implemented with an index isn't working. However, looking at the use of "setfile" in the snippet you provided raises some questions, as you may be looping over the wrong External_Dictionary if you are attempting to set multiple data files to that dictionary. You can test by putting an output statement in the loops to confirm when you are properly looping through the data you want in both the forcase and index versions of the loop. If this does not help, please provide here or at cspro@lists.census.gov your zipped code including the forcase and index versions of the loop. Let us know if you have any additional questions.
Hope this helps,
Justin
The example you gave was of the working code with the forcase, so it's not clear why the version implemented with an index isn't working. However, looking at the use of "setfile" in the snippet you provided raises some questions, as you may be looping over the wrong External_Dictionary if you are attempting to set multiple data files to that dictionary. You can test by putting an output statement in the loops to confirm when you are properly looping through the data you want in both the forcase and index versions of the loop. If this does not help, please provide here or at cspro@lists.census.gov your zipped code including the forcase and index versions of the loop. Let us know if you have any additional questions.
Hope this helps,
Justin
-
- Posts: 608
- Joined: July 9th, 2012, 11:32 am
- Location: Islamabad, Pakistan
Re: Index File
Thank you. Please check the email.
a.
a.
-
- Posts: 219
- Joined: November 21st, 2022, 4:41 pm
Re: Index File
Hello,
Having tested it, I can now see why the attempt to use a .csidx index file is causing a problem. After I commented out the line it worked fine. In order to loop over the file.csdb with all of your cases, you need to use . Calling another setfile to External_DIctionary immediately afterwards is overriding the .csdb you need to loop over with the .csidx that has no cases and cannot be looped over.
See more on index files here for why this .csidx file has no cases and cannot be looped over. You can see that a .csidx file is only made for text/JSON data sources, not for csdb data sources. You can check this by going to your folder and seeing that there are .csidx files for your text files, but none are created for the .csdb files. Let us know if you have any additional questions.
Hope this helps,
Justin
Having tested it, I can now see why the attempt to use a .csidx index file is causing a problem. After I commented out the line
Code: Select all
setfile (External_Dictionary, "..\file.csidx",create);
Code: Select all
setfile (External_Dictionary, "..\file.csdb");
See more on index files here for why this .csidx file has no cases and cannot be looped over. You can see that a .csidx file is only made for text/JSON data sources, not for csdb data sources. You can check this by going to your folder and seeing that there are .csidx files for your text files, but none are created for the .csdb files. Let us know if you have any additional questions.
Hope this helps,
Justin
-
- Posts: 608
- Joined: July 9th, 2012, 11:32 am
- Location: Islamabad, Pakistan
Re: Index File
Dear Justinlakier
Can you explain how I can create a JSON file for external data?
Thank you.
a.
Can you explain how I can create a JSON file for external data?
Thank you.
a.
-
- Posts: 608
- Joined: July 9th, 2012, 11:32 am
- Location: Islamabad, Pakistan
Re: Index File
Hello,
Now, I am able to convert the CSDB file into JSON format BY following these steps:
Open the CSDB file in CSPro Data Viewer window->File->Export->Select JSON as the export format.
The converts the CSDB data into JSON format. However, the time taken to view the sample remains the same.
setfile(SAMPLED_HH_DICT, "..\106_Ext_Data\MyFile.json");//Household[s] Sample file
setfile(ASSIGN_HOUSEHOLD, concat("..\106_Ext_Data\PRN_", edit("9999", PSU_ID), ".csdb"));//Selected Household sample file
forcase SAMPLED_HH_DICT where SW_PRO_CODE= PSU_ID and SW_S_CODE= tonumber(loadsetting("SupervisorID")) do
endfor;
Now, I am able to convert the CSDB file into JSON format BY following these steps:
Open the CSDB file in CSPro Data Viewer window->File->Export->Select JSON as the export format.
The converts the CSDB data into JSON format. However, the time taken to view the sample remains the same.
setfile(SAMPLED_HH_DICT, "..\106_Ext_Data\MyFile.json");//Household[s] Sample file
setfile(ASSIGN_HOUSEHOLD, concat("..\106_Ext_Data\PRN_", edit("9999", PSU_ID), ".csdb"));//Selected Household sample file
forcase SAMPLED_HH_DICT where SW_PRO_CODE= PSU_ID and SW_S_CODE= tonumber(loadsetting("SupervisorID")) do
endfor;
-
- Posts: 1846
- Joined: December 5th, 2011, 11:27 pm
- Location: Washington, DC
Re: Index File
All CSPro data files use an index. Some indices, like for CSPro DB, are internal to the file itself, and others (like for text), are external, using that .csidex format. However, the index is only on the case key, and because you are querying data that is not part of the key, the index won't help much.
Because data is broken out into tables in CSPro DB files in its SQLite format, you can query data by creating an appropriate query. For example, your code uses this to get a list of SW_SEL_HH_ID values to work with:
Because data is broken out into tables in CSPro DB files in its SQLite format, you can query data by creating an appropriate query. For example, your code uses this to get a list of SW_SEL_HH_ID values to work with:
forcase SAMPLED_HH_DICT where SW_PRO_CODE = PSU_ID AND SW_S_CODE = tonumber(loadsetting("SupervisorID")) do
I can write this using a SQLite query:string query = maketext("SELECT `level-1`.`sw_sel_hh_id` FROM `cases` "
"JOIN `level-1` ON `level-1`.`case-id` = `cases`.`id` "
"JOIN `sampled_hh_rec` ON `sampled_hh_rec`.`level-1-id` = `level-1`.`level-1-id` "
"WHERE `cases`.`deleted` = 0 AND `sampled_hh_rec`.`sw_pro_code` = %d AND `sampled_hh_rec`.`sw_s_code` = %d;",
PSU_ID, tonumber(loadsetting("SupervisorID")));
List string selected_hh_ids;
sqlquery(SAMPLED_HH_DICT, selected_hh_ids, query);
Now you can process the appropriate values:"JOIN `level-1` ON `level-1`.`case-id` = `cases`.`id` "
"JOIN `sampled_hh_rec` ON `sampled_hh_rec`.`level-1-id` = `level-1`.`level-1-id` "
"WHERE `cases`.`deleted` = 0 AND `sampled_hh_rec`.`sw_pro_code` = %d AND `sampled_hh_rec`.`sw_s_code` = %d;",
PSU_ID, tonumber(loadsetting("SupervisorID")));
List string selected_hh_ids;
sqlquery(SAMPLED_HH_DICT, selected_hh_ids, query);
do numeric ctr = 1 while ctr <= selected_hh_ids.length()
SW_SEL_HH_ID = tonumber(selected_hh_ids(ctr));
errmsg("Matched: %v", SW_SEL_HH_ID);
enddo;
On my machine, the forcase loop takes 0.955 seconds, whereas sqlquery takes 0.022 seconds, so a massive time improvement.SW_SEL_HH_ID = tonumber(selected_hh_ids(ctr));
errmsg("Matched: %v", SW_SEL_HH_ID);
enddo;
-
- Posts: 608
- Joined: July 9th, 2012, 11:32 am
- Location: Islamabad, Pakistan
Re: Index File
Dear Gregory;
I hope you and team are doing well.
It is working perfectly. Thank you.
a.
I hope you and team are doing well.
It is working perfectly. Thank you.
a.