Tools to Direct Change Duplicate ID Number

Discussions about tools to complement CSPro data processing
Forum rules
New release: CSPro 8.0
Post Reply
yanina
Posts: 60
Joined: October 31st, 2016, 9:37 am

Tools to Direct Change Duplicate ID Number

Post by yanina »

Dear Master

Does any tool to change the duplicate case ID number ? so later when its done, I can open this concatenated file without error message (duplicate ID/index).

This is my situation :
I have merge many files with Concatenate tools, with option "concatenate regardless file structure." I did not use the dictionary for check the duplicate because this will take much time, as I realized that the files I merge has many duplicate id.

File to merge: file1.dat, file2.dat, file3.dat, file4.dat, file5.dat, file6.dat, file7.dat, file8.dat, file9.dat, file10.dat, .... file20.dat
Output: All_Concatenated.dat

Right now I use Index File tools to check the duplicate ID in All_Concatenated.dat, but can not make direct change of duplicate ID.
I have to choice which case, write down the the other/s one, or leave them, then do the change the ID on file1.dat or file2.dat ...that will take a long time :cry:

So usually, I take the other way that more easy for me: I convert this All_Concatenated.dat to SPSS and make the change the duplicate ID in SPSS.
But, on this step, I can not back to CSPro data entry mode as file has been on SPSS format.

So perhaps if any tools that able to change duplicate ID Number directly ? So my All_Concatenated.dat would clean and can open without error/index id duplicate message anymore.

Please advice, master. Josh, George .

Many Thanks

Yanina
Gregory Martin
Posts: 1777
Joined: December 5th, 2011, 11:27 pm
Location: Washington, DC

Re: Tools to Direct Change Duplicate ID Number

Post by Gregory Martin »

You can run CSPro batch applications on files that have duplicate IDs, so with logic you could convert the ID. but you have to decide what rules you want to use to generate the new ID.

See attached for an example of how to do this. I assume that an ID like 999999 is unused and then assign that to duplicate cases. To run this you'll have to use the 7.0 beta.
PROC GLOBAL

numeric newKey = 999999;
numeric caseCounter;


PROC MODIFYDUPLICATEKEY_FF

preproc

    // clear out anything in the temporary key storage file
    close(KEYSTORAGE_DICT);
    open(KEYSTORAGE_DICT,create);


PROC MODIFYDUPLICATEKEY_QUEST

    inc(caseCounter);

    CASE_KEY = ID;

    // has this key already been used?
    if loadcase(KEYSTORAGE_DICT,CASE_KEY) then

        errmsg("The key %d was already found at case %d; changing to %d",ID,CASE_NUMBER,newKey);
        ID = newKey;

        inc(newKey,-1);

    endif;

    // save the information about this key
    CASE_KEY = ID;
    CASE_NUMBER = caseCounter;
    writecase(KEYSTORAGE_DICT);
Attachments
modifyDuplicateKey.zip
(3.11 KiB) Downloaded 454 times
yanina
Posts: 60
Joined: October 31st, 2016, 9:37 am

Re: Tools to Direct Change Duplicate ID Number

Post by yanina »

Dear Gregory Martin

Thanks. Very happy when you can give us solution like this.

I have trying with your logic but my concatenate file get error.

The output dat file (that using your logic batch): I open with pen file but get error : too many occurances for record ...

With CSindex tools file1_file2.dat (concatenate) has found duplicate cases on case id: 1.

Would you please see myproj.zip attached.

Please advice. Thank you

Yanina

Process Messages
*** Case [21 ] has 1 messages (0 E / 0 W / 1U)
U -25 The key DEFAULT was already found at case 1; changing to 9
*** Case [31 ] has 1 messages (0 E / 0 W / 1U)
U -25 The key DEFAULT was already found at case 1; changing to 8
*** Case [11 ] has 1 messages (0 E / 0 W / 1U)
U -25 The key DEFAULT was already found at case 1; changing to 7
User unnumbered messages:
Line Freq Pct. Messa
---- ---- ---- -----
25 3 - The k
CSPRO Executor Normal End
Attachments
myproj.zip
(9.68 KiB) Downloaded 377 times
Gregory Martin
Posts: 1777
Joined: December 5th, 2011, 11:27 pm
Location: Washington, DC

Re: Tools to Direct Change Duplicate ID Number

Post by Gregory Martin »

In addition to what you posted, can you post the batch application that you used to change the IDs? I would like to see the data file with the changed IDs.

The dictionary in what you posted, myproj.dcf, has an ID of length 1, but the listing messages that you posted have an ID of length 3, so something is not consistent.
yanina
Posts: 60
Joined: October 31st, 2016, 9:37 am

Re: Tools to Direct Change Duplicate ID Number

Post by yanina »

Dear Gregory Martin

Thank you so much for your kind help.

The batch run well and very smoothly on most of my application (include above myproj.zip). Yes, the ID length have to consistent with the batch.
Duplicatas Case_ID can able to separate and renumbered correctly.

But I have data and application attached that failed to run with your batch.
Would you please see my attached application and its DATA*.DATA inside (mypro that I don't know why its failed to run ..
I am using your batch modifyDuplicateKey.zip attached within.

I got error on most all cases INVALID RECORD TYPE on any Multiple Occurences field.
///
*** Case [ ..] has 2 messages (1 E / 1 W / 0U)
W 10007 Invalid Record Type: ' '

on and on ..

============
DATA1.DAT : 3296 Cases
DATA2.DAT : 3 Cases : But all duplicate with DATA1.DAT
DATA1_DATA2_CONCATENATE.DAT : Concatenate fle content
=============

Then I can't got the correct data output

Your help is very appreciate.

Yanin
Attachments
MYPROJ2.zip
(183.6 KiB) Downloaded 376 times
Gregory Martin
Posts: 1777
Joined: December 5th, 2011, 11:27 pm
Location: Washington, DC

Re: Tools to Direct Change Duplicate ID Number

Post by Gregory Martin »

The problem is that your dictionary, PROJ_01.dcf, doesn't have a record type. The ID, RESP_ID, goes from position 1-6 in the file.

In what I sent you, the record had a record type and so the ID went from position 2-7 in the file. If you change that, you can run your code and you get this output:

*** Case [ 1] has 1 messages (0 E / 0 W / 1U)
U -25 The key 1 was already found at case 3083; changing to 999999

*** Case [ 2164] has 1 messages (0 E / 0 W / 1U)
U -25 The key 2164 was already found at case 1; changing to 999998

*** Case [ 2165] has 1 messages (0 E / 0 W / 1U)
U -25 The key 2165 was already found at case 2; changing to 999997
record type.png
record type.png (6.36 KiB) Viewed 12342 times
yanina
Posts: 60
Joined: October 31st, 2016, 9:37 am

Re: Tools to Direct Change Duplicate ID Number

Post by yanina »

Dear Gregory

Exelente. Yes. Its worked. Thank you so much.

For members that have same issue, here are my steps that solve my problem:
1. Change the order of field position on Dictionary PROJ_01.dcf,
2. Save to new Dictionary,
3. Reformated the concatenated Data to new.
4. Then run Greogry Martin code above.
5. Bingo

See screenshoot.

Yanin
Attachments
dict_changes.PNG
dict_changes.PNG (124.84 KiB) Viewed 12327 times
Post Reply