Alines() used with multiple String „Parsechar“ to parse large files ?
Posted: Sun Jul 11, 2021 1:40 pm
Hi,
I volunteered to code alines() implementation in X#, as it fits GetWord* I did earlier, is often used here and I don’t think I know enough of X# to try for tableupdate(), cursorset/getProp, Buffermode, Cursoradapter package I currently miss most in X#.
In my use cases there are ~2.8 main usages for aLines():
Splitting texts into array of lines like the name suggests, often with default parameter
Splitting CSV (often Excel SSV semicolon separated values...) lines into array item lines
Splitting into words using multiple char separators when GetwordNum does not fit (the 0.8 usage)
For these use cases I have ample test material from War&Peace – needed as pushing Dotnet perf to where fox is when running inside fox C-runtime is not easy, esp. as Dotnet works on unicode „char“ which is more demanding than always-1-byte-char used in Fox and strings thereof.
Now Alines() method signature „cParseChar“ is a bit misleading – you can use Len(Separator)>1 and it works – default Parse“Chars“ are Chr(13), Chr(13)+Chr(10), Chr(10). Worst case would be long file, many separators all starting with same char. One possible and relatively demanding scenario I came up with is special „partial parsing“ of XML/HTML, most separators starting with „<“. Perhaps as multi-step sequence, eliminating void tags in all forms of tagging first, getting to interesting meat afterwards...
I can build a table with heavy HTML content by de-7zipping .chm files and use those with somewhat concoted runs with various tags, but prefer to employ anything formed by real outside needs.
If anybody has data file and alines() calls doing hefty, but real-world work with
alines(taAlines, cHeftySourceStrOrTableofSuchStrings, nWhatFitsThePurpose ;
, cStringWithLenGT1_1, cStringWithLenGT1_2, cStringWithLenGT1_3 ;
[ , cStringWithLenGT1_4, cStringWithLenGT1_5, cStringWithLenGT1_6...] )
to check and optimize my implementation on, receiving such data and alines() calls here or via private msg or directly via email would be splendid.
tia
thomas
I volunteered to code alines() implementation in X#, as it fits GetWord* I did earlier, is often used here and I don’t think I know enough of X# to try for tableupdate(), cursorset/getProp, Buffermode, Cursoradapter package I currently miss most in X#.
In my use cases there are ~2.8 main usages for aLines():
Splitting texts into array of lines like the name suggests, often with default parameter
Splitting CSV (often Excel SSV semicolon separated values...) lines into array item lines
Splitting into words using multiple char separators when GetwordNum does not fit (the 0.8 usage)
For these use cases I have ample test material from War&Peace – needed as pushing Dotnet perf to where fox is when running inside fox C-runtime is not easy, esp. as Dotnet works on unicode „char“ which is more demanding than always-1-byte-char used in Fox and strings thereof.
Now Alines() method signature „cParseChar“ is a bit misleading – you can use Len(Separator)>1 and it works – default Parse“Chars“ are Chr(13), Chr(13)+Chr(10), Chr(10). Worst case would be long file, many separators all starting with same char. One possible and relatively demanding scenario I came up with is special „partial parsing“ of XML/HTML, most separators starting with „<“. Perhaps as multi-step sequence, eliminating void tags in all forms of tagging first, getting to interesting meat afterwards...
I can build a table with heavy HTML content by de-7zipping .chm files and use those with somewhat concoted runs with various tags, but prefer to employ anything formed by real outside needs.
If anybody has data file and alines() calls doing hefty, but real-world work with
alines(taAlines, cHeftySourceStrOrTableofSuchStrings, nWhatFitsThePurpose ;
, cStringWithLenGT1_1, cStringWithLenGT1_2, cStringWithLenGT1_3 ;
[ , cStringWithLenGT1_4, cStringWithLenGT1_5, cStringWithLenGT1_6...] )
to check and optimize my implementation on, receiving such data and alines() calls here or via private msg or directly via email would be splendid.
tia
thomas