In another post the brilliant and always-helpful Wolfgang suggested that instead of using a two-dimensional array, we should use an array of objects.
Huh? (Wolfgang apparently forgot whom he was addressing)
To recap: I have a list of about 3000 US postal codes, which I Seek() or AScan() with every loop. Sometimes there are half a million loops, depending on the size of the database that we're importing.
here's the basic structure:
// code, city, state, county
{ { "43756","McConnelsville","OH","58" } }
Example:
WHILE !oJoeDBF:Eof
// do lots of stuff
// then look up the city, county and state (plus some other stuff) on the basis of zip (postal) code
cZip = oJoeDBF:FIELDGET(#FZIP)
IF oZipDBF:Seek( cZip, FALSE, FALSE )
cCity := oZipDBF:FIELDGET(#CITY)
cState := oZipDBF:FIELDGET(#CSTATE)
cCounty := oZipDBF:FIELDGET(#CCOUNTY)
ELSE
cCity := cState := cCounty := Space(0)
ENDIF
oJoeDBF:FIELDPUT(#FCITY, cCity )
oJoeDBF:FIELDPUT(#FSTATE, cState )
oJoeDBF:FIELDPUT(#FCOUNTY, cCounty )
oJoeDBF:Skip(1)
ENDDO
Simple enough. Here's the array method, which I don't often use because I've always thought it creates a slowdown while the OS scrambles to find contiguous memory.
WHILE !oJoeDBF:Eof
cZip = oJoeDBF:FIELDGET(#FZIP)
nElement := AScan( aZips, { |x| x[1] == cZip } )
// by the way, Terry: aZips is already sorted by the first element
IF nElement > 0
cCity := aZips[ nElement, 2]
cState := aZips[ nElement, 3]
cCounty := aZips[ nElement, 4]
ELSE
cCity := cState := cCounty := Space(0)
ENDIF
// do the field replacement
oJoeDBF:Skip(1)
ENDDO
My first question was to ask WHICH method I should choose. To which Wolfgang replied:
"an array lookup will be faster because memory is cheap these days, and a serial read of the database (SetOrder(0)) is really fast. It would be better to not build a two-dimensional array, but an array of objects."
So now two potential new techniques have been added. It would be fun to test and wonderful to know which works best. But first I'm hoping that someone can show me an example of
1. a "serial read" (you mean Locate() ? )
2. an array of objects, as it applies here.
Perhaps this has been explained in this forum before, so please just enlighten me with a link.
Remember: the aforementioned loop runs up to half a million times in one sequence.
Thank you, everyone.
Best Practices: Seek() vs AScan()
Best Practices: Seek() vs AScan()
Joe Curran
Ohio USA
Ohio USA
Best Practices: Seek() vs AScan()
Hi Joe,
please let me explain:
a serial read is the following code:
Reading the database in its "natural" order is the fastest read available because it excludes any index.
And using an object makes the code easier to read and more stable (using export variables instead of better protect variables and access entities to have less code to understand):
and to fill the array:
and finally to search:
HTH
Wolfgang
please let me explain:
a serial read is the following code:
Code: Select all
oServer:SetOrder( 0 )
oServer:GoTop()
while ! oServer:EOF
// do something
oServer:Skip()
enddo
And using an object makes the code easier to read and more stable (using export variables instead of better protect variables and access entities to have less code to understand):
Code: Select all
class clsZip
export cCode as string
export cCity as string
export cState as string
export cCounty as string
method Init( oDBF ) class clsZip
cCode := oDBF:FieldGet( #Zip )
cCity := oDBF:FieldGet( #City )
cState := oDBF:FieldGet( #CState )
cCounty := oDBF:FieldGet( #CCounty )
return self
Code: Select all
aZIP := {}
oServer:SetOrder( 0 )
oServer:GoTop()
while ! oServer:EOF
AAdd( aZIP, clsZip{ oServer } )
oServer:Skip()
enddo
Code: Select all
nElement := AScan( aZip, { |o| o:cZip == cZip } )
Wolfgang
Wolfgang Riedmann
Meran, South Tyrol, Italy
wolfgang@riedmann.it
https://www.riedmann.it - https://docs.xsharp.it
Meran, South Tyrol, Italy
wolfgang@riedmann.it
https://www.riedmann.it - https://docs.xsharp.it
Best Practices: Seek() vs AScan()
Thank you, Wolfgang.
At this moment I can't see how an array of objects could possibly be faster and less memory-intense than a multidimensional array of simple character strings, but I will certainly try it!
At this moment I can't see how an array of objects could possibly be faster and less memory-intense than a multidimensional array of simple character strings, but I will certainly try it!
Joe Curran
Ohio USA
Ohio USA
Best Practices: Seek() vs AScan()
Hi Joe,
A simple array of objects is not faster, but it provides a little better compile time checking that multi-dim arrays, if you store array elements to local vars before using them. Especially if the object has say 10 or more properties, isn't it much easier and cleaner managing an array of objects, than an array of arrays?
But what is indeed faster and absolutely most robust and provides full compile time checking, is using a strongly typed list (array) of objects that is available in .Net. Something like:
and then, in order to search for zip code cZip, you can use:
Because everything is strongly typed in this code, the array uses the least possible memory, it's very fast and it's a lot less prone for errors, because the compiler will tell you if you have done something wrong, like searching for a numeric instead of a string etc.
And if absolute speed is very important, you can use a SortedList or a Dictionary instead of a simple List, which can provide a 10x or even 100x speed on searching for an item!
A simple array of objects is not faster, but it provides a little better compile time checking that multi-dim arrays, if you store array elements to local vars before using them. Especially if the object has say 10 or more properties, isn't it much easier and cleaner managing an array of objects, than an array of arrays?
But what is indeed faster and absolutely most robust and provides full compile time checking, is using a strongly typed list (array) of objects that is available in .Net. Something like:
Code: Select all
USING System.Collections.Generic
LOCAL aZip AS List<clsZip>
aZip := List<clsZip>{}
aZip:Add(clsZip{"city" , "zip number"})
aZip:Add(clsZip{"other city" , "zip no2"})
...
Code: Select all
LOCAL oResult AS clsZip
oResult := aZip:Find({oZip AS clsZip => oZip:cCode == cZip })
? oResult:cCode
And if absolute speed is very important, you can use a SortedList or a Dictionary instead of a simple List, which can provide a 10x or even 100x speed on searching for an item!
Chris Pyrgas
XSharp Development Team
chris(at)xsharp.eu
XSharp Development Team
chris(at)xsharp.eu
Best Practices: Seek() vs AScan()
Hi Joe,
for sure an array of objects (in VO) is less memory efficient that a bidimensional array, but it is much easier to understand and to read, specially after several years when you don't remember if the county is element 3 or 4.
As Chris wrote, in X# this is an entire other thing as you can use collections to be more efficient and to have more compile time checkings.
Whenever I touch VO code that uses bi- or multidimensional arrays, I rewrite that code to use objects because it is much easier to read and therefore the code is also more stable and easier to debug.
Wolfgang
for sure an array of objects (in VO) is less memory efficient that a bidimensional array, but it is much easier to understand and to read, specially after several years when you don't remember if the county is element 3 or 4.
As Chris wrote, in X# this is an entire other thing as you can use collections to be more efficient and to have more compile time checkings.
Whenever I touch VO code that uses bi- or multidimensional arrays, I rewrite that code to use objects because it is much easier to read and therefore the code is also more stable and easier to debug.
Wolfgang
Wolfgang Riedmann
Meran, South Tyrol, Italy
wolfgang@riedmann.it
https://www.riedmann.it - https://docs.xsharp.it
Meran, South Tyrol, Italy
wolfgang@riedmann.it
https://www.riedmann.it - https://docs.xsharp.it
Best Practices: Seek() vs AScan()
Wolfgang,
So there is also no compile time check if property of field cZip exists.
If you want early bound code in VO you could do this
In X# you can do this
However for a fast lookup in a 3000 element list I would not use an array or list of object but create a dictionary of strings and clsZip objects Dictionary <String, clsObject>. If you want case insensitive comparisons then pass StringComparer.OrdinalIgnoreCase to the constructor of the dictionary to do so.
The Ascan code will use an average of 1500 (1/2 n) comparisons. A Dictionary of this size will do a binary search so that will be square root of n, = Sqrt(3000) ~ 55 comparisons.
If you do this with a DBF then you are also using a dictionary like approach because the index is also ordered like binary tree.
So I think in VO the DBF is the better solution. Especially when the DBF is on a SSD.
Things would be different if there are only 100 or less elements. In that case having everything in memory certainly beats having to do disk access.
Robert
This will do a late bound ivar access into o because o is not typed and therefore treated as a usual.wriedmann wrote:Hi Joe,
and finally to search:Code: Select all
nElement := AScan( aZip, { |o| o:cZip == cZip } )
So there is also no compile time check if property of field cZip exists.
If you want early bound code in VO you could do this
Code: Select all
LOCAL oZip as clsZip
.
.
nElement := AScan( aZip, { |o| oZip := o, oZip:cZip == cZip } )
Code: Select all
nElement := AScan( aZip, { |o as clsZip| o:cZip == cZip } )
However for a fast lookup in a 3000 element list I would not use an array or list of object but create a dictionary of strings and clsZip objects Dictionary <String, clsObject>. If you want case insensitive comparisons then pass StringComparer.OrdinalIgnoreCase to the constructor of the dictionary to do so.
The Ascan code will use an average of 1500 (1/2 n) comparisons. A Dictionary of this size will do a binary search so that will be square root of n, = Sqrt(3000) ~ 55 comparisons.
If you do this with a DBF then you are also using a dictionary like approach because the index is also ordered like binary tree.
So I think in VO the DBF is the better solution. Especially when the DBF is on a SSD.
Things would be different if there are only 100 or less elements. In that case having everything in memory certainly beats having to do disk access.
Robert
XSharp Development Team
The Netherlands
robert@xsharp.eu
The Netherlands
robert@xsharp.eu
Best Practices: Seek() vs AScan()
Hi Robert,
thank you very much!
The trick to use a local typed variable looks really good!
Of course in X# a dictionary is the best option (but only if the key is unique and do not contains any strange characters.
Regarding memory vs disk access: most of the time my software is used on networks and therefore I'm using global caching objects that are read at the first access and then reused through the entire program use. So I can save all the frequent open/search/close processes.
For this purpose in VO I have a clsCachedServer class that checks at every use if the data on the disk has been changed (timestamp).
I have to rewrite that class for X# to use an internal dictionary instead of an AScan call.
Wolfgang
thank you very much!
The trick to use a local typed variable looks really good!
Of course in X# a dictionary is the best option (but only if the key is unique and do not contains any strange characters.
Regarding memory vs disk access: most of the time my software is used on networks and therefore I'm using global caching objects that are read at the first access and then reused through the entire program use. So I can save all the frequent open/search/close processes.
For this purpose in VO I have a clsCachedServer class that checks at every use if the data on the disk has been changed (timestamp).
I have to rewrite that class for X# to use an internal dictionary instead of an AScan call.
Wolfgang
Wolfgang Riedmann
Meran, South Tyrol, Italy
wolfgang@riedmann.it
https://www.riedmann.it - https://docs.xsharp.it
Meran, South Tyrol, Italy
wolfgang@riedmann.it
https://www.riedmann.it - https://docs.xsharp.it
-
- Posts: 200
- Joined: Wed Oct 09, 2019 6:51 pm
Best Practices: Seek() vs AScan()
Hi Robert,robert wrote: The Ascan code will use an average of 1500 (1/2 n) comparisons. A Dictionary of this size will do a binary search so that will be square root of n, = Sqrt(3000) ~ 55 comparisons.
IIRC Dotnet Dictionary as generalizable construct was first based on Hashtable. Dim memory bells here ringing on implementation of Dotnet Dictionary switching store/search paradigma depending on # items in table and structure of key: hash function call on 50 records with integer keys makes small or no sense, as hash collisions also must be guarded against.
Where did you find that Dotnet Dictionary with 3000 items still uses binary search pattern ? Or did you trace/debug ?
curious
thomas