StringBuilder performance

This forum is meant for examples of X# code.

Post Reply
User avatar
wriedmann
Posts: 3798
Joined: Mon Nov 02, 2015 5:07 pm
Location: Italy

StringBuilder performance

Post by wriedmann »

Hi all interested people,
please see this code:

Code: Select all

cBuffer := DateTime.Now:ToString()
foreach oTag as PlanTag in _oPlanTage
    	cBuffer := cBuffer + oTag:DebugString( 1 )
    	foreach oPosition as PlanPosition in _oPlanPositionen
    		cBuffer := cBuffer + oPosition:DebugString( 1 )
    	next
next
cBuffer := cBuffer + DateTime.Now:ToString()
In my application this code creates a text file with over 95.000 lines.
The code takes a lot of time (5 minutes 36 seconds) and uses an entire processor core.
A simple optimization makes it behave better:

Code: Select all

cBuffer := DateTime.Now:ToString()
foreach oTag as PlanTag in _oPlanTage
 	cBuffer := cBuffer + oTag:DebugString( 1 )
  	cPosition := ""
    	foreach oPosition as PlanPosition in _oPlanPositionen
    		cPosition := cPosition + oPosition:DebugString( 1 )
    	next
    	cBuffer := cBuffer + cPosition
next
cBuffer := cBuffer + DateTime.Now:ToString()
The only change is that instead of adding every substring to the main buffer there is an intermediate buffer.
This reduces the needed time to about 4 seconds!!!
But the use of the StringBuilder class makes the code again perform faster:

Code: Select all

oSB := StringBuilder{}
oSB:AppendLine( DateTime.Now:ToString() )
foreach oTag as PlanTag in _oPlanTage
    	oSB:Append( oTag:DebugString( 1 ) )
    	foreach oPosition as PlanPosition in _oPlanPositionen
    		oSB:Append( oPosition:DebugString( 1 ) )
    	next
next
oSB:AppendLine( DateTime.Now:ToString() )
cBuffer := oSB:ToString() 
The code now takes only 2 seconds!
Wolfgang
P.S. in VO you can see similar differences, but there is no StringBuilder class available.
Wolfgang Riedmann
Meran, South Tyrol, Italy
wolfgang@riedmann.it
https://www.riedmann.it - https://docs.xsharp.it
User avatar
Chris
Posts: 5057
Joined: Thu Oct 08, 2015 7:48 am
Location: Greece

StringBuilder performance

Post by Chris »

Hi Wolfgang,

Very good sample!

Furthermore, if you know in advance the size (more or less) of the final string, then specify this in the constructor of the StringBuilder object, this will make sure that its internal buffer will only allocated once (instead of dozens of times if you do not specify a starting size), which will further improve performance.

Also, if you do this very often in your app, then it's also a good idea to always (re)use a single StringBuilder object, instead of creating a new one every time. Just reset to zero string size after you are done with it (with oSB:Length := 0), this will keep the internal buffer intact, which will prevent any further memory allocation when you generate new text in the string builder. Only further memory allocation will happen when converting it to a normal string.
Chris Pyrgas

XSharp Development Team
chris(at)xsharp.eu
User avatar
wriedmann
Posts: 3798
Joined: Mon Nov 02, 2015 5:07 pm
Location: Italy

StringBuilder performance

Post by wriedmann »

Hi Chris,
I had tried to build a StringBuilder class in VO, but unfortunately it was slower that a simple string concatenation as in the 2nd sample.
This is the relative VO-Code:

Code: Select all

class StringBuilder
protect _aElements			as array
	
declare method Append
declare method GetString
	
method Init() class StringBuilder
_aElements := {}
return self
	
method Append( cString as string ) as void pascal class StringBuilder
AAdd( _aElements, cString )
return
	
method GetString() as string pascal class StringBuilder
local ptrResult as byte ptr
local ptrTemp as byte ptr
local nLen as dword
local nI as dword
local nBufLen as dword 
local nTotalLen as dword 
local nIndex as dword
local cBuffer as string
local cResult as string
	
nLen := ALen( _aElements )
nBufLen := 0
for nI := 1 upto nLen         
  cBuffer := _aElements[nI]
  nTotalLen := nTotalLen + SLen( cBuffer )
next
if nTotalLen == 0
  cResult := ""
else
  ptrResult := MemAlloc( nTotalLen )
  if ptrResult == null_ptr
    _Break( "memory allocation error - failed to allocate " + NTrim( nTotalLen ) + " bytes" )
  endif
  ptrTemp := ptrResult
  nIndex := 0
  for nI := 1 upto nLen         
    cBuffer := _aElements[nI]
    nBufLen := SLen( cBuffer )
    MemCopyString( ptrTemp, cBuffer, nBufLen )
    ptrTemp := ptrTemp + nBufLen
  next
  cResult := Mem2String( ptrResult, nTotalLen )
  MemFree( ptrResult ) 
endif
	
return cResult
I'm pretty sure this code can be enhanced, but after the first checks I decided to to put more time in this class.
Wolfgang
Wolfgang Riedmann
Meran, South Tyrol, Italy
wolfgang@riedmann.it
https://www.riedmann.it - https://docs.xsharp.it
Jamal
Posts: 324
Joined: Mon Jul 03, 2017 7:02 pm

StringBuilder performance

Post by Jamal »

Hi Wolfgang,

While you are at it, just wondering if you create X# or C# COM object and initialize a StringBuilder object like Chris suggested, then use it in a similar fashion, what would the performance be beyond the initial COM object call.

Jamal
User avatar
wriedmann
Posts: 3798
Joined: Mon Nov 02, 2015 5:07 pm
Location: Italy

StringBuilder performance

Post by wriedmann »

Hi Jamal,
I have not tested that, but in my experience (and I do a LOT of COM interaction between X# modules and VO applications) the COM interface is not very fast (and cannot be very fast because there is a lot of code and a lot of conversions involved).
Wolfgang
Wolfgang Riedmann
Meran, South Tyrol, Italy
wolfgang@riedmann.it
https://www.riedmann.it - https://docs.xsharp.it
Serggio
Posts: 46
Joined: Sun May 14, 2017 5:03 pm
Location: Ukraine

StringBuilder performance

Post by Serggio »

You're welcome (see the attachment)
Attachments
StringBuilder.zip
(2.86 KiB) Downloaded 207 times
Karl-Heinz
Posts: 774
Joined: Wed May 17, 2017 8:50 am
Location: Germany

StringBuilder performance

Post by Karl-Heinz »

wriedmann wrote:Hi Chris,
I had tried to build a StringBuilder class in VO, but unfortunately it was slower that a simple string concatenation as in the 2nd sample.
Hi Wolfgang,

i agree, even when i use static memory only i see no speed advantages. Maybe i overlooked something, but when i compare the results of your stringbuilder with mine the speed differences are not that much as i would expect.

Code: Select all

CLASS StringbuilderMem  INHERIT Vobject
PROTECT _ptrValue  AS BYTE PTR
PROTECT _dwCurrentPos AS DWORD
PROTECT _dwStep := 2000 AS DWORD 

DECLARE METHOD Append
DECLARE METHOD GetString 

METHOD Append ( cValue AS STRING )  AS VOID PASCAL CLASS StringbuilderMem  
LOCAL dwLen AS DWORD             

      
	dwLen := SLen ( cValue ) 
		

	IF dwLen > 0  
      	
				
		IF _dwCurrentPos + dwLen  > MemLen ( _ptrValue ) 
					
//		  ? "MemRealloc"  ,  MemLen ( _ptrValue )  , dwLen , _dwCurrentPos  
					
			_ptrValue := MemRealloc ( _ptrValue , MemLen (_ptrValue ) + _dwStep )   
					
		ENDIF 	
				

  	   MemCopyString ( PTR ( _CAST , _ptrValue  + _dwCurrentPos )  , cValue , dwLen )  

     	_dwCurrentPos += dwLen
	     	
	ENDIF	     	


	RETURN 
METHOD Destroy() CLASS StringbuilderMem 

	
	UnRegisterAxit(SELF) 
   	
	IF _ptrValue != NULL_PTR 
		MemFree ( _ptrValue ) 		
		
	ENDIF 	

	RETURN NIL 


METHOD GetString() AS STRING PASCAL CLASS StringbuilderMem 
 
	IF  _ptrValue == NULL_PTR  .OR. _dwCurrentPos == 0
		RETURN NULL_STRING
		
	ENDIF			
		
	RETURN Mem2String ( _ptrValue , _dwCurrentPos ) 

	                      

METHOD Init( nCapacity ) CLASS StringbuilderMem  


	Default (@nCapacity, _dwStep )   
	

	_ptrValue := MemAlloc ( nCapacity )
		
	_dwStep := nCapacity
	 
  	RegisterAxit ( SELF )
  	           

	RETURN SELF  


regards
Karl-Heinz
User avatar
ArneOrtlinghaus
Posts: 419
Joined: Tue Nov 10, 2015 7:48 am
Location: Italy

StringBuilder performance

Post by ArneOrtlinghaus »

I have also made the experience that often repeated string operations with strings for 1000 characters and more get very expensive. In VO already many years ago I made a class similar to stringbuilder to use memalloc functions for avoiding triggering the garbage collector and there was a huge difference in speed. Now with X# it is very similar: the dynamic memory can get cost intensive. Making tests with a performance profiler show that much time goes into treating strings, even if fully strong typed.
mainhatten
Posts: 200
Joined: Wed Oct 09, 2019 6:51 pm

StringBuilder performance

Post by mainhatten »

wriedmann wrote:I have not tested that, but in my experience (and I do a LOT of COM interaction between X# modules and VO applications) the COM interface is not very fast (and cannot be very fast because there is a lot of code and a lot of conversions involved).
Hi Wolfgang,
gut reaction hints at following my second programming mantra: "Chunky, not Chatty" when it comes to calling across layers, as such layers sometimes have realistc physical borders - in this case the marshalling code. I am pretty certain that your first example done across COM into Stringbuilder, would be slower - at least at first / for strings not really large. The second example, first concatenating lots of tiny strings into intermediate, then doing 1 large append - there the benefit of not tasking memory managment with large discarded memory areas might be better as target string is in multi-megabyte range.

In vfp we have similar issues, typically when string sizes rize above 10K and memory allotment is set for small VM. Typical response is similar to your second way (as we have no StringBuilder type), although often with the twist of not only adding small strings into 1 string, but a small array of strings, which then can be concatenated in 1 line

Code: Select all

laTmp = ""   && setting all elements to 1 start value is nice in this context
for lnRun = 1 to 7
    *-- build laTmps
next
lcLargeString = lcLargeString + laTmp[1] + laTmp[2] + laTmp[3] + laTmp[4] + laTmp[5] + laTmp[6] + laTmp[7] 
as the slow part is not the concat of one or more strings, but the release of previous var, claiming new memory and assigning the total of right side of the line. Can be seen by measuring: as lcLargestring grows, adding strings of identical length gets slower as lcLargeString grows.
But easiest way (even if going against "RAM is always faster" reflex) is to open a buffered low level file and just appending the new strings until result is finished. If needed loading them once with FileToStr() for further processing is often faster than always memcpying it around in process space, as all internal memory allotment and garbage collection is sidestepped until final load.
Unixoid behaviour makes sense there and is even easier to code and read. (Noticed that xSharp does not differentiate between buffered or unbufferef LLF, but as buffered was/is vfp default behaviour, probably xSharp LLF implementation defaults to buffered as well. Question already raised on GIT)

That was true on old HD, and SSD improved write throughput as well.

regards
thomas
Post Reply