<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://pickwiki.org/index.php?action=history&amp;feed=atom&amp;title=How-To_Improve_Large_String_Performance_Using_Buffers</id>
	<title>How-To Improve Large String Performance Using Buffers - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://pickwiki.org/index.php?action=history&amp;feed=atom&amp;title=How-To_Improve_Large_String_Performance_Using_Buffers"/>
	<link rel="alternate" type="text/html" href="https://pickwiki.org/index.php?title=How-To_Improve_Large_String_Performance_Using_Buffers&amp;action=history"/>
	<updated>2026-04-28T22:08:40Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.43.0</generator>
	<entry>
		<id>https://pickwiki.org/index.php?title=How-To_Improve_Large_String_Performance_Using_Buffers&amp;diff=2670&amp;oldid=prev</id>
		<title>Rkgozar: /* Discussion */</title>
		<link rel="alternate" type="text/html" href="https://pickwiki.org/index.php?title=How-To_Improve_Large_String_Performance_Using_Buffers&amp;diff=2670&amp;oldid=prev"/>
		<updated>2020-09-26T14:32:30Z</updated>

		<summary type="html">&lt;p&gt;&lt;span class=&quot;autocomment&quot;&gt;Discussion&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;[[Users]] &amp;gt;&amp;gt; [[Rex_Gozar]] &amp;gt;&amp;gt; How-To Improve Large String Performance Using Buffers&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= How-To Improve Large String Performance Using Buffers =&lt;br /&gt;
2007-11-16 by Rex Gozar&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Brief ==&lt;br /&gt;
Processes that create or manipulate large amounts of data (over 1MB) may be slow, impacting user response time and system performance.  This performance degradation can be traced to Universe needing to reallocate and copy memory repeatedly as the amount of data grows.  If you can allocate all the memory that you&amp;#039;ll need up front, you can eliminate this bottleneck.  This document discusses a coding strategy for doing this.&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
In processing a 12MB text file, execution time went from 83 seconds to 4 seconds by simply replacing VAR&amp;lt;-1&amp;gt; &amp;quot;append field&amp;quot; syntax with VAR[PTR,LENGTH] &amp;quot;substring assignment&amp;quot; syntax.  Your mileage may vary.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
*** Using VAR&amp;lt;-1&amp;gt; &amp;quot;append field&amp;quot; syntax ***&lt;br /&gt;
ITEM = &amp;quot;&amp;quot;&lt;br /&gt;
LOOP&lt;br /&gt;
   VALUE = some large amount of data&lt;br /&gt;
   ITEM&amp;lt;-1&amp;gt; = VALUE&lt;br /&gt;
WHILE (SOME.CONDITION) REPEAT&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
*** Preallocating memory and using VAR[PTR,LENGTH] &amp;quot;substring assignment&amp;quot; syntax ***&lt;br /&gt;
MAXBYTES = some calculated value&lt;br /&gt;
BUFFER = SPACE(MAXBYTES)            ;* allocates memory in one shot&lt;br /&gt;
BUFPTR = 0                          ;* always points to last byte in buffer&lt;br /&gt;
LOOP&lt;br /&gt;
   VALUE = some large amount of data&lt;br /&gt;
   BUFFER[(1 + BUFPTR), (1 + LEN(VALUE))] = @FM:VALUE&lt;br /&gt;
   BUFPTR += (1 + LEN(VALUE))&lt;br /&gt;
WHILE (SOME.CONDITION) REPEAT&lt;br /&gt;
BUFFER = BUFFER[2, (BUFPTR - 1)]    ;* strip first @FM&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Discussion ==&lt;br /&gt;
In my business application, I work with vendor-supplied text files that contain fixed-width fields.  These files can be several megabytes in size.  I created a subroutine to convert the fixed-widths fields for each row to tab separated values.&lt;br /&gt;
&lt;br /&gt;
Initially, I used &amp;quot;append field&amp;quot; syntax (i.e. VAR&amp;lt;-1&amp;gt; = VALUE) to build the resulting TSV file.  Running it took over 83 seconds -- ouch!  I put in display statements to see what was going on, and I noticed that as more rows were processed, the slower and slower it got.&lt;br /&gt;
&lt;br /&gt;
Now for the theoretical stuff: When we create a variable within a program, Universe has to allocate memory to hold the contents of the variable. When we append data to the variable, Universe has to see if the new data will exceed the amount of memory that it allocated.  When it does, it has to ask the system for a new chunk of memory, then copy the contents of the old chunk to the new plus the new data.  Large character strings tend to trigger this memory reallocation and copy over and over, slowing down the entire process.  To keep this from happening, we need to allocate all the memory we need when we first initialize the variable.&lt;br /&gt;
&lt;br /&gt;
The revised program is shown below.  I create a NEWBUF that is twice the size of the original ITEM.  In my case, I figured that would be big enough to prevent memory reallocation.  Elapsed run time went from 83 seconds to 4, over 20 times faster.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0001:       SUBROUTINE FIXED.TO.TSV(ITEM, TABMAP)&lt;br /&gt;
0002: * Take a flat file of fixed-width values and insert&lt;br /&gt;
0003: * tabs so we don&amp;#039;t have to code field widths in all&lt;br /&gt;
0004: * the other programs.&lt;br /&gt;
0005: ******&lt;br /&gt;
0006: ***&lt;br /&gt;
0007: * Set up a &amp;quot;macro&amp;quot; to get the system seconds&lt;br /&gt;
0008: * and milliseconds for performance timing.&lt;br /&gt;
0009: ***&lt;br /&gt;
0010:       EQU GET$TICKS LIT &amp;#039;(SYSTEM(99):((SYSTEM(12) * 1000) &amp;quot;R%3&amp;quot;))&amp;#039;&lt;br /&gt;
0011: ***&lt;br /&gt;
0012: * Our TABMAP contains the field lengths (widths)&lt;br /&gt;
0013: * for each tab delimited field.  Set the maximum&lt;br /&gt;
0014: * attribute mark count (MAXAMC).&lt;br /&gt;
0015: ***&lt;br /&gt;
0016:       MAXAMC = DCOUNT(TABMAP, @FM)&lt;br /&gt;
0017: ***&lt;br /&gt;
0018: * Get the starting ticks, since we also want to&lt;br /&gt;
0019: * include memory allocation in our timings.&lt;br /&gt;
0020: ***&lt;br /&gt;
0021:       START.TICKS = GET$TICKS&lt;br /&gt;
0022: ***&lt;br /&gt;
0023: * Initialize a variable in memory that&amp;#039;s large enough&lt;br /&gt;
0024: * hold the existing data and the new tabs we&amp;#039;ll be adding.&lt;br /&gt;
0025: ***&lt;br /&gt;
0026:       NEWBUF = ITEM:ITEM&lt;br /&gt;
0027:       NEWPTR = 0&lt;br /&gt;
0028: ***&lt;br /&gt;
0029: * Loop to remove each line for processing.&lt;br /&gt;
0030: ***&lt;br /&gt;
0031:       ITEM = ITEM&lt;br /&gt;
0032:       LOOP&lt;br /&gt;
0033:          LINE = REMOVE(ITEM, MORE.LINES)&lt;br /&gt;
0034:          GOSUB CONVERT.LINE&lt;br /&gt;
0035:          ***&lt;br /&gt;
0036:          * Note that the @FM is prepended to the&lt;br /&gt;
0037:          * line, since we are not using &amp;lt;-1&amp;gt;&lt;br /&gt;
0038:          * notation.&lt;br /&gt;
0039:          ***&lt;br /&gt;
0040:          NEWBUF[1+NEWPTR,1+LEN(LINE)] = @FM:LINE&lt;br /&gt;
0041:          NEWPTR += (1 + LEN(LINE))&lt;br /&gt;
0042:       WHILE MORE.LINES REPEAT&lt;br /&gt;
0043: ***&lt;br /&gt;
0044: * Strip off the leading @FM.&lt;br /&gt;
0045: ***&lt;br /&gt;
0046:       ITEM = NEWBUF[2,NEWPTR-1]&lt;br /&gt;
0047: ***&lt;br /&gt;
0048: * Show the elapsed time in milliseconds.&lt;br /&gt;
0049: ***&lt;br /&gt;
0050:       ELAPSED = GET$TICKS - START.TICKS&lt;br /&gt;
0051:       DISPLAY ELAPSED&lt;br /&gt;
0052:       RETURN&lt;br /&gt;
0053: *&lt;br /&gt;
0054: *&lt;br /&gt;
0055: CONVERT.LINE:&lt;br /&gt;
0056:       WORKBUF = &amp;quot;&amp;quot;&lt;br /&gt;
0057:       WORKPTR = 1&lt;br /&gt;
0058:       FOR AMC = 1 TO MAXAMC&lt;br /&gt;
0059:          FIELDLEN = TABMAP&amp;lt;AMC&amp;gt;&lt;br /&gt;
0060:          WORKBUF&amp;lt;AMC&amp;gt; = LINE[WORKPTR, FIELDLEN]&lt;br /&gt;
0061:          WORKPTR += FIELDLEN&lt;br /&gt;
0062:       NEXT AMC&lt;br /&gt;
0063:       LINE = CONVERT(@FM, CHAR(9), WORKBUF)&lt;br /&gt;
0064:       RETURN&lt;br /&gt;
0065:    END&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Could this subroutine be faster?  Sure it can, but the point is all I did was one change and now it runs in 1/20th of the time.&lt;br /&gt;
&lt;br /&gt;
== Recommendations ==&lt;br /&gt;
I only use this technique when performance can be noticeably increased.  For example, building a savelist in a program may only be a few milliseconds faster using this technique, but who&amp;#039;s going to notice? In many cases, readability and maintainability should take precedence over performance.&lt;br /&gt;
&lt;br /&gt;
Always measure performance before and after optimizing your code.  And only optimize the parts that take the longest to perform.&lt;br /&gt;
&lt;br /&gt;
http://www.autopower.com/rgozar/pixel.gif&lt;/div&gt;</summary>
		<author><name>Rkgozar</name></author>
	</entry>
</feed>