patternMinor
Stack-based mini-stack
Viewed 0 times
basedstackmini
Problem
I've created a
Obviously if the stack is an object it lives on the heap and its creation and destruction takes a significant amount of time. Therefore any heap based option is out for loops that create and destroy the
Is there anything that can be altered to:
```
unit Ministacks;
interface
const
DefaultSize = 31;
type
///
/// The ministack stores 31 elements on the system's stack.
/// It is coded for raw speed.
/// The stack is safe for holding managed types
/// It does not do range checking, other than through Assertions at debug time
///
TMiniStack = record
{$IFDEF DEBUG}
strict private
function capacity: Integer;
{$ENDIF}
private
SP: Integer;
{$IFDEF CPUX64}
Filler: integer; // Keep array aligned
{$ENDIF}
{$IFDEF DEBUG}
HeapFlag: TGUID;
HeapSize: Integer;
{$ENDIF}
Items: array[0..DefaultSize - 1] of T;
function GetItem(index: Integer): T; inline;
public
procedure Free;
///
/// Initializes the stack.
/// Must be called before the stack can be used.
///
procedure Init; inline;
function Pop: T; inline;
procedure Push(const Item: T); inline;
///
/// Returns the top item on the stack, does not alter the stack pointer.
///
///
function Peek: T; inline;
function IsEmpty: Boolean; inline;
///
/// Allows the stack to be accessed as a read-only array.
///
property Item[index: Integer]: T read GetItem;
property Count: integer read SP;
end;
MiniStack = class
public type
PStack = ^Stack;
Stack = TMiniStack;
public
///
/// Creates new ministack on the heap.
///
/// The maximum number of elements the stack can hold
/// Pointer to the newly created stack.
///
/// Do not create and destroy a Ministack in a loop, use the stack based Ministack i
MiniStack to assist in removing recursion from routines using a manual stack.Obviously if the stack is an object it lives on the heap and its creation and destruction takes a significant amount of time. Therefore any heap based option is out for loops that create and destroy the
TStack.Is there anything that can be altered to:
- make it faster?
- improve the checking during debug time?
```
unit Ministacks;
interface
const
DefaultSize = 31;
type
///
/// The ministack stores 31 elements on the system's stack.
/// It is coded for raw speed.
/// The stack is safe for holding managed types
/// It does not do range checking, other than through Assertions at debug time
///
TMiniStack = record
{$IFDEF DEBUG}
strict private
function capacity: Integer;
{$ENDIF}
private
SP: Integer;
{$IFDEF CPUX64}
Filler: integer; // Keep array aligned
{$ENDIF}
{$IFDEF DEBUG}
HeapFlag: TGUID;
HeapSize: Integer;
{$ENDIF}
Items: array[0..DefaultSize - 1] of T;
function GetItem(index: Integer): T; inline;
public
procedure Free;
///
/// Initializes the stack.
/// Must be called before the stack can be used.
///
procedure Init; inline;
function Pop: T; inline;
procedure Push(const Item: T); inline;
///
/// Returns the top item on the stack, does not alter the stack pointer.
///
///
function Peek: T; inline;
function IsEmpty: Boolean; inline;
///
/// Allows the stack to be accessed as a read-only array.
///
property Item[index: Integer]: T read GetItem;
property Count: integer read SP;
end;
MiniStack = class
public type
PStack = ^Stack;
Stack = TMiniStack;
public
///
/// Creates new ministack on the heap.
///
/// The maximum number of elements the stack can hold
/// Pointer to the newly created stack.
///
/// Do not create and destroy a Ministack in a loop, use the stack based Ministack i
Solution
-
Stack with fixed size has very troublesome usability. That is why modern Operating Systems don't limit stack size by default, only to provide some reasonable 'max' that is used to detect real stack overflow caused by an infinite recursion. And they also don't allocate whole possible stack at once.
For example, on Windows a thread created withing a process will get its stack allocated with 1 or a few pages and one guard page at the end.
The guard page is flagged as invalid from the perspective of hardware processor.
When any instruction processed by the processor touches the guard page, the processor generates exception state which is then handled by the Operating System which in turn finds out it is an stack overflow state and allocates another page (or a few more) and adds them to the thread's stack space (modifies definition of the virtual memory layout of the SS segment selector).
So the effect is that the stack can grow automatically as needed. The grows happens only when the guard page was hit. This guard page hit test is done at 0 cost in the hardware.
Similar approach would be useful also in this class. Either reallocate the
Without auto growing stack and without any runtime checking you'll soon face mysterious access violations at best and random application behavior at worst. Both being hard to guess and find bugs.
(Also profiling your code would probably show that even if you leave the run-time safety checks in there the cost payed would be very small compared to the other things the functions do)
-
Debug time checking would get little bit faster if instead of
-
Also analyzing the assembly language code generated by the compiler from your class is a good guide for identifying if your code is optimal. In the generated assembly code the most frequently used operations (Push, Pop) should be also very short and efficient.
Some related things you may want to read
Stack with fixed size has very troublesome usability. That is why modern Operating Systems don't limit stack size by default, only to provide some reasonable 'max' that is used to detect real stack overflow caused by an infinite recursion. And they also don't allocate whole possible stack at once.
For example, on Windows a thread created withing a process will get its stack allocated with 1 or a few pages and one guard page at the end.
The guard page is flagged as invalid from the perspective of hardware processor.
When any instruction processed by the processor touches the guard page, the processor generates exception state which is then handled by the Operating System which in turn finds out it is an stack overflow state and allocates another page (or a few more) and adds them to the thread's stack space (modifies definition of the virtual memory layout of the SS segment selector).
So the effect is that the stack can grow automatically as needed. The grows happens only when the guard page was hit. This guard page hit test is done at 0 cost in the hardware.
Similar approach would be useful also in this class. Either reallocate the
Items array to make it bigger (see source code of TList which does the same) or allocate several pointers to several Items (buckets) where only 1 of them is the current one containing the TOS pointer.Without auto growing stack and without any runtime checking you'll soon face mysterious access violations at best and random application behavior at worst. Both being hard to guess and find bugs.
(Also profiling your code would probably show that even if you leave the run-time safety checks in there the cost payed would be very small compared to the other things the functions do)
-
Debug time checking would get little bit faster if instead of
HeapFlag being 16bit long array of bytes with states identified by special magic constants, there would be an enum flagged to be sized as native processor data type (32bit cardinal on 32bit system, 64bit cardinal on 64bit system) as manipulating such types costs almost always only 1-clock-
Also analyzing the assembly language code generated by the compiler from your class is a good guide for identifying if your code is optimal. In the generated assembly code the most frequently used operations (Push, Pop) should be also very short and efficient.
Some related things you may want to read
- [1] Stack class as implemented by Microsoft for the .NET framework, see the
Pushimplementation for auto-growing strategy - http://referencesource.microsoft.com/#mscorlib/system/collections/stack.cs
- [2] TFPList class as implemented by FreePascal runtime.
function TFPList.Add(Item: Pointer): Integerimplements thePushoperation. Seefunction TFPList.Expand: TFPListfor auto-growing strategy - http://svn.freepascal.org/svn/fpc/trunk/rtl/objpas/classes/lists.inc
- [3] Wikipedia describes 2 basic Stack implementations, one backed by linked list the other backed by array (your case) - http://en.wikipedia.org/wiki/Stack_(abstract_data_type)#Implementation
- [4] Stack implementation backed by list of fixed size arrays (best of the two worlds) - http://www.codeproject.com/Articles/12349/Fast-Indexable-Stacks
- [5] Microsoft Support, Description of the stack checking for Windows NT-based applications - http://support.microsoft.com/kb/100775
- [6] j00ru//vx tech blog, Fun facts: Windows kernel and Guard pages - http://j00ru.vexillium.org/?p=1594
- [7] Windows Internals, Mark Russinovich, David A. Solomon, Alex Ionescu, chapter "Stacks" - Google books...
- [8] OSDev.org, How kernel, compiler, and C library work together, see "stack probing" - http://wiki.osdev.org/How_kernel,_compiler,_and_C_library_work_together#alloca.2C_main
- [9] Stack Overflow, How to find CPU cycle for an assembly instruction - https://stackoverflow.com/a/692727/2626313
- [10] Stack Overflow, Disassembling a DLL coded in Delphi — how to start? (note that since Delphi 7 some disassembler is built-in right inside the IDE) - https://stackoverflow.com/a/2345632/2626313
- [11] Some ReactOS internals describing the auto-growing stack as implemented by the Operating System
- Stack probing code - http://svn.reactos.org/svn/reactos/trunk/reactos/lib/sdk/crt/except/i386/chkstk_asm.s?view=markup
- Exception handler, see "stack" - http://svn.reactos.org/svn/reactos/trunk/reactos/ntoskrnl/ke/i386/traphdlr.c?view=markup
- Stack switching code (one process can have many hardware stacks but only 1 at a time is active), see "KeSwitchKernelStack" - http://svn.reactos.org/svn/reactos/trunk/reactos/ntoskrnl/ke/i386/usercall_asm.S?view=markup
- Some stack resizing code as response to page fault when the guard page is hit, see "MiCheckForUserStackOverflow" - http://svn.reactos.org/svn/reactos/trunk/reactos/ntoskrnl/mm/ARM3/pagfault.c?view=markup
Context
StackExchange Code Review Q#61511, answer score: 6
Revisions (0)
No revisions yet.