Silviu-Marius Ardelean's blog

a software engineer's web log

Process Status Analysis – the first steps

I am pleased to announce my first cross-platform and open source project, the Process Status Analysis tool, available on GitHub.

The Process Status Analysis (psa) version 0.2 is available for Windows and Linux (Debian derived / Ubuntu tested) Operating Systems.
Download: psa for Ubuntu Linux x64 (64)
Download: psa for Ubuntu Linux x86 (63)
Download: psa.exe for Windows x64 (67)
Download: psa.exe for Win32 (66)

You may wonder why I did it or what it brings new. Well, I did it for fun, in my spare time and I will continue improving it when I’ll find a time to do it.

The project is written in modern C++ using idioms from the C++ 1x standards. Even if initially was done as a C++ for Windows only, during the past days I managed the port of it for Linux using Visual Studio 2017’s project templates and a connection via SSH.
In general, the source code base is similar, differing just by OS specific stuff.

In case you want to find out more about how to develop C++ Linux projects from the best development tool (imho), Visual Studio, you can find more information on Visual Studio development team blog.

Related to this psa project, the Linux version requires libprocps4-dev library in order to build.

The main reason for starting this project was that I wanted to know what’s the total memory amount used by my Chrome browser. I know it uses a lot of resources but not that much… 🙂

Even if my preferred processes analysis tool, the Process Hacker offers a lot of processes administration possibilities but it didn’t provide what I want, so I decided to enjoy a bit.

Chrome processes in Process Hacker

Sample – Google Chrome processes in Process Hacker tool

So, what I achieved by psa.exe was something like:

A bit too much in my humble opinion…
The features this tool offers includes:

Get all processes loaded in memory information

I case you want to have a snapshot of all the processes loaded in the OS’s memory you can have it with.

Get process only used memory

With psa.exe you can reach the used memory by a specific parameter -o after the process name or at least a part of it’s name.

Currently, there is no string replace ‘*’ but it’s ongoing.


Print processes tree

Storing the processes’ data within a generic tree done by me I took the decision to print the processes tree output, in a similar way there is in Windows with tree.exe tool or on Linux in pstree or even htop.

 

psa.exe partial tree sample

Process Status Analysis partial tree of Windows process

 

Top most “expensive” processes

In case you want to see what are the most expensive processes within your operating system you can achieve it with:

or simpler “psa -e” in case you’re sure you want top 10 expensive processes (the default value).

Redirect output to a file

From the standard output the information can be easely redirected to a file.

Kill process feature

This feature was not implemented yet but in case we need it we can be done it easily with the existing tools on the target OS (ex. Task Manager, Process Exporer/Hacker, pskill.exe for Windows or the combination ps + kill on LInux).

Feedbacks and improvements
Any constructive feedback, suggestions, contributions to improvements are appreciated.
Feel free to add any issue you find, wish or suggestion you have in the GitHub repository, the 
Issues section or here as a comment.

Share

The First Hackathon Experience #HackTM2016

It has passed approximately two months from my first hackathon experience, the #HackTM2016 from Timisoara. This delay I’m publishing this article is mostly because of the release period to the job and other personal stuff I had to do.

This experience was a reminder of my high school programming competitions where I have attended with different applications written in the already ancient Turbo Pascal 7.0 or Borland C++ 3.1. At that moment, probably because of my high school informatics great teachers, the competitive spirit between colleagues was so intense and we have competed for each other in creating applications within the local competitions and in other cities from the country.

This year, the hackathon competition from Timisoara was between 20 – 22 May 2016 to the UPT Restaurant, Timisoara and from my point of view it was a great event. The event had categories such eHEALTH, Robots, Smart City, GameDev, IoT, Education, Fintech and Open but no Automotive category as I expected according to previous pitching sessions.

Before the effective event, I have attended in two pitching sessions seeing different challenging proposals in few domains. The most appealing were some ideas of the ADAS team from Continental, a team I was apart between 2006-2008 (Siemens VDO department, video camera projects at that time).

Even if I had no team, because of curiosity, I bought the ticket and I went to the hackathon pitching session, trying to figure a team and to decide a project to deal with.

The Continental ADAS team came with hardware and software support how to hack their platform.

We were able to find a small team, first by two persons and later for a period five persons. Unfortunately, those last three persons left us while realizing that the project is not what they have imagined initially, that other projects look more challenging or that their knowledge was not matching with what it was required for our project.
So, I installed on my laptop some software used internally by the ADAS team and we took a ride to record real traffic data. Once we had this data, the effective programming for grabbing it can be done in office conditions.

Driving to grab ADAS data

The idea

Having the radar and camera information, GPS data and vehicle dynamics, we were thinking is that in the near future, even before self-driving cars on the common roads, this information might be sent in the cloud to be used by the traffic management solutions in order to be optimized the traffic. Even these days, in my city, Timisoara, such traffic management solution is implemented but is not based on cars internal information. Maybe, this idea will not be applied.

What we have done – the effective experience

So, faced with this challenge, me and my teammate Nikola Kolevski, a Serbian nice guy, have started the work on Friday evening. We have met on that pitching meeting and we had a great cooperation. I was the back-end guy and he the man from the cloud. Because we “spoke” different languages, me C++ and he Python, we have decided to speak the REST way.

During our job, we have improvised the “project management” with a Trello board. Of course, we used GitHub for source control.

What I effectively had to do it was to inject some code within a .DLL that was loaded into a Continental application and sent the ADAS information in the cloud. Nicola was the guy that received and collected the data. Unfortunately, even if we tried few times, we found no front-end available teammate, just some with slices of time in terms of availability.

ADAS AE-RO HackTM concept

I decided to use the benefits of asynchronous programming and I used the Casablanca REST API. But because of the Continental’s Visual Studio 2012 project constraints (!!!), during the Friday evening, I faced up with the challenge to find and adapt a Casablanca library older version to the project. The latest Casablanca’s versions are available for VS 2013 and VS 2015 only. Thanks to NuGet tool, I finally managed to get and use the 1.10 version.

On Saturday we managed the effective work, faced with some challenges related to the TCP/IP communication between our applications because of some Python server configuration, but finally, our applications were able to talk each other via REST services, in the night and I have tweeted.

After a sleep break, next morning we did some last code adjustments and being time constrained we tried to improvise a frontend. Also, we had a second trip with the Continental’s Mercedes car to test what we did, using a 4G network.

As usual for a hackathon, everything was on the run with adrenaline, so quite nice! At the end of the hackathon, we had to prepare for the hackathon jury’s visit and later for presentation because we have qualified in the first three teams in the Robots section. Yeah, we were included there because the Automotive category did not exist, but it was fine. 🙂

The truth is that the Continental was the only automotive represented company even if there are many such companies in Timisoara, but it seems they are not interested in such events.

Other interesting things from #HackTM2016

Attending to #HackTM2016 was a great experience, I have seen many interesting projects but from far the most exciting one was the Symme 3D Printer, a local start-up.

Conclusions

In an internet of things world, our based idea might connect the cars ADAS information with intelligent management future systems to improve the traffic flow in big cities.

It is obvious but I want to underline: if you want to have success in a hackathon, try having a core team before the event. Otherwise, you might just have fun coding but not ending the prototype.  Strategies of being efficient would be a great asset.

Meeting new people and trying to do something from the scratch in a limited time is a very cool thing even if you don’t have time to write optimized and tested code. Also, you might learn a lot of new things.

Definitely, I will repeat this experience in the future!

Share

Getting Table’s indexes experiences – workaround

Trying to get table indexes information in SQL Server 2012 I identified a strange situation within a specific method that I was using so long but it was not acting as expected in one situation.

The way of getting indexes information using the ODBC C API into that old and inherited method looks like:

Usually, I got the right information about indexes but in one situation I encounter a strange behavior. It’s about having a clustered index into a scenario.

I have a table that contains two indexes referenced to some fields: IndexField_1 and IndexField_3 mapped over int, NULL fields. When IndexField_1 is Non-Unique, Non-Clustered and IndexField_3 is Clustered index I get the right information.
But if the index IndexField_1 is Clustered and the IndexField_3 is Non-Unique, Non-Clustered I get no information about IndexField_1 index (eg. szIdxName and szIdxColName are “” and their length is -1 that means SQL_NULL_DATA). Within while loop, with the next iteration, I get correct information about the second index IndexField_3.

Because SQLExtendedFetch() is deprecated I tried using SQLFetchScroll() but the behavior is the same from my interest point of view.

I was not sure whether the problem is with SQLStatistics, the bindings or SQLFetchScroll (they all always return SQL_SUCCESS). It looks such a problem with the driver when the first index is clustered.
According to SQLStatistics documentation if my swType parameter is SQL_TABLE_STAT I have no information for index or field. But for this scenario, I had no indexes of combined fields.
For the good scenario I observed that my while loop had 3 iterations including one of having swType = SQL_TABLE_STAT without information in szIdxName. But for the bad scenario, the loop had only 2 iterations. So it looks like SQLExtendedFetch() is not getting the last one index.

After some googling and research without very significant solutions, I decided to apply a workaround by avoiding the old API and I rewrite my method.

So, in order to get table indexes information, I have chosen a direct SQL query into SYS tables: sys.tables, sys.indexes, sys.schema.

Because I preferred getting also information about the index’s composed fields, I applied a second additional SQL query:

and I have collected data into a container of defined structure according to my SQL Indexes interest information:

The last member vectColumns stores information about the columns that are used for a specific index.

Finally, the new method that collects table indexes information looks like:

In this way I have complete information about the indexes of my tables.

Conclusion: When the C/C++ API doesn’t give you any hopes don’t forget that SQL saves you.

Share

HTML files generation using XML and XSLT with Microsoft XML DOM API

This short tutorial shows how easy it’s to generate reports in HTML pages using Microsoft XML DOM API together XML and XSLT.

XML (Extensible Markup Language) became a universal standard of encoding data in a format that is both human-readable and machine-readable. It’s widely used in business applications and even Microsoft Office uses it into internal file formats.
XSLT is used for XML documents decoration. Once we have data into a XML files, using the XSLT (Extensible Stylesheet Language Transformations) we can easily generate HTML and xHTML files. XSLT is a W3C recommendation still from 16. November 1999 and in the meantime, it was extended with a new version XSLT v2.0.

XSLT uses XPATH to get the XML’s tags information, complete the predefined temples and transform results into a .html document.
Each decent browser has support for XML and XSLT. All we have to do it’s to link two such files (.xml and .xslt) and once we execute the XML file the browser will generate and render our XHTML content.

But in case we are writing non-browser applications the HTML generation becomes a bit complicated in case you are not satisfied with a hard-coded solution and want a flexible solution.

Using the Microsoft’s XML Core Services (MSXML) our job became a piece of cake. We focus once over the HTML generator and later in case we want to change something into our look and content we have to deal only with the .xml and .xslt files.

Because of using COM don’t forget proper the calls of CoInitialize() and CoUninitialize().
Here are two samples files generated with the test application using the upper method: sample_1, sample_2.

The combination of XML, XSLT and XPATH offers a very flexible way to generate HTML files. With such an approach even native application does not need to change in case we change the HTMLs look. Within the presented case the hard-coded solutions are avoided and most probably a new recompilation is not needed in case we want to change data content (XML) or the look (XSLT).
In case you want to add sophisticated HTML code (ex. colored, formatted, images, etc.) you need to convert that code into XHTML format before adding data into the .XSLT file.

demo application (1018)

Share

SubclassWindow() method issues in projects base on MFC Feature Pack

The Problem
Trying to paint a background image into client area of a MDI application build in VC++ 6.0 to VC++ 2005 IDE it’s not a difficult task.
In case you need, you can find easily good references. For instance, there are two references from Microsoft (KB129471 and KB103786) and one I prefer: a FAQ wrote by a friend of mine.

Unfortunately things are changing radically in case you’re following the same steps in a Visual C++ IDE that has MFC Feature Pack support. If you’re building from the scratch a VC++ 2008/VC++ 2010 a MDI project that has MFC Feature Pack support and you’re trying to apply sub-classing steps, you will have a big surprise in the moment you’re starting your application in debug mode. Effectively your application will crash in the moment you are trying to call SubclassWindow() in CMainFrame::OnCreate().

Problem details
Starting with MFC Feature Pack CMDIFrameWndEx is the new CMainFrame’s parent class instead of CMDIFrameWnd and the problem acts inside of Attach() method:

and the issue appears in the second ASSERT() macro

because CWnd::FromHandlePermanent(HWND hWnd) looks up into a permanent handle map and in returns existing CWnd pointer.

CHandleMap is the wrapper that implements the mapping mechanism between the pointers of MFC wrapped classes and the Windows object handles. Internally, this class has to dictionaries (m_permanentMap and m_temporaryMap) implemented as CMapPtrToPtr, m_nHandles – the number of handles, m_nOffset – the offset of handles in the object and it has a m_pClass pointer of CRuntimeClass (a run time class associated with all MFC classes).
In case you’re interest in more details, you can find more information here.

We have a pointer to a CHandleMap instance that is assigned with the returned pointer of a handle map returned by afxMapHWND(). The returned pointer pWnd it’s assigned with the result returned by pMap->LookupPermanent(hWnd). LookupPermanet() effectively search into a the permanent hash map for exiting HANDLEs and in our case it find it.

where

If the item having nHash key was found into m_pHashTable then the condition if (pAssoc->key == key) is TRUE because the attribute m_hWndMDIClient of CMDIFrameWnd is used yet.
So, effectively what LookupPermanent() has found in m_permanentMap map is m_hWndMDIClient. And because pMap->SetPermanent(m_hWnd = hWndNew, this) is one of the next call into Attach() method those ASSERTs are a must.
Even if those ASSERT() calls from Attach() are available only in debug mode (because of ASSERT() macro behavior) a release build would not save the situation. Soon or later you’ll get conflicts and the application will crash.

Trying to find where this has happened is not so complicated as long as we take in consider our CMainFrame class it’s derived from CMDIFrameWndEx a class that extends CMDIFrameWnd. If we are looking into CMDIFrameWndEx class implementation (AfxMDIClientAreaWnd.cpp) we will see that into this class SubclassWindow() method it’s called jet:

Subclassing a CWnd derived instance that has already a mapped HWND item is an error and these ASSERTs try to avoid this from development moment. Having two different CWnd-derived objects with the same HWND is not possible – the only exception is CDC instances that have 2 HWNDs (m_hDC and m_hAttribDC).
Related to my issue, according to Steve Horne from Microsoft, “anything that uses the MFC Feature Pack will be using CMDIFrameWndEx which is a very different beast. It has this feature built it as you’ve found out”.
The worst part is that “If you were able to subclass the Ex client area, you’d probably end up breaking a lot of the FluentUI features.”
The VS 2008 / VS 2010 wizard generates and use a lot of Feature Pack FluendUI items.

A bad solution
An approach might be trying to adapt sub-classing idea directly into CMainFrame class. So, the steps might be:

  • No CMDIClientWnd instance is needed (as in existing tutorials). So no more SubclassWindow() call in CMainFrame::OnCreate().
  • Handle WM_ERASEBKGND, WM_SIZE and WM_PAINT on CMainFrame.
  • CWnd::FromHandle() acquires a pointer to an MFC object pointer from CHandleMap via afxMapHWND().

    At the very first time everything looked nice. But unfortunately I have to admit Steve Horne’s observations. In different situations (most on resizing or moving messages) some of the FluentUI items were not correctly painted (some Ribbon items painting issues – different cases).

    So, a better solution is needed.

    A good but not perfect solution
    In my research, for projects base on MFC Feature Pack, there is no perfect solution for this issue. I mean something similarly with the good solutions that I mentioned in the beginning of this article but acts fine until the first IDE that use MFC Feature Pack.
    As we have seen on top trying to subclass a window with an already mapped is not a good idea.
    The solution is based on Joseph M. Newcomery’s idea, a well-known book writer and Microsoft Visual C++ MVP. Joe proposes “temporary” remapping only for the case we need – in my case painting actions. For the rest of the action the mapping process inside of framework continues in the classic way. It’s a “gross and ugly” solution but until having a better solution from Microsoft or others I consider it fine for my needs.

  • First step is to define a class CMDIClientWnd derived from CWnd and add WM_PAINT and WM_ERASEBKGND handle methods.
  • Catch the WM_PAINT message in CMainFrame via PreTranslateMessage() before the message is dispatched for execution and calling our redraw method.
  • Here is the RedrawClientArea() public method.

    So we create locally an instance of CMDIClientWnd and we attach it internally to ChandleMap::m_permanetMap via Attach(), not before detaching m_wndClientArea (an CMDIClientAreaWnd instance, attribute in CMDIFrameWndEx and as we have seen before it subclass the CMDIFrameWndEx in CMDIFrameWndEx::OnCreateClient()).

    The idea is that our CMDIClientWnd instance temporary replace m_wndClientArea instance of CMDIClientAreaWnd right before effective WM_PAINT message is dispatched via PreTranslateMessage().

  • Include your new class header (ex. MDIClientWnd.h) in MainFrm.cpp and call RedrawClientArea() in CMainFrame::OnSize().
  • If the child frames window is not tabbed style (when all client area is hidden) and the client area is still visible than we have to call RedrawClientArea() method from WM_MOVE and WM_SIZE handler of CChildFrame and we have to include MainFrm.h into ChildFrame.cpp.
  • Additionally, in order to make sure the painting message is received by main frame at application’s starting moment and your image is correctly painted from the beginning, please call pMainFrame->Invalidate() after pMainFrame->UpdateWindow() in InitInstance() method of your application class. Otherwise, if your application it’s starting with no opened document (for instance new document), your picture will appear only in the moment a WM_PAINT message is generated in CMainFrame (for instance when you resize your application, select the menu, etc).
  • A disadvantage of this approach is that the interest message (WM_PAINT) is not handled inside the class of m_wndClientArea, but the good point is that the rest of the messages are left at the correct class of the framework and will work correctly.
    Demo application (1691)

    Share

    Several C++ singleton implementations

    This article offers some insight into singleton design-pattern.
    The singleton pattern is a design pattern used to implement the mathematical concept of a singleton, by restricting the instantiation of a class to one object. The GoF book describes the singleton as: “Ensure a class only has one instance, and provide a global point of access to it.
    The Singleton design pattern is not as simple as it appears at a first look and this is proven by the abundance of Singleton discussions and implementations. That’s way I’m trying to figure a few implementations, some base on C++ 11 features (smart pointers and locking primitives as mutexs). I am starting from, maybe, the most basic singleton implementation trying to figure different weaknesses and tried to add gradually better implementations.
    The basic idea of a singleton class implies using a static private instance, a private constructor and an interface method that returns the static instance.

    Version 1
    Maybe, the most common and simpler approach looks like this:

    Unfortunately this approach has many issues. Even if the default constructor is private, because the copy constructor and the assignment operator are not defined as private the compiler generates them and the next calls are valid:

    So we have to define the copy constructor and the assignment operator having private visibility.

    Version 2 – Scott Meyers version
    Scott Meyers in his Effective C++ book adds a slightly improved version and in the getInstance() method returns a reference instead of a pointer. So the pointer final deleting problem disappears.
    One advantage of this solution is that the function-static object is initialized when the control flow is first passing its definition.

    The destructor is private in order to prevent clients that hold a pointer to the Singleton object from deleting it accidentally. So, this time a copy object creation is not allowed:


    error C2248: otherSingleton::otherSingleton ' : cannot access private member declared in class 'otherSingleton'
    error C2248: 'otherSingleton::~otherSingleton' : cannot access private member declared in class 'otherSingleton'

    but we can still use:

    This singleton implementation was not thread-safe until the C++ 11 standard. In C++11 the thread-safety initialization and destruction is enforced in the standard.

    If you’re sure that your compiler is 100% C++11 compliant then this approach is thread-safe. If you’re not such sure, please use the approach version 4.

    Multi-threaded environment
    Both implementations are fine in a single-threaded application but in the multi-threaded world things are not as simple as they look. Raymond Chen explains here why C++ statics are not thread safe by default and this behavior is required by the C++ 99 standard.
    The shared global resource and normally it is open for race conditions and threading issues. So, the singleton object is not immune to this issue.
    Let’s imagine the next situation in a multithreaded application:

    At the very first access a thread call getInstance() and pInstance is null. The thread reaches the second line (2) and is ready to invoke the new operator. It might just happen that the OS scheduler unwittingly interrupts the first thread at this point and passes control to the other thread.
    That thread follows the same steps: calls the new operator, assigns pInstance in place, and gets away with it.
    After that the first thread resumes, it continues the execution of line 2, so it reassigns pInstance and gets away with it, too.
    So now we have two singleton objects instead of one, and one of them will leak for sure. Each thread holds a distinct instance.

    An improvement to this situation might be a thread locking mechanism and we have it in the new C++ standard C++ 11. So we don’t need using POSIX or OS threading stuff and now locking getInstance() from Meyers’s implementation looks like:

    The constructor of class std::lock_guard (C++11) locks the mutex, and its destructor unlocks the mutex. While _mutex is locked, other threads that try to lock the same mutex are blocked.
    But in this implementation we’re paying for synchronization overhead for each getInstance() call and this is not what we need. Each access of the singleton requires the acquisition of a lock, but in reality we need a lock only when initializing pInstance. If pInstance is called n times during the course of a program run, we need the lock only for the first time.
    Writing a C++ singleton 100% thread safe implementation it’s not as simple as it appears as long as for many years C++ had no threading standard support. In order to implement a thread-safe singleton we have to apply the double-checked locking (DCLP) pattern.
    The pattern consists of checking before entering the synchronized code, and then check the condition again.
    So the first singleton implementation would be rewritten using a temporary object:

    This pattern involves testing pInstance for nullness before trying to acquire a lock and only if the test succeeds the lock is acquired and after that, the test is performed again. The second test is needed for avoiding race conditions in case other thread happens to initialize pInstance between the time pInstance was tested and the time the lock was acquired.
    Theoretically, this pattern is correct, but in practice is not always true, especially in multiprocessor environments.
    Due to this rearranging of writes, the memory as seen by one processor at a time might look as if the operations are not performed in the correct order by another processor. In our case, the assignment to pInstance performed by a processor might occur before the Singleton object has been fully initialized.
    After the first call of getInstance() the implementation with pointers (non-smart) needs pointer to that instance in order to avoid memory leaks.

    Version 3 – Singleton with smart pointers
    Until C++ 11, the C++ standard didn’t have a threading model and developers needed to use external threading APIs (POSIX or OS dependent primitives). But finally C++ 11 standard has threading support.
    Unfortunately, the first C++ new standard implementation in Visual C++ 2010 is incomplete and threading support is available only starting with beta version of VS 2011 or the VS 2012 release preview version.

    As we know, in C++ by default the class members are private. So, our default constructor is private too. I added here in order to avoid misunderstanding and explicitly adding to public / protected.
    Finally, feel free to use your special instance (singleton):

    And no memory leaks emotion… 🙂
    Multiple threads can simultaneously read and write different std::shared_ptr objects, even when the objects are copies that share ownership.
    But even this implementation using double checking pattern but is not optimal to double check each time.


    Version 4 – Thread safe singleton C++ 11
    To have a thread safe implementation we need to make sure that the class single instance is locked and created only once in a multi-threaded environment.
    Fortunately, C++ 11 comes in our help with two new entities: std::call_once and std::once_flag. Using them with a standard compiler we have the guaranty that our singleton is thread safely and no memory leak.
    Invocations of std::call_once on the same std::once_flag object are serialized.
    Instances of std::once_flag are used with std::call_once to ensure that a particular function is called exactly once, even if multiple threads invoke the call concurrently.
    Instances of std::once_flag are neither CopyConstructible, CopyAssignable, MoveConstructible nor MoveAssignable.

    Here it is my proposal for a singleton thread safe implementation in C++ 11:

    The parameter to getInstance() was added for demo reasons only and should be passed to a new proper constructor. As you can see, I am using a lambda instead normal method.
    This is how I tested my safeSingleton and smartSingleton classes.

    So I create 20 threads and I launch them in parallel (std::thread::join) and each thread accesses getInstance() (with a demo id parameter). Only one of the threads that is trying to create the instance succeeds.
    Additionally, if you’re using a C++11 100% compiler you could also delete the copy constructor and assignment operator. This will allow you to obtain an error while trying to use such deleted members.

    Other comments
    I tested this implementation on a machine with Intel i5 processor (4 cores). If you see some concurrent issues in this implementation please fell free to share here. I am open to other good implementations, too.
    An alternative to this approach is creating the singleton instance of a class in the main thread and pass it to the objects which require it. In case we have many singleton objects this approach is not so nice because the objects discrepancies can be bundled into a single ‘Context’ object which is then passed around where necessary.

    Update: According to Boris’s observation I removed std::mutex instance from safeSingleton class. This is not necessary anymore because std::call_once is enough to have thread safe behavior for this class.

    Update2: According to Ervin and Remus’s observation, in order to make things clear I simplified the implementation version 3 and this is not using std::weak_ptr anymore.

    References:
    just::thread – Anthony Williams – Just Software Solutions Ltd
    C++ and the Perils of Double-Checked Locking by Scott Meyers and Andrei Alexandrescu
    Modern C++ Design: Generic Programming and Design Patterns Applied by Andrei Alexandrescu ( Romanian like me 🙂 )

    Share

    Adventures with _chkstk

    Called by the compiler when you have more than one page of local variables in your function.
    _chkstk Routine is a helper routine for the C compiler. For x86 compilers, _chkstk Routine is called when the local variables exceed 4K bytes; for x64 compilers it is 8K.

    That’s all that you get from _chkstk()’s msdn web page. Nothing more…

    Overview
    A process starts with a fixed stack space. The top of a stack is pointed to by the ESP register (Extended Stack Pointer) and this is a decrementing pointer. Every function calls results in a stack created for the function inside this Process Stack. Every thread function has its own stack. The stack is a downward growing array. When a function starts, the default stack reservation size is 1 MB.
    This is contrasting with the heap’s size whether theoretically increases to a limit of 4 GB on 32bits OS. See more information here.

    Every thread under Windows gets its own block of contiguous memory, and while function calls are made, the stack pointer is increasing and decreasing. In contrast, a different thread within the same process might get a different block of contiguous memory – its own stack. When a context switch occurs, the current thread’s ESP (along with the IP and other registers) are saved in the thread’s context structure, and restored when the thread is activated the next time.
    To specify a different default stack reservation size for all threads and fibers, use the STACKSIZE statement in the module definition (.def) file. To change the initially committed stack space, use the dwStackSize parameter of the CreateThread, CreateRemoteThread, or CreateFiber function.
    Most stack problems occur in overflows of existing stacks, as their sizes are fixed and they cannot be expanded.

    _chkstk() increases the stack when needed by committing some of the pages previously reserved for the stack. If there is no more physical memory available for committed pages, _chkstk fails. When you enter a function (VC++ with the stack checking enabled), it will call the _chkstk located in CHKSTK.ASM. This function does a stack page probing and causes the necessary pages of memory to be allocated using the guard page scheme, if possible. In this function, it is stated that when it encounters _XCPT_GUARD_PAGE_VIOLATION, the OS will attempt to allocate another guarded page and if it encounters _XCPT_UNABLE_TO_GROW_STACK then it’s a stack overflow error. When _XCPT_UNABLE_TO_GROW_STACK is encountered, the stack is not yet set up properly, that is why, that it will not call the catch because calling it will use invalid stack variables which will again cause another exception.

    Case – Too many or too big variables on stack
    As I said on top, the function stack size is 1 MB. If you miss that and you’re trying to define and use internally an array like this:

    When you’ll compile with VC++ compiler in debug mode you will have a big surprise: the application is crashing on _chkstk() in the moment the _chkstk() tries to create new memory page on stack and fails.
    The output window shows next message:
    First-chance exception at 0x004116e7 in testApp.exe: 0xC00000FD: Stack overflow.
    Unhandled exception at 0x004116e7 in testApp.exe: 0xC00000FD: Stack overflow.

    This happens because the 1MB limit is overloaded even on a win32 OS: 4000*200*4 = 3.2MB (approx.).
    Same story if you define many local variables and their stack usage overloads the 1MB limit. Off-course the thread stack size can be changed but think once again if it is really needed to do that.
    If you really need this big array then the best solution to avoid this crash is using the heap.

    Case – Recursive functions
    If you have an infinite recursion then you will gate same stack overflow error and the application crashes in _chkstk.asm. Recursive function is not the subject of this article so I don’t go in deep… Here it is a good example of what happens with recursive functions.
    The solution is to avoid using recursive functions as much as possible and try to implement an iterative function.

    Case – A stack corruption
    I have started looking over _chkstk() function in the moment when I got few bugs with crashes with some similarly details. I had to analyze some .dump files and solve few bugs that contained a call stack with _chkstk() on top.
    Most of the .dump files call stack contained a second similarly thing: the call of a thread function (so called ThreadFoo()) that was running in a threads pool.
    In that moment I started to research why _chkstk() fails and my first track was to debug the stack overflows. I followed a MSDN debugging tutorial and unfortunately I didn’t find something strange. I checked if the local stack variables are not so big in order to fill the ThreadFoo() function’s stack and it did not.
    Then a new study of ThreadFoo() function has followed in order to detect the internal functions calls that can fail in some circumstances. I stopped to some trace function calls and I studied deeply. Those trace functions where defined in an external debug class and each time when a new trace file was added it used an internal buffer (TCHAR szBuff[2048] = _T(“”);).
    The writing of this buffer was done using: swprintf(). As we know this function is unsafe and is not recommended to use. As long as the content of these trace lines was dynamically build (in some cases those line may contain even dynamically build SQL queries that failed) then the length of these trace lines could be higher than 2048 bytes and then guess what: a stack corruption appears! UPS! The stack pointer will be corrupted (the classic stack overflow case).

    So I have implemented and used the next macros:

    Now, if we’re using the safe macro we will have no issues.

    A safety alternative way to that buffer was the heap using but the heap access is not fast as the stack access so I preferred this approach (in a business application every milliseconds matters for the log system).
    After that fixed I met no other stack corruptions in ThreadFoo() and other code areas.

    Even if the top of the call stack was _chkstk() this was not the function that failed. The error appeared because of that stack corruption and _chkstk() has just detected.

    Conclusion
    If your code produces a stack overflow, then you have to rethink your design in right away:

    • If you see _chkstk() on the top of call stack, check if you have no stack corruptions – stack overflow.
    • Don’t try to manipulate the stack by yourself. The default 1MB stack size is basically enough
    • Allocate dynamically if you’re using big arrays
    • If you have recursive functions producing a stack overflow, re-write them using loops (a tip: it is a proven fact that any recursive functions can be programmed non-recursive)

    References
    Set stack size
    Thread Stack Size
    _chkstk Routine
    Stack (data structure)
    Debugging a Stack Overflow – tutorial
    Visual C++ apps crashing in _chkstk() under load
    Optimization Note (C++) 1: push, pop, call _chkstk
    What is Recursion?

    Share

    pre vs. post increment operator – benchmark

    Compiler: Visual C++ 2010
    Operating System: Windows 7 32bits
    Tested machine CPU: Intel core i3
    Download: preVSpost (demo project) (917)

    A recent Visual C++ team’s comment on twitter.com reminded me a hot topic that exists in C++ programming world: there is a long discussion of using pre versus post increment operators, specially, for iterators. Even me I was witness to a discussion like this. The discussion started from a FAQ written by me on www.codexpert.ro.

    The reason of preferring pre increment operators is simple. For each post-increment operator a temporary object is needed.
    Visual C++ STL implementation looks similarly with next code:

    But for pre-increment operator implementation this temporary object is not needed anymore.

    In the discussion that I mentioned above, somebody came with a dummy application and tried to prove that things have changed because of new compilers optimizations (the code exists in the attached file, too). This sample is too simple and far away to the real code. Normally the real code has more code line codes that eat CPU time even if you’re compiling with /O2 settings (is obviously).
    Base on that VC++ team’s tweet related to viva64.com’s research I decided to create my own benchmark base on single and multicore architectures. For those that don’t know Viva64 is a company specialized on Static Code Analysis.
    Starting from their project I extended the tested for other STL containers: std::vector, std::list, std::map, and std::unordered_map (VC++ 2010 hash table implementation).
    For parallel core tests I used Microsoft’s new technology called Parallel Pattern Library.

    1. How the tests were made
    1.1. Code stuff
    In order to get execution time I used same timer as Viva64 team (with few changes). Each container instance was populated with 100000 elements of same random data. An effective computing function was repeated 10 times. Into this function some template functions are called for 300 times. The single core computing function contains loops like this:

    For the parallel core computing the first simple for loop has changed in:

    Where cnt is an instance of combinable class and the sum of partial computed elements is obtained by calling combine() method:

    As you can see, the parallel_for function uses one of the new C++ standard features: a lambda function. This lambda function and the combinable class implements the so called parallel aggregation pattern and helps you to avoid the multithreaded common share resource issues. The code is executed on independent tasks. The reason that this approach is fast is that there is very little need for synchronization operations. Calculating the per-task local results uses no shared variables, and therefore requires no locks. The combine operation is a separate sequential step and also does not require locks.

    1.2. Effective results
    The tests were running on a Intel core i3 machine (4 cores) running Windows 7 on 32bits OS. I tested debug and release mode for single and multi cores computation. The test application was build in VC++ 2010 one of the first C++11 compliant.
    The OX axis represents the execution repeated times, and the OY axis means time in seconds.

    1.2.1. Single core computation
    Debug

    Release

    1.2.2. Multi cores computation
    As you know, multi core programming is the future. For C++ programmers Microsoft propose a very interesting library called Parallel Pattern Library.
    The overall goal is to decompose the problem into independent tasks that do not share data, while providing a sufficient number of tasks to occupy the number of cores available.

    This is how it looks my task manager when the demo application runs in parallel mode.

    Isn’t it nice comparing to a single core use? 🙂

    Debug

    Release

    1.2.3. Speedup
    Speedup is an efficiency performance metric for a parallel algorithm comparing to a serial algorithm.

    Debug

    Release

    Conclusions:
    The biggest differences appear in the debugging area where the pre-increment is “the champion”.
    With primitive types (like int and pointers), the opposite might be true, because of the pipe-lining that a CPU does. With post-increment, due to optimizations in release there is no copy to be returned for these simple types.
    According to these results I have to agree with Viva64 team. Even if the results are so close in release version I keep my opinion that using pre increment operator is preferred instead of post increment operators. We all know how long it takes the debug period and how important is every second that we win in long debugging days.
    If you still have doubts in using pre-increment operator or you need a flexible way of switching this operators in your code you can easily implement some macros like these:

    #define VECTOR_ITERATOR(type, var_iter) std::vector::iterator var_iter;
    #define VECTOR_FOR(vect, var_iter) for (var_iter = vect.begin(); var_iter != vect.end(); ++var_iter)

    Share

    Numeric type conversion to std::string and vice versa

    In our real applications we have to convert from strings to integer or to real variables and vice versa (double/float/int variable to std::string).
    We can realize these conversions using C style CRT function or we can try C++ approach via STL.
    Unfortunately, current C standard libraries do not offer a complete support for any type of conversion. For instance, if we try to put an integer into a C++ string object (std::(w)string) using a well known function itoa() then we get next error:

    // error C2664: ‘itoa’ : cannot convert parameter 2 from ‘std::string’ to ‘char *’

    A C style approach in order to avoid this error means using an intermediary buffer:

    Same story if we try to convert a std::string to an int:

    // error C2664: ‘atoi’ : cannot convert parameter 1 from ‘std::string’ to ‘const char *’

    In this case we can use c_str() in order to return a constant pointer to char.

    An elegant way to get rid of such problems is to build two conversion function that use templates and C++ streams.
    Base on this idea, I created a Sting2Numeric class that contains two static methods: Type2String() and String2Type().

    where BadConvertion is a std::runtime_error‘s derived class.

    Because of ANSI and UNICODE project’s compatibility I defined few macros:

    Because of this compatibility I strongly recommend using a xstring alias instead of std::wstring or std::string.
    When you want to convert an int, float, double, or other numerical type to a xstring in a C++ style you can use the Type2String() function. Vice versa, if you want to convert a xstring to these types you can use String2Type().

    In order to avoid possible thrown exception I recommend to you using a try catch block whenever you’re using these functions. I prefer using xstring for string/wstring variables definition, too.
    Here is a sample of using this class:

    The String2Numeric class can be extended. For instance, if the conversion throw an error then you can add detailed information in the exception message.

    Download String2Numeric (875) class.

    Share

    File size fast detection

    Many times in our job, we need to work with files and need to know file properties.
    One of the most important properties is file size. Of course, there are a lot of API that allows finding this property, but most of them needs additional file operations: open file, find file size and close file.

    A direct and fast way in order to detect the file size without these operations means the CRT run-time library’s function _tstat64() and stuff.

    In the header file or on .cpp file please add next macros definitions:

    Then, write next function:

    If you’re using WinAPI there is an even faster way in order to get file size.

    Finally, call these functions wherever you need.

    Share

    Dynamic popup and drop down menus for custom representations

    Many applications allow dynamic customization for visual objects or data views. For instance, well known Internet Explorer application provides toolbars customization using a popup menu that appears when the user execute right click mouse action in toolbar zone area.

    Internet Explorer sample menu

    Other sample where this kind of menu is very useful is when it’s used in order to customize database data representation in Windows controls like List control or grid control. These kind of applications allow data filtering and show/hide columns using this kind of menu. The user just right click on control header and gets what he need.

    Starting from this idea, I implemented a class CDynamicPopupMenu. This class allows an easy building of this kind of menus. I used if in a demo dialog base application over a list control.

    my demo application

    Internally, this class uses a STL container (std::map) with a data structure used in order to embed items menu properties. When the menu is built, menu’s behavior is implemented using these properties.

    Add new menu item
    The new item add menu method has next definition:

    where:

  • item_id – represents internal item ID; the ID is used for menu customization, too;
  • parent_id – parent item ID used when we define a new items sub-group (a drop-down menu); the attribute value is 0 if menu item is a part of initial menu;
  • is_visible – this flag is used a item is checked / unchecked. In my demo application this flag is set true for all list control’s columns that we want to display. For “Select All” and “Check All” items this flag is false because we want to create new subgroup that contains new columns, but we don’t have “Select All” or “Check All” columns.
  • check_flag – this flag allow check/uncheck menu property;
  • has_child – if is true allows a subgroup definition (a new drop-down menu);
  • item_name – unicode menu item name;
  • enable_flag – defines if the item is enable or disable.
  • Add separator item
    Add separator item method definition looks like this:

    where:

  • item_id – menu item ID;
  • parent_id – parent item ID from the subgroup has started; the attribute value is 0 if menu item is a part of initial menu.
  • Menu add items sample
    In my demo application, in CtestPopupMenuDlg::SetDefaultMapValues(void) method, among other things, you can find next calls:

    Get menu internal data
    In order to access the internal data container (std::map) that stores all dynamic menu items you just can use next method:

    followed by:

    Create and display menu
    Menu creation must be done just after we add all menu items. The menu is displayed only after TrackPopupCustomMenu() call. The definition of this method looks like this:

    where:

  • point – mouse coordinates where the menu start building;
  • hWnd – parrent window handle where the menu is created.
  • Function’s return value is the menu IDs that was selected. If no item was selected the function returns 0.
    In my demo application, menus creation is called on list control right-click method (NM_RCLICK).

    As you can see, I’m calling TrackPopupCostumMenu(), using mouse point property when the user right-click over list control.
    I am saving list control handler, selected item ID and WM_NOTIFY value into a pointer to message notification structure NMHDR. Then I’m passing this pointer to OnNotify() method.
    Using WM_NOTIFY message and OnNotify() method, I inform parent control window that a new event was generated.

    I am calling GetItemCheckedFlag() if order to detect selected item check status (check / uncheck). Then, if item state means check I apply negation over this bool flag and I’m calling SetCheckedItemFlag() method. Finnaly this method produce changes in my control list, depending on my menu command (FillData() method).

    Menu interaction with parent window (list control)
    In my demo application, the interaction between dynamic menu and list control to be treated by FillData() method.
    In order to use CDynamicPopupMenu’s internal container data is need to initialize a DynamicMenuData pointer with GetDynamicMenuData()’s returned value.

    Using that pointer to internal menu data, I iterate over internal container, and for those items that are visible and selected set on true I insert columns in my list control.
    Similarly, when using such menus, the application can apply filters on real data.
    CDynamicPopupMenu class contains other useful methods. This kind of menu can be used in different situations in order to change application’s behavior.

    Download demo application: testPopupMenu (Visual C++ 2005 project)

    Share