Tag Archives: C++ 11

psa.exe partial tree sample

Process Status Analysis – the first steps

I am pleased to announce my first cross-platform and open source project, the Process Status Analysis tool, available on GitHub.

The Process Status Analysis (psa) version 0.2 is available for Windows and Linux (Debian derived / Ubuntu tested) Operating Systems.
Download: psa for Ubuntu Linux x64 (4906 downloads)
Download: psa for Ubuntu Linux x86 (4152 downloads)
Download: psa.exe for Windows x64 (4115 downloads)
Download: psa.exe for Win32 (4083 downloads)

You may wonder why I did it or what it brings new. Well, I did it for fun, in my spare time and I will continue improving it when I’ll find a time to do it.

The project is written in modern C++ using idioms from the C++ 1x standards. Even if initially was done as a C++ for Windows only, during the past days I managed the port of it for Linux using Visual Studio 2017’s project templates and a connection via SSH.
In general, the source code base is similar, differing just by OS specific stuff.

In case you want to find out more about how to develop C++ Linux projects from the best development tool (imho), Visual Studio, you can find more information on Visual Studio development team blog.

Related to this psa project, the Linux version requires libprocps4-dev library in order to build.

The main reason for starting this project was that I wanted to know what’s the total memory amount used by my Chrome browser. I know it uses a lot of resources, but not that much… 🙂

Even if my preferred processes analysis tool, the Process Hacker offers a lot of processes administration possibilities, but it didn’t provide what I want, so I decided to enjoy a bit.

Chrome processes in Process Hacker

Sample – Google Chrome processes in Process Hacker tool

So, what I achieved by psa.exe was something like:

C:\Windows\system32> psa -o chrome
PID [896] chrome.exe 237.5234 MB
PID [1496] chrome.exe 87.6875 MB
PID [2388] chrome.exe 166.5000 MB
PID [5860] chrome.exe 3.1211 MB
PID [7336] chrome.exe 273.2188 MB
PID [8444] chrome.exe 68.0508 MB
PID [8624] chrome.exe 63.3945 MB
PID [9180] chrome.exe 296.9766 MB
PID [10600] chrome.exe 292.6289 MB
PID [12352] chrome.exe 182.5977 MB
PID [13688] chrome.exe 2.3555 MB
PID [14052] chrome.exe 73.1875 MB
PID [16200] chrome.exe 211.9805 MB
PID [17284] chrome.exe 55.7148 MB
PID [18036] chrome.exe 208.6680 MB
PID [19012] chrome.exe 457.2305 MB
PID [19312] chrome.exe 143.4766 MB
-----------------------------------
Total used memory: 2824.31 MB

A bit too much in my humble opinion…
The features this tool offers includes:

Get all processes loaded in memory information

I case you want to have a snapshot of all the processes loaded in the OS’s memory you can have it with.

psa -a

Get process only used memory

With psa.exe you can reach the used memory by a specific parameter -o after the process name or at least a part of its name.

psa -o chrome             // find how much memory uses your Chrome!   o_O

Currently, there is no string replace ‘*’ but it’s ongoing.


Print processes tree

Storing the processes’ data within a generic tree done by me, I took the decision to print the processes’ tree output, similarly there is in Windows with tree.exe tool or on Linux in pstree or even htop.

./psa -t
./psa -t 1034

psa.exe partial tree sample

Process Status Analysis partial tree of Windows process

Top most “expensive” processes

In case you want to see what are the most expensive processes within your operating system, you can achieve it with:

psa -e 20

or simpler psa -e in case you’re sure you want top 10 expensive processes (the default value).

silviu@ubuntu-dev-server:~/projects/psa-lin/bin/x64/Release$ ./psa -e
Top 10 consuming memory processes
-------------------------------------------
PID        Process Name         RAM Usage
-------------------------------------------
[924]    /usr/lib/policykit-1/polkitd   270.68 MB
[906]    /usr/lib/accountsservice/accounts-daemon       269.43 MB
[842]    /usr/sbin/rsyslogd     250.39 MB
[878]    /usr/lib/snapd/snapd   197.29 MB
[531]    /lib/systemd/systemd-timesyncd         97.97 MB
[1614]    sshd: silviu@pts/0    93.16 MB
[1576]    sshd: silviu [priv]   93.16 MB
[863]    /usr/bin/lxcfs         93.13 MB
[427]    /sbin/lvmetad          92.55 MB
[1034]    /usr/sbin/sshd        63.98 MB
-------------------------------------------


C:\Windows\system32> psa -e 10
Top 10 consuming memory processes
-------------------------------------------
PID        Process Name         RAM Usage
-------------------------------------------
[19012]    chrome.exe           435.67 MB
[17684]    chrome.exe           329.51 MB
[7336]    chrome.exe            259.61 MB
[15576]    devenv.exe           248.20 MB
[896]    chrome.exe             222.85 MB
[2388]    chrome.exe            188.21 MB
[18428]    chrome.exe           184.48 MB
[15760]    chrome.exe           173.82 MB
[488]    Dropbox.exe            162.15 MB
[5488]    googledrivesync.exe   161.11 MB
-------------------------------------------
Total used memory: 2365.61 MB

Redirect output to a file

From the standard output the information can be easily redirected to a file.

c:\> psa -t > windows_processes_tree.txt
# ./psa -t > linux_processes_tree.txt        

Kill process feature

This feature was not implemented yet but in case we need it we can be done it easily with the existing tools on the target OS (ex. Task Manager, Process Exporer/Hacker, pskill.exe for Windows or the combination ps + kill on LInux).

Feedbacks and improvements
Any constructive feedback, suggestions, contributions to improvements are appreciated.
Feel free to add any issue you find, wish or suggestion you have in the GitHub repository, the 
Issues section or here as a comment.

The Chameleon Pathnames

The title might be as well “When the pathname is not what it has to be”.

The experience of developing plugins for Adobe Acrobat/Reader reserved me different surprises, that made the task more challenging. One of the biggest surprises I had was the impact of the Adobe’s Cloud idea over the Acrobat’ API within Acrobat products. Their feature idea is to keep all the already opened documents within their Cloud in order to make them available to different devices you’re interacting with.

In my case, having interactions with external non-Adobe’s applications, the things complicated when trying to get the file pathname. This option is coming enabled by default.

This is how the Acrobat.com Cloud looks like within Acrobat products This is how the Acrobat.com Cloud looks like within Acrobat products

Usually, when we are thinking to files path we expect to have something similar to GetFullPathName(). But according to Acrobat SDK’s concept: not every file opened into Acrobat/Reader has to be a local disk file. It may be associated with a stream, a network file, etc.

The reason why I was looking to get the correct file path is that my plugin and others are connecting to a system that expects the local or network file path. So it was needed to find a way to get a usual file path.

The challenge I am talking about has reproduced with an Adobe.com environment activated, having such a file already synchronized in the Acrobat.com cloud by using:

acrobat.com_path_functions

But both API’s functions return proper values with non-Cloud files. With a local filename not already uploaded within Acrobat.com I got the correct file path with both functions.

So the workaround I was thinking invokes the next steps:

1. Get the file path using ASFileSysDIPathFromPath(). In case your project is a Unicode project don’t forget that the returned type is a char* and you’ll need to encode it to the proper Unicode (UTF-8 in my case).

AVDoc avDoc = AVAppGetActiveDoc();
if (NULL == avDoc)  // no doc is loaded
{
  char strErrorMsg[MAX_PATH] = {0};
  strcat_s(strErrorMsg, "There is no opened file.");
  AVAlertNote(strErrorMsg);
  return;
}
PDDoc pdDoc = AVDocGetPDDoc(doc);
ASFile fileinfo = PDDocGetFile(pdDoc);
ASFileSys fileSys = ASFileGetFileSys(fileinfo);
ASPathName pathname = ASFileAcquirePathName(fileinfo);
char* szFilePath = ASFileSysDIPathFromPath(fileSys, pathname, pathname);
// this declaration it's just for sample
// szFilePath = "/Acrobat.com/
// 89323c47-676e-44a3-9fc3-cd18fe57bc91/3ac312be-c13d-4a00-bd57-b454a52019e3/myFile.pdf"
char* szAnsiPath = ASFileSysDisplayStringFromPath(fileSys, pathname);
auto strTmpPath = std::vector<wchar_t>{0};  // get the length of the path
int result =
    MultiByteToWideChar(CP_UTF8, NULL, szAnsiPath, strlen(szAnsiPath), NULL, 0);
if (result > 0) {
  strTmpPath.resize(result + 1);  // encode into UTF-8 the array
  result = MultiByteToWideChar(CP_UTF8, NULL, szAnsiPath, strlen(szAnsiPath),
                               &strTmpPath[0], result);
  strTmpPath[result] = '\0';
  if (result == 0)
    TRACE_ERROR(
        _T("The transformation ANSI to UNICODE has failed. Error code: %d"),
        GetLastError());
}
tstring sFilePath(&strTmpPath[0]);  // continue by using the sFilePath

2. Check if it is a cloud path (starts with Acrobat.com:).

if (ptr4Cloud && ptr4Cloud->checkCloudPath(sFilePath)) {
  tstring newFileGeneratedPath;
  ptr4Cloud->SaveFileOnDisk(
      pdDoc, sFilePath,
      newFileGeneratedPath);  // use the newFileGeneratedPath as expected 
}

where

bool FooCloud::checkCloudPath(const ustring& strCloudPath) {
  if (m_cloudPrefix.length() > strCloudPath.length()) {
    return false;
  }
  return (std::mismatch(m_cloudPrefix.begin(), m_cloudPrefix.end(),
    strCloudPath.begin())
    .first == m_cloudPrefix.end());
}

3. Save the file content into a temporary file (ex. C:\Users\Silviu.Ardelean\AppData\Local\Temp\)

void FooCloud::SaveFileOnDisk(PDDoc pdDoc,
                              const tstring& strCloudPath,
                              tstring& newFilePath) {
  DWORD dwRetVal = 0;
  TCHAR lpTempPathBuffer[MAX_PATH + 1] = {0};
  tstring fileName = getFileName(strCloudPath);
  dwRetVal = GetTempPath(MAX_PATH, lpTempPathBuffer);
  if (dwRetVal != 0)
    newFilePath = lpTempPathBuffer;
  newFilePath += (!fileName.empty() && dwRetVal != 0)
                     ? fileName
                     : _T("YourFile.pdf");  // create the effective file
  ASFileSys fileSysX = ASGetDefaultFileSys();
  ASPathName new_path =
      ASFileSysCreatePathName(fileSysX, ASAtomFromString("Cstring"),
                              (const void*)CW2A(newFilePath.c_str()), NULL);
  PDDocCopyParams saveParams =
      (PDDocCopyParams)ASmalloc(sizeof(PDDocCopyParamsRec));
  saveParams->size = sizeof(PDDocCopyParamsRec);
  saveParams->newPath = new_path;
  saveParams->fileSys = fileSysX;
  saveParams->cancelProc = NULL;
  saveParams->cancelProcData = NULL;
  saveParams->progMon = NULL;
  saveParams->progMonData = NULL;
  saveParams->saveChanges = false;
  PDDocCopyToFile(pdDoc, saveParams);
  m_listFiles.push_back(newFilePath);
  ASfree(saveParams);
  ASFileSysReleasePath(fileSysX, new_path);
}

where

tstring FooCloud::getFileName(const tstring& strCloudPath) {
  return (strCloudPath.length() < m_cloudPrefix.length())
    ? _T("")
    : strCloudPath.substr(m_cloudPrefix.length());
}

4. Provided the temporary file path to the proxy module that expects it to interact with my system.

5. Clean/delete the temporary files on plugin uploading – PluginUnload() callback.

Additional comments
In case your plugins will interact with external non-Adobe’s application most probably you’ll have to do different tricks. Because of the way the Acrobat SDK is designed, without direct support for wchar_t and std::wstring you will need to make different conversions and encoding/decoding (ex. the ATL macros CW2A, CA2W(), functions such MultiByteToWideChar() on Windows, etc).

If you don’t have to interact with external non-Adobe’s applications be confident with Acrobat’s SDK types and data structures. In this way, you’ll avoid such conversions.

Getting Table’s indexes experiences – workaround

Trying to get table indexes information in SQL Server 2012 I identified a strange situation within a specific method that I was using so long but it was not acting as expected in one situation.

The way of getting indexes information using the ODBC C API into that old and inherited method looks like:

nRetCode = ::SQLStatistics(hstmtAux,
                                    NULL,
                                    0,
                                    NULL,
                                    0,
                                    (TCHAR*)(LPCTSTR)strTempTable,
                                    SQL_NTS,
                                    SQL_INDEX_ALL,
                                    SQL_ENSURE);
if (nRetCode == SQL_SUCCESS || nRetCode == SQL_SUCCESS_WITH_INFO) {
  nRetCode = ::SQLBindCol(hstmtAux, 4, SQL_C_SHORT, &swNonUnique, sizeof(SWORD),
                          &cbNonUnique);
  nRetCode = ::SQLBindCol(hstmtAux, 5, SQL_CHAR, szIdxQualif,
                          sizeof(CHAR) * 130, &cbIdxQualif);
  nRetCode = ::SQLBindCol(hstmtAux, 6, SQL_C_CHAR, szIdxName,
                          sizeof(CHAR) * 130, &cbIdxName);
  nRetCode =
      ::SQLBindCol(hstmtAux, 7, SQL_C_SHORT, &swType, sizeof(SWORD), &cbType);
  nRetCode = ::SQLBindCol(hstmtAux, 8, SQL_C_SHORT, &swSeqInIdx, sizeof(SWORD),
                          &cbSeqInIdx);
  nRetCode = ::SQLBindCol(hstmtAux, 9, SQL_C_CHAR, szIdxColName,
                          sizeof(CHAR) * 130, &cbIdxColName);

  while (bNoFetch || nRetCode == SQL_SUCCESS_WITH_INFO ||
         (nRetCode = ::SQLExtendedFetch(hstmtAux, SQL_FETCH_NEXT, 1, &crow,
                                        &rgfRowStatus)) == SQL_SUCCESS) {
    if (cbIdxName != SQL_NULL_DATA &&
        _tcslen((TCHAR)szIdxName) < 0) {  
        // rest of the code 
    } 
  } // rest of the code 
}

Usually, I got the right information about indexes but in one situation I encounter a strange behavior. It’s about having a clustered index into a scenario. I have a table that contains two indexes referenced to some fields: IndexField_1 and IndexField_3 mapped over int, NULL fields. When IndexField_1 is Non-Unique, Non-Clustered and IndexField_3 is Clustered index I get the right information.
But if the index IndexField_1 is Clustered and the IndexField_3 is Non-Unique, Non-Clustered I get no information about IndexField_1 index (eg. szIdxName and szIdxColName are “” and their length is -1 that means SQL_NULL_DATA). Within while loop, with the next iteration, I get correct information about the second index IndexField_3.

Because SQLExtendedFetch() is deprecated I tried using SQLFetchScroll() but the behavior is the same from my interest point of view.

I was not sure whether the problem is with SQLStatistics, the bindings or SQLFetchScroll (they all always return SQL_SUCCESS). It looks such a problem with the driver when the first index is clustered.
According to SQLStatistics documentation if my swType parameter is SQL_TABLE_STAT I have no information for index or field. But for this scenario, I had no indexes of combined fields.
For the good scenario I observed that my while loop had 3 iterations including one of having swType = SQL_TABLE_STAT without information in szIdxName. But for the bad scenario, the loop had only 2 iterations. So it looks like SQLExtendedFetch() is not getting the last one index.

After some googling and research without very significant solutions, I decided to apply a workaround by avoiding the old API and I rewrite my method.

So, in order to get table indexes information, I have chosen a direct SQL query into SYS tables: sys.tables, sys.indexes, sys.schema.

SELECT DISTINCT I.[name] AS IndexName, I.is_unique AS IsUnique,
    I.is_primary_key AS IsPrimaryKey, I.type AS IndexType,
    I.type_desc AS IndexDesc,
    T.[object_id] AS ObjectID FROM sys.tables AS T INNER JOIN
        sys.indexes AS I ON T.[object_id] = I.[object_id]and T.name =
        'myTABLE' and
        I.type_desc<> 'HEAP' INNER JOIN sys.schemas AS S ON T.schema_id =
            S.schema_id and s.name = 'myTABLE'

Because I preferred getting also information about the index’s composed fields, I applied a second additional SQL query:

SELECT AC.Name as ColumnName FROM sys.tables as T INNER JOIN
    sys.indexes as I on T.[object_id] =
    I.[object_id] INNER JOIN sys.index_columns as IC on IC.[object_id] =
        I.[object_id] and IC.[index_id] =
            I.[index_id] INNER JOIN sys.all_columns as AC on IC.[object_id] =
                AC.[object_id] and IC.[column_id] =
                    AC.[column_id] WHERE I.object_id =
                        'NumericObjectID' and T.name = 'TableName' and I.name =
                            'IndexName' order by T.name,
                        I.name

and I have collected data into a container of defined structure according to my SQL Indexes interest information:

struct SQL_INDEX {
  CString index_name;
  bool is_unique;
  long object_id;
  bool is_primary;
  short index_type;
  CString index_desc;
  std::vector vectColumns;
};

The last member vectColumns stores information about the columns that are used for a specific index.

Finally, the new method that collects table indexes information looks like:

// collect table indexes information
void CFoo::GetIndexesInformation(const CString& strSchema,
                                 const CString& strTable,
                                 std::vector& vectKeys,
                                 CDatabase* pDBdata) {
  HSTMT hstmt = SQL_NULL_HSTMT;
  SQLRETURN nRetCode = 0;
  CString sqlQuery;

  sqlQuery.Format(
      _T("SELECT DISTINCT I.[name] as IndexName, I.is_unique as IsUnique, I.is_primary_key as IsPrimaryKey, I.type as IndexType, I.type_desc as IndexDesc, T.[object_id] as ObjectID \
FROM sys.tables as T \
INNER JOIN sys.indexes as I on T.[object_id] = I.[object_id] and T.name = '%s' and I.type_desc <> 'HEAP' \
INNER JOIN sys.schemas as S on T.schema_id = S.schema_id and s.name = '%s'"),
      strTable, strSchema);

  if ((nRetCode = ::SQLAllocStmt(pDBdata->m_hdbc, &hstmt)) == SQL_SUCCESS) {
    if (SQL_NULL_HSTMT != hstmt) {
      pDBdata->OnSetOptions(hstmt);

      nRetCode = ::SQLExecDirect(hstmt, (CHAR*)(LPCTSTR)sqlQuery,
                                 sqlQuery.GetLength());
      if (SQL_SUCCESS == nRetCode) {
        CHAR buffer[128] = {0};
        SQLLEN iLen = 0;
        short int val = 0;
        long obj_id = 0;

        while (::SQLFetch(hstmt) == SQL_SUCCESS) {
          SQL_INDEX ob;

          if (::SQLGetData(hstmt, 1, SQL_C_CHAR, buffer, 128, &iLen) ==
              SQL_SUCCESS)
            ob.index_name = buffer;

          if (::SQLGetData(hstmt, 2, SQL_C_SHORT, &val, sizeof(short int),
                           &iLen) == SQL_SUCCESS)
            ob.is_unique = (1 == val);

          if (::SQLGetData(hstmt, 3, SQL_C_SHORT, &val, sizeof(short int),
                           &iLen) == SQL_SUCCESS)
            ob.is_primary = (1 == val);

          if (::SQLGetData(hstmt, 4, SQL_C_SHORT, &val, sizeof(short int),
                           &iLen) == SQL_SUCCESS)
            ob.index_type = val;

          if (::SQLGetData(hstmt, 5, SQL_C_CHAR, buffer, 128, &iLen) ==
              SQL_SUCCESS)
            ob.index_desc = buffer;

          if (::SQLGetData(hstmt, 6, SQL_C_LONG, &obj_id, sizeof(long),
                           &iLen) == SQL_SUCCESS)
            ob.object_id = obj_id;

          vectKeys.push_back(ob);
        }
      }
    }
  }

  // collect index’s columns/fields information
  for (auto it = vectKeys.begin(); it != vectKeys.end(); ++it) {
    HSTMT hstmt2 = SQL_NULL_HSTMT;
    if ((nRetCode = ::SQLAllocStmt(pDBdata->m_hdbc, &hstmt2)) == SQL_SUCCESS) {
      if (hstmt2 != SQL_NULL_HSTMT) {
        pDBdata->OnSetOptions(hstmt2);

        CString sSQL;
        SQLLEN iLen = 0;

        sSQL.Format(_T("SELECT AC.Name as ColumnName \
FROM sys.tables as T inner join sys.indexes as I on T.[object_id] = I.[object_id] \
INNER JOIN sys.index_columns as IC on IC.[object_id] = I.[object_id] and IC.[index_id] = I.[index_id] \
INNER JOIN sys.all_columns as AC on IC.[object_id] = AC.[object_id] and IC.[column_id] = AC.[column_id] \
WHERE I.object_id = %ld and T.name = '%s' and I.name = '%s' order by T.name, I.name"),
                    it->object_id, strTable, it->index_name);

        nRetCode =
            ::SQLExecDirect(hstmt2, (UCHAR*)(LPCTSTR)sSQL, sSQL.GetLength());
        if (SQL_SUCCESS == nRetCode) {
          CHAR buffer[128] = {0};
          while (::SQLFetch(hstmt2) == SQL_SUCCESS) {
            if (::SQLGetData(hstmt2, 1, SQL_C_CHAR, buffer, 128, &iLen) ==
                SQL_SUCCESS) {
              it->vectColumns.push_back(buffer);
            }
          }
        }
      }
    }
  }
}

In this way I have complete information about the indexes of my tables.

std::vector vectIndexesSQL;
pFoo->GetIndexesInformation(strSchema, pDBTable->strTblName, vectIndexesSQL, pDataDB);

Conclusion: When the C/C++ API doesn’t give you any hopes don’t forget that SQL saves you.

Several C++ singleton implementations

This article offers some insight into singleton design-pattern.
The singleton pattern is a design pattern used to implement the mathematical concept of a singleton, by restricting the instantiation of a class to one object. The GoF book describes the singleton as: “Ensure a class only has one instance, and provide a global point of access to it.
The Singleton design pattern is not as simple as it appears at a first look and this is proven by the abundance of Singleton discussions and implementations. That’s way I’m trying to figure a few implementations, some base on C++ 11 features (smart pointers and locking primitives as mutexs). I am starting from, maybe, the most basic singleton implementation trying to figure different weaknesses and tried to add gradually better implementations.
The basic idea of a singleton class implies using a static private instance, a private constructor and an interface method that returns the static instance.

Version 1
Maybe, the most common and simpler approach looks like this:

class simpleSingleton {
  simpleSingleton();
  static simpleSingleton* _pInstance;

public:
  ~simpleSingleton() {}
  static simpleSingleton* getInstance() {
    if (!_pInstance) {
      _pInstance = new simpleSingleton();
    }
    return _pInstance;
  }
  void demo() {
    std::cout << "simple singleton # next - your code ..." << std::endl;
  }
};

simpleSingleton* simpleSingleton::_pInstance = nullptr;

Unfortunately this approach has many issues. Even if the default constructor is private, because the copy constructor and the assignment operator are not defined as private the compiler generates them and the next calls are valid:

// Version 1
simpleSingleton * p = simpleSingleton::getInstance(); // cache instance pointer p->demo();

// Version 2
simpleSingleton::getInstance()->demo();

simpleSingleton ob2(*p); // copy constructor
ob2.demo();

simpleSingleton ob3 = ob2; // copy constructor
ob2.demo();

So we have to define the copy constructor and the assignment operator having private visibility.

Version 2 – Scott Meyers version
Scott Meyers in his Effective C++ book adds a slightly improved version and in the getInstance() method returns a reference instead of a pointer. So the pointer final deleting problem disappears.
One advantage of this solution is that the function-static object is initialized when the control flow is first passing its definition.

class otherSingleton {
  static otherSingleton* pInstance;

  otherSingleton();

  otherSingleton(const otherSingleton& rs) { pInstance = rs.pInstance; }

  otherSingleton& operator=(const otherSingleton& rs) {
    if (this != &rs) {
      pInstance = rs.pInstance;
    }

    return *this;
  }

  ~otherSingleton();

 public:
  static otherSingleton& getInstance() {
    static otherSingleton theInstance;
    pInstance = &theInstance;

    return *pInstance;
  }

  void demo() {
    std::cout << "other singleton # next - your code ..." << std::endl;
  }
};

otherSingleton * otherSingleton::pInstance = nullptr;

The destructor is private in order to prevent clients that hold a pointer to the Singleton object from deleting it accidentally. So, this time a copy object creation is not allowed:

otherSingleton ob = *p;
ob.demo();


error C2248: otherSingleton::otherSingleton ' : cannot access private member declared in class 'otherSingleton'
error C2248: 'otherSingleton::~otherSingleton' : cannot access private member declared in class 'otherSingleton'

but we can still use:

// Version 1
otherSingleton *p = &otherSingleton::getInstance(); // cache instance pointer p->demo();
// Version 2
otherSingleton::getInstance().demo();

This singleton implementation was not thread-safe until the C++ 11 standard. In C++11 the thread-safety initialization and destruction is enforced in the standard.

If you’re sure that your compiler is 100% C++11 compliant then this approach is thread-safe. If you’re not such sure, please use the approach version 4.

Multi-threaded environment
Both implementations are fine in a single-threaded application but in the multi-threaded world things are not as simple as they look. Raymond Chen explains here why C++ statics are not thread safe by default and this behavior is required by the C++ 99 standard.
The shared global resource and normally it is open for race conditions and threading issues. So, the singleton object is not immune to this issue.
Let’s imagine the next situation in a multithreaded application:

static simpleSingleton* getInstance() {
  if (!pInstance)  // 1
  {
    pInstance = new simpleSingleton();  // 2
  }

  return pInstance;  // 3
}

At the very first access a thread call getInstance() and pInstance is null. The thread reaches the second line (2) and is ready to invoke the new operator. It might just happen that the OS scheduler unwittingly interrupts the first thread at this point and passes control to the other thread.
That thread follows the same steps: calls the new operator, assigns pInstance in place, and gets away with it.
After that the first thread resumes, it continues the execution of line 2, so it reassigns pInstance and gets away with it, too.
So now we have two singleton objects instead of one, and one of them will leak for sure. Each thread holds a distinct instance.

An improvement to this situation might be a thread locking mechanism and we have it in the new C++ standard C++ 11. So we don’t need using POSIX or OS threading stuff and now locking getInstance() from Meyers’s implementation looks like:

static otherSingleton& getInstance() {
  std::lock_guard lock(_mutex);
  static otherSingleton theInstance;
  pInstance = &theInstance;
  return *pInstance;
}

The constructor of class std::lock_guard (C++11) locks the mutex, and its destructor unlocks the mutex. While _mutex is locked, other threads that try to lock the same mutex are blocked.
But in this implementation we’re paying for synchronization overhead for each getInstance() call and this is not what we need. Each access of the singleton requires the acquisition of a lock, but in reality we need a lock only when initializing pInstance. If pInstance is called n times during the course of a program run, we need the lock only for the first time.
Writing a C++ singleton 100% thread safe implementation it’s not as simple as it appears as long as for many years C++ had no threading standard support. In order to implement a thread-safe singleton we have to apply the double-checked locking (DCLP) pattern.
The pattern consists of checking before entering the synchronized code, and then check the condition again.
So the first singleton implementation would be rewritten using a temporary object:

static simpleSingleton* getInstance() {
  if (!pInstance) {
    std::lock_guard lock(_mutex);

    if (!pInstance) {
      simpleSingleton* temp = new simpleSingleton;
      pInstance = temp;
    }
  }

  return pInstance;
}

This pattern involves testing pInstance for nullness before trying to acquire a lock and only if the test succeeds the lock is acquired and after that, the test is performed again. The second test is needed for avoiding race conditions in case other thread happens to initialize pInstance between the time pInstance was tested and the time the lock was acquired.
Theoretically, this pattern is correct, but in practice is not always true, especially in multiprocessor environments.
Due to this rearranging of writes, the memory as seen by one processor at a time might look as if the operations are not performed in the correct order by another processor. In our case, the assignment to pInstance performed by a processor might occur before the Singleton object has been fully initialized.
After the first call of getInstance() the implementation with pointers (non-smart) needs pointer to that instance in order to avoid memory leaks.

Version 3 – Singleton with smart pointers
Until C++ 11, the C++ standard didn’t have a threading model and developers needed to use external threading APIs (POSIX or OS dependent primitives). But finally C++ 11 standard has threading support.
Unfortunately, the first C++ new standard implementation in Visual C++ 2010 is incomplete and threading support is available only starting with beta version of VS 2011 or the VS 2012 release preview version.

class smartSingleton {
 private:
  static std::mutex _mutex;

  smartSingleton();
  smartSingleton(const smartSingleton& rs);
  smartSingleton& operator=(const smartSingleton& rs);

 public:
  ~smartSingleton();

  static std::shared_ptr& getInstance() {
    static std::shared_ptr instance = nullptr;

    if (!instance) {
      std::lock_guard lock(_mutex);

      if (!instance) {
        instance.reset(new smartSingleton());
      }
    }

    return instance;
  }

  void demo() {
    std::cout << "smart pointers # next - your code ..." << std::endl;
  }
};

As we know, in C++ by default the class members are private. So, our default constructor is private too. I added here in order to avoid misunderstanding and explicitly adding to public / protected.
Finally, feel free to use your special instance (singleton):

// Version 1
std::shared_ptr p = smartSingleton::getInstance(); // cache instance pointer
p->demo();

// Version 2
std::weak_ptr pw = smartSingleton::getInstance(); // cache instance pointer
pw.lock()->demo();

// Version 3
smartSingleton::getInstance()->demo();

And no memory leaks emotion… 🙂
Multiple threads can simultaneously read and write different std::shared_ptr objects, even when the objects are copies that share ownership.
But even this implementation using double checking pattern but is not optimal to double check each time.


Version 4 – Thread safe singleton C++ 11
To have a thread safe implementation we need to make sure that the class single instance is locked and created only once in a multi-threaded environment.
Fortunately, C++ 11 comes in our help with two new entities: std::call_once and std::once_flag. Using them with a standard compiler we have the guaranty that our singleton is thread safely and no memory leak.
Invocations of std::call_once on the same std::once_flag object are serialized.
Instances of std::once_flag are used with std::call_once to ensure that a particular function is called exactly once, even if multiple threads invoke the call concurrently.
Instances of std::once_flag are neither CopyConstructible, CopyAssignable, MoveConstructible nor MoveAssignable.

Here it is my proposal for a singleton thread safe implementation in C++ 11:

class safeSingleton {
  static std::shared_ptr<safeSingleton> instance_;
  static std::once_flag only_one;

  safeSingleton(int id) {
    std::cout << "safeSingleton::Singleton()" << id << std::endl;
  }

  safeSingleton(const safeSingleton& rs) { instance_ = rs.instance_; }

  safeSingleton& operator=(const safeSingleton& rs) {
    if (this != &rs) {
      instance_ = rs.instance_;
    }

    return *this;
  }

 public:
  ~safeSingleton() { std::cout << "Singleton::~Singleton" << std::endl; }

  static safeSingleton& getInstance(int id) {
    std::call_once(
        safeSingleton::only_one,
        [](int idx) {
          safeSingleton::instance_.reset(new safeSingleton(idx));

          std::cout << "safeSingleton::create_singleton_() | thread id " + idx
                    << std::endl;
        },
        id);

    return *safeSingleton::instance_;
  }

  void demo(int id) {
    std::cout << "demo stuff from thread id " << id << std::endl;
  }
};

std::once_flag safeSingleton::only_one;
std::shared_ptr<safeSingleton> safeSingleton::instance_ = nullptr;

The parameter to getInstance() was added for demo reasons only and should be passed to a new proper constructor. As you can see, I am using a lambda instead normal method.
This is how I tested my safeSingleton and smartSingleton classes.

std::vector v;
int num = 20;

for (int n = 0; n < num; ++n) {
  v.push_back(
      std::thread([](int id) { safeSingleton::getInstance(id).demo(id); }, n));
}

std::for_each(v.begin(), v.end(), std::mem_fn(&std::thread::join));

// Version 1
std::shared_ptr<smartSingleton> p = smartSingleton::getInstance(
    1);  // cache instance pointer p->demo("demo 1");

// Version 2
std::weak_ptr<smartSingleton> pw = smartSingleton::getInstance(2);  // cache instance pointer
pw.lock()->demo(2);

// Version 3
smartSingleton::getInstance(3)->demo(3);

So I create 20 threads and I launch them in parallel (std::thread::join) and each thread accesses getInstance() (with a demo id parameter). Only one of the threads that is trying to create the instance succeeds.
Additionally, if you’re using a C++11 100% compiler you could also delete the copy constructor and assignment operator. This will allow you to obtain an error while trying to use such deleted members.

Other comments
I tested this implementation on a machine with Intel i5 processor (4 cores). If you see some concurrent issues in this implementation please fell free to share here. I am open to other good implementations, too.
An alternative to this approach is creating the singleton instance of a class in the main thread and pass it to the objects which require it. In case we have many singleton objects this approach is not so nice because the objects discrepancies can be bundled into a single ‘Context’ object which is then passed around where necessary.

Update: According to Boris’s observation I removed std::mutex instance from safeSingleton class. This is not necessary anymore because std::call_once is enough to have thread safe behavior for this class.

Update2: According to Ervin and Remus’s observation, in order to make things clear I simplified the implementation version 3 and this is not using std::weak_ptr anymore.

References:
just::thread – Anthony Williams – Just Software Solutions Ltd
C++ and the Perils of Double-Checked Locking by Scott Meyers and Andrei Alexandrescu
Modern C++ Design: Generic Programming and Design Patterns Applied by Andrei Alexandrescu