.comment-link {margin-left:.6em;}

Esdee

Sunday, September 25, 2011

linux threads and forking (and zeroc ice)

so we have this nice icegrid setup with python nodes tied with sip to underlying c++ code. on new rpc call, ice creates a new thread to handle it, but as the underlying c++ code is not threadsafe we fork from this thread and continue execution in a new process.

everything works perfectly, but one day we decide to remove the python layer and go with c++ all the way

and then something weird started happening - forked processes started hanging, crashing, a total mess...after some research we found out several articles mentioning that it is a bad idea to mix linux threads and forks because of possible copied locked mutexes in the child process and this is exactly what we had observed.

so how we solve this? and why the python+cpp solution worked fine?

as a second solution we modified our code to use a thin python wrapper again over the main c++ functionality in attempt to copy the behavior from the original solution, but again the forked child processes had copied locked mutexes that caused them to hang

so, how come? in the original solution A the underlying code is pretty much the same as B and C, what appears is that some ice threads are calling localtime_r in the moment when we fork and when the forked code tries to execure localtime_r it locks.
but why does it not happen in the original python/cpp solution? why does it happen in the new python/cpp solution? are there some python flags to help avoid this? the original code would spend some time jumping from python to cpp, while the new code would only enter cpp code once and return the results.

in any case it seems a mess so we went for another completely different solution, but the headache was/is huge

Tuesday, June 21, 2011

c++: references vs. pointers

there is some campaign at my company right now. its about replacing/using references in new code as opposed to pointers because this is ....modern; right.

i mean, i understand all the benefits of using shared pointers, smart pointers, boost utils, but please do not do it on mandatory basis; don't insist it is applied in every case and in every line of code, using references in the hands of an inexperienced programmer can raise even bigger problems than a bare naked 'C' pointer memory leak or such.

two very common bugs i've encountered after reviewing some code that followed the new 'guidelines':


A& func1()
{
A* a = new A;
return *a;
}


don't you think this is ugly in the least? not to mention, that nobody knows where 'a' is dealocated, if ever

another example (this one is sweet as honey):

B& func2(const B& bIn)
{
B b = bIn;
return b;
}


it took me quite a few moments actually to realize that this method is returning a reference to a local object. no wonder then that the all of the sudden the returned object behaved strangely and had inconsistent data, as the author developer complained

fun, fun, fun...

anyway...all is good, references are fine and useful, but please don't say that using pointers is old fashioned, don't want to get started again...

:D

good night!

Saturday, April 30, 2011

using apache qpid with persistence

that's just a quick post about two weeks struggle with a problem which was eventually solved in half an hour.

as at my company we (i) decided to use apache QPID as a message queue framework, we certainly needed to use persistance for a queues and messages. building qpid itself was really easy and straight forward, and for building the persistance module msgstore.so i was following the directions posted at Lahiru Gunathilake's Blog here. i build the whole setup on my local machine and everything was working just perfectly - messages were sent persitently, queues were durable etc.

then i started deploying on the dev servers

the qpid broker started crashing on startup.
the qpid broker started crashing on startup.
the qpid broker started crashing on startup.

and on and on and on

it was easy to detect that the problem appeared when the broker was started with --load-module=msgstore.so and that it seg faulted when attempting to create a berkeley db database. but why??
well...on my local machine i am running fedora, and the dev server rules with CentOs.
my fedora has berkeley db 4.8 and the server has 4.5. this was the first 'ding'.

now, how to trick the qpid configure into using db4.8 instead of db4.5? we couldn't just install the new libs as there might have been incompatibilities with the product, so what to do? we tried modifying configure scripts, playing with sym links etc - i myself don't have much experience with linux so was relying mostly on the admin guy, but he was helpless with this too - seg fault after segfaul, while on my machine the broker hummed silently, transfering persistant messages and maintaining durable queues.

then after a loooot of reading we came to an obvious solution - the qpid configure itself gave it too us, and hadn't i been too shortsighted, this would have been done days ago -

just build berkeley 4.8 with --perfix to place it in a separate directory (say $BDB48), away from the db4.5 libs the product needs, and then, before running qpid configure, run these

export CPP_PATH=$BDB48/include
export LIB_PATH=$BDB48/libs


(i'm writing these by memory, check qpid's configure script with --help to see the correct ones)

after that run configure and it will find and use the berkeley libs you wanted it too!

awesome!
now lets do the same in production where we have RedHad 5. BOOM! here goes the so familiar seg fault that we managed to escape by the clever export trick. why? what is so different on redhat???
another week of research, testing, breaking, hair-tearing followed, this time with no result.
i was even contemplating to run a virtual fedora machine on the redhat server, just and so to have the familiar setup and start the broker with persitancy there.

luckily i didn't have to -
one day i sat with the it manager to explain him the situation and while we were scratching our heads we looked at persistance module's readme.txt where it said that it was tested with berkeley db 4.3.
so what? we have db4.8 and it should be fine, right?
what if we gave it another try, but not with the 4.8, but 4.3 set-up the same way?
this took about10 minutes to setup and when i hit 'enter' for qpidd --load-module=msgstore.so, before my eyes was the beautiful log dump, saying that the module was loaded and so on and so on ....

aaah....rtfm? the thing is we actually tested db4.3 when facing the initial problem of setting up the dev servers. this failed somehow, and so we didn't consider this when fighting with production setup

anyways - this post grew as long as the first two episodes of "game of thrones" that i watched today :D

have a good night and don't give up the fight!

giving back

i haven't been posting new articles, and even logging-in to this blog for probably years, but today i sat with the idea to share some more recent experience and noticed there were comments and "thanks", waiting to be approved (some sitting in for more than a year)...useless to say it warmed my heart that something i posted with the idea to be a dev diary read only by my self, actually turned out to be helpful for someone else out there, and that i incidentally have been giving my share to the greatest invention of our era - the Internet, the social network, the virtual community of knowledge and common interests, "The Void Which Binds"

alright now - on to the next problem :D

Friday, October 30, 2009

OSSP mm - Shared Memory Allocation

i played a few days with the mm shared memory lib. i must say i liked it!
it is easy to understand and easy to implement into existing code.
of course the lack of new()/delete() is somewhat limiting, but not that difficult to overcome in most cases.
alternatively, on our project we were looking into boost:interprocess, but i was kinda reluctant to use it, mostly because of the syntax (btw i found a boost/ace alternative at POCO, the users and designers say it provides boost/ace-a-like functionality, even if it cannot compare in perfomance, but the best gain is the C#-oriented syntax; probably worth it to examine).

there was one really weird problem with mm though, when i got the impression that mm_lock() wasnt working.
the workflow was a follows:
we have a hash located on the shared memory segment and before i get() from it i mm_lock it and subsequently mm_unlock() it. still i noticed that many times same value was being written to the hash, which was unexpected by design.
i fired our test up to create a few working processes and breaked in each one before a get(). the locks seemed to work, but after a put() suddenly the lock seemed to vanish in the haze?! digging deeper i found out that in put() we of course try to allocad some new memory in the shared mem segment and traceing down into mm_malloc i found out it was trying to lock too, and when done - calling mm_unlock()! piece of bad design or weird side-effect? either way, apart from that my impressions with OSSP mm were great. however it turned out that the boost guys go a few more tricks up in their sleeves, especially the name persisted storage seemed a very attractive idea for our purposes and we eventually decided to use the boost approach.
so...moving on and up again :)

Monday, October 19, 2009

random performace thoughts

just wanted to lay things out so i can see them more clearly. if it doesn't make sense to you, dear readers, please excuse me :)

so, we have many hashes (maps, containing cached data), but due to our implementation with fork() all the data is lost once a particular task is done. i had the idea of storing all those in a berkeley db so that every new forked task gets it from there and doesn't have to fill the hash by itself, but then - what is a task gonna read from the db when it doesn't know what it is gonna need beforehand. we cannot read everything as the data is HUGE and we cant read bit by bit as this slows down performance a lot - we had a test case like this: we tried using memcached as a remote cache, but this slowed us down even if we read from it only once into a local hash and all next calls where local (quick); if we hadn't read into local hash first, the slowdown would be even more devastating.

one possible solution is for the main process to reread all from the stored data and all children would get it via fork/copy-on-write and then put any changes/additions back into the db for the parent to read when a new task requests start.

i have another one around - when if we realise what we need from the cache much earlier than when we need it? we could then start a thread to pre-load the expected data while the main thread goes on processing and when it reached the point when the hash is needed - voila! its all up there! in case there are gaps they can be filled runtime, should be a small percentage of the overall effort

i'm working on a shared memory solution, but all these got me into doubt about the performance hit, similar to hitting the memcached everytime (by the way, i found out about OSSP mm shared memory allocation project at http://www.ossp.org/pkg/lib/mm/, and it looks really good. i dont know how i missed it earlier, guess i was looking for a std-kind-of allocator and got ptu back by some messages that boost had given up on the idea. another btw - is the new std with C++x0 gonna include something like this? from what i read it does include some fancy new stuff related to multithreaded and multicore programming which is awesome. would be great once we get our hands on it)

so where was i....aaah...back to the drawing board, time for another brainstoriming session...tomorrow :D

Wednesday, February 11, 2009

More on Berkeley DB

alright! time passes on and one gets wiser (or feels sillier and has more questions to ask).

the project i'm working on is written in python with c++ interop throught sip.
at one point we had to move from the solely single-threaded design (who would design a single-threaded SERVER may i ask, duh) to a scalable architecture.

the first obstacle we hit was the mighty GIL of Python (oh my, what didn't i try to overcome this...) - fighting this monster made me an expert in providing solutions for scalability and performance optimizations. or not.

anyway, one of the problems we had was that somehow the data we read from berkeley db would become corrupted seemingly without any reason. dude...what a mess. our system is complicated enough so it took some time till we came to investigate the possibility that bdb was somehow the reason for this.
i studied the solutions for multithreaded and multiprocess access to bdb, added DbEnv, also played a bit with DB_DBT_USERMEM because we thought that simply relying on DB_DBT_MALLOC might lead to mem leaks (i still have to thouringly test and make sure that this is not an issue).
eventually we came to this - whenever you fork and the parent and any children processes share bdb handles - CLOSE AND REOPEN THE DB HANDLES IN THE CHILD PROCESS!
this can save you a couple of hours (or weeks) of hitting your head against the wall, so do this, get your bonus and go happilly home to your family for the day

cheers

to be continued...

Tuesday, November 11, 2008

getting to know Berkeley DB

getting to know Berkeley DB came upon this link:

http://simonwillison.net/2003/Nov/26/discoveringBerkeleyDB/


and this comment:


My only complaint about BerkeleyDB is that it's a wee bit flaky when not used with transactions. Databases can become corrupted, processes can deadlock, etc. I've found problems even when using CDS mode. Only problem with wrapping everything in transactions is the performance hit. So, here's what we came up with as a compromise (we use the excellent BerkeleyDB.pm module from our Perl code): lock the entire database on write with a semaphore. The overhead is negligible in terms of speed, but it's done a remarkable job of keeping our indexes very clean!


lock the entire database on write with a semaphore
???
say again?

Update (a few hours later)
i'm beginning to understand what the comment author meant....my rdbms experiece will freak out with these new ideas in berkeley db...