Wednesday, February 11, 2009

More on Berkeley DB

alright! time passes on and one gets wiser (or feels sillier and has more questions to ask).

the project i'm working on is written in python with c++ interop throught sip.
at one point we had to move from the solely single-threaded design (who would design a single-threaded SERVER may i ask, duh) to a scalable architecture.

the first obstacle we hit was the mighty GIL of Python (oh my, what didn't i try to overcome this...) - fighting this monster made me an expert in providing solutions for scalability and performance optimizations. or not.

anyway, one of the problems we had was that somehow the data we read from berkeley db would become corrupted seemingly without any reason. dude...what a mess. our system is complicated enough so it took some time till we came to investigate the possibility that bdb was somehow the reason for this.
i studied the solutions for multithreaded and multiprocess access to bdb, added DbEnv, also played a bit with DB_DBT_USERMEM because we thought that simply relying on DB_DBT_MALLOC might lead to mem leaks (i still have to thouringly test and make sure that this is not an issue).
eventually we came to this - whenever you fork and the parent and any children processes share bdb handles - CLOSE AND REOPEN THE DB HANDLES IN THE CHILD PROCESS!
this can save you a couple of hours (or weeks) of hitting your head against the wall, so do this, get your bonus and go happilly home to your family for the day

cheers

to be continued...