Monday, September 26, 2011

Coder at sea. Arg.

In the near recent (last few months) exploration of code and bugs,
I've run into many lessons to learn.

The lone coder.
It seems to happen more often than not.
A single coder is put in charge of a project.
Maybe even a large, long, daunting one.
And because of other priorities, maybe the testing is lax.
So when it is finally released out into the world, CHAOS ensues.
Everyone is then brought on to save the ship.
Scramble, late hours, quick fixes, fixes for the quick fixes, daily redeploys, defensive restarts.
And stability is still off in the distance, another week, two, a month.
So the more eyes, code reviews, the better.

Losing commits.
Having developers on different coding branches can cause this serious issue.
With the different streams of coding, there ends up being a lot of coding, especially if this includes production maintenance, an upcoming release, and new development.
And with those separate spaces (and especially with coders leaving), commits can get lost. The worst example: issues are broken again when fixes are lost, such as when a side branch (based off older code) is pushed to production.

Memory.
In summary: JAXBContext. A memory hog.
This came up from explicit use and also from JAXWS. Recreating the JAXBContext is expensive and will block threads. I found this by wadding through thread dumps and finding half the threads blocked waiting to read a zip file for the JAXBContext creation. Both from explicit new instance creation (instead of just the marhaller) and from recreating the JAXWS service (instead of just the port).
This can grind the system down, cause constant garbage collections, and require hours of profiling (visualvm) and monitoring.

Error handling.
One example on a small but important project from a former coworker was fun to investigate. Well I thought it was fun, but it was time consuming.
On failing to copy a file, no errors were logged.
The user saw no error message in the UI, the log file was silent, the database had no mention of it.
After searching I found exceptions were being swallowed and the stacktrace message was too large for the database column it was being stuffed into (the database failure was at least returned in the REST call).
And later I found that database values are misconfigured (untested) and resources could have been deployed to production instead of QA (bad bad).
So these are some of the cleanups I've worked on periodically.

No comments:

Post a Comment