Category Archives: Java

Analyze Tomcat logs with GoAccess

GoAccess is an open source real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems.

It also can generate nice looking HTML reports.

I use it to analyze Tomcat servers logs and it works great.

If you’re running Fedora or RHEL/CentOS with EPEL repository enabled  you can easily install it with a simple:

 yum install -y goaccess

Next step is configure Tomcat to generate a suitable log file. I typically use this configuration in my server.xml


Then you can ask goAccess to read the log.

If you are interested in live monitoring just have to use this command (changing, of course, the date part of the file name and possibly the full path of your tomcat log directory):

goaccess -f /var/log/tomcat/localhost_access.2016-05-12.log \
--log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u"' \
--date-format='%d/%b/%Y' \
--time-format='%H:%M:%S' 

If you want to generate a fancy HTML report, just add the -a option and redirect the output to a convenient file:

goaccess -f /var/log/tomcat/localhost_access.2016-05-12.log \
--log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u"' \
--date-format='%d/%b/%Y' \
--time-format='%H:%M:%S' \
-a > report.html

that’s it!

Here’s an example of a such generated report.

I suggest you to take a look at the man page and/or at the documentation for more options and features.

Deduplicating Maven, Gradle and IDE’s JAR files on BTRFS

If you are a Java (or a Groovy) developer, your home directory will probably contains (at least) some GBs of downloaded JAR files.

Every dependency management tool (and IDE) downloads its own copy of the JAR files he need in a different directory. This lead to the situation where you might have several copies of the same file wasting your precious disk space!

In such situation one can take advantage of one of the nicest features of the BTRFS filesystem: deduplication.

Deduplication is not a feature supported by the official BTRFS’s tools yet, although the filesystem itself, being a COW filesystem, provides all the needed capabilities.

To leverage them, there’s an handy tool called duperemove.

Basically, what this amazing tool does is scan one or more directories (or subvolumes), find same content extents and deduplicate them. Deduplicate an extent mean that every reference to it points to the same phisical extent on disk (having hence only a phisical written content on disk).

To give you some numbers I’ll show how I used this on my commuting laptop (which has a small 300GB disk, so every GB is precious)

In my home have the following directories (related to the topic):

.eclipse
.gradle
.grails
.groovy
.IdeaIC13
.IdeaIC14
.IntelliJIdea13
.IntelliJIdea14
.ivy2
.m2

for a total usage of 4.8GB.

I used duperemove to deduplicate them with this command:

./duperemove -rdh /home/federico/{\
.eclipse,\
.gradle,\
.grails,\
.groovy,\
.IdeaIC13,\
.IdeaIC14,\
.IntelliJIdea13,\
.IntelliJIdea14,\
.ivy2,\
.m2\
}

after the execution the result was:

Comparison of extent info shows a net change in shared extents of: 704.0M

which is roughly a 14%, but, as they say, YMMV.

One might also suggest that I clean those directory from time to time, or just delete old releases, but  when I’m commuting I work offline and being able to patch and build some project which I’ve not been working on recently is quite handy. Also, having a small HD, even 700M might be useful!

UPDATE: including /usr/share/java(where my OS stores all packaged java libraries) and /usr/local (where I usually install binary distributions of software) further improves results.

Spring Integration – FTP files not reprocessed and broken flow

In the last years I’ve done everything I could to avoid using FTP as a mechanism to pass data from one procedure to the other inside our company.

Tools such Spring Integration and RabbitMQ helped me a lot in this process (making my life so much better)

However we still have to use FTP as exchange mechanism with several external partners.

Spring Integration has a wonderful FTP module that covers all of my needs, however I found a case which required me some googling to shed light on.

I had this scenario: a partner produces a file every n minutes and store in its own FTP server, with a never changing file name. I regularly check the FTP, get the file locally and put it in a RabbitMQ exchange.

So, basically, I have this kind of configuration (not the actual configuration, just an example):



        




     
        
             
        
    




this works nicely, with an important exception: if, for any reason, the outbound gateway cannot deliver the file to RabbitMQ (for example if the server is unreacheable, or restarting, or in maintenance) then this is what happens (suppose that the interesting file is called foo.csv):

  1. the file foo.csv is moved from remote server to the local filesystem
  2. the files is passed from the FTP inbound to the RabbitMQ outbound
  3. the FTP inbound marks the file as “processed”
  4. the RabbitMQ outbound fails to deliver the message to the broker (so it doesn’t delete the local file)

this seems the wanted behaviour (and, it is!), anyway, the problems arises at the next polling cycle:

  1. the FTP inbound sees that he already have a local file named foo.csv, so he won’t dowload the new one from the remote server
  2. anyway, it doesn’t either process the local foo.csv because he marked it as already “processed”

at this point, or chain is stuck and won’t start working again unless we restart our application.

The solution comes from the documentation (as always!)

From that page you can learn that the FTP inbound uses a FileListFilter to decide which local file process. Such filter can be changed setting the local-filter attribute. The default  filter is an AcceptOnceFileListFilter, which, as stated by its name, accept a file only once.

The list of accepted files is stored in memory, and that’s why restarting the application makes our foo.csv file processed again (of course, in case you need it, there are solutions to it persistent)

Going back to our problem, the solution is to set the filter to an instance of the class AcceptAllFileListFilter.





        


With this configuration in place, the remote foo.dat file won’t be downloaded unless the local copy has been deleted, and the local copy will be deleted after the first polling cycle which can succesfully deliver its content to the RabbitMQ broker.