Troubleshooting Offline Site Servers (Troubleshooting)

“HELP! I’m a PaperCut system administrator and our Site Server is unavailable! What could be wrong?”

Site Servers are great, because even if your main Application Server goes offline your regional offices can carry on printing….. BUT, what do you do if a Site Server becomes unresponsive? What could be wrong, and how do we fix it?

 

What symptoms are you experiencing?

There are several symptoms which can occur as a result of an unexpectedly offline Site Server - we’ll address the 3 main one’s in this article, which are:

 
 

If the Site Server web interface is unresponsive

This is usually the least troublesome issue to fix, and can often be rectified after making some quick checks on some of the basics:

 
  • Make sure the server is powered-on, and network connected (to the right network!)
     
  • Check that the Site Server Application is still installed
     
  • Ensure the ‘PaperCut Site Server’ service is running on the server
     
  • Check that ports 9191/9192 (or whatever ports you have manually specified in the configuration) are open on the server
     
  • If you are trying to access the Site Server interface from a different device across the network, see what happens when you try and access it locally on the Site Server
     
  • You can also find some useful troubleshooting tips on the following article - these relate mainly to the Application Server though
     
  • If you’re brave enough, have a look in the server.log file found in the - [app path]/server/logs - directory. Can you see any errors? Check the lines nearest the bottom for useful error messages
     
  • If all else fails, ensure that debug mode is enabled on the Site Server and Application Server, and then reach-out to your PaperCut support provider for further assistance
 

If the Site Server status page says “Offline Mode” constantly



Statuses such as the one above indicate that although the Site Server is running successfully on its host server, the Application Server is unable to verify the Site Server’s current status - perhaps due to network connectivity or database issues.

A few quick things to check before we get started with more complex troubleshooting:

 
  1. Ensure that all of your PaperCut Servers are on the same PaperCut version. This is essential for a successful connection
     
  2. Ensure that your Site Server has adequate free resources - E.g. disk space, CPU and RAM. For a full discussion of identifying and resolving server performance issues, please see our article Troubleshooting Server Performance Issues.
     
  3. The most common cause of offline Site Servers is network problems - so checking the network (both connectivity and latency) is very important. We’ll discuss this further in the scenarios in this article

(Also, be aware that PaperCut is constantly being improved, so check-out our Known Issues pages for details on fixes relating to Site Servers in recent versions.)

Testing connectivity between the Site Server and PaperCut

Network connectivity between the Site Server and the Application Server would be the next thing to check.

We want to make sure that the Site Server is configured with the correct details of your Application Server, and that the Application Server is contactable from the Site Server. The following test will let you verify that the Site Server is configured correctly, and that the PaperCut server is reachable over the network.

You can do that by:

1. On the Site Server browse to the ‘<app path>\server’ folder

2. Open the site-server.properties file

3. Confirm the address of the PaperCut Application Server (master). Find the line: server.master.address=

 

Then using those server details open a browser and try navigating to the following URLs (replacing ‘PaperCut Server Hostname’ with the details of your PaperCut Application Server) to make sure the PaperCut Application server is accessible, the ports are open, and the traffic isn’t being blocked or rerouted by a proxy.

 
  • http://<PaperCut Server Hostname>:9191
     
  • https://<PaperCut Server Hostname>:9192 (unless you have installed a signed certificate on your PaperCut server, you will see a self-signed certificate error in the browser - but that is normal)

Have a look at the network latency (speed) during your tests, because very slow network speeds between the Site Server and the Application Server may make it impossible for the 2 servers to successfully connect with each other.

If you believe that network speed might be causing your issue, there are some Config Keys in PaperCut that may help. Any changes to these would ideally be made by a certified PaperCut technician!

system.site.keepalive-interval-secs - default is 3

system.site.register-interval-secs - default is 30

Hints - the ‘register’ value must be greater than the ‘keepalive’ value. Changing the values of these keys needs to be done with care, as it will change the frequency with which the Application Server will check for issues on your Site Servers.

Have you made any changes in your environment?

Any changes in your environment can have knock-on effects on other things. If you have made any changes in your environment around the time that the Site Server started experiencing issues, try undoing those changes if possible to see if it resolves the problem.

Some specific activities that might cause issues for your Site Server are:

 
  • Installing new SSL certificate on your PaperCut server - One thing that can go wrong is that the server.master.address in site-server.properties file must match the common name on any security certificates used. Here is a Guide.
     
  • Upgrades to server operating systems, firewalls or network switches - PaperCut requires certain ports on your network to ensure ongoing functionality of a Site Server, and updates to Operating Systems, software and hardware can block required ports or the traffic can start being incorrectly labelled as potentially malicious. The update process for operating systems can also potentially corrupt the Site Server database.

  • Issues with ports or blocked traffic may manifest itself in the following way in your web browser, when trying to connect to the Site Server:
 


 

How’s your database?

If the network all looks ok (and it’s very important to confirm that, as it’s by far the most common cause of site server issues), and you haven’t made any changes to your environment at all, then you may have a Site Server database issue. Fortunately the Site Server maintains a cache of the database from the Application Server, and this can be recreated if the Site Server can connect with the Application Server.

If you think a corrupt Site Server database might be your problem, have a read through This Article which will help you establish if that might be the cause, and if so, this would be how to fix it:

Initialize and re-sync a Site Server database

This is done using the db-tools functions. Follow the steps outlined below (ideally in-conjunction with your PaperCut support provider).

WARNING: Depending on the size of the site this can take a while and would only recommend doing this if the site is already down (or out-of-hours). This completely resets the internal Site Server Database. Re-syncing all the data again from the PaperCut Application server.

 
  1. On the Site Server - Stop the ‘PaperCut Site Server’ service
     
  2. Open a command prompt / terminal - ‘as administrator’
     
  3. Change the directory to the following - [AppPath]\server\bin\<platform>
     
  4. Run the following command - db-tools init-db -f
     
  5. On the Site Server - Start the ‘PaperCut Site Server’ service

The database will now re-sync all data from the Application Server

Out of ideas?

Don’t panic! If all else fails, ensure that debug mode is enabled on the Site Server and Application Server, and then reach-out to your PaperCut support provider for further assistance

 

If the Site Server goes into Offline Mode intermittently

Intermittent issues by their nature usually indicate a situation where when an environment is in its ‘ideal’ state, everything works fine - so we need to find out what could be causing your environment to intermittently be effected.

Where Site Servers are concerned, two dependencies are often the cause of intermittent Site Server connectivity issues:

 
  1. Network latency (speed) is the common cause of intermittent Site Server connectivity:
     
    • Perform ‘ping’ tests from the Site Server to the Application Server (and vice-versa), checking the ping response speed. You should run these tests over a reasonable period of time, so you can see the stability of the network connection. A satisfactory ping response would be between 1ms-150ms
       
    • If you have other 3rd party tools available to you, to check the network latency, we encourage you to use them and to analyse the results
     
  2. Server resources are crucial for Site Server connectivity, so ensure that your Site Server has permanent adequate free resources - E.g. disk space, CPU and RAM:
     
    • If your Site Server is a Virtual Machine, make sure that its resources are configured as ‘static’ (or ‘dedicated’) on the VM Host - this prevents other VM servers on the Host from stealing valuable resources from the Site Server. For a full discussion of identifying and resolving server performance issues, please see our article Troubleshooting Server Performance Issues.

Speed issues between the Site Server and the Application server are known to sometimes cause the following error:

 

“Initial setup fails with a 503 Service Unavailable” error



Although this displays as an error, this is actually PaperCut doing its job - the speed on the network between the servers is inadequate to a point where the Application Server is unable to successfully issue its ‘keep-alive’ prompts to the Site Server, therefore it takes it offline as it’s unresponsive.

 

Logs!

If you have a Site Server going offline intermittently, you will see the following records in the logs. You can see that the Site Server keeps going into offline mode (called SLAVE_OFFLINE in the logs) and then switching back online almost immediately (called SLAVE_PROXY in the logs). This is likely because of a slow or ‘flapping’ network connection to the main PaperCut Application server:

2020–01–01 13:51:27,934 INFO ServerStateManagerImpl:529 - Server state changed from SLAVE_PROXY to SLAVE_OFFLINE [server-state-monitor]

2020–01–01 13:53:32,037 INFO ServerStateManagerImpl:529 - Server state changed from SLAVE_OFFLINE to SLAVE_PROXY [server-state-monitor]

2020–01–01 13:57:16,488 INFO ServerStateManagerImpl:529 - Server state changed from SLAVE_PROXY to SLAVE_OFFLINE [server-state-monitor]

2020–01–01 13:59:12,544 INFO ServerStateManagerImpl:529 - Server state changed from SLAVE_OFFLINE to SLAVE_PROXY [server-state-monitor]

 

Identifying an issue with network latency

In an ideal scenario, the Site Server will quickly detect when a PaperCut Application server is unavailable and will quickly shift into “offline” mode. This works well when the Site Server has a stable connection to the main PaperCut Application server, but what happens if this connection is unreliable or has a high latency? In this scenario, the Site Server might go into offline mode before it should. If your sites have an inconsistent network connection then the Site Server may be stuck in offline mode for most of the day.

If we can identify that this is occurring from the logs, then it’s possible to fine-tune the settings on the Site Server and PaperCut so that the offline mode is not triggered so quickly.

Network speed - log examples:

The Site Server log below indicates that the Application Server response took 4391 ms for a HTTPS request made by Site Server - this indicates possible network speed issues:

2020–01–01 10:04:21,508 DEBUG ProxyBase:199 - Response took 4391ms. POST papercut-server:9192/rpc/api/rest/master/setDeviceServer/404b395f-476a-43cd-86d7-bb8b3e19384f returned a response status of 200 OK (POST - /rpc/api/rest/internal/site/setDeviceServer) [https-102]

Compare that with the Application Server itself for the same request, it just took 1248 ms:

2020–01–01 10:04:21,486 DEBUG Jetty:936 - <<< 10.7.1.93 HTTP/1.1 POST /rpc/api/rest/master/setDeviceServer/404b395f-476a-43cd-86d7-bb8b3e19384f => Status: 200, Content-Type: application/json, Took 1248ms [https-101]

The above logs indicate that 3143 ms was lost during the network transfer.

The Site Server works best when adding some resiliency and fail-over when the network connection is completely severed, but may not work quite as well in situations where the network connection times-out intermittently because of slowness or latency.

So what can we do about this? Some customers with similar issues have reported some success with increasing the time-out period before the Site Server switches to Offline Mode.

To make this change, follow these steps to change this setting on BOTH the Application Server and Site Server.

Application Server Steps

 
  1. Log into the PaperCut Application Server admin interface
     
  2. Options > Actions > Config Editor
     
  3. Increase ‘system.site.keepalive-http-timeout-secs’ to ‘5’

Site Server Steps

 
  1. Stop the ‘PaperCut Site Server’ service
     
  2. Open a Command Prompt/Terminal ‘as administrator’
     
  3. Navigate to the - [app-path]/server/bin/<platform>/ - directory
     
  4. Execute the following db-tools command:
     
    db-tools run-sql “update tbl_config set property_value = ‘5’ where property_name = ‘system.site.keepalive-http-timeout-secs’ ”
     
  5. Start the PaperCut Site Server service

However, unless the underlying slow or ‘flapping’ network connection can be addressed, it’s likely that end-users will see other issues like slow client pop-ups, long times to log in to copiers, or a delay before receiving scanned files.

 

Site Server unable to resolve its own hostname:

Every 30 seconds the Site Server updates its registration with the Application Server.

Before making this registration call the Site Server checks its hostname on the network. If this check fails, the Application Server believes that the Site Server is offline.

Site Server hostname check - log examples:

Every now and then there might be some packet drops.

2020–01–01 11:57:42,842 DEBUG ServerStateManagerImpl:360 - Master server keep-alive failed on attempt 1 of 2. Took: 2002ms: java.net.SocketTimeoutException: connect timed out [server-state-monitor]

2020–01–02 12:00:19,988 DEBUG ServerStateManagerImpl:360 - Master server keep-alive failed on attempt 1 of 2. Took: 2001ms: java.net.SocketTimeoutException: connect timed out [server-state-monitor]

If this is your situation, this would point potentially to an underlying DNS resolution issue in your environment that needs to be addressed.

A quick fix if you’re facing this problem could be to edit the Windows Host File on the server, so that it doesn’t have to poll the DNS Server across the network for this information. Although, it may be better to investigate more generally the success of DNS resolution in your environment - especially because using the quick fix solution to manually edit the Windows Hosts file will mean that if the IP address(es) of the PaperCut Server(s) is changed in the future, you will need to remember to manually edit the Windows Hosts file again to reflect the new IP address.

 

Link to original article