NFS Read Perforrmance
Below shows a graph of the response time ("latency"), measured in milliseconds per call,
measured from the client (10.0.0.17) to the server (10.0.0.18) via 100BaseT network.
Note the knee at 1500 bytes. At this point, the nfs read response requires
more than one UDP packet to respond. (The 1500 comes from the ethernet
Maximum Transmission Unit (MTU)).
Note the absence of bumps at 3000, 4500, etc.
The number of packets per response is shown below.
Below is a graph of the network traffic, measured on the server, in KBytes per second,
for various operations. (raw data)
The graph below is similar, except that it shows the number of bytes per operation.
Its derived by multiplying the latency times the bitrate above.
The response to the read request scales as the size of the file, and is given
response bytes = (requested file size) + 106 + 36 *(number of response packets)
The other traffic is constant, and independent of the file size. Each request/response
requires exactly one packet, and the number of bytes in each packet is:
Note that the getattr and read requests will contain the filepath, and thus
the size will vary depending on the filename.
- null call request = 82 bytes
- null call response = 66 bytes
- getattr request = 114 bytes
- getattr response = 138 bytes
- nfs read request = 126 bytes
Below is a graph of the cpu usage on the server, as a function of file size.
Clearly, the graphs show a lot of noise; but why is not clear.
The cpu usage data was collected with the 'vmstat' command, set to report
every 10 seconds. The 'nullcall' and 'getattr' data represent approx.
30 samples taken over 300 seconds. The 'read' data represent 1x to 8x more
points over a correspondingly longer time period. We conclude that either
there is something noisy about how the kernel keeps cpu usage data, or that
the kernel scheduling algorithms are inherently noisy.
The graph below shows the cpu-usage, in microseconds per call. We get this
graph by multiplying the percent-busy data by the elapsed-time data. It shows
the actual cycles burned to satisfy one request, inclusive of context switches,
interrupts, nbetwork processing, etc. Note the null call and getattr call sit
nearly on top of one another: this is consistent with earlier data, where the
getattr call takes only 2.75 microsecond more than nullcall.
The next graph shows the number of interrupts handled per read operation.
It seems to stairstep an extra sixth of an interrupt per UDP packet. Note
that getattr and nullcall take two interrupts per operation.
In all cases, (nullcall, getattr, and read) context switches remain constant
at three per operation.
Unless otherwise noted, the experimental setup as follows:
- NFS Version 2 protocol
- client machine: 10.0.0.17 Intel Pentium-4 1700 MHz, 256 KB cache, 512MB RAM
- server machine: 10.0.0.18 Intel Pentium-4 1700 MHz, 256 KB cache, 896MB RAM
- network: Ethernet 100BaseT switched full-duplex (addtron ??)