SPECsfs2008_nfs.v3 Result

Avere Systems, Inc. : FXT 3500 (44 Node Cluster)
SPECsfs2008_nfs.v3 = 1564404 Ops/Sec (Overall Response Time = 0.99 msec)


Performance

Throughput
(ops/sec)
Response
(msec)
158669 0.6
317906 0.7
477508 0.6
637151 0.7
797347 1.0
964648 1.1
1150533 1.0
1304474 1.4
1423859 2.0
1564404 2.2
Performance Graph


Product and Test Information

Tested By Avere Systems, Inc.
Product Name FXT 3500 (44 Node Cluster)
Hardware Available November 2011
Software Available November 2011
Date Tested October 2011
SFS License Number 9020
Licensee Locations Pittsburgh, PA
USA

The Avere Systems FXT 3500 appliance provides tiered NAS storage that allows performance to scale independently of capacity. The FXT 3500 is built on a 64-bit architecture that is managed by Avere OS software. FXT clusters scale to as many as 50 nodes and support millions of IO operations per second and deliver tens of GB/s of bandwidth. The Avere OS software dynamically organizes data into tiers. Active data is placed on the FXT appliances while inactive data is placed on slower mass storage servers. The FXT 3500 accelerates performance of read, write and metadata operations and supports both NFSv3 and CIFS protocols. The tested configuration consisted of (44) FXT 3500 cluster nodes. The system also included (4) OpenSolaris ZFS servers that acted as the mass storage systems. Avere's integrated global namespace functionality was used to present a single namespace to all clients.

Configuration Bill of Materials

Item No Qty Type Vendor Model/Name Description
1 44 Storage Appliance Avere Systems, Inc. FXT 3500 Avere Systems Tiered NAS Appliance running Avere OS V2.1 software. Includes (15) 600 GB SAS Disks.
2 4 Server Chassis Genstor CSRM-4USM24ASRA 4U 24-Drive Chassis for OpenSolaris mass storage NFS Server.
3 4 Server Motherboard Genstor MOBO-SX8DTEF Supermicro X8DTE Motherboard for OpenSolaris mass storage NFS Server.
4 8 CPU Genstor CPUI-X5620E Intel Westmere Quad Core 2.4 GHz 12MB L3 CPU for OpenSolaris mass storage NFS Server..
5 8 CPU Heat Sink Genstor CPUI-2USMP0038P SM 2U 3U P0038P Heat Sink for OpenSolaris mass storage NFS Server.
6 48 Memory Arrow MT36JSZF1G72PZ-1GD1 Micron DRAM Module DDR3 SDRAM 8GByte 240 RDIMM for OpenSolaris mass storage NFS Server.
7 92 Disk Drive Hitachi HDAS-3000H6G64U 3TB SATA II 64MB 3.5 ULTRASTAR 7200 RPM Disks for OpenSolaris mass storage NFS Server.
8 4 SSD Intel SSDSA2BZ200G301 Intel SSD 710 Series (200GB, 2.5in SATA 3 Gb/s, 25nm, MLC) for OpenSolaris mass storage NFS Server.
9 12 RAID Controller Genstor RAAD-LSAS9211-8i LSI SAS 8 port PCI-X 6 Gbps LSI SAS 9211-8i HBA for OpenSolaris mass storage NFS Server.
10 24 Cabling Genstor CABL-SASICBL0281L SAS 4-Lane for Backplane to 4-Lane SATA for OpenSolaris mass storage NFS Server.
11 4 Network Card Intel E10G42BTDA Intel 10 Gigabit AF DA Dual Port Server Adapter for OpenSolaris mass storage NFS Server.

Server Software

OS Name and Version Avere OS V2.1
Other Software Mass storage server runs OpenSolaris 5.11 svn_134, this package is available for download at http://genunix.org/dist/indiana/osol-dev-134-x86.iso
Filesystem Software Avere OS V2.1

Server Tuning

Name Value Description
zfs set atime off Disable atime updates on the OpenSolaris mass storage system.
ncsize 2097152 Size the directory name lookup cache (DNLC) to 2 million entries on the OpenSolaris mass storage system.
zfs:zfs_arc_max 0x1400000000 Set the maximum size of the ZFS ARC cache to 80GB on the OpenSolaris mass storage system.
zfs:zfs_arc_meta_limit 0x13c0000000 Limit the metadata portion of the ZFS ARC cache to 79GB on the OpenSolaris mass storage system.
zfs:zfs_vdev_cache_bshift 13 Limit the device-level prefetch to 8KB on the OpenSolaris mass storage system.
zfs:zfs_mdcomp_disable 1 Disable metadata compression on the OpenSolaris mass storage system.
zfs:zfs_txg_timeout 2 Aggressively push filesystem transactions from the ZFS intent log to storage on the OpenSolaris mass storage system.
zfs:zfs_no_write_throttle 1 Disable write throttle on the OpenSolaris mass storage system.
rpcmod:cotsmaxdupreqs 6144 Increase the size of the duplicate request cache that detects RPC-level retransmissions on connection-oriented transports on the OpenSolaris mass storage system.
ddi_msix_alloc_limit 4 Limit the number of MSI-X vectors per driver instance on the OpenSolaris mass storage system.
pcplusmp:apic_intr_policy 1 Distribute interrupts over CPUs in a round-robin fashion on the OpenSolaris mass storage system.
ixgbe.conf:mr_enable 0 Disable multiple send and receive queues for Intel 10GbE adapter on the OpenSolaris mass storage system.
ixgbe.conf:tx_ring_size 4096 Increase the packet receive ring to 4096 entries on the Intel 10GbE adapter on the OpenSolaris mass storage system.
ixgbe.conf:rx_ring_size 4096 Increase the packet transmit ring to 4096 entries on the Intel 10GbE adapter on the OpenSolaris mass storage system.
ixgbe.conf:intr_throttling 0 Disable interrupt throttling on the Intel 10GbE adapter on the OpenSolaris mass storage system.
ixgbe.conf:tx_copy_threshold 1024 Increase the transmit copy threshold from the default of 512 bytes to 1024 on the OpenSolaris mass storage system.
NFSD_SERVERS 1024 Run 1024 NFS server threads on the OpenSolaris mass storage system.
Writeback Time 12 hours Files may be modified up to 12 hours before being written back to the mass storage system.
cfs.pagerFillRandomWindowSize 32 Increase the size of random IOs from disk.
cfs.quotaCacheMoveMax 10 Limit space balancing between cache policies.
cfs.resetOnIdle true Equally share cache capacity when system is not recycling any blocks.
vcm.readdir_readahead_mask 0x7ff0 Optimize readdir performance.
buf.autoTune 0 Statically size FXT memory caches.
buf.neededCleaners 16 Allow 16 buffer cleaner tasks to be active.
tokenmgrs.geoXYZ.fcrTokenSupported no Disable the use of full control read tokens.
tokenmgrs.geoXYZ.tkPageCleaningMaxCntThreshold 256 Optimize cluster data manager to write more pages in parallel.
tokenmgrs.geoXYZ.tkPageCleaningStrictOrder no Optimize cluster data manager to write pages in parallel.
cluster.VDiskCommonMaxRtime 600000000 Retry internal RPC calls for up to 600 seconds. Value is in microseconds.
cluster.VDiskCommonMaxRcnt 65535 Retry internal RPC calls for up to 65535 attempts.
cluster.VDiskInternodeMaxRtime 600000000 Retry inter-FXT node RPC calls for up to 600 seconds. Value is in microseconds.
cluster.VDiskInternodeMaxRcnt 65535 Retry inter-FXT node RPC calls for up to 65535 attempts.

Server Tuning Notes

21+1 RAID5 (RAIDZ) array was created on the OpenSolaris mass storage server using the following Solaris command: zpool create vol0 raidz ...list of 21 disk WWN's... log ...WWN of SSD device...

Disks and Filesystems

Description Number of Disks Usable Size
Each FXT 3500 node contains (15) 600 GB 10K RPM SAS disks. All FXT data resides on these disks. 660 342.2 TB
Each FXT 3500 node contains (1) 250 GB SATA disk. System disk. 44 10.0 TB
The mass storage systems contain (22) 3 TB SATA disks. The ZFS file system is used to manage these disks and the FXT nodes access them via NFSv3. 88 223.8 TB
The mass storage systems contain (1) 3 TB system disk. 4 10.9 TB
The mass storage systems contain (1) 200 GB SSD disk. ZFS Intent Log (ZIL) Device. 4 186.0 GB
Total 800 587.0 TB
Number of Filesystems Single Namespace
Total Exported Capacity 229162 GB (OpenSolaris mass storage system capacity)
Filesystem Type TFS (Tiered File System)
Filesystem Creation Options Default on FXT nodes. OpenSolaris mass storage server ZFS filesystem created with 'zpool create vol0 raidz ...list of 21 disk WWN's... log ...WWN of SSD device...'
Filesystem Config 21+1 RAID5 configuration on OpenSolaris mass storage server.
Fileset Size 185647.1 GB

Network Configuration

Item No Network Type Number of Ports Used Notes
1 10 Gigabit Ethernet 44 One 10 Gigabit Ethernet port used for each FXT 3500 appliance.
2 10 Gigabit Ethernet 4 The mass storage systems are connected via 10 Gigabit Ethernet.

Network Configuration Notes

Each FXT 3500 was attached via a single 10 GbE port to one of two Arista Networks 7050-64 port 10 GbE switches. The FXT nodes were evenly split across the two switches. The load generating clients were also spread across the two switches. The mass storage servers were also attached to the same two switches, 2 mass storage servers per switch. Finally, the two switches were interconnected using a QSFP+ port along with 4x10GbE ports for a total of 80 Gbps of bidirectional inter-switch bandwidth. A 1500 byte MTU was used throughout the network.

Benchmark Network

An MTU size of 1500 was set for all connections to the switch. Each load generator was connected to the network via a single 10 GbE port. The SUT was configured with 88 separate IP addresses on one subnet. Each cluster node was connected via a 10 GbE NIC and was sponsoring 2 IP addresses.

Processing Elements

Item No Qty Type Description Processing Function
1 88 CPU Intel Xeon CPU E5620 2.40 GHz Quad-Core Processor FXT 3500 Avere OS, Network, NFS/CIFS, Filesystem, Device Drivers
2 8 CPU Intel Xeon E5620 2.40 GHz Quad-Core Processor OpenSolaris mass storage systems

Processing Element Notes

Each file server has two physical processors.

Memory

Description Size in GB Number of Instances Total GB Nonvolatile
FXT System Memory 144 44 6336 V
Mass storage system memory 96 4 384 V
FXT NVRAM 2 44 88 NV
Grand Total Memory Gigabytes     6808  

Memory Notes

Each FXT node has main memory that is used for the operating system and for caching filesystem data. A separate, battery-backed NVRAM module is used to provide stable storage for writes that have not yet been written to disk.

Stable Storage

The Avere filesystem logs writes and metadata updates to the NVRAM module. Filesystem modifying NFS operations are not acknowledged until the data has been safely stored in NVRAM. The battery backing the NVRAM ensures that any uncommitted transactions persist for at least 72 hours. The OpenSolaris MASS contains a ZFS intent log (ZIL) device. The ZIL device is an Intel 710 SSD and this device will save all write-cached data in the event of a power loss. This SSD device is designed with sufficient capacitance to flush any committed write data to the non-volatile flash media in event of a power failure or unclean shutdown.

System Under Test Configuration Notes

The system under test consisted of (44) Avere FXT 3500 nodes. Each node was attached to the network via 10 Gigabit Ethernet. Each FXT 3500 node contains (15) 600 GB SAS disks. The OpenSolaris mass storage systems were each attached to the network via a single 10 Gigabit Ethernet link. The mass storage servers were 4U Supermicro servers each configured with a software RAIDZ 21+1 RAID5 array consisting of (22) 3TB SATA disks. Additionally, a 200GB SSD device was used for the ZFS intent log (ZIL) in each mass storage server.

Other System Notes

N/A

Test Environment Bill of Materials

Item No Qty Vendor Model/Name Description
1 20 Supermicro SYS-1026T-6RFT+ Supermicro Server with 48GB of RAM running CentOS 5.6 (Linux 2.6.18-238.19.1.el5)
2 2 Arista Networks 7050-64 Arista Networks 64 Port 10 GbE Switch. 48 SFP/SFP+ ports, 4 QSFP+ ports

Load Generators

LG Type Name LG1
BOM Item # 1
Processor Name Intel Xeon E5645 2.40GHz Quad-Core Processor
Processor Speed 2.40 GHz
Number of Processors (chips) 2
Number of Cores/Chip 6
Memory Size 48 GB
Operating System CentOS 5.6 (Linux 2.6.18-238.19.1.el5)
Network Type Intel Corporation 82599EB 10-Gigabit SFI/SFP+

Load Generator (LG) Configuration

Benchmark Parameters

Network Attached Storage Type NFS V3
Number of Load Generators 20
Number of Processes per LG 176
Biod Max Read Setting 2
Biod Max Write Setting 2
Block Size 0

Testbed Configuration

LG No LG Type Network Target Filesystems Notes
1..20 LG1 1 /sfs0,/sfs1,/sfs2,/sfs3 LG1 nodes are evenly split across the two switches

Load Generator Configuration Notes

All clients were mounted against all filesystems on all FXT nodes.

Uniform Access Rule Compliance

Each load-generating client hosted 176 processes. The assignment of processes to network interfaces was done such that they were evenly divided across all network paths to the FXT appliances. The filesystem data was evenly distributed across all disks and FXT appliances and MASS storage servers.

Other Notes

N/A

Config Diagrams


Generated on Tue Nov 15 16:05:03 2011 by SPECsfs2008 HTML Formatter
Copyright © 1997-2008 Standard Performance Evaluation Corporation

First published at SPEC.org on 15-Nov-2011