Let DataGravity shed some light on your data

By now you’ve probably heard of Paula Long’s new startup, DataGravity. You may know her from that other storage company she co-founded, EqualLogic.

One of EqualLogic’s goals was to put a storage administrator in every box, you shouldn’t have to pay¬†extra for management tools. DataGravity is taking it further by saying you bought the storage, you should know what’s in it, all for the price of primary storage.

DataGravity array

What’s it all about?

  • Hybrid array (HDD+SSD) (like¬†Nimble, Tegile, Tintri)
  • VM-aware (VMware only)
  • Data-aware (like no other ūüėČ )
  • Multi protocol: CIFS/SMB, NFS, iSCSI
  • Active/proactive architecture
  • Unique snapshot/data protection features

The array comes in 2 fixed configurations, the DG2200 (48TB raw capacity, 2.4TB flash) and DG2400¬†(96TB raw capacity, 4.8TB flash). The array consists of a 2U compute node and a 4U storage node. The compute node contains the SSD disks¬†as well as 2 physically separated servers (controllers), each with it’s own CPU, RAM and NVRAM, the servers are connected by a 8-lane PCIe bus. Disks are connected though SAS interfaces, all hardware is x86 commodity hardware.

DataGravity uses a so-called active/proactive controller architecture, designating a primary node and an intelligence node. The underlying storage is carved up into 2 data pools, one for production storage (primary node), one for intelligence/protection data (intelligence node) and a pool for free space. The data pools consist of separate fault domains (RAID groups). DataGravity developed their own RAID, which allows them to dynamically resize pools, without degrading the RAID set, pretty neat. Each pool can tolerate 2 disk failures (dual parity).

The primary node handles I/O requests and performs inline analytics, generating metadata on who, what,¬†where¬†and when data was accessed. The data and metadata is copied over to the intelligence node, which analyzes the metadata and can index over 400 different file and data types. The intelligence node performs inline compression and targeted deduplication (from what I’ve read, there’s no compression or deduplication on the primary node¬†*update* inline compression and deduplication is also performed on the primary node) and uses the data to create point-in-time DiscoveryPoints (snapshots).

The analytics and data protection features have zero production impact, since these tasks are performed by what normally would be the idle storage controller. Data needs to be mirrored to the secondary controller for HA purposes anyway, so why not put the controller to use.

Writes are cached in NVRAM to enable full stripe writes to the storage pool for optimal performance.

DataGravity flow

Discovery Points

Most snapshotting techniques rely on the availability of the production¬†data, without the original data your snapshot data is useless. I can name a couple of vendors who will let you do snapshots/mirrors/clones to a dedicated tray/RAID group inside the array, to obtain the same level of physical separation¬†as DataGravity’s Discovery Points.

The performance impact however will still remain, whether they use copy-on-write or redirect-on-write, either the controller or the production storage (or both) will take some sort of performance hit. Discovery Points will not impact production performance, since they are handled by their own controller and disks.

Discovery points provide near instant restore and will let you restore anything from a LUN to a VM to a file inside a VM. Despite all these awesome features, this doesn’t mean you can go ahead and uninstall your old backup software. You’ll probably still need some sort of off-site backup. Currently there’s no replication functionality, but I assume this is on their roadmap.

As you might have guessed, VMware snapshots are used to create a data consistent backup. If you know something about VMware snapshots and VSS, then you know a data consistent backup doesn’t always equal an application-aware backup. Some sort of integration with let’s say Veeam would be pretty neat.

Data insight
DataGravity will give you all sorts of insight into your data, for instance; most active users, dormant data, file types stored. Just to be clear, yes it will create a full text searchable index of that .docx file your stored in your Windows VM. There are even rules that identify sensitive data like social security and creditcard numbers. Currently only NTFS volumes are supported.

In case of a controller failure, some analytics and indexing tasks will be deferred. Handling I/O and data protection always takes priority.

DataGravity will not render your current eDiscovery appliance, which is probably way more advanced¬†(and costly), obsolete. Then again, that’s not the market they are looking to target anyway.

I for one am really excited about Datagravity’s unique architecture and features, which really sets it apart from other hybrid arrays. I’m guessing I’m not the only one, since they already won TechTarget’s VMworld Best of Show and New Technology.

For more information, check out their technical whitepaper or watch this introduction and demo video, recorded at Tech Field Day. Also, head over to Willem ter Hansel’s website to read his interview with Paula Long.


The following two tabs change content below.

Rutger Kosters

VCDX #209 | Virtualization Consultant | Blogger | Tech Junkie

Latest posts by Rutger Kosters (see all)

Comments (2)

  1. Great post Rutger. I am an SE for DataGravity, and just wanted to provide one point of clarification. Inline compression and targeted deduplication is performed on both the primary and intelligence nodes, not on just the intelligence node.

    If you have any further questions please let me know. Twitter handle is @gmaentz Thanks again for the post.

Leave a Reply

Your email address will not be published. Required fields are marked *