Big Data Workloads in the cloud
Make your big data workloads fly
More than any other computing workload, big data and associated workloads needs the right infrastructure platform. Bottlenecks can cripple otherwise well thought out systems and at scale being able to provision resources configured correctly can have a huge impact on overall system performance. It’s why a fast growing number of customers love CloudSigma for big data workloads.
High Performance Infrastructure
Big data and similar high performance computing workloads are only as fast as their slowest components. It’s those bottlenecks that turn a an efficient and well functioning deployment into something that is far from optimal. Having a cloud platform that can sustain a very wide range of performance requirements, including very high performance across multiple aspects of it’s infrastructure is therefore critical.
What to Ask
- What internal networking VM to VM speeds does your cloud support?
Things to Look For
- Networking speeds especially around head nodes can be huge bottlenecks. Having internal networking that can support multiple gigabit speeds will increase scalability and overall performance levels that are achievable.
Our Cloud Solution
- Our cloud is built with dual 10GigE networking to every compute node. You can easily stream at 4-5Gbps and higher from a standard single CloudSigma cloud server. We don’t have HPC instances because all our servers of all sizes can reach HPC performance levels.
- What storage performance profiles are available?
- Big data workloads as the name suggests, combine large scale data sets with intense computing. As a result big data needs cost effective petabyte scale storage options alongside very fast active workload storage. Look for a cloud that allows you to fluidly combine these differing storage needs easily.
- We offer both all SSD and scale-out magnetic options. Customers can create and size their drives then mount and combine them across their cloud servers as needed. Our all SSD option delivers up to 1000 IOPS per write thread for only $0.13 per GB/30 days with a maximum drive size of 5TB. Our scale-out magnetic option is only $0.08 per GB/30 days and a maximum drive size of up to 100TB. It’s super simple to manage high performance requirements alongside large-scale data storage in our cloud.
- Do you expose CPU instruction sets to your cloud servers?
- Modern CPU performance relies heavily on the instruction sets of the CPU to optimise processing of many operations. Exposure of these instruction sets through to the cloud servers isn’t always available. Look for a cloud able to expose instruction sets so you can benefit from 2015 CPUs instead of 1985 ones!
- We allow customers to choose between a standard emulated CPU (for compatibility) or the native CPU model exposure. The latter allows full CPU instruction set exposure to the cloud servers for optimal performance.
“By having our choice of good performance, cost effective storage as well as high-performance SSD storage, all of which is elastic and on-demand, we’re able to quickly and cost effectively deploy our platform, whose goal is to use Earth Observation data to protect both life and property from earthquake and volcanic hazards.”Julio Carreira, Capacity Manager, European Space Agency
Scaling Up & Automation
As you are likely going to be doing batch computing, your resource requirements will vary widely over time. Likewise, operating at scale means an automated system for easy management. Choosing a cloud that quickly responds to infrastructure needs, allows complete automation and backs all this up with a sensible billing system that tracks your usage over time is essential.
What to Ask
- How long does it take to provision key resources such as servers, drives and networking resources?
Things to Look For
- Spinning up large scale infrastructure on demand is only feasible within an environment that can support quick provisioning. Look for short resource fulfillment lead times.
Our Cloud Solution
- All resources are delivered immediately after ordering. For networks and drive creation this means 2-5 seconds, for new servers it takes less 30 seconds to get to a usable login prompt.
- Does your API support bulk operations & full feature coverage?
- It’s important to know what you can and can’t automate about your cloud platform, API coverage is therefore important to understand. Likewise when scaling out hundreds or thousands of servers the ability to support bulk operations is a major bonus.
- We’re proud of our battle tested cloud API. It not only offers 100% feature coverage to give you complete automation but also a number of bulk operations for quick scale-up. Need 200 new worker nodes? Just give us the golden server image and we’ll clone and start all two hundred with a single API call. We’ll even keep you posted as your job progresses. Check out our driver and library API integrations.
- How does your billing deal with variable infrastructure needs from batch processing over time?
- Big data and HPC customers tend to vary their resource consumption widely in response to various batch processing activities. As a result it’s best to look for a cloud that allows you to purchase what you need when you need it if possible.
- We’ve built our cloud with a utility approach to billing. It means we just look at your aggregate resource consumption at any one time per resource. The best part is that you can buy resources on subscription or burst. So you can subscribe for a core resource usage but then scale up on burst whenever you like. No more reserved instances versus on-demand. It’s a lot simpler.
“CloudSigma’s level of flexibility extends well past just resource purchasing efficiency. When we first made the decision to migrate from Rackspace to CloudSigma, we assumed there would be a learning curve, considering CloudSigma’s approach is different from most providers’. However, CloudSigma made it quite easy for us, and having the ability to deploy our preferred operating system and software within its cloud was a big help too.”Amit Chaudhary, Co-Founder, Gresp
Platform Flexibility
Our big data and HPC users have a clear understanding of their computing requirements. Having a platform that allows customers to express their requirements accurately is essential to an optimized and successful deployment particularly when working at scale.
What to Ask
- What choice of operating systems and applications do I have when using your cloud?
Things to Look For
- Some clouds restrict what can be run at the software layer within them. This cedes strategic control of technology decisions to the cloud vendor and may cause issues in the future if not today.
Our Cloud Solution
- We offer an open platform. While we provide a number of pre-installed systems for convenience, customers can install or upload any x86/x64 based operating system without the need for modifications. This includes BSD, Linux and Windows based systems. Whatever you want to run, our cloud can support it.
- Can I tweak CPU and hypervisor settings to optimize them for my work load?
- It’s possible to adjust a great many settings in a virtualized environment to better fit a specific workload. This can result in up to 50% performance gains over standard settings. The more customization the better.
- We expose everything from NUMA topology (great for big VMs) to virtual core size (create lots of threads for parallel processing) to hypervisor timer settings (this can make a huge difference for Windows environments). With CloudSigma you are in the driving seat and can achieve the best price/performance on the market as you scale out.
- How much flexibility of server sizing is available?
- Big data and HPC type workloads often require cloud server sizing that’s outside of the normal range. It could be a heavy RAM server, a server with a lot of storage or just a very large node.
- We offer completely unbundled resources with no fixed server sizes. You can size servers with exactly the CPU and RAM you need and combine that with different storage as you need. We call it ‘perfect provisioning’. We also offer a wide sizing range for each resource so you are sure to have enough resources.
“We turned to CloudSigma because we needed a cost-efficient and flexible high performance cloud infrastructure to host our demanding semantic search solutions. The ability to quickly tailor the infrastructure to perfectly fit many different requirements is ideal for any big data venture.”Mario Juric, CTO, Unsilo
High Availability & Load Balancing
Being able to properly separate key components of a deployment to avoid single points of failure is important for service availability as well as for spreading load across infrastructure to avoid hot spots that can become bottlenecks. A cloud that can give users the tools to ensure high availability of their services as well as load balancing is an important aspect for anyone in the process of choosing a cloud service vendor.
What to Ask
- What functionality do you offer to ensure separation of key customer infrastructure components?
Things to Look For
- Outsourcing infrastructure to the cloud has many benefits but there’s a danger that loss of visibility can introduce single points of failure. Features that enable separation of infrastructure can be very useful for building truly resilient cloud infrastructure deployments.
Our Cloud Solution
- We offer the ability to explicitly separate infrastructure on separate systems. When creating or cloning a drive or server you can specify which other infrastructure to avoid. We also explicitly show infrastructure that is residing on shared systems. It empowers our customers to build more resilient services.
- What redundant are the systems used to delivery your cloud service?
- The quality and redundancy of systems used by various cloud service providers can vary significantly. When assessing one provider versus another this should be part of your value for money calculation.
- We only choose data centers that are Tier III equivalent or higher to house our cloud locations. Additionally we blend multiple tier 1 connectivity providers to ensure network availability. Within our cloud we have redundant switching throughout. As a result we offer a service level agreement with a 100% availability level plus a sub-1ms network latency guarantee.
- Do you offer load balancing and auto-scaling?
- Having load balancing offered as a service with validated products can be a great time saver. Adding auto scaling will make it easy to increase clusters based on your requirements in real time.
- We offer a multi-tiered load balancing service giving customers an appropriate choice at the right price. These options include advanced load balancing features and full auto-scaling abilities. The basic tier starts at just $50 per month.