R on Ubuntu featured image

Installing R on Ubuntu 21.04: A Tutorial

R is a programming language that specializes in working with data. R is free software that supports an extensive catalog of statistical and graphical methods. The list includes various machine learning algorithms, time series, linear regression, and more. It’s used by industry giants like Google, Facebook, Airbnb, Uber, etc. As the description suggests, R is the go-to option when big …

How-To-Install-and-Use-PostgreSQL-on-Ubuntu-18.04

Setting up PostgreSQL on an Ubuntu 18.04 Server

As time and technology continue to progress, the internet holds a central position in the modern world. That is why most companies and businesses have websites and applications to represent the online aspect of their brands. Regardless of whether you are the owner of a small or large website, you need the help of certain tools to make your job …

install Anaconda tutorial

How to Install Anaconda on Ubuntu 18.04 in Six Simple Steps

Introduction Anaconda is an open-source package manager and framework for handling machine learning and data science workflows. It also helps to distribute some programming languages ​​like Python and R. With over 7500 different scientific data packages, Anaconda helps process large-scale data, scientific computing, and predictive analysis. This package is available as a free and paid version. In this tutorial, we …

Cloud Cost Efficiency

You have a web crawling service? See how CloudSigma can match your scalability and cost efficiency requirements

Customer Profile Grepsr is a leading cloud powered web scraping service. It provides custom solutions to extract public data for non-technical individuals and teams that need data extracted, aggregated, and organized so they can focus on more important things. Its managed platform helps companies with everything they need to capture, normalize and effortlessly bring data into their systems. The data …

Installing Hadoop Tutorial

Installing Hadoop on a Single Node in Five Simple Steps

Welcome to our guide on installing Hadoop in five simple steps. To start with, the Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather …

Realtime Twitter Twitter Data Ingestion using Flume

Realtime Twitter Data Ingestion using Flume

With more than 330 million active users, Twitter is one of the top platforms where people like to share their thoughts. More importantly, twitter data can be used for a variety of purposes such as research, consumer insights, demographic insights and many more. In addition, twitter data insights are especially useful for businesses as they allow for the analysis of …

Measuring VM Traffic on CloudSigma

How to measure the traffic on your VM with the CloudSigma API and RRDtool

Learn how to measure the traffic of your virtual machines in CloudSigma using the simple instructions below, to obtain, store and graph the traffic from our different network interfaces. To obtain the statistics of each interface we can make use of the CloudSigma API and a simple script that will store the data of each network interface. The database we …

cloud and data science featured image

CloudSigma Sponsors “An Intro to Data Science with R event” to Encourage R Language Adoption

CloudSigma Sponsorship CloudSigma was excited to have sponsored the Intro to Data Science with R event, organized by R User Group Sylhet! The event took place on May 1st 2018, at the ACM office, Sylhet, Bangladesh. The event brought together more than 20 R Language enthusiasts, software developers and data scientists. They gathered to learn, exchange ideas and promote the …

cloudera tutorial featured image

Setting up a Big Data Cluster on Cloudera Tutorial

CDH is Cloudera’s 100% open source platform distribution, including Apache Hadoop and built specifically to meet enterprise demands. CDH delivers everything you need for enterprise use right out of the box. By integrating Hadoop with more than a dozen other critical open source projects, Cloudera has created a functionally advanced system that helps you perform end-to-end Big Data workflows. (Source). …

Setting up a Big Cluster in 3 Easy Steps

Setting up a Big Data Cluster within Minutes in 3 Easy Steps

HDP is the industry’s a truly secure, enterprise-ready open source Apache™ Hadoop® distribution based on a centralized architecture (YARN). HDP addresses the complete needs of data-at-rest, powers real-time customer applications and delivers robust big data analytics that accelerate decision making and innovation. (Source: https://hortonworks.com/products/data-platforms/hdp/). I am going to install HDP to create a big cluster of 5 nodes deployed on …

ditas

Improving Data-intensive Applications by Moving Data and Computation into Mixed Cloud/Fog Environments

The DITAS project is a Research and Innovation Action funded by the European Commission as part of the Horizon2020 Programme. The project started in January 2017 and will conclude in December 2019. The consortium includes 5 industry partners (including CloudSigma) and 3 research organisations from 6 European countries. The coordinator of the DITAS project is Dr. David García Pérez, from …