Oak.... Green.... Java.... BigData

A Developer's junkyard....

Monday, November 16, 2015

A few memory issues in Hadoop!

›
In a Hadoop setup (rather any Big Data setup), memory issues are not unexpected! An update on couple of issues we have seen off late – ...
2 comments:
Wednesday, June 17, 2015

EMR cluster and selection of EC2 instance type - Cost Optimization!

›
AWS Elastic MapReduce (EMR) is Amazon’s service providing Hadoop in the Cloud . EMR inherently uses the EC2 nodes as the hadoop nodes...
Tuesday, May 26, 2015

Cost optimization through performance improvement of S3DistCp

›
We reduced the cost of running our production cluster by about 60% by reducing total size of production cluster from 15 nodes to 6 nodes th...
Wednesday, April 29, 2015

Impact of NULL values on where-clause/group-by-clause in Hive queries

›
Following is the check to verified that NULL values do not impact GROUP BY but it DOES IMPACT where clause. Query: select count(*) fro...
Wednesday, March 25, 2015

Hive - S3NativeFileSystem - Insert Overwrite Bug

›
We store all our data in S3. We create external tables pointing to the data in S3 and run hive queries on these tables. In one of ...
Thursday, March 12, 2015

Tuning Yarn container for Oozie

›
Oozie is a popular workflow management tool for BigData applications. To give some high level idea, following is the container allocat...
Thursday, February 26, 2015

Hive - dynamic partitions performance issue with S3/EMR

›
Problem: We use Hive in Amazon EMR to parse our logs. Any insert query to insert data into a table where number of partitions are high ...
Sunday, November 16, 2014

Hive Performance Improvement - statistics and file format checks

›
Following are couple of important configuration changes that can improve the performance of hive quer...
Thursday, October 2, 2014

Hive Query to get 95th Percentiled ranked item

›
I was working on a query where I had to convert a complex MySql query which was providing 95 th percentile value from a group_concat result...
Friday, September 26, 2014

Collections in CQL3 - How they are stored

›
If you don’t already know about collections in Cassandra CQL, following page provides excellent details about the same – http://www.d...
›
Home
View web version

About Me

Sarang Anajwala
View my complete profile
Powered by Blogger.