So my company is moving from a Postgres/Greenplum infrastructure to Hadoop over the next 6-8 months. Very excited about this but it's a learning process for everyone in the company. Saw there's no threads on the topic. Anyone have any exposure to HDFS, Map->reduce, etc? We've chosen Hortonworks as our vendor and I've been going through some of the tutorials on Hive, Pig, etc.
1/17/2014 3:45:41 PM
Quite a few MSA graduates here. They have done it in their program.
1/17/2014 3:49:13 PM
Neato yeah my company provides an analytics software platform with a lot of big hosted data so it's a big leap for us in terms of processing power. I've been interested in going back to school. Unfortunately I moved to Denver so it would be something locally.
1/17/2014 3:52:12 PM
you should look into EMR if you want to do this cost effectively. hadoop on your own hardware in this day is hard to justify unless you're doing steady state jobs 24x7. even then, it's still going to be hard without significant scalehttp://aws.amazon.com/elasticmapreduce/http://www.bigdatahpc.com
1/17/2014 4:13:34 PM
1/17/2014 4:20:57 PM
I was suggesting you host in amazon..
1/17/2014 4:50:49 PM
That's my point. We already have the infrastructure and it's harder to get contracts where data is hosted elsewhere. From what I understand it makes more sense to host it ourselves.
1/17/2014 6:02:38 PM
^Definitely, and if he WERE going to go with a public cloud infrastructure for hadoop, he would be using http://www.windowsazure.com/en-us/solutions/big-data/ anyway back to the OP, yes, the product I design (http://blogs.msdn.com/b/visualstudioalm/archive/2013/11/13/announcing-application-insights-preview.aspx) has been building on hadoop. The biggest continual problem is finding the happy medium between data latency and compute costs. We want to deliver as close to realtime data as possible, but that starts costing insane $$$ once you hit certain thresholds.[Edited on January 17, 2014 at 8:47 PM. Reason : .]
1/17/2014 8:43:50 PM
i am interested in learning about hadoop too. Any good tutorials/online courses out there? I looked into MSA program a few years ago but cannot afford to go back to school full time
1/21/2014 9:21:45 AM
I know not a lot of you are in DC area but still posting it if anyone is interestedIBM Big Data Developer Day https://www-950.ibm.com/events/wwe/grp/grp004.nsf/v17_agenda?openform&seminar=FDDQVFES&locale=en_US
1/27/2014 8:48:09 AM
^^I've been going through these: http://hortonworks.com/tutorials/They're pretty good for a basic understanding of the different components of Hadoop.
1/27/2014 11:01:25 AM
i read this as hard poop
1/28/2014 10:59:12 AM
Update here. So we're advancing with both Cloudera and Hortonworks. Once we complete our Hadoop lake and fully convert our software over to Hadoop we're told we're going to be the largest Hadoop lake that either distributor is helping deploy/support. Pretty neato!
3/20/2015 4:44:57 PM
In Cloudera training this week! Woohoo.
4/27/2015 12:51:33 PM
We had a sales pitch/training for Amazon Kinesis/red shift and EMR. They have ways to run Hive and or Pig directly on S3 or Dynamo. I'm rewriting some of our analytics ranking algorithms and the last step will be to use one of those platforms.
4/30/2015 10:29:09 AM
kinesis + spark is the new realtime hotness imohttps://spark.apache.org/docs/latest/streaming-kinesis-integration.htmlthen pass the data to s3 for later redshift ingest or EMR processing
4/30/2015 12:51:17 PM