Sunday, October 20, 2013

Big Data on the Brain: Big Data Explained for the Average Joe

For the past few months, the latest buzz word in the Technology world is 'Big Data'. I've been asking myself, what exactly is 'Big Data' and how does it apply to me?

It's difficult for the average person to grasp the amount of Data that is available in any form. Rather than working with GBs of Data, we are now required to analyze Terabytes and Petabytes of Data.

To put this in perspective, let's take a look at some stats for Twitter:

  • 58 million new Tweets per day
  • 9,100 Tweets per second
  • 2.1 billion queries by Search Engines per day

http://www.statisticbrain.com/twitter-statistics/

That's a lot of data....

Traditionally, 'data' for analyzing is stored in a RDBMS or Relational Database Management System. Nice, neat Tables with Columns and Rows. Even the average Joe not in the IT industry has seen this format in an Excel Spreadsheet.
That's great if you're only querying GBs of Data AND more importantly, if that data is organized.


I think of it like a stack of Euros bills. Each value is a different size, which serves a purpose of allowing a person to quickly identify the different values quickly.










But when you put all the bills into one pile, it's difficult to count what you have without having to organize each bill into it's own pile. If you're like me, you'll spend that extra time organizing then count. With 10 bills it's quick and easy, but what if you had 1000 bills? Then 10k? You can see the increase in prep work that is needed to get to your result.




The pile of Bills represents what is the future of Data. No longer can we categorize data in familiar groupings stored in one container, like a database. We need to think bigger. We need to throw out the concept of a 'database' and see all forms of data as a potential container.

Let's revisit our Twitter stats from earlier. How long would it take to write 58 million rows of data? Mins, hours, days? So say, you commit most of your resources to writing all that data to table, but what of Users trying to query that same information? What if your little sister is looking for tweets on the latest news about her favorite singer, Justin Beiber? With the traditional RDBMS, performance is constrained to the resources available. So since most of your resources available are committed to adding the data, you're going to have to tell your sister she will need to wait. I'm sure she'll be patient and willing to wait... When she's at school the next day and her friends are chatting about the latest Beiber news, she won't feel out of the loop and lost. She knows when she gets home she can finally read about what her friends read yesterday. Yea, I don't think so... Timing of information is everything these days. A delay of a day or even an hour, can be the difference between winning or losing.

What to do about this predicament?
How can I store huge amounts of data AND access it quickly?

This is where Apache Hadoop comes into play.
Hadoop provides a 'software framework that supports Data-Intensive Distributed Applications'. The first time I read the formal definition I was like 'huh?', what does that mean? I understood the meaning of the individual words, but couldn't comprehend how they worked together.

Thinking differently about what is 'data' is the key to understanding the capabilities of Hadoop.
Hadoop enables a filtering, sorting, and summary method known as 'MapReduce', which is actually 2 separate functions.

All data, in the form of output, has some sort of structure. Processing, by it's nature, has to have some sort of pattern. This is where the 'Map' functionality comes into play, by filter, sorting, and querying the data. This means, you can now 'Map' any type of flat file.
Mapping can be done on multiple files at the same time which then the 'Reduce' function summarizes the results into a consolidated output.

The 'MapReduce' addresses the 'Data-Intensive' and 'Applications' part of the Hadoop description. So I'm guessing you're asking where does the 'Distributed' part come into play. Here's where Hadoop takes data analytics to a whole new level. Rather than constraining processing to one Server as with a traditional database, MapReduce can be ran across multiple servers in parallel. This means you can harness the resources of an unlimited number of machines and not have to consolidate all your data into one container.  So you end up with multiple Servers and multiple files, breaking down the work into smaller chunks, and running in parallel.




Think about the possibilities this presents...
As an example, using Amazon EMR Web Services, The New York Times used 100 Amazon EC2 instances and a Hadoop application to process 4 TB of raw image TIFF data (stored in S3) into 11 million finished PDFs in the space of 24 hours at a computation cost of about $240 (not including bandwidth).wikipedia.org

I could go into greater detail of the different use cases and possibilities of Hadoop, but I think that is too much for one blog entry. My goal was to provide a brief overview of why Big Data is critical to our world today and provide a start of how to address the data predicament.

Big Data is here, and we need to keep up or be left behind...

Friday, July 26, 2013

Company Culture


Netflix posted an amazing deck on their Company Culture.
Netflix Culture: Freedom and Responsibility

The following points really hit home for me, so I wanted to share them with you.

Judgement

  • You make wise decisions (people, technical, business, and creative) despite ambiguity
  • You identify root causes, and get beyond treating symptoms
  • You think strategically, and can articulate what you are, and are not, trying to do
  • You smartly separate what must be done well now, and what can be improved later

Communication

  • You listen well, instead of reacting fast, so you can better understand
  • You are concise and articulate in speech and writing
  • You treat people with respect independent of their status or disagreement with you
  • You maintain calm poise in stressful situations

Impact

  • You accomplish amazing amounts of important work
  • You demonstrate consistently strong performances so colleagues can rely upon you
  • You focus on great results rather than on process
  • You exhibit bias-to-action, and avoid analysis-paralysis

Curiosity

  • You lean rapidly and eagerly
  • You seek to understand our strategy, market, customers, and suppliers
  • You are broadly knowledgeable about business, technology and entertainment
  • You contribute effectively outside of your specialty

Innovation

  • You re-conceptualize issues to discover practical solutions to hard problems
  • You challenge prevailing assumption when warranted, and suggest better approaches
  • You create new ideas that prove useful
  • You keep us nimble by minimizing complexity and finding time to simplify

Courage

  • You say what you think even if it is controversial
  • You make tough decisions without agonizing
  • You take smart risks
  • You question actions inconsistent with our values

Passion

  • You inspire other with your thirst for excellence
  • You care intensely about success
  • You celebrate wins
  • You are tenacious

Honesty

  • You are known for candor and directness
  • You are non-political when you disagree with others
  • You only say things about fellow employees you will say to their face
  • You are quick to admit mistakes

Selflessness

  • You seek what is best for , rather than best for yourself or your group
  • You are ego-less when searching for the best ideas
  • You make time to help colleagues
  • You share information openly and proactively

If all companies followed these values, just think of the potential. 

Sunday, July 14, 2013

Building Confidence

I just started reading Lean In: Women, Work, and the Will to Lead by Sheryl Sandberg. I figured reading about the journey of a successful female in the IT industry would help provide insight on how to further my development.

I'm only about 25% the way through (according to my Kindle), but she brings up some interesting points, one of which I wanted to mention.

She quotes a 2011 McKinsey report which found "men are promoted based on potential, while women are promoted based on past accomplishments."
She goes on to say:
"In addtion to the external barriers erected by society, women are hindered by barriers that exist within ourselves. We hold ourselves back in ways both big and small, by lacking self-confidence, by not raising our hands, and by pulling back when we should be leaning in. We internalize the negative messages we get throughout our lives - the messages that say it's wrong to be outspoken, aggressive, more powerful than men."

I sub-consciously already knew this, but seeing it written on paper really drives home the internal barriers preventing me from seeing what my co-workers already see in me. 
My boss saw something in me that caused him to jump at the chance to bring me on to his team. For the first few months, it was a bit surreal for me, I know I probably drove him crazy constantly thanking him for the opportunity. I kept thinking how lucky I was to move into a role that I didn't even know existed a year before. 
6 months into it, I've grown wiser and understanding that it wasn't luck that gave me this opportunity, but my own talent and accomplishments. Building confidence in my abilities is the first step in my journey to becoming a 'Rockstar' Product Manager.

-Jenn

Wednesday, July 03, 2013

Formats for a Technical User Community - Feeds vs Forums

The question of the moment is, for a Technical User Community, what type of format do users prefer?

A Feed - example: SalesForce Chatter
Simple posts and short attention span of Users, cause they're on to the next post.



A Forum - example: StackOverflow.com.
Questions and Discussions with a traditional Forum format. Categorizing posts and allowing additional formatting.




The feed format is definitely the new hip look, great for collaboration, but does it play to the geek in a technical user? Sometimes we just want a simple easy format we're use to when searching for answers.

Can you include all the necessary information needed to troubleshoot a issue within a Feed Thread?

We'll have to see what develops.

Friday, March 15, 2013

Social CRM

What is Social CRM to you?

I'm finally starting to understand how Social CRM is the future. The internet has nulled the traditional geographical barriers.
Companies are no longer searching for the best talent available in a certain city, they are now able to search for the best talent in the world.
Technology can now create office environments bringing together people from all parts of the world and making it feel like they are physically in the office.


Friday, January 11, 2013

Whoa, 2013 already...

Ok, I'm going to really try to keep this Blog updated with my adventures, learnings, etc.

Someone once said my growth is like steep curve. Starts slow and then takes off. So that's what it's doing now.

Stay posted to see where my journey takes me...

-Jenn

Sunday, July 10, 2011

Mod Pizza

If you're looking for a inexpensive, low carb, and yummy pizza, check out Mod Pizza. All the toppings you want for $6.28. The crust is more like a flat bread than a traditional pizza crust. Thin and crispy.
The garlic knots are a must have too. Little breadsticks tied into a knots brushed with olive oil and garlic chunks.

Mod stands for 'Made On Demand' and they really do.

As a bonus, check in with FourSquare and you get a free small pop.

Don't forget dessert. Pick up a old skoll Ding Dong located by the registers.