Follow iN.SG on Twitter
Careers@IDAInfocomm123
RSS

Eye 2 Eye

Start thinking big

Posted date: 4 August 2009
Mr Hu Yoshida
Mr Yoshida: In Singapore, you are going to go after some of these content opportunities, you will have some of these sites with exabytes of data, so start thinking big.

Mr Hubert Yoshida, Chief Information Officer of Hitachi Data Systems, spoke at the Infocomm Development Authority of Singapore’s Distinguished Infocomm Speaker programme on 16 July 2009. “Hu”, whose blog was ranked among the “top 10 most influential” within the storage industry by Network World, talked about “Anticipating and Surviving the Data Deluge in the New World”. The following are excerpts from his talk.

Why we can expect a data deluge
An example of the growth in data is the presidential archives. By law in the United States, when a president leaves office, his records need to be transferred into the National Archives. President Clinton was in office for eight years. When he left office, he transferred about 3 terabytes of data. Recently, President Bush left office in January 20, and his archived data was about 146 terabytes. That’s 50 times what President Clinton had. President Bush did not use email, he didn’t use a laptop for security reasons, but even with that, he had 50 times more than what President Clinton had.

Now we come to President Obama. He has a secure Blackberry. Some of his staff have two or three Blackberries, serving different purposes. He has five or six web pages  - White House, technology.com... He also has his technology plan on a website. He’s also on MySpace, Facebook, he does Twitter. So in his first six months of office, we estimate that he has generated half a petabyte of data. If he stays in office for eight years, he will probably have tens of petabytes, maybe hundreds of petabytes.

Typically Presidents keep their data until the end of their term, then they transfer it. Can you imagine on January 20, having 10 petabytes dumped on you, if you have the responsibility of archiving it? That’s going to be impossible, so somehow changes have to be made. That’s one of the challenges we will be facing.

The data monster
There is also what I call a data monster lurking amongst us. She is a teenager, or maybe a pre-teenager.  She does social networking, she’s on MySpace, Twitter, Facebook, all of that. She IMs, SMSes listening to iTunes and she’s emailing and is on the cell phone, generating tons of data to usually the same person at the same time!

She needs little instruction – you probably have not taught them how to use the Internet. They’re fearless – they’re not hesitant about jumping in or trying it, or adopting it. They discard quickly. She is probably on her third generation of iPod. She’ll need more space because she’s doing videos and other things besides music. That’s the data monster that’s going to create this even greater wave of data.


More data, more users, more interactions and relationships
The biggest problem today is not the retention of data but the access. How do you connect all these consumers with the right piece of data that they need? The industry is starting to address this, with unified access and unified management. What we need to do is unify all this access, and virtualisation plays a key part in doing this.

Meta data helps you find and link to data. A lot of what we do with information is not to move or own things, but to link to things. Send a link in an email, not an attachment. Information retrieval is a difficult problem. When you Google, you get all these false hits. There’s a lot of ambiguity, and not everything is in English. There are different sites that provide more intelligent search capabilities, but trying to find information is a very difficult problem. That I think is going to be the biggest challenge going forward.

Cost
The other side of this is cost. People think storage problems would be solved if storage was at zero cost, but even if new hardware spending went to zero, the cost of storage and IT is going to increase because of the operational cost - the cost to protect the data, transfer the data, share the data, to search the data. That’s where the expense is. 

In 2001 and 2002 we had the dotcom bust and 911, so we had a business downturn and the revenue for storage hardware decreased from US$18 billion to US$12 billion. But when the hardware decreased, the operational cost continued to go up.  How do you decrease that operational cost?

Four key technologies to address these issues
Storage virtualisation allows us to consolidate, move data around, access stranded data and share it. It is a key enabling technology.

RADM(NS) Ronnie Tay and Mr Hu Yoshida
RADM(NS) Ronnie Tay (left) and Mr Hu Yoshida in conversation.

The next technology is thin provisioning. Your user wants you to give them a terabyte, but we can provision just what they use. With thin provisioning, with dynamic provisioning, you have virtual capacity.

Then there is data de-duplication. If we can look at streams of data, compare, and when we see duplicates, discard them - that greatly reduces the amount of wastage that goes into copies, the amount of time it takes to make those copies, and reduces your operation cost. Things that are very repetitive, like back-up data, you can see sometimes 25 to 1 de-duplication.

The last one is really becoming more important as we move into this explosion, and that’s archiving. Active archiving is not something you put away and store in bulk but something you inter-access on an active basis. You take that working set, take data that is 60 days old, take it out and reduce the working set. Put the data into a gold copy and into an archive.

These types of solutions are not going to be provided by one vendor. It takes a family of vendors. 

The Cloud
This is what Hitachi Data Systems is focused on. We’re not going to be a Cloud provider; we’re going to be a provider of the infrastructure for the Cloud. The type of storage you need in the Cloud will depend on the business model. If the business model is search, you don’t need storage, you need cache. For social networking, you need low-cost, commodity storage. For services like Software-as-a-Service or eBay, you need enterprise storage. Infrastructure services need scalable, commodity storage. Content depots need archive storage services and scalability.

The next wave of data growth
We need to start thinking in terms of petabytes and exabytes. In the next two to three years, we expect to see a lot of our customers with an exabyte of data. That’s a million terabytes. That sounds like a lot, but when you think about it, it’s just one customer who is looking at storing home data on his site. You may have a terabyte in the home, and you’ll just need a million subscribers to have an exabyte. Particularly if in Singapore, you are going to go after some of these content opportunities, you will have some of these sites with exabytes of data, so start thinking big.