Skip to main content
placeholder image

A novel approach to data deduplication over the engineering-oriented cloud systems

Journal Article


Download full-text (Open Access)

Abstract


  • This paper presents a duplication-less storage system over the engineering-oriented cloud computing platforms. Our deduplication storage system, which manages data and duplication over the cloud system, consists of two major components, a front-end deduplication application and a mass storage system as back-end. Hadoop distributed file system (HDFS) is a common distribution file system on the cloud, which is used with Hadoop database (HBase). We use HDFS to build up a mass storage system and employ HBase to build up a fast indexing system. With a deduplication application, a scalable and parallel deduplicated cloud storage system can be effectively built up. We further use VMware to generate a simulated cloud environment. The simulation results demonstrate that our deduplication storage system is sufficiently accurate and efficient for distributed and cooperative data intensive engineering applications

Authors


  •   Sun, Zhe (external author)
  •   Shen, Jun
  •   Yong, Jianming (external author)

Publication Date


  • 2013

Citation


  • Sun, Z., Shen, J. & Yong, J. (2013). A novel approach to data deduplication over the engineering-oriented cloud systems. Integrated Computer Aided Engineering, 20 (1), 45-57.

Scopus Eid


  • 2-s2.0-84872297259

Ro Full-text Url


  • http://ro.uow.edu.au/cgi/viewcontent.cgi?article=9879&context=infopapers

Ro Metadata Url


  • http://ro.uow.edu.au/infopapers/2543

Has Global Citation Frequency


Number Of Pages


  • 12

Start Page


  • 45

End Page


  • 57

Volume


  • 20

Issue


  • 1

Place Of Publication


  • Netherlands

Abstract


  • This paper presents a duplication-less storage system over the engineering-oriented cloud computing platforms. Our deduplication storage system, which manages data and duplication over the cloud system, consists of two major components, a front-end deduplication application and a mass storage system as back-end. Hadoop distributed file system (HDFS) is a common distribution file system on the cloud, which is used with Hadoop database (HBase). We use HDFS to build up a mass storage system and employ HBase to build up a fast indexing system. With a deduplication application, a scalable and parallel deduplicated cloud storage system can be effectively built up. We further use VMware to generate a simulated cloud environment. The simulation results demonstrate that our deduplication storage system is sufficiently accurate and efficient for distributed and cooperative data intensive engineering applications

Authors


  •   Sun, Zhe (external author)
  •   Shen, Jun
  •   Yong, Jianming (external author)

Publication Date


  • 2013

Citation


  • Sun, Z., Shen, J. & Yong, J. (2013). A novel approach to data deduplication over the engineering-oriented cloud systems. Integrated Computer Aided Engineering, 20 (1), 45-57.

Scopus Eid


  • 2-s2.0-84872297259

Ro Full-text Url


  • http://ro.uow.edu.au/cgi/viewcontent.cgi?article=9879&context=infopapers

Ro Metadata Url


  • http://ro.uow.edu.au/infopapers/2543

Has Global Citation Frequency


Number Of Pages


  • 12

Start Page


  • 45

End Page


  • 57

Volume


  • 20

Issue


  • 1

Place Of Publication


  • Netherlands