Tuesday, February 8, 2022

Optimization - Minimization of Resources Used in Big Data Storage and Analysis

 2016


Big Data Optimization: Recent Developments and Challenges

Editor: Ali Emrouznejad

Presents recent developments and challenges in big data optimization

Collects various recent algorithms in large-scale optimization all in one book

Presents useful big data optimization applications in a variety of industries, both for academics and practitioners

Include some guideline to use cloud computing and Hadoop in large-scale and big data optimization






Pages i-xv


Big Data: Who, What and Where? Social, Cognitive and Journals Map of Big Data Publications with Focus on Optimization

Ali Emrouznejad, Marianna Marra

Pages 1-16

Setting Up a Big Data Project: Challenges, Opportunities, Technologies and Optimization

Roberto V. Zicari, Marten Rosselli, Todor Ivanov, Nikolaos Korfiatis, Karsten Tolle, Raik Niemann et al.

Pages 17-47

Optimizing Intelligent Reduction Techniques for Big Data

Florin Pop, Catalin Negru, Sorin N. Ciolofan, Mariana Mocanu, Valentin Cristea

Pages 49-70

Performance Tools for Big Data Optimization

Yan Li, Qi Guo, Guancheng Chen

Pages 71-96

Optimising Big Images

Tuomo Valkonen

Pages 97-131

Interlinking Big Data to Web of Data

Enayat Rajabi, Seyed-Mehdi-Reza Beheshti

Pages 133-145

Topology, Big Data and Optimization

Mikael Vejdemo-Johansson, Primoz Skraba

Pages 147-176

Applications of Big Data Analytics Tools for Data Management

Mo Jamshidi, Barney Tannahill, Maryam Ezell, Yunus Yetis, Halid Kaplan

Pages 177-199

Optimizing Access Policies for Big Data Repositories: Latency Variables and the Genome Commons

Jorge L. Contreras

Pages 201-215

Big Data Optimization via Next Generation Data Center Architecture

Jian Li

Pages 217-229

Big Data Optimization Within Real World Monitoring Constraints

Kristian Helmholt, Bram van der Waaij

Pages 231-250

Smart Sampling and Optimal Dimensionality Reduction of Big Data Using Compressed Sensing

Anastasios Maronidis, Elisavet Chatzilari, Spiros Nikolopoulos, Ioannis Kompatsiaris

Pages 251-280

Optimized Management of BIG Data Produced in Brain Disorder Rehabilitation

Peter Brezany, Olga Štěpánková, Markéta Janatová, Miroslav Uller, Marek Lenart

Pages 281-317

Big Data Optimization in Maritime Logistics

Berit Dangaard Brouer, Christian Vad Karsten, David Pisinger

Pages 319-344

Big Network Analytics Based on Nonconvex Optimization

Maoguo Gong, Qing Cai, Lijia Ma, Licheng Jiao

Pages 345-373

Large-Scale and Big Optimization Based on Hadoop

Yi Cao, Dengfeng Sun

Pages 375-389

Computational Approaches in Large-Scale Unconstrained Optimization

Saman Babaie-Kafaki

Pages 391-417

Numerical Methods for Large-Scale Nonsmooth Optimization

Napsu Karmitsa

Pages 419-436

Metaheuristics for Continuous Optimization of High-Dimensional Problems: State of the Art and Perspectives

Giuseppe A. Trunfio

Pages 437-460


Convergent Parallel Algorithms for Big Data Optimization Problems

Simone Sagratella

Pages 461-474

Back Matter

Pages 475-487

https://link.springer.com/book/10.1007/978-3-319-30265-2


Bibliography

6.
Chukwa. https://chukwa.apache.org/.
7.
Dai, J., Huang, J., Huang, S., Huang, B., Liu, Y.: HiTune: dataflow-based performance analysis for big data cloud. In: Proceedings of the 2011 USENIX Annual Technical Conference (USENIX ATC’11) (2011)

8.
Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: a self-tuning system for big data analytics. In: Proceedings Biennial Conference on Innovative Data Systems Research (CIDR’11), pp. 261–272 (2011)

9.
Guo, Q., Li, Y., Liu, T., Wang, K., Chen, G., Bao, X., Tang, W.: Correlation-based performance analysis for full-system MapReduce optimization. In: Proceedings of the IEEE International Conference on Big Data (BigData’13), pp. 753–761 (2013)

10.
Li, Y., Wang, K., Guo, Q., Zhang, X., Chen, G., Liu, T., Li, J.: Breaking the boundary for whole-system performance optimization of big data. In: Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’13), pp. 126–131 (2013)

11.
Ganglia. http://ganglia.sourceforge.net/. 
12.
Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using Mantri. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI’10), pp. 1–16 (2010)

13.
Garduno, E., Kavulya, S.P., Tan, J., Gandhi, R., Narasimhan, P.: Theia: visual signatures for problem diagnosis in large hadoop clusters. In: Proceedings of the 26th International Conference on Large Installation System Administration: Strategies, Tools, and Techniques (LISA’12), pp. 33–42 (2012)

14.
http://hadoop.apache.org/docs/r1.2.1/vaidya.html. 
15.
Khoussainova, N., Balazinska, M., Suciu, D.: PerfXplain: debugging MapReduce job performance. Proc. VLDB Endow. 5(7), 598–609 (2012)

16.
Jimnez, V., Cazorla, F.J., Gioiosa, R., Buyuktosunoglu, A., Bose, P., O’Connell, F.P., Mealey, B.G.: Adaptive prefetching on POWER7: improving performance and power consumption. ACM Trans. Parallel Comput. (TOPC) 1(1), Article 4 (2014)

17.
Funston, J.R., El Maghraoui, K., Jann, J., Pattnaik, P., Fedorova, A.: An SMT-selection metric to improve multithreaded applications’ performance. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS’12), pp. 1388–1399 (2012)

18.
Basu, A., Gandhi, J., Chang, J., Hill, M.D., Swift, M.M.: Efficient virtual memory for big memory servers. In: Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13), pp. 237–248 (2013)

19.
Chen, Y., Fang, S., Huang, Y., Eeckhout, L., Fursin, G., Temam, O., Wu, C.: Deconstructing iterative optimization. ACM Trans. Archit. Code Optim. (TACO) 9(3), Article 21 (2012)

20.
Li, M., Zeng, L., Meng, S., Tan, J., Zhang, L., Butt, A.R., Fuller, N.: MRONLINE: MapReduce online performance tuning. In: Proceedings of International Symposium on High-performance Parallel and Distributed Computing (HPDC’14), pp. 165–176 (2014)

21.
Herodotou, H., Babu, S.: Profiling, what-if analysis, and cost-based optimization of MapReduce programs. Proc. VLDB Endow. 4(11), 1111–1122 (2011)

22.
LZ4. https://code.google.com/p/lz4/. 
23.
Babu, S.: Towards automatic optimization of MapReduce programs. In: Proceedings of the ACM Symposium on Cloud Computing (SoCC’10), pp. 137–142 (2010)

24.
Yigitbasi, N., Willke, T.L., Liao, G., Epema, D.: Towards machine learning-based Auto-tuning of MapReduce. In: Proceedings of the 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’13), pp. 11–20 (2013)

25.
Zhang, Z., Cherkasova, L., Loo, B.T.: AutoTune: optimizing execution concurrency and resource usgae in MapReduce workflows. In: Proceedings of the International Conference on Autonomic Computing (ICAC’13), pp. 175–181 (2013)


1.
IBM Power Server. http://www-03.ibm.com/systems/power/hardware/. 
2.
IBM Power Linux. http://www-03.ibm.com/systems/power/software/linux/. 
3.
IBM Java Development Kit. http://www.ibm.com/developerworks/java/jdk/. 
4.
IBM InfoSphere BigInsight. http://www-01.ibm.com/software/data/infosphere/biginsights/. 
5.
IBM Platform Symphony. http://www-03.ibm.com/systems/platformcomputing/products/symphony/. 





 
2.
How to run a Big Data project. Interview with James Kobielus. ODBMS Industry Watch, 15 May 2014. http://www.odbms.org/blog/2014/05/james-kobielus/ (2015). 
3.
Laney, D.: 3D data management: controlling data volume, velocity and variety. Appl. Deliv. Strateg. File, 949 (2001)

4.
Zikopoulos, P., Eaton, C.: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, 1st ed. McGraw-Hill Osborne Media (IBM) (2011)

5.
Foster, I.: Big Process for Big Data, Presented at the HPC 2012 Conference. Cetraro, Italy (2012)

6.
Gattiker, A., Gebara, F.H., Hofstee, H.P., Hayes, J.D., Hylick, A.: Big Data text-oriented benchmark creation for Hadoop. IBM J. Res. Dev., 57(3/4), 10: 1–10: 6 (2013)

7.
Zicari, R.: Big Data: Challenges and Opportunities. In: Akerkar, R. (ed.) Big Data Computing, p. 564. Chapman and Hall/CRC (2013)

8.
On Big Data: Interview with Dr. Werner Vogels, CTO and VP of Amazon.com. ODBMS Industry Watch, 02 Nov 2011. http://www.odbms.org/blog/2011/11/on-big-data-interview-with-dr-werner-vogels-cto-and-vp-of-amazon-com/ (2015). 
9.
Big Data Analytics at Thomson Reuters. Interview with Jochen L. Leidner. ODBMS Industry Watch, 15 Nov 2013. http://www.odbms.org/blog/2013/11/big-data-analytics-at-thomson-reuters-interview-with-jochen-l-leidner/ (2015). 
10.
Setting up a Big Data project. Interview with Cynthia M. Saracco. ODBMS Industry Watch, 27 Jan 2014. http://www.odbms.org/blog/2014/01/setting-up-a-big-data-project-interview-with-cynthia-m-saracco/ (2015). Accessed 15 July 2015
11.
Jacobs, A.: The pathologies of big data. Commun. ACM 52(8), 36–44 (2009)

12.
On Big Data and Hadoop. Interview with Paul C. Zikopoulos. ODBMS Industry Watch, 10 June 2013. http://www.odbms.org/blog/2013/06/on-big-data-and-hadoop-interview-with-paul-c-zikopoulos/ (2015). 
13.
Next generation Hadoop. Interview with John Schroeder. ODBMS Industry Watch, 07 Sep 2012. http://www.odbms.org/blog/2012/09/next-generation-hadoop-interview-with-john-schroeder/ (2015). 
14.
On Big Data, Analytics and Hadoop. Interview with Daniel Abadi. ODBMS Industry Watch, 05 Dec 2012. http://www.odbms.org/blog/2012/12/on-big-data-analytics-and-hadoop-interview-with-daniel-abadi/ (2015). 
15.
Data Analytics at NBCUniversal. Interview with Matthew Eric Bassett. ODBMS Industry Watch, 23 Sep 2013. http://www.odbms.org/blog/2013/09/data-analytics-at-nbcuniversal-interview-with-matthew-eric-bassett/ (2015). 
16.
Analytics: The real-world use of big data. How innovative enterprises extract value from uncertain data (IBM Institute for Business Value and Saïd Business School at the University of Oxford), Oct 2012

17.
Hopkins, B.: The Patterns of Big Data. Forrester Research, 11 June 2013

18.
Lim, H., Han, Y., Babu, S.: How to Fit when No One Size Fits. In: CIDR (2013)

19.
Cattell, R.: Scalable SQL and NoSql Data Stores. SIGMOD Rec., 39(4), 27 Dec 2010

20.
Gilbert, S., Lynch, N.: Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33(2), 51–59 (2002)

21.
Haerder, T., Reuter, A.: Principles of transaction-oriented database recovery. ACM Comput. Surv. 15(4), 287–317 (1983)
MathSciNetCrossRef
22.
Bailis, P., Ghodsi, A.: Eventual Consistency Today: Limitations, Extensions, and Beyond. Queue 11(3), pp. 20:20–20:32, Mar 2013

23.
Pritchett, D.: BASE: an acid alternative. Queue 6(3), 48–55 (2008)
CrossRef
24.
Vogels, W.: Eventually consistent. Commun. ACM 52(1), 40–44 (2009)
CrossRef
25.
Moniruzzaman, A.B.M., Hossain, S.A.: NoSQL Database: New Era of Databases for Big data Analytics—Classification, Characteristics and Comparison. CoRR (2013). arXiv:1307.0191
26.
Datastax, Datastax Apache Cassandra 2.0 Documentation. http://www.datastax.com/documentation/cassandra/2.0/index.html (2015). Accessed 15 Apr 2015
27.
Apache Cassandra White Paper. http://www.datastax.com/wp-content/uploads/2011/02/DataStax-cBackgrounder.pdf
28.
MongoDB Inc., MongoDB Documentation. http://docs.mongodb.org/manual/MongoDB-manual.pdf (2015). Accessed 15 Apr 2015
29.
Chang, F., Dean, S., Ghemawat, W.C., Hsieh, D.A. Wallach, Burrows, M., Chandra, T., Fikes, A.,Gruber, R.E.:Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, vol 7, pp. 15–15. Berkeley, CA, USA (2006)

30.
George, L.: HBase: The Definitive Guide, 1st ed. O’Reilly Media (2011)

31.
Apache Software Foundation, The Apache HBase Reference Guide. https://hbase.apache.org/book.html
32.
Buerli, M.: The Current State of Graph Databases, Dec-2012, http://www.cs.utexas.edu/~cannata/dbms/Class%20Notes/08%20Graph_Databases_Survey.pdf (2015). Accessed 15 Apr 2015
33.
Angles, R.: A comparison of current graph database models. In: ICDE Workshops, pp. 171–177 (2012)

34.
McColl, R.C., Ediger, D., Poovey, J., Campbell, D., Bader, D.A.: A performance evaluation of open source graph databases. In: Proceedings of the First Workshop on Parallel Programming for Analytics Applications, pp. 11–18. New York, NY, USA (2014)

35.
Harris, S., Seaborne, A.: SPARQL 1.1 Query Language. SPARQL 1.1 Query Language, 21-Mar-2013. http://www.w3.org/TR/sparql11-query/ (2013)
36.
Holzschuher, F., Peinl, R.: Performance of graph query languages: comparison of cypher, gremlin and native access in Neo4 J. In: Proceedings of the Joint EDBT/ICDT 2013 Workshops, pp. 195–204. New York, NY, USA (2013)

37.
VoltDB Inc., Using VoltDB. http://voltdb.com/download/documentation/
38.
Pezzini, M., Edjlali, R.: Gartner top technology trends, 2013. In: Memory Computing Aims at Mainstream Adoption, 31 Jan 2013

39.
Herschel, G., Linden, A., Kart, L.: Gartner Magic Quadrant for Advanced Analytics Platforms, 19 Feb 2014

40.
Borthakur, D.: The hadoop distributed file system: Architecture and design. Hadoop Proj. Website 11, 21 (2007)

41.
Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pp. 29–43. New York, NY, USA (2003)

42.
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51(1), 107–113 (2008)

43.
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: A Warehousing Solution over a Map-reduce Framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)

44.
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1099–1110 (2008)

45.
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. Washington, DC, USA (2010)

46.
White, T.: Hadoop: The Definitive Guide, 1st ed. O’Reilly Media, Inc., (2009)

47.
Apache Spark Project. http://spark.apache.org/
48.
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, pp. 10–10. Berkeley, CA, USA (2010)

49.
Cranor, C., Johnson, T., Spataschek, O., Shkapenyuk, V.: Gigascope: a stream database for network applications. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 647–651. New York, NY, USA (2003)

50.
Arasu, A., Babu, S., Widom, J.: The CQL Continuous Query Language: Semantic Foundations and Query Execution. VLDB J. 15(2), 121–142 (2006)

51.
Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: NiagaraCQ: a scalable continuous query system for internet databases. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 379–390. New York, NY, USA (2000)

52.
Agrawal, J., Diao, Y., Gyllstrom, D, Immerman, N.: Efficient pattern matching over Event streams. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 147–160. New York, NY, USA (2008)

53.
Jain, N., Mishra, S., Srinivasan, A., Gehrke, J., Widom, J., Balakrishnan, H., Çetintemel, U., Cherniack, M., Tibbetts, R., Zdonik, S.: Towards a Streaming SQL Standard. Proc VLDB Endow 1(2), 1379–1390 (2008)

54.
Balkesen, C., Tatbul, N.: Scalable data partitioning techniques for parallel sliding window processing over data streams. In: VLDB International Workshop on Data Management for Sensor Networks (DMSN’11). Seattle, WA, USA (2011)

55.
Ahmad, Y., Berg, B., Çetintemel, U., Humphrey, M., Hwang, J.-H., Jhingran, A., Maskey, A., Papaemmanouil, O., Rasin, A., Tatbul, N., Xing, W., Xing, Y., Zdonik, S.B.: Distributed operation in the Borealis stream processing engine. In: SIGMOD Conference, pp. 882–884 (2006)

56.
Apache Spark. http://spark.apache.org/
57.
Apache Storm. http://storm.incubator.apache.org/
58.
Amazon Kinesis. http://aws.amazon.com/kinesis/
59.
Gualtieri, M., Curran, R.: The Forrester Wave: Big Data Streaming Analytics Platforms, Q3 2014, 17 July 2014

60.
Tibco Streambase. http://www.streambase.com
61.
Ivanov, T., Niemann, R., Izberovic, S., Rosselli, M., Tolle, K., Zicari, R.V.: Performance evaluationi of enterprise big data platforms with HiBench. presented at the In: 9th IEEE International Conference on Big Data Science and Engineering (IEEE BigDataSE 2015), Helsinki, Finland, 20–22 Aug 2015

62.
Ivanov, T., Beer, M.: Performance evaluation of spark SQL using BigBench. Presented at the In: 6th Workshop on Big Data Benchmarking (6th WBDB). Canada, Toronto, 16–17 June 2015

63.
Rosselli, M., Niemann, R., Ivanov, T., Tolle, K., Zicari, R.V.: “Benchmarking the Availability and Fault Tolerance of Cassandra”, presented at the In 6th Workshop on Big Data Benchmarking (6th WBDB), June 16–17, 2015. Canada, Toronto (2015)

64.
TPC, TPC-H - Homepage. http://www.tpc.org/tpch/ (2015). 
65.
TPC Big Data Working Group, TPC-BD - Homepage TPC Big Data Working Group. http://www.tpc.org/tpcbd/default.asp (2015). 
66.
BigData Top100, 2013. http://bigdatatop100.org/ (2015). 
67.
Big Data Benchmarking Community, Big Data Benchmarking | Center for Large-scale Data Systems Research, Big Data Benchmarking Community. http://clds.ucsd.edu/bdbc/ (2015). Accessed 15 July 2015
68.
Chen, Y.: We don’t know enough to make a big data benchmark suite-an academia-industry view. Proc. WBDB (2012)

69.
Xiong, W., Yu, Z., Bei, Z., Zhao, J., Zhang, F., Zou, Y., Bai, X., Li, Y., Xu, C.: A characterization of big data Benchmarks. In: Big Data. IEEE International Conference on 2013, 118–125 (2013)

70.
Luo, C., Zhan, J., Jia, Z., Wang, L., Lu, G., Zhang, L., Xu, C.-Z., Sun, N.: CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications. Front. Comput. Sci. 6(4), 347–362 (2012)
MathSciNet
71.
Chen, Y., Ganapathi, A., Griffith, R., Katz, R.: The case for evaluating MapReduce performance using workload suites. In: 2011 IEEE 19th International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 390–399 (2011)

72.
Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads. Proc. VLDB Endow. 5(12), 1802–1813 (2012)
CrossRef
73.
Qin, X., Zhou, X.: A survey on Benchmarks for big data and some more considerations. In: Intelligent Data Engineering and Automated Learning–IDEAL. Springer 2013, 619–627 (2013)

74.
Wang, L., Zhan, J., Luo, C., Zhu, Y, Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S.: Bigdatabench: a big data benchmark suite from internet services. arXiv:14011406 (2014)
75.
AMP Lab Big Data Benchmark. https://amplab.cs.berkeley.edu/benchmark/ (2015). 
76.
Patil, S., Polte, M., Ren, K., Tantisiriroj, W., Xiao, L., López, J., Gibson, G., Fuchs, A., Rinaldi, B.: Ycsb ++: benchmarking and performance debugging advanced features in scalable table stores. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, p. 9 (2011)

77.
Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: Linkbench: a database benchmark based on the facebook social graph. In: Proceedings of the 2013 international conference on Management of data, pp. 1185–1196 (2013)


No comments:

Post a Comment