Presents useful big data optimization applications in a variety of industries, both for academics and practitioners
Include some guideline to use cloud computing and Hadoop in large-scale and big data optimization
Big Data: Who, What and Where? Social, Cognitive and Journals Map of Big Data Publications with Focus on Optimization
Setting Up a Big Data Project: Challenges, Opportunities, Technologies and Optimization
Roberto V. Zicari, Marten Rosselli, Todor Ivanov, Nikolaos Korfiatis, Karsten Tolle, Raik Niemann et al.
Florin Pop, Catalin Negru, Sorin N. Ciolofan, Mariana Mocanu, Valentin Cristea
Optimizing Access Policies for Big Data Repositories: Latency Variables and the Genome Commons
Jorge L. Contreras
Smart Sampling and Optimal Dimensionality Reduction of Big Data Using Compressed Sensing
Anastasios Maronidis, Elisavet Chatzilari, Spiros Nikolopoulos, Ioannis Kompatsiaris
Metaheuristics for Continuous Optimization of High-Dimensional Problems: State of the Art and Perspectives
Giuseppe A. Trunfio
Bibliography
6.
Chukwa. https://chukwa.apache.org/.
7.
Dai, J., Huang, J., Huang, S., Huang, B., Liu, Y.: HiTune: dataflow-based performance analysis for big data cloud. In: Proceedings of the 2011 USENIX Annual Technical Conference (USENIX ATC’11) (2011)
8.
Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: a self-tuning system for big data analytics. In: Proceedings Biennial Conference on Innovative Data Systems Research (CIDR’11), pp. 261–272 (2011)
9.
Guo, Q., Li, Y., Liu, T., Wang, K., Chen, G., Bao, X., Tang, W.: Correlation-based performance analysis for full-system MapReduce optimization. In: Proceedings of the IEEE International Conference on Big Data (BigData’13), pp. 753–761 (2013)
10.
Li, Y., Wang, K., Guo, Q., Zhang, X., Chen, G., Liu, T., Li, J.: Breaking the boundary for whole-system performance optimization of big data. In: Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’13), pp. 126–131 (2013)
11.
Ganglia. http://ganglia.sourceforge.net/.
12.
Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using Mantri. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI’10), pp. 1–16 (2010)
13.
Garduno, E., Kavulya, S.P., Tan, J., Gandhi, R., Narasimhan, P.: Theia: visual signatures for problem diagnosis in large hadoop clusters. In: Proceedings of the 26th International Conference on Large Installation System Administration: Strategies, Tools, and Techniques (LISA’12), pp. 33–42 (2012)
14.
http://hadoop.apache.org/docs/r1.2.1/vaidya.html.
15.
Khoussainova, N., Balazinska, M., Suciu, D.: PerfXplain: debugging MapReduce job performance. Proc. VLDB Endow. 5(7), 598–609 (2012)
16.
Jimnez, V., Cazorla, F.J., Gioiosa, R., Buyuktosunoglu, A., Bose, P., O’Connell, F.P., Mealey, B.G.: Adaptive prefetching on POWER7: improving performance and power consumption. ACM Trans. Parallel Comput. (TOPC) 1(1), Article 4 (2014)
17.
Funston, J.R., El Maghraoui, K., Jann, J., Pattnaik, P., Fedorova, A.: An SMT-selection metric to improve multithreaded applications’ performance. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS’12), pp. 1388–1399 (2012)
18.
Basu, A., Gandhi, J., Chang, J., Hill, M.D., Swift, M.M.: Efficient virtual memory for big memory servers. In: Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13), pp. 237–248 (2013)
19.
Chen, Y., Fang, S., Huang, Y., Eeckhout, L., Fursin, G., Temam, O., Wu, C.: Deconstructing iterative optimization. ACM Trans. Archit. Code Optim. (TACO) 9(3), Article 21 (2012)
20.
Li, M., Zeng, L., Meng, S., Tan, J., Zhang, L., Butt, A.R., Fuller, N.: MRONLINE: MapReduce online performance tuning. In: Proceedings of International Symposium on High-performance Parallel and Distributed Computing (HPDC’14), pp. 165–176 (2014)
21.
Herodotou, H., Babu, S.: Profiling, what-if analysis, and cost-based optimization of MapReduce programs. Proc. VLDB Endow. 4(11), 1111–1122 (2011)
22.
LZ4. https://code.google.com/p/lz4/.
23.
Babu, S.: Towards automatic optimization of MapReduce programs. In: Proceedings of the ACM Symposium on Cloud Computing (SoCC’10), pp. 137–142 (2010)
24.
Yigitbasi, N., Willke, T.L., Liao, G., Epema, D.: Towards machine learning-based Auto-tuning of MapReduce. In: Proceedings of the 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’13), pp. 11–20 (2013)
25.
Zhang, Z., Cherkasova, L., Loo, B.T.: AutoTune: optimizing execution concurrency and resource usgae in MapReduce workflows. In: Proceedings of the International Conference on Autonomic Computing (ICAC’13), pp. 175–181 (2013)
2.
How to run a Big Data project. Interview with James Kobielus. ODBMS Industry Watch, 15 May 2014. http://www.odbms.org/blog/2014/05/james-kobielus/ (2015).
3.
Laney, D.: 3D data management: controlling data volume, velocity and variety. Appl. Deliv. Strateg. File, 949 (2001)
4.
Zikopoulos, P., Eaton, C.: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, 1st ed. McGraw-Hill Osborne Media (IBM) (2011)
5.
Foster, I.: Big Process for Big Data, Presented at the HPC 2012 Conference. Cetraro, Italy (2012)
6.
Gattiker, A., Gebara, F.H., Hofstee, H.P., Hayes, J.D., Hylick, A.: Big Data text-oriented benchmark creation for Hadoop. IBM J. Res. Dev., 57(3/4), 10: 1–10: 6 (2013)
7.
Zicari, R.: Big Data: Challenges and Opportunities. In: Akerkar, R. (ed.) Big Data Computing, p. 564. Chapman and Hall/CRC (2013)
8.
On Big Data: Interview with Dr. Werner Vogels, CTO and VP of Amazon.com. ODBMS Industry Watch, 02 Nov 2011. http://www.odbms.org/blog/2011/11/on-big-data-interview-with-dr-werner-vogels-cto-and-vp-of-amazon-com/ (2015).
9.
Big Data Analytics at Thomson Reuters. Interview with Jochen L. Leidner. ODBMS Industry Watch, 15 Nov 2013. http://www.odbms.org/blog/2013/11/big-data-analytics-at-thomson-reuters-interview-with-jochen-l-leidner/ (2015).
10.
Setting up a Big Data project. Interview with Cynthia M. Saracco. ODBMS Industry Watch, 27 Jan 2014. http://www.odbms.org/blog/2014/01/setting-up-a-big-data-project-interview-with-cynthia-m-saracco/ (2015). Accessed 15 July 2015
11.
Jacobs, A.: The pathologies of big data. Commun. ACM 52(8), 36–44 (2009)
12.
On Big Data and Hadoop. Interview with Paul C. Zikopoulos. ODBMS Industry Watch, 10 June 2013. http://www.odbms.org/blog/2013/06/on-big-data-and-hadoop-interview-with-paul-c-zikopoulos/ (2015).
13.
Next generation Hadoop. Interview with John Schroeder. ODBMS Industry Watch, 07 Sep 2012. http://www.odbms.org/blog/2012/09/next-generation-hadoop-interview-with-john-schroeder/ (2015).
14.
On Big Data, Analytics and Hadoop. Interview with Daniel Abadi. ODBMS Industry Watch, 05 Dec 2012. http://www.odbms.org/blog/2012/12/on-big-data-analytics-and-hadoop-interview-with-daniel-abadi/ (2015).
15.
Data Analytics at NBCUniversal. Interview with Matthew Eric Bassett. ODBMS Industry Watch, 23 Sep 2013. http://www.odbms.org/blog/2013/09/data-analytics-at-nbcuniversal-interview-with-matthew-eric-bassett/ (2015).
16.
Analytics: The real-world use of big data. How innovative enterprises extract value from uncertain data (IBM Institute for Business Value and Saïd Business School at the University of Oxford), Oct 2012
17.
Hopkins, B.: The Patterns of Big Data. Forrester Research, 11 June 2013
18.
Lim, H., Han, Y., Babu, S.: How to Fit when No One Size Fits. In: CIDR (2013)
19.
Cattell, R.: Scalable SQL and NoSql Data Stores. SIGMOD Rec., 39(4), 27 Dec 2010
20.
Gilbert, S., Lynch, N.: Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33(2), 51–59 (2002)
21.
Haerder, T., Reuter, A.: Principles of transaction-oriented database recovery. ACM Comput. Surv. 15(4), 287–317 (1983)
MathSciNetCrossRef
22.
Bailis, P., Ghodsi, A.: Eventual Consistency Today: Limitations, Extensions, and Beyond. Queue 11(3), pp. 20:20–20:32, Mar 2013
23.
Pritchett, D.: BASE: an acid alternative. Queue 6(3), 48–55 (2008)
CrossRef
24.
Vogels, W.: Eventually consistent. Commun. ACM 52(1), 40–44 (2009)
CrossRef
25.
Moniruzzaman, A.B.M., Hossain, S.A.: NoSQL Database: New Era of Databases for Big data Analytics—Classification, Characteristics and Comparison. CoRR (2013). arXiv:1307.0191
26.
Datastax, Datastax Apache Cassandra 2.0 Documentation. http://www.datastax.com/documentation/cassandra/2.0/index.html (2015). Accessed 15 Apr 2015
27.
Apache Cassandra White Paper. http://www.datastax.com/wp-content/uploads/2011/02/DataStax-cBackgrounder.pdf
28.
MongoDB Inc., MongoDB Documentation. http://docs.mongodb.org/manual/MongoDB-manual.pdf (2015). Accessed 15 Apr 2015
29.
Chang, F., Dean, S., Ghemawat, W.C., Hsieh, D.A. Wallach, Burrows, M., Chandra, T., Fikes, A.,Gruber, R.E.:Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, vol 7, pp. 15–15. Berkeley, CA, USA (2006)
30.
George, L.: HBase: The Definitive Guide, 1st ed. O’Reilly Media (2011)
31.
Apache Software Foundation, The Apache HBase Reference Guide. https://hbase.apache.org/book.html
32.
Buerli, M.: The Current State of Graph Databases, Dec-2012, http://www.cs.utexas.edu/~cannata/dbms/Class%20Notes/08%20Graph_Databases_Survey.pdf (2015). Accessed 15 Apr 2015
33.
Angles, R.: A comparison of current graph database models. In: ICDE Workshops, pp. 171–177 (2012)
34.
McColl, R.C., Ediger, D., Poovey, J., Campbell, D., Bader, D.A.: A performance evaluation of open source graph databases. In: Proceedings of the First Workshop on Parallel Programming for Analytics Applications, pp. 11–18. New York, NY, USA (2014)
35.
Harris, S., Seaborne, A.: SPARQL 1.1 Query Language. SPARQL 1.1 Query Language, 21-Mar-2013. http://www.w3.org/TR/sparql11-query/ (2013)
36.
Holzschuher, F., Peinl, R.: Performance of graph query languages: comparison of cypher, gremlin and native access in Neo4 J. In: Proceedings of the Joint EDBT/ICDT 2013 Workshops, pp. 195–204. New York, NY, USA (2013)
37.
VoltDB Inc., Using VoltDB. http://voltdb.com/download/documentation/
38.
Pezzini, M., Edjlali, R.: Gartner top technology trends, 2013. In: Memory Computing Aims at Mainstream Adoption, 31 Jan 2013
39.
Herschel, G., Linden, A., Kart, L.: Gartner Magic Quadrant for Advanced Analytics Platforms, 19 Feb 2014
40.
Borthakur, D.: The hadoop distributed file system: Architecture and design. Hadoop Proj. Website 11, 21 (2007)
41.
Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pp. 29–43. New York, NY, USA (2003)
42.
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51(1), 107–113 (2008)
43.
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: A Warehousing Solution over a Map-reduce Framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)
44.
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1099–1110 (2008)
45.
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. Washington, DC, USA (2010)
46.
White, T.: Hadoop: The Definitive Guide, 1st ed. O’Reilly Media, Inc., (2009)
47.
Apache Spark Project. http://spark.apache.org/
48.
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, pp. 10–10. Berkeley, CA, USA (2010)
49.
Cranor, C., Johnson, T., Spataschek, O., Shkapenyuk, V.: Gigascope: a stream database for network applications. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 647–651. New York, NY, USA (2003)
50.
Arasu, A., Babu, S., Widom, J.: The CQL Continuous Query Language: Semantic Foundations and Query Execution. VLDB J. 15(2), 121–142 (2006)
51.
Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: NiagaraCQ: a scalable continuous query system for internet databases. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 379–390. New York, NY, USA (2000)
52.
Agrawal, J., Diao, Y., Gyllstrom, D, Immerman, N.: Efficient pattern matching over Event streams. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 147–160. New York, NY, USA (2008)
53.
Jain, N., Mishra, S., Srinivasan, A., Gehrke, J., Widom, J., Balakrishnan, H., Çetintemel, U., Cherniack, M., Tibbetts, R., Zdonik, S.: Towards a Streaming SQL Standard. Proc VLDB Endow 1(2), 1379–1390 (2008)
54.
Balkesen, C., Tatbul, N.: Scalable data partitioning techniques for parallel sliding window processing over data streams. In: VLDB International Workshop on Data Management for Sensor Networks (DMSN’11). Seattle, WA, USA (2011)
55.
Ahmad, Y., Berg, B., Çetintemel, U., Humphrey, M., Hwang, J.-H., Jhingran, A., Maskey, A., Papaemmanouil, O., Rasin, A., Tatbul, N., Xing, W., Xing, Y., Zdonik, S.B.: Distributed operation in the Borealis stream processing engine. In: SIGMOD Conference, pp. 882–884 (2006)
56.
Apache Spark. http://spark.apache.org/
57.
Apache Storm. http://storm.incubator.apache.org/
58.
Amazon Kinesis. http://aws.amazon.com/kinesis/
59.
Gualtieri, M., Curran, R.: The Forrester Wave: Big Data Streaming Analytics Platforms, Q3 2014, 17 July 2014
60.
Tibco Streambase. http://www.streambase.com
61.
Ivanov, T., Niemann, R., Izberovic, S., Rosselli, M., Tolle, K., Zicari, R.V.: Performance evaluationi of enterprise big data platforms with HiBench. presented at the In: 9th IEEE International Conference on Big Data Science and Engineering (IEEE BigDataSE 2015), Helsinki, Finland, 20–22 Aug 2015
62.
Ivanov, T., Beer, M.: Performance evaluation of spark SQL using BigBench. Presented at the In: 6th Workshop on Big Data Benchmarking (6th WBDB). Canada, Toronto, 16–17 June 2015
63.
Rosselli, M., Niemann, R., Ivanov, T., Tolle, K., Zicari, R.V.: “Benchmarking the Availability and Fault Tolerance of Cassandra”, presented at the In 6th Workshop on Big Data Benchmarking (6th WBDB), June 16–17, 2015. Canada, Toronto (2015)
64.
TPC, TPC-H - Homepage. http://www.tpc.org/tpch/ (2015).
65.
TPC Big Data Working Group, TPC-BD - Homepage TPC Big Data Working Group. http://www.tpc.org/tpcbd/default.asp (2015).
66.
BigData Top100, 2013. http://bigdatatop100.org/ (2015).
67.
Big Data Benchmarking Community, Big Data Benchmarking | Center for Large-scale Data Systems Research, Big Data Benchmarking Community. http://clds.ucsd.edu/bdbc/ (2015). Accessed 15 July 2015
68.
Chen, Y.: We don’t know enough to make a big data benchmark suite-an academia-industry view. Proc. WBDB (2012)
69.
Xiong, W., Yu, Z., Bei, Z., Zhao, J., Zhang, F., Zou, Y., Bai, X., Li, Y., Xu, C.: A characterization of big data Benchmarks. In: Big Data. IEEE International Conference on 2013, 118–125 (2013)
70.
Luo, C., Zhan, J., Jia, Z., Wang, L., Lu, G., Zhang, L., Xu, C.-Z., Sun, N.: CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications. Front. Comput. Sci. 6(4), 347–362 (2012)
MathSciNet
71.
Chen, Y., Ganapathi, A., Griffith, R., Katz, R.: The case for evaluating MapReduce performance using workload suites. In: 2011 IEEE 19th International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 390–399 (2011)
72.
Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads. Proc. VLDB Endow. 5(12), 1802–1813 (2012)
CrossRef
73.
Qin, X., Zhou, X.: A survey on Benchmarks for big data and some more considerations. In: Intelligent Data Engineering and Automated Learning–IDEAL. Springer 2013, 619–627 (2013)
74.
Wang, L., Zhan, J., Luo, C., Zhu, Y, Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S.: Bigdatabench: a big data benchmark suite from internet services. arXiv:14011406 (2014)
75.
AMP Lab Big Data Benchmark. https://amplab.cs.berkeley.edu/benchmark/ (2015).
76.
Patil, S., Polte, M., Ren, K., Tantisiriroj, W., Xiao, L., López, J., Gibson, G., Fuchs, A., Rinaldi, B.: Ycsb ++: benchmarking and performance debugging advanced features in scalable table stores. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, p. 9 (2011)
77.
Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: Linkbench: a database benchmark based on the facebook social graph. In: Proceedings of the 2013 international conference on Management of data, pp. 1185–1196 (2013)
No comments:
Post a Comment