one size ts all": an idea whose time has come and gone, 21st International Conference on Data Engineering (ICDE'05), pp.2-11, 2005. ,
The digital universe of opportunities: Rich data and the increasing value of the internet of things, 2014. ,
, Facts and Stats About The Big Data Industry Webpage http://cloudtweaks.com/ 2015/03/surprising-facts-and-stats-about-the-big-data-industry
The case for shared nothing, Database Engineering, vol.9, pp.4-9, 1986. ,
A comparison of approaches to large-scale data analysis, Proceedings of the 35th SIGMOD international conference on Management of data, SIGMOD '09 ,
DOI : 10.1145/1559845.1559865
Trends in Cleaning Relational Data: Consistency and Deduplication, Foundations and Trends?? in Databases, vol.5, issue.4, pp.281-393, 2015. ,
DOI : 10.1561/1900000045
Gamma -a high performance dataaow database machine, Proceedings of the 12th International Conference on Very Large Data Bases, ser. VLDB '86, pp.228-237, 1986. ,
,
,
,
,
,
,
,
,
Webpage ,
,
, Streaming Data Webpage, " https://aws.amazon.com/streaming-data
,
,
,
http://spark.apache.org/streaming/. [23] Spark Streaming ,
,
Sybase IQ Multiplex???Designed For Analytics, Proceedings of the Thirtieth International Conference on Very Large Data Bases - ser. VLDB '04. VLDB Endowment, pp.1227-1230, 2004. ,
DOI : 10.1016/B978-012088469-8.50111-X
,
The vertica analytic database, Proc. VLDB Endow, pp.1790-1801, 2012. ,
DOI : 10.14778/2367502.2367518
An overview of DB2 parallel edition, ACM SIGMOD Record, vol.24, issue.2, pp.460-462, 1995. ,
DOI : 10.1145/568271.223876
A Survey of Data Stream Processing Tools, Information Sciences and Systems, p.295, 2014. ,
DOI : 10.1007/978-3-319-09465-6_31
The data civilizer system [30] Improving Data Preparation for Business Ana- lytics Webpage, " https://tdwi.org/research/2016/07/ best-practices-report-improving-data-preparation-for-business-analytics, CIDR, 2017. ,
Gartner warns rms of 'dirty data, Information Management Journal, 2007. ,
Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications), 2006. ,
Hadoop: The Deenitive Guide, 2009. ,
SQL-on-Hadoop, Proceedings of the VLDB Endowment, vol.7, issue.12, pp.1295-1306, 2014. ,
DOI : 10.14778/2732977.2733002
Impala: A modern, open-source SQL engine for hadoop, CIDR, 2015. ,
The tail at scale, Communications of the ACM, vol.56, issue.2, 2013. ,
DOI : 10.1145/2408776.2408794
DiNoDB, Proceedings of the First International Workshop on Bringing the Value of "Big Data" to Users (Data4U 2014), Data4U '14, 2014. ,
DOI : 10.1145/2658840.2658841
,
HadoopDB, VLDB, 2009. ,
DOI : 10.14778/1687627.1687731
NoDB: eecient query execution on raw data les, SIGMOD, 2012. ,
Megastore: Providing scalable, highly available storage for interactive services, CIDR 2011, Fifth Biennial Conference on Innovative Data Systems Research Online Proceedings. www.crdrdb.org, pp.223-234, 2011. ,
The Hadoop Distributed File System, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010. ,
DOI : 10.1109/MSST.2010.5496972
Spanner, ACM Transactions on Computer Systems, vol.31, issue.3, pp.1-822, 2013. ,
DOI : 10.1145/2518037.2491245
MapReduce: Simpliied Data Processing on Large Clusters, USENIX OSDI, 2004. ,
Hadoop++, VLDB, 2010. ,
DOI : 10.14778/1920841.1920908
Only aggressive elephants are fast elephants, Proc. of VLDB, pp.1591-1602, 2012. ,
DOI : 10.14778/2350229.2350272
CoHadoop, VLDB, 2011. ,
DOI : 10.14778/2002938.2002943
Column-oriented storage techniques for MapReduce, Proceedings of the VLDB Endowment, vol.4, issue.7, 1105. ,
DOI : 10.14778/1988776.1988778
Rccle: A fast and space-eecient data placement structure in mapreduce-based warehouse systems, Data Engineering (ICDE), 2011 IEEE 27th International Conference on, pp.1199-1208, 2011. ,
Here are my data les. here are my queries. where are my results, CIDR'11, pp.57-68, 2011. ,
Mapreduce is Good Enough? If All You Have is a Hammer, Throw Away Everything That's Not a Nail!, Big Data, vol.1, issue.1, 1209. ,
DOI : 10.1089/big.2012.1501
F1, Proceedings of the VLDB Endowment, vol.6, issue.11, 2013. ,
DOI : 10.14778/2536222.2536232
The hadoop distributed le system, Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), ser. MSST '10, 2010. ,
The voltdb main memory dbms, IEEE Data Eng. Bull, vol.36, issue.2, pp.21-27, 2013. ,
Shark, Proceedings of the 2013 international conference on Management of data, SIGMOD '13, 2013. ,
DOI : 10.1145/2463676.2465288
Spark: Cluster Computing with Working Sets, USENIX Hot- Cloud, 2010. ,
Pig latin, Proceedings of the 2008 ACM SIGMOD international conference on Management of data , SIGMOD '08, 2008. ,
DOI : 10.1145/1376616.1376726
,
Webpage, www.vertica.com/. [62] Hadoop ,
, Postgresql
Impala: A modern, open-source sql engine for hadoop, Proc. CIDR '15, 2015. ,
Cod: Database / operating system co-design, CIDR, 2013. ,
Spark: Cluster computing with working sets, Proc. of USENIX HotCloud, 2010. ,
,
,
Spark SQL, Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pp.1383-1394, 2015. ,
DOI : 10.1007/3-540-59451-5_2
Discretized streams, Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, 2013. ,
DOI : 10.1145/2517349.2522737
, The Lambda Architecture Webpage
Some methods for classiication and analysis of multivariate observations, Proc. of 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967. ,
A density-based algorithm for discovering clusters in large spatial databases with noise, Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining, 1996. ,
Supporting Memory Storage in HDFS ,
Invisible loading, Proceedings of the 16th International Conference on Extending Database Technology, EDBT '13, 2013. ,
DOI : 10.1145/2452376.2452377
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation, Advances in neural information processing systems, 2006. ,
DOI : 10.21236/ADA629956
Latent Dirichlet Allocation, Journal of Machine Learning Research, vol.3, pp.993-1022, 2003. ,
Interactive query processing in big data systems: A cross-industry study of MapReduce workloads, Proc. of VLDB, 2012. ,
DOI : 10.21236/ADA561769
Hadoop's adolescence, Proc. of VLDB, 2013. ,
DOI : 10.14778/2536206.2536213
Tachyon, Proceedings of the ACM Symposium on Cloud Computing, SOCC '14, 2014. ,
DOI : 10.1145/2517349.2522737
HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm, DMTCS Proceedings, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00406166
hatS: A Heterogeneity-Aware Tiered Storage for Hadoop, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp.502-511, 2014. ,
DOI : 10.1109/CCGrid.2014.51
URL : http://people.cs.vt.edu/~butta/docs/ccgrid2014-hats.pdf
,
,
Webpage ,
MISO, Proceedings of the 2014 ACM SIGMOD international conference on Management of data, SIGMOD '14, 2014. ,
DOI : 10.1145/2588555.2588568
Resilient Distributed Datasets, NSDI, 2012. ,
DOI : 10.1145/2886107.2886110
Parallel in-situ data processing with speculative loading, Proceedings of the 2014 ACM SIGMOD international conference on Management of data, SIGMOD '14, p.14 ,
DOI : 10.1145/2588555.2593673
SCANRAW, ACM Transactions On Database Systems, 2015. ,
DOI : 10.1109/PACT.2011.9
GLADE, Proceedings of the 2012 international conference on Management of Data, SIGMOD '12 ,
DOI : 10.1145/2213836.2213936
Dremel: Interactive analysis of web-scale datasets Webpage, " https://en.wikipedia.org/wiki/FITS. [95] Reservoir sampling, Proc. of the 36th Int'l Conf on Very Large Data Bases, 2010. ,
Incremental Detection of Inconsistencies in Distributed Data, 2012 IEEE 28th International Conference on Data Engineering, pp.1367-1383, 2014. ,
DOI : 10.1109/ICDE.2012.82
Incremental Detection of Inconsistencies in Distributed Data, 2012 IEEE 28th International Conference on Data Engineering, pp.318-329, 2012. ,
DOI : 10.1109/ICDE.2012.82
Dissecting UbuntuOne, Proceedings of the 2015 ACM Conference on Internet Measurement Conference, IMC '15, pp.155-168, 2015. ,
DOI : 10.1109/ICC.2014.6883506
BigDansing, Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pp.1215-1230, 2015. ,
DOI : 10.1145/2463676.2463706
Messing up with BART, Proc. of VLDB, pp.36-47, 2015. ,
DOI : 10.14778/2850578.2850579
NADEEF, Proceedings of the 2013 international conference on Management of data, SIGMOD '13, pp.541-552, 2013. ,
DOI : 10.1145/2463676.2465327
Holistic data cleaning: Putting violations into context, 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp.458-469, 2013. ,
DOI : 10.1109/ICDE.2013.6544847
A pipelined framework for online cleaning of sensor data streams, Tech. Rep, 2005. ,
A model-based approach for rrd data stream cleansing, Proc. of CIKM, pp.862-871, 2012. ,
SCREEN, Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pp.827-841, 2015. ,
DOI : 10.1109/ICDE.2007.367867
Scalable Distributed Stream Join Processing, Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pp.811-825, 2015. ,
DOI : 10.1109/TKDE.2015.2427795
Scalable and adaptive online joins, Proc. of VLDB, pp.441-452, 2014. ,
DOI : 10.14778/2732279.2732281
URL : http://infoscience.epfl.ch/record/190035/files/paper.pdf
Crowd-Based Deduplication, Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pp.1263-1277, 2015. ,
DOI : 10.14778/2536336.2536337
Katara: A data cleaning system powered by knowledge bases and crowdsourcing, Proc. of SIGMOD, pp.1247-1261, 2015. ,
Sampling the repairs of functional dependency violations under hard constraints, Proc. of VLDB, pp.197-207, 2010. ,
DOI : 10.14778/1920841.1920870
Conditional Functional Dependencies for Data Cleaning, 2007 IEEE 23rd International Conference on Data Engineering, pp.746-755, 2007. ,
DOI : 10.1109/ICDE.2007.367920
Proof positive and negative in data cleaning, 2015 IEEE 31st International Conference on Data Engineering ,
DOI : 10.1109/ICDE.2015.7113269
Repairing Functional Dependency Violations in Distributed Data, DASFAA, pp.441-457, 2015. ,
DOI : 10.1007/978-3-319-18120-2_26
On approximating optimum repairs for functional dependency violations, Proceedings of the 12th International Conference on Database Theory, ICDT '09, pp.53-62, 2009. ,
DOI : 10.1145/1514894.1514901
Continuous data cleaning, 2014 IEEE 30th International Conference on Data Engineering, pp.244-255, 2014. ,
DOI : 10.1109/ICDE.2014.6816655
The dataaow model: A practical approach to balancing correctness , latency, and cost in massive-scale, unbounded, out-of-order data processing, Proc. of VLDB, pp.1792-1803, 2015. ,
Liquid: Unifying nearline and ooine big data integration, CIDR, 2015. ,
Temporal rules discovery for web data cleaning, Proc. of VLDB, pp.336-347, 2015. ,
DOI : 10.14778/2856318.2856328
, Webpage, " https://spark-summit.org/east-2015/ streaming-machine-learning-in-spark
,
GraphX, First International Workshop on Graph Data Management Experiences and Systems, GRADES '13, 2013. ,
DOI : 10.1145/2484425.2484427
,
,
, Spark stream cleaning Webpage
A cost-based model and eeective heuristic for repairing constraints by value modiication, Proc. of SIGMOD, pp.143-154, 2005. ,
DOI : 10.1145/1066157.1066175
URL : http://homepages.inf.ed.ac.uk/wenfei/papers/sigmod05.pdf
The 8 requirements of real-time stream processing, ACM SIGMOD Record, vol.34, issue.4, pp.42-47, 2005. ,
DOI : 10.1145/1107499.1107504
URL : http://www.sigmod.org/publications/sigmod-record/0512/p42-article-stonebraker.pdf
,
The llunatic data-cleaning framework, Proc. of VLDB, pp.625-636, 2013. ,
Adaptive cleaning for rrd data streams, Proceedings of the 32Nd International Conference on Very Large Data Bases, ser. VLDB '06. VLDB Endowment, pp.163-174, 2006. ,
Sequential Data Cleaning, Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16, pp.909-924, 2016. ,
DOI : 10.1145/2463676.2463706