Monday, September 30, 2013

[repost ]MySQL Forums :: Fabric, Sharding, HA, Utilities :: MySQL Fabric: Blogs, Presentations

original:http://forums.mysql.com/read.php?144,545664,545664#msg-545664 MySQL Fabric: Presentations http://www.slideshare.net/nixnutz/mysql-57-fabric-high-availability-and-sharding MySQL Fabric: Blogs http://mysqlmusings.blogspot.com/2013/09/brief-introduction-to-mysql-fabric.html http://vnwrites.blogspot.com/2013/09/mysqlfabric-sharding-introduction.html http://vnwrites.blogspot.com/2013/09/mysqlfabric-sharding-example.html http://vnwrites.blogspot.com/2013/09/mysqlfabric-sharding-migration.html http://vnwrites.blogspot.com/2013/09/mysqlfabric-sharding-maintenance.html http://alfranio-distributed.blogspot.com/2013/09/tips-to-build-fault-tolerant-database.html http://alfranio-distributed.blogspot.com/2013/09/writing-fault-tolerant-database.html https://blogs.oracle.com/jbalint/entry/mysql_connector_j_with_fabric http://geert.vanderkelen.org/mysql-fabric-support-in-connectorpython/ http://blog.ulf-wendel.de/2013/mysql-5-7-fabric-any-good/ ===================================== Presentations: http://www.slideshare.net/ryanthiessen/mysql-conference-2011-the-secret-sauce-of-sharding-ryan-thiessen https://github.com/tumblr/jetpants/blob/master/doc/VelocityEurope2011Presentation.pdf?raw=true http://www.slideshare.net/oemebamo/database-sharding-at-netlog http://www.slideshare.net/RockeTier/database2011-my-sql-sharding http://www.slideshare.net/RockeTier/how-sharding-turned-mysql-into-the-internet-defacto-database-standard-moshe-kaplan-rocketier http://www.slideshare.net/guest0e6d5e/sharding-architectures Blogs: http://blog.ulf-wendel.de/2012/shard-key-support-and-cache-locality-optimization-support-for-php-mysql-driver/ http://blog.ulf-wendel.de/2011/php-load-balancing-and-sharding-combined-with-mysqlnd/ http://blog.evernote.com/tech/2012/04/26/shard-boiled/ http://mysqldba.blogspot.in/2012/03/mysql-shards-gearman-rabbitmq-nodejs.html http://blogs.citrix.com/2011/07/15/db-sharding-with-netscaler-datastream/ Projects: https://github.com/tumblr/jetpants https://github.com/twitter/gizzard http://code.google.com/p/shard-query/ http://www.slideshare.net/datacharmer/sharding-for-the-masses http://www.slideshare.net/Kentoku/spider-muc2010-20100414 https://github.com/tchandy/octopus/wiki/Sharding http://www.slideshare.net/tchandy/projeto-octopus-database-sharding-para-activerecord http://kovyrin.github.com/db-charmer/#sharding



via WordPress http://blog.newitfarmer.com/imd/mysql/12947/repost-mysql-forums-fabric-sharding-ha-utilities-mysql-fabric-blogs-presentations#utm_source=rss&utm_medium=rss&utm_campaign=repost-mysql-forums-fabric-sharding-ha-utilities-mysql-fabric-blogs-presentations

Labels:

[project ]nearinfinity/hbase-benchmark

original:https://github.com/nearinfinity/hbase-benchmark HBase Benchmark HBase benchmark is a tool to test an HBase cluster. It contains a read test for Honeycomb and read/write tests for HBase directly. The required flags for the Honeycomb test are sqlTable, columnFamily, indexName and toolTable. Usage The arguments to the benchmark tool are: -autoFlush Enable auto flush of write buffer to [...]



via WordPress http://blog.newitfarmer.com/big_data/hbase/12940/project-nearinfinityhbase-benchmark#utm_source=rss&utm_medium=rss&utm_campaign=project-nearinfinityhbase-benchmark

Labels:

Thursday, September 26, 2013

[repost ]Apache Tez—对MapReduce数据处理的归纳

original:http://www.infoq.com/cn/news/2013/09/TEZ 在最近的一篇InfoQ文章中曾讨论过,Hortonworks新的Stinger Initiative非常依赖Tez——一个全新Hadoop数据处理框架。 博客文章Apache Tez:Hadoop数据处理的新篇章中写道: “诸如Hive和Pig等更高级别的数据处理应用,需要这样的一个执行框架:该框架能够用有效的方式,表达这些应用的复杂的查询逻辑,并且在执行查询时能够保证高性能。Apache Tez……给出了传统MapReduce的一种替代方案,让任务能够满足对快速响应时间和PB量级的极端吞吐量的需求。” 为了实现这一目标,Tez并没有将数据处理按照单任务建模,而是作为一种数据流图来处理: ……图中的顶点表示应用逻辑,而边则表示数据转移。丰富的数据流定义API,让用户能够用直观的方式表达复杂的查询逻辑。对于更高级别的声明式应用程序(如Hive和Pig)所生成的查询计划来说,这简直是一种天作之合……数据流管道可以被表示为单一的Tez任务,它会运行整个计算。而Tez负责将这个逻辑图扩展为任务的物理图,并执行它。 在Tez的顶点上,特定的用户逻辑以输入、处理器和输出模块的形式建模,输入和输出模块定义了输入和输出数据(包括格式、访问方法和位置),而处理器模块定义了数据转换逻辑——它可以用MapReduce任务或Reducer的形式表示。虽然Tez并不明确地强制要求任何数据格式的限制,但它需要输入、输出和处理器能够互相兼容。类似地,由一条边连接的输入/输出对,在格式/位置上必须是兼容的。 博客文章Apache Tez中的数据处理API,描绘了一套简单的Java API,用于表示数据处理的DAG(有向无环图)。该API包含三部分: DAG:定义了全体任务。用户为每个数据处理任务创建DAG对象。 Vertex:定义了用户逻辑,以及执行该用户逻辑所需的资源与环境。用户为任务中的每一步创建Vertex对象,并将其添加到DAG。 Edge:定义了生产者和消费者顶点之间的链接。用户创建Edge对象,用来连接生产者和消费者顶点。 Tez所定义的边属性,使其能够将用户任务实例化、配置其输入输出、恰当地调度它们,并定义任务之间的数据如何路由。Tez还支持通过指定用户指南、数据大小和资源,为每个顶点的执行定义其并发机制。 数据转移:定义了任务之间数据的路由选择。 一对一:数据从第i个生产者任务路由到第i个消费者任务。 广播:数据从一个生产者任务路由到所有消费者任务。 散列:生产者任务以碎片的形式散播数据,而消费者任务收集碎片。来自各个生产者任务的第i块碎片,都会路由到第i个消费者任务。 调度:定义了一个消费者任务何时被设定为以下内容。 顺序的:消费者任务被安排在某个生产者任务完成之后。 并发的:消费者任务必须与某个生产者任务同时执行。 数据源:将某个任务输出的生命周期/可靠性定义为如下内容。 持续的:在任务推出后,输入将依旧可用——它或许在之后被丢弃。 持久可靠:输入将被可靠地存储,而且将永远可用。 短暂的:输出仅在生产者任务运行过程中可用, 有关Tez架构的更多细节,请参阅Tez设计文档。 用数据流来表现数据处理的理念并不算新鲜——这正是Cascading的基础,而且许多使用Oozie的应用也实现了这一目的。相比之下,Tez的优势在于,将这一切都放在了一个单一的框架中,并针对资源管理(基于Apache Hadoop YARN)、数据传输和执行,对该框架进行了优化。此外,Tez的设计还提供了对可热插拔的顶点管理模块的支持,用来收集来自任务的相关信息,并在运行时改变数据流图,从而为了性能和资源使用进行优化。 查看英文原文:Apache Tez – a Generalization of the MapReduce Data Processing



via WordPress http://blog.newitfarmer.com/anls/mapreduce/12920/repost-apache-tez-%e5%af%b9mapreduce%e6%95%b0%e6%8d%ae%e5%a4%84%e7%90%86%e7%9a%84%e5%bd%92%e7%ba%b3#utm_source=rss&utm_medium=rss&utm_campaign=repost-apache-tez-%25e5%25af%25b9mapreduce%25e6%2595%25b0%25e6%258d%25ae%25e5%25a4%2584%25e7%2590%2586%25e7%259a%2584%25e5%25bd%2592%25e7%25ba%25b3

Labels:

[repost ]Neural Networks for Machine Learning Lecture 6a Overview of mini-batch gradient descent

original:https://d396qusza40orc.cloudfront.net/neuralnets/lecture_slides%2Flec6.pdf



via WordPress http://blog.newitfarmer.com/ai/machine-learning/12914/repost-neuralnetworks-for-machinelearning-lecture6a-overviewofmini-batchgradientdescent#utm_source=rss&utm_medium=rss&utm_campaign=repost-neuralnetworks-for-machinelearning-lecture6a-overviewofmini-batchgradientdescent

Labels:

[repost ]HBase Performance Testing

http://hstack.org/hbase-performance-testing/



via WordPress http://blog.newitfarmer.com/big_data/hbase/12905/repost-hbase-performance-testing#utm_source=rss&utm_medium=rss&utm_campaign=repost-hbase-performance-testing

Labels:

[repost ]Google Dremel 原理 – 如何能3秒分析1PB





via WordPress http://blog.newitfarmer.com/anls/analytics-bi/12907/repost-google-dremel-%e5%8e%9f%e7%90%86-%e5%a6%82%e4%bd%95%e8%83%bd3%e7%a7%92%e5%88%86%e6%9e%901pb-2#utm_source=rss&utm_medium=rss&utm_campaign=repost-google-dremel-%25e5%258e%259f%25e7%2590%2586-%25e5%25a6%2582%25e4%25bd%2595%25e8%2583%25bd3%25e7%25a7%2592%25e5%2588%2586%25e6%259e%25901pb-2

Labels:

Wednesday, September 25, 2013

[repost ]HBase performances/load tests.

original:http://www.spaggiari.org/index.php/hbase/hbase-performances-load-tests There is multiple ways to measure HBase performances. There are tools included in HBase, external tools, or even home-made scripts. Let’s try to list them first. Tools included in HBase: org.apache.hadoop.hbase.PerformanceEvaluation org.apache.hadoop.hbase.util.LoadTestTool ? External tools: YCSB Home-made scripts: DIY ? If you know other performances/load test tools for HBase, feel free to let me [...]



via WordPress http://blog.newitfarmer.com/big_data/hbase/12897/repost-hbase-performancesload-tests#utm_source=rss&utm_medium=rss&utm_campaign=repost-hbase-performancesload-tests

Labels:

[repost ]YCSB:Yahoo! Cloud Serving Benchmark

original:https://github.com/brianfrankcooper/YCSB/ Yahoo! Cloud System Benchmark (YCSB) ==================================== Links ----- http://wiki.github.com/brianfrankcooper/YCSB/ http://research.yahoo.com/Web_Information_Management/YCSB ycsb-users@yahoogroups.com Getting Started --------------- 1. Download the latest release of YCSB: wget https://github.com/downloads/brianfrankcooper/YCSB/ycsb-0.1.4.tar.gz tar xfvz ycsb-0.1.4 cd ycsb-0.1.4 2. Set up a database to benchmark. There is a README file under each binding directory. 3. Run YCSB command. bin/ycsb load basic -P workloads/workloada bin/ycsb [...]



via WordPress http://blog.newitfarmer.com/big_data/hbase/12895/repost-ycsbyahoo-cloud-serving-benchmark#utm_source=rss&utm_medium=rss&utm_campaign=repost-ycsbyahoo-cloud-serving-benchmark

Labels:

[repost ]The Log-Structured Merge-Tree (LSM-Tree) (1996)

original:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.44.2782 Abstract . High-performance transaction system applications typically insert rows in a History table to provide an activity trace; at the same time the transaction system generates log records for purposes of system recovery. Both types of generated information can benefit from efficient indexing. An example in a well-known setting is the TPC-A benchmark application, [...]



via WordPress http://blog.newitfarmer.com/software-develop/arithmetic/12876/repost-the-log-structured-merge-tree-lsm-tree-1996#utm_source=rss&utm_medium=rss&utm_campaign=repost-the-log-structured-merge-tree-lsm-tree-1996

Labels:

Sunday, September 22, 2013

[repost ]Statistical Data Mining Tutorials Tutorial Slides by Andrew Moore

original:http://www.autonlab.org/tutorials/ Advertisment: In 2006 I joined Google. We are growing a Google Pittsburgh office on CMU’s campus. We are hiring creative computer scientists who love programming, and Machine Learning is one the focus areas of the office. We’re also currently accepting resumes for Fall 2008 intenships. If you might be interested, feel welcome to send [...]



via WordPress http://blog.newitfarmer.com/ai/data-mining/12861/repost-statistical-data-mining-tutorials-tutorial-slides-by-andrew-moore#utm_source=rss&utm_medium=rss&utm_campaign=repost-statistical-data-mining-tutorials-tutorial-slides-by-andrew-moore

Labels:

[repost ]Tutorial 8: Speech Translation: Theory and Practice

original:http://www.icassp2013.com/Tutorial_08.asp Tutorial 8: Speech Translation: Theory and Practice Monday, May 27, 9 am-12 noon Presented by Bowen Zhou, Xiaodong He Abstract In this tutorial, we first survey the latest statistical machine translation (SMT) technologies, presenting both theoretic and practical perspectives that are most relevant to speech translation (ST). Next, we review key learning problems, and [...]



via WordPress http://blog.newitfarmer.com/ai/machine-translation/12864/repost-tutorial-8-speech-translation-theory-and-practice#utm_source=rss&utm_medium=rss&utm_campaign=repost-tutorial-8-speech-translation-theory-and-practice

Labels:

[repost ]SPEECH TRANSLATION: THEORY AND PRACTICES

original:http://researcher.watson.ibm.com/researcher/files/us-zhou/Speech%20Translation%20Tutorial.pdf SPEECH TRANSLATION: THEORY AND PRACTICES Bowen Zhou and Xiaodong He zhou@us.ibm.com xiaohe@microsoft.com May 27 th 2013 @ ICASSP 2013Bowen Zhou & Xiaodong He ICASSP 2013 Tutorial: Speech Translation Universal Translator: dream (will) come true or yet another over promise ? • Spoken communication without a language barrier: mankind’s long-standing dreams – Translate human speech [...]



via WordPress http://blog.newitfarmer.com/ai/machine-translation/12862/repost-speech-translation-theory-and-practices#utm_source=rss&utm_medium=rss&utm_campaign=repost-speech-translation-theory-and-practices

Labels:

[repost ]海量 Python 学习资源列表

original:https://github.com/kirang89/pycrumbs/blob/master/pycrumbs.md Contents Beginner’s Delight Style Guide and Idioms Dictionary Decorators Generators Iterators Yield Context Managers Unicode Networking Metaclasses Documentation Sphinx Debugging Testing Profiling Packaging Deployment Fabric Warts and Gotchas Web Frameworks Flask Web2Py Django Bottle API and Web Services Scraping Mobile Development Kivy Google Glass Resources Libraries GUI Programming WSGI Databases SQLAlchemy Data Mining Data [...]



via WordPress http://blog.newitfarmer.com/python/library/12859/repost-%e6%b5%b7%e9%87%8f-python-%e5%ad%a6%e4%b9%a0%e8%b5%84%e6%ba%90%e5%88%97%e8%a1%a8#utm_source=rss&utm_medium=rss&utm_campaign=repost-%25e6%25b5%25b7%25e9%2587%258f-python-%25e5%25ad%25a6%25e4%25b9%25a0%25e8%25b5%2584%25e6%25ba%2590%25e5%2588%2597%25e8%25a1%25a8

Labels:

[repost ]Samza 0.7 document

original:http://samza.incubator.apache.org/learn/documentation/0.7.0/ Documentation Introduction Background Concepts Architecture Comparisons Introduction MUPD8 Storm API Overview Javadocs Container TaskRunner Streams Checkpointing State Management Metrics Windowing Event Loop JMX Jobs JobRunner Configuration Packaging YARN Jobs Logging YARN Application Master Isolation Operations Security Kafka



via WordPress http://blog.newitfarmer.com/big_data/streams/samza/12854/repost-samza-0-7-document#utm_source=rss&utm_medium=rss&utm_campaign=repost-samza-0-7-document

Labels:

[repost ]Peregrine :Map Reduce and Bigtable Done Right.

original:http://peregrine.io/ Source Javadoc Community Downloads Map Reduce and Bigtable Done Right. Peregrine is a map reduce framework designed for running iterative jobs across partitions of data. Peregrine is designed to be FAST for executing map reduce jobs by supporting a number of optimizations and features not present in other map reduce frameworks. Get started today [...]



via WordPress http://blog.newitfarmer.com/big_data/hadoop/12845/repost-peregrine-map-reduce-and-bigtable-done-right#utm_source=rss&utm_medium=rss&utm_campaign=repost-peregrine-map-reduce-and-bigtable-done-right

Labels:

Tuesday, September 17, 2013

[repost ]“Social Analysis”是一个致力于推动社交网络相关的科学、工程以及应用的开源社区。

original:http://socialysis.org/?lang=zh “Social Analysis”是一个致力于推动社交网络相关的科学、工程以及应用的开源社区。社区同时提供定制和公众的服务,并且鼓励工程师、研究员之间交流以及共享。”Social Analysis”倡导的是开放、创新、诚实,致力于建立一个健康有序的开源生态环境,促进思想的交流和资源的共享。 Social Analysis涉及的范围: (1) 数据,算法和工具的共享; (2) 算法评估; (3) 相关新闻发布;



via WordPress http://blog.newitfarmer.com/anls/social-analytics-anls/12843/repost-social-analysis%e6%98%af%e4%b8%80%e4%b8%aa%e8%87%b4%e5%8a%9b%e4%ba%8e%e6%8e%a8%e5%8a%a8%e7%a4%be%e4%ba%a4%e7%bd%91%e7%bb%9c%e7%9b%b8%e5%85%b3%e7%9a%84%e7%a7%91%e5%ad%a6#utm_source=rss&utm_medium=rss&utm_campaign=repost-social-analysis%25e6%2598%25af%25e4%25b8%2580%25e4%25b8%25aa%25e8%2587%25b4%25e5%258a%259b%25e4%25ba%258e%25e6%258e%25a8%25e5%258a%25a8%25e7%25a4%25be%25e4%25ba%25a4%25e7%25bd%2591%25e7%25bb%259c%25e7%259b%25b8%25e5%2585%25b3%25e7%259a%2584%25e7%25a7%2591%25e5%25ad%25a6

Labels:

Monday, September 16, 2013

[repost ]Hadoop : Beyond Map-Reduce

original:http://qconsf.com/track/hadoop-beyond-map-reduce Hadoop, the open-source combination of Map-Reduce libraries and the Hadoop Distributed File System (HDFS) has long been an essential tool in any enterprise or startup. Data scientists use Hadoop to execute statistical and analytical functions on large volumes of data. Data Infrastructure and Search engineers use Hadoop to generate ready-to-load indexes for custom search [...]



via WordPress http://blog.newitfarmer.com/big_data/hadoop/12835/repost-hadoop-beyond-map-reduce#utm_source=rss&utm_medium=rss&utm_campaign=repost-hadoop-beyond-map-reduce

Labels:

[repost ]Teach kids programming

original:https://medium.com/p/a2dc04ea9529 2 min read Teach kids programming A collection of resources I’ve been gathering the best resources to teach children & teens programming — books, environments, apps, courseware and games. These resources are meant for teachers and parents who want to have their children fall in love with computers and see the magic of programming. I’m staying [...]



via WordPress http://blog.newitfarmer.com/programming/common-programming/12825/repost-teach-kids-programming#utm_source=rss&utm_medium=rss&utm_campaign=repost-teach-kids-programming

Labels:

[repost ]A Course in Machine Learning

original:http://ciml.info/ Machine learning is the study of algorithms that learn from data and experience. It is applied in a vast variety of application areas, from medicine to advertising, from military to pedestrian. Any area in which you need to make sense of data is a potential consumer of machine learning. CIML is a set of [...]



via WordPress http://blog.newitfarmer.com/ai/machine-learning/12818/repost-a-course-in-machine-learning#utm_source=rss&utm_medium=rss&utm_campaign=repost-a-course-in-machine-learning

Labels:

Thursday, September 12, 2013

[repost ]Functional Requirements and Their Poor Cousins: The Truth About Non-Functional Requirements (NFRs)

original:http://www.outsystems.com/blog/2013/03/the-truth-about-non-functional-requirements-nfrs.html Whenever anybody says Functional Requirement, I think of princesses. I think of Ariel and Cinderella. I think of how each is central to her story and embodies a specific identity, and then I think of the princess who stands out as a true metaphor for functional requirements – the one who reflects the role [...]



via WordPress http://blog.newitfarmer.com/software-develop/requirement/12811/repost-functional-requirements-and-their-poor-cousins-the-truth-about-non-functional-requirements-nfrs#utm_source=rss&utm_medium=rss&utm_campaign=repost-functional-requirements-and-their-poor-cousins-the-truth-about-non-functional-requirements-nfrs

Labels:

[repost ]Cloudera:Installing and Configuring an External PostgreSQL Database

original:http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.5.4/Cloudera-Manager-Enterprise-Edition-Installation-Guide/cmeeig_topic_5_6.html Use the following instructions to install PostgreSQL and set up a database on the appropriate hosts. It’s useful to set a password for the root user of PostgreSQL. Note the host name and port number where you install PostgreSQL because you will need to specify them when you install the JDBC connector to PostgreSQL [...]



via WordPress http://blog.newitfarmer.com/big_data/cloudera-big_data/12793/repost-clouderainstalling-and-configuring-an-external-postgresql-database#utm_source=rss&utm_medium=rss&utm_campaign=repost-clouderainstalling-and-configuring-an-external-postgresql-database

Labels:

[repost ]Cloudera:Changing Embedded PostgreSQL Database Passwords

original:http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.5.3/Cloudera-Manager-Enterprise-Edition-Installation-Guide/cmeeig_topic_19_3.html When Cloudera Manager installs and configures embedded PostgreSQL databases, it creates user accounts and passwords. You may wish to change passwords associated with the embedded PostgreSQL database accounts. To change these passwords, you must know what the original password was, but since the accounts were automatically created, this information is often unknown. To achieve [...]



via WordPress http://blog.newitfarmer.com/big_data/cloudera-big_data/12791/repost-clouderachanging-embedded-postgresql-database-passwords#utm_source=rss&utm_medium=rss&utm_campaign=repost-clouderachanging-embedded-postgresql-database-passwords

Labels:

Wednesday, September 11, 2013

[repost ]大数据时代的机器学习热点——国际机器学习大会ICML2013参会感想

original:http://www.csdn.net/article/2013-09-05/2816831



via WordPress http://blog.newitfarmer.com/ai/machine-learning/12782/repost-%e5%a4%a7%e6%95%b0%e6%8d%ae%e6%97%b6%e4%bb%a3%e7%9a%84%e6%9c%ba%e5%99%a8%e5%ad%a6%e4%b9%a0%e7%83%ad%e7%82%b9-%e5%9b%bd%e9%99%85%e6%9c%ba%e5%99%a8%e5%ad%a6%e4%b9%a0%e5%a4%a7#utm_source=rss&utm_medium=rss&utm_campaign=repost-%25e5%25a4%25a7%25e6%2595%25b0%25e6%258d%25ae%25e6%2597%25b6%25e4%25bb%25a3%25e7%259a%2584%25e6%259c%25ba%25e5%2599%25a8%25e5%25ad%25a6%25e4%25b9%25a0%25e7%2583%25ad%25e7%2582%25b9-%25e5%259b%25bd%25e9%2599%2585%25e6%259c%25ba%25e5%2599%25a8%25e5%25ad%25a6%25e4%25b9%25a0%25e5%25a4%25a7

Labels:

[repost ]giraph:How to write a Page Rank application

original:http://giraph.apache.org/pagerank.html In this example, we will detail a very simple implementation of the page rank algorithm and how input/output works in Giraph. At the end of this short tutorial, you should have a simple working piece of code that will run on a real cluster. Choose your graph generic types Giraph implements bulk synchronous parallel [...]



via WordPress http://blog.newitfarmer.com/anls/giraph/12774/repost-giraphhow-to-write-a-page-rank-application#utm_source=rss&utm_medium=rss&utm_campaign=repost-giraphhow-to-write-a-page-rank-application

Labels:

Tuesday, September 10, 2013

[repost ]Hama Graph Tutorial

original:http://hama.apache.org/hama_graph_tutorial.html This document describes the Graph computing framework and serves as a tutorial. Overview Hama includes the Graph package for vertex-centric graph computations. Hama’s Graph package allows you to program Google’s Pregel style applications with simple programming interface. Vertex API Writing a Hama graph application involves subclassing the predefined Vertex class. Its template arguments define [...]



via WordPress http://blog.newitfarmer.com/anls/huma/12770/repost-hama-graph-tutorial#utm_source=rss&utm_medium=rss&utm_campaign=repost-hama-graph-tutorial

Labels:

[repost ]Hama BSP Tutorial

original:http://hama.apache.org/hama_bsp_tutorial.html This document describes the Hama BSP framework and serves as a tutorial. Overview Hama provides a Pure BSP Bulk Synchronous Parallel Model for message passing and collective communication. A BSP program consists of a sequence of supersteps. Each superstep consists of the following three phases: Local computation Process communication Barrier synchronization BSP programming enables [...]



via WordPress http://blog.newitfarmer.com/anls/huma/12767/repost-hama-bsp-tutorial#utm_source=rss&utm_medium=rss&utm_campaign=repost-hama-bsp-tutorial

Labels:

[repost ]12种JavaScript MVC框架之比较

original:http://www.admin10000.com/document/1390.html Gordon L. Hempton是西雅图的一位黑客和设计师,他花费了几个月的时间研究和比较了12种流行的JavaScript MVC框架,并在博客中总结了每种框架的优缺点,最终的结果是,Ember.js胜出。 此次比较针对的特性标准有四种,分别是: UI绑定(UI Bindings) 复合视图(Composed Views) Web表现层(Web Presentation Layer) 与其他框架良好协作(Plays Nicely with Others) 对于各种JavaScript MVC框架,Gordon都总结了优缺点: Backbone.js——优点:强大的社区,强劲的势头;缺点:抽象较弱,很多功能亟待增加。 SproutCore——优点:对绑定的支持,可靠的社区,大量特性;缺点:过度规范,难以和不需要的特性解耦。 Sammy.js——优点:易于学习,更容易和现存的服务端应用程序整合;缺点:过于简单,无法应用于大型应用程序中。 Spine.js——优点:轻量级,文档很完备;缺点:它的核心概念“spine”是异步的用户界面,这意味着理想状况用户界面永远不会发生堵塞,而这个基础有缺陷。 Cappuccino——优点:大型深思熟虑后的框架,良好的社区,很棒的继承模型;缺点:由iOS开发者创建,使用JavaScript模拟Objective-C。 Knockout.js——优点:对绑定的支持,完备的文档和教程;缺点:绑定语法拙劣,缺少统一的视图组件层级关系。 Javascript MVC——优点:可靠的社区;缺点:基于字符串的继承模型很差,控制器与视图关系过密而缺少绑定。 GWT(Google Web Toolkit)——优点:全面的框架,良好的社区,可靠的基于Java的组件继承模型;缺点:可能无法经受时间的考验,另外,Java在客户端上的抽象有些笨拙。 Google Closure——优点:很好的基于组件的UI组合系统。缺点:缺少UI绑定支持。 Ember.js——优点:很丰富的模板系统,拥有复合视图和UI绑定;缺点:相对较新,文档不够完备。 Angular.js——优点:对模板范围和控制器设计有很好的考虑,拥有依赖注入系统,支持丰富的UI绑定语法。缺点:代码的模块性不强,视图的模块化也不够。 Batman.js——优点:代码清晰,绑定、持久化的方法简单;缺点:使用了单例控制器。 经过对以上各种Javascript MVC框架特性的比较,Gordon认为只有Ember.js能够完全满足他的要求,从而成为他最终选用的框架。 你是否也使用过某些JavaScript MVC框架呢?欢迎参与讨论。



via WordPress http://blog.newitfarmer.com/ajax-framework/ajax-ajax-framework/12764/repost-12%e7%a7%8djavascript-mvc%e6%a1%86%e6%9e%b6%e4%b9%8b%e6%af%94%e8%be%83#utm_source=rss&utm_medium=rss&utm_campaign=repost-12%25e7%25a7%258djavascript-mvc%25e6%25a1%2586%25e6%259e%25b6%25e4%25b9%258b%25e6%25af%2594%25e8%25be%2583

Labels:

[repost ]Angular.js VS. Ember.js:谁将成为Web开发的新宠?

original:http://www.admin10000.com/document/2858.html 本文源自于Quora网站的一个问题,作者称最近一直在为一个新的Rails项目寻找一个JavaScript框架,通过筛选,最终纠结于Angular.js和Ember.js。 这个问题获得了大量的关注,并吸引到这两个框架的开发者参与回答。如果你也纠结JavaScript框架的选择,那么本文对你来说也是一个非常好的参考资料。   Angular.js和Ember.js介绍 Angular.js是一款开源的JavaScript框架,由Google维护,其目标是增强基于Web应用,并带有MVC功能,使得开发和测试变得更加容易。 Angular.js读取包含附加自定义(标签属性)的HTML,遵从这些自定义属性中的指令,并将页面中的输入输出与由JavaScript变量表示的模型绑定起来。这些JavaScript变量的值可以手工设置,或者从静态或动态JSON资源中获取。 项目地址:http://angularjs.org/ Ember.js同样是一个用于创建web应用的JavaScript MVC 框架,其采用基于字符串的Handlebars模板,支持双向绑定、观察者模式、计算属性(依赖其他属性动态变化)、自动更新模板、路由控制、状态机等。 Ember.js使用自身扩展的类来创建Ember.js对象、数组、字符串、函数,提供大量方法与属性用于操作。每一个Ember.js应用都使用各自的命名空间,避免冲突。 项目地址:http://emberjs.com/   Angular.js开发者:Angular.js最能体现HTML的精髓 Angular.js其中一位开发者Misko Hevery回复了提问者的疑问,内容如下。 我是Angular团队中的一名开发者,我还不太了解Emeber.js,因此我的观点可能会有些偏颇。 有人说,Angular.js和Ember.js都在HTML中放入了太多的逻辑。当然,将逻辑放入HTML是一个不好的做法,我们也不建议这么做。事实上,Angular.js只放置绑定,而不是逻辑,我们建议把逻辑放入控制器中。但绑定同样是信息,这些信息可以放在一些地方,你有三种选择: 代码。但这使得程序模块化很成问题,因为HTML与代码紧密耦合,要想重新组成一个应用程序非常困难。 HTML。这正是Angular.js所做的。我们认为,除了放置连接信息外,你不应该在HTML中做任何事情。任何逻辑都不应该出现在这里,因为它会导致各种问题。我认为Angular.js做的绑定相当好。 元数据文件:虽然我不知道是否有人这样做,但基本上这是一个双重问题,因为你将不得不在代码中连接HTML位置和模型位置。 当然,在构建一个应用程序时,你也可以不使用框架,但不可否认,使用框架将使得你的开发工作变得更容易。 我个人认为Angular.js的独特之处在于它拥抱HTML/CSS,遵循“HTML是什么”的精神。其他一些框架提供了它们自己的API,偏离了HTML。Angular.js在所有框架中是最能体现声明式的。我相信声明式非常适合用来构建用户界面,而JS非常适合用来编写逻辑。 Angular.js允许你扩展HTML,所以你在使用Angular.js过程中遇到的任何问题都可以很容易地克服。你可以在Angular.js官网中http://angularjs.org找到一些能够展示其特性的例子。   Ember.js开发者:Ember.js是构建“雄心勃勃”的应用程序的不二选择 Ember.js的一位开发者Tom Dale对Angular.js和Ember.js进行了详细对比,具体内容如下。 作为Ember.js的作者之一,我经常会被问道:应该使用Angular.js还是Ember.js? 我认为在做出选择之前,需要考虑:要构建什么样的应用?那么Ember.js是不是比Angular.js更好呢? 虽然两者在表面上有一些相似之处——它们都使用绑定,都比其他框架(比如Backbone.js)更有利于编写Web应用程序。 我首先来介绍一下Ember.js项目的由来。从2009年开始,我就一直在苹果公司参与SproutCore的开发,这是一个开源的类似于Cocoa的JavaScript框架,后来演变成了你现在所看到的iCloud。当时,我的周围是一些世界上最好的Cocoa开发者。 问题是,多少年来在客户端应用程序方面,似乎并没有真正新的突破。自80年代以来就一直遵循的基本模型——代码运行在本地计算机上,从网络上获取数据,然后在本地处理,并显示在屏幕上;而如今唯一改变的是——代码运行在浏览器的沙箱环境中,然后加载所需的“二进制”文件,而不是由用户安装到硬盘上的文件。 在考虑这些问题是,我会首先想到:在我们之前,人们已经做了什么?我认为很难去争辩框架的成功,比如Cocoa,无论在Mac还是iOS上,Cocoa都可以让开发者轻松编写受用户喜爱的应用程序。 我们希望开发者能够创建雄心勃勃的、能够与本地应用竞争的Web应用程序。要做到这一点,开发者首先需要先进的工具和正确的概念,以帮助他们沟通和协作。 在开发Ember.js过程中,我们花了大量时间从其他一些本地应用程序框架(如Cocoa)中引入一些概念,但后来我们感觉到这些概念带来的困扰多于帮助,或它们并不适合用来构建Web应用程序。因此,我们开始转向其他流行的开源项目,比如Ruby on Rails和Backbone.js,从它们中来找灵感。 因此,Ember.js最终成为了一个综合的、强大的、符合现代Web特性的、轻量级的工具。 在我看来,与Ember.js相比,Angular.js更像一个研究项目。比如,来看看它们的学习文档:Ember.js主要讨论模型、视图和控制器,而Angular.js指南要求你去学习一些类似于范围、指示符和transclusion方面的内容等。 我完全支持一些研究型项目,并希望它们能够变成最好的。但是,要记住,要在生产环境中看待应用程序。 一些大公司已经在Ember.js上投入了时间和精力,比如新版ZenDesk已经使用Ember.js重写(在他们对Backbone.js失望后,决定放弃它改用Ember.js),Square的整个Web层面也是基于Ember.js的(因为他们想要一个漂亮、响应式的UI),Groupon的移动版Web应用也是使用Ember.js开发的。此外,还有很多初创型公司通过Ember.js获得了成功,并开始对Ember.js社区进行贡献。 而我目前所看到使用Angular.js开发的大多数应用程序只是演示项目,或是Google的内部项目。 Yehuda(Ember.js开发者之一)和我也一直积极邀请真正的用户参与Ember.js框架的设计和维护,这可以确保我们在Ember.js中添加的功能对于实际开发是有用的。 事实上,在过去的几个月中,大多数Ember.js开发工作都是由Ember.js社区的核心贡献组完成的,他们来自不同的公司。如果Yehuda和我哪天有什么事情,或者我们的公司倒闭了,Ember.js还将会持续发展。这是一个真正的社区项目,而不是“Google”项目。 回到技术细节。Angular.js官网上写道“Angular.js是HTML的未来,它被设计用于构建Web应用程序。”我认为当看他们的应用程序时,这种理念是显而易见的——用户界面由HTML标记定义,使用有语义意义的属性(比如data-ng-repeat)来装饰。 而Ember.js使用Handlebars来描述HTML,来展现你的应用程序界面。从美观角度,我们可以谈谈你是更喜欢Handlebars语法(使用类似于{{#each}}的helper),还是喜欢像Angular.js那样通过额外的属性来注释HTML。我个人认为,HTML属性方法有点杂乱,难以阅读。当然,你可以使用其中任何一种方式。如果Ember.js不存在,而我又必须使用一个使用了数据属性的框架,那么我会考虑Angular.js。 抛开美观不谈,我相信,Ember.js使用基于字符串的模板的方式也为我们带来了一些优势: 基于字符串的模板可以在服务器上预编译。这样可以减少启动时间,也意味着渲染一个模板可以像调用一个函数一样简单。 Angular.js需要你在应用程序启动时遍历整个DOM,你的应用程序越大,启动速度越慢。 如果你想在服务器上渲染你的应用程序(用于Google爬虫索引或让首次加载时显示速度更快),Angular.js需要你去启动整个浏览器环境,像PhantomJS,这是资源密集型的。而Handlebars是100%的JavaScript字符串,所有你需要的只是像node.js或Rhino之类的东西。 如果你的应用程序变得越来越大,那么字符串模板可以很容易地被分割和懒加载。 此外,Handlebars只让你绑定属性,而Angular.js允许你嵌入实时更新的任意表达式。很多人最初将这个视为Ember.js的局限性,但实际上: Ember.js允许非常容易地使用JavaScript来创建可计算属性,它可以包含任意表达式。我们只要求你指定你的依赖,这样在更新时可以智能些。 Angular.js在每次有新的变化时,必须重新计算这些表达式,这意味着需要在你的应用程序中绑定更多的元素,速度会变慢。 因为Ember.js只允许你绑定属性,我们将可以很容易地利用ECMAScript 6的性能优势,如Object.observes。由于Angular.js发明了自己的带有自定义解析器的JavaScript子集,这对于浏览器来说,优化代码变得比较困难。 [...]



via WordPress http://blog.newitfarmer.com/ajax-framework/emberjs/12763/repost-angular-js-vs-ember-js%ef%bc%9a%e8%b0%81%e5%b0%86%e6%88%90%e4%b8%baweb%e5%bc%80%e5%8f%91%e7%9a%84%e6%96%b0%e5%ae%a0%ef%bc%9f#utm_source=rss&utm_medium=rss&utm_campaign=repost-angular-js-vs-ember-js%25ef%25bc%259a%25e8%25b0%2581%25e5%25b0%2586%25e6%2588%2590%25e4%25b8%25baweb%25e5%25bc%2580%25e5%258f%2591%25e7%259a%2584%25e6%2596%25b0%25e5%25ae%25a0%25ef%25bc%259f

Labels:

[repost ]A guide to analyzing Python performance

original:http://www.huyng.com/posts/python-performance-analysis/ While it’s not always the case that every Python program you write will require a rigorous performance analysis, it is reassuring to know that there are a wide variety of tools in Python’s ecosystem that one can turn to when the time arises. Analyzing a program’s performance boils down to answering 4 basic questions: [...]



via WordPress http://blog.newitfarmer.com/architecture/performance/12759/repost-a-guide-to-analyzing-python-performance#utm_source=rss&utm_medium=rss&utm_campaign=repost-a-guide-to-analyzing-python-performance

Labels:

Saturday, September 07, 2013

[repost ] CS595D – Graph Mining, Weekly Seminar

original:http://www.cs.ucsb.edu/~xyan/classes/CS595D.htm Abstract: Graph mining and network analytics is critical to a variety of application domains, ranging from community detection in social networks, malicious program analysis in computer security, to searches for functional modules in biological pathways and structural analysis in chemical compounds. There is an emerging need to systematically investigate the modeling, managing, and mining [...]



via WordPress http://blog.newitfarmer.com/ai/graph-computing/12751/repost-cs595d-graph-mining-weekly-seminar#utm_source=rss&utm_medium=rss&utm_campaign=repost-cs595d-graph-mining-weekly-seminar

Labels:

[repost ]Big Data Debate: Will HBase Dominate NoSQL?

original:http://www.informationweek.com/software/enterprise-applications/big-data-debate-will-hbase-dominate-nosq/240159475 HBase is modeled after Google BigTable and is part of the world’s most popular big data processing platform, Apache Hadoop. But will this pedigree guarantee HBase a dominant role in the competitive and fast-growing NoSQL database market? Michael Hausenblas of MapR argues that Hadoop’s popularity and HBase’s scalability and consistency ensure success. The growing [...]



via WordPress http://blog.newitfarmer.com/big_data/hbase/12741/repost-big-data-debate-will-hbase-dominate-nosql#utm_source=rss&utm_medium=rss&utm_campaign=repost-big-data-debate-will-hbase-dominate-nosql

Labels:

Friday, September 06, 2013

[repost ]Performance-Related Books

original:http://alexanderpodelko.com/blog/2013/08/27/performance-related-books/ Here is my updated list of performance-related books. Books are grouped into a few categories just for convenience – some books fit several categories and books definitely may be grouped differently. Inside each category books are arranged chronologically. Some older product-based books may probably be dropped – but keep them so far in case [...]



via WordPress http://blog.newitfarmer.com/architecture/performance/12736/repost-performance-related-books#utm_source=rss&utm_medium=rss&utm_campaign=repost-performance-related-books

Labels:

Thursday, September 05, 2013

[paper ]MillWheel: Fault-Tolerant Stream Processing at Internet Scale

original:http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p734-akidau.pdf



via WordPress http://blog.newitfarmer.com/big_data/streams/12732/paper-millwheel-fault-tolerant-stream-processing-at-internet-scale#utm_source=rss&utm_medium=rss&utm_campaign=paper-millwheel-fault-tolerant-stream-processing-at-internet-scale

Labels:

[repost ] Analyzing the Analyzers 分析分析师 —— 数据科学部门如何建

original:http://blog.csdn.net/iascchen/article/details/11100147 很多牛逼的公司都宣称在建立数据科学部门,这个部门该如何组建,大家都在摸石头过河。 O‘reilly Strata今年 六月份发布了报告 《Analyzing the Analyzers》,比较清晰的阐述了数据科学部门所需要的不同角色及其技能。重点内容翻译如下: 数据科学家的分类研究方法 自我认识 请被调查者用常用的5级标准(从完全同意到完全不同意)来回答 “我觉得自己是一个XX” 这样的问题,能够获得数据科学家的自我认识结果。调查结果将数据科学家分为以下四类:Data Businesspeople、Data Creatives、Data Developer、Data Researchers。 技能需求 请被调查者对数据科学家所需的以下22项技能进行排序,分析不同类型的数据科学家的技能要求。其中的ML是机器学习的简写,OR指运筹学(Operations Research) 将它们结合起来分析 根据受访者的自我认知和技能排序,可以识别出不同类型的数据科学家所需要的技能。 数据科学家的类别 Data Businesspeople Data Businesspeople 往往专注于组织管理和如何从数据项目中产生利润。他们往往将自己定位为领导或创业者,约 80% 的 Data Businesspeople 承担员工管理的责任。Data Businesspeople 还可能是咨询服务或合同类服务的提供者。Data Businesspeople 学历相对较高,大约 60% 拥有硕士以上学位,其中 MBA 接近 25%;而且很多 Data Businesspeople 都有工科学位的背景。Data Businesspeople 往往操作真实数据,90% 以上偶尔会操作 GB 级别的数据。与其他数据科学家相比,Data Businesspeople 年龄稍微偏大,接近四分之一是女性(相比略高),仅有四分之一的 Businesspeople 把自己称为数据科学家(相比略低)。 Data [...]



via WordPress http://blog.newitfarmer.com/anls/common/12730/repost-analyzing-the-analyzers-%e5%88%86%e6%9e%90%e5%88%86%e6%9e%90%e5%b8%88-%e6%95%b0%e6%8d%ae%e7%a7%91%e5%ad%a6%e9%83%a8%e9%97%a8%e5%a6%82%e4%bd%95%e5%bb%ba#utm_source=rss&utm_medium=rss&utm_campaign=repost-analyzing-the-analyzers-%25e5%2588%2586%25e6%259e%2590%25e5%2588%2586%25e6%259e%2590%25e5%25b8%2588-%25e6%2595%25b0%25e6%258d%25ae%25e7%25a7%2591%25e5%25ad%25a6%25e9%2583%25a8%25e9%2597%25a8%25e5%25a6%2582%25e4%25bd%2595%25e5%25bb%25ba

Labels:

[repost ]Efficient discovery of overlapping communities in massive networks

original:http://www.pnas.org/content/110/36/14534.full Edited by Chris H. Wiggins, Columbia University, New York, NY, and accepted by the Editorial Board July 1, 2013 (received for review December 18, 2012) Next Section Abstract Detecting overlapping communities is essential to analyzing and exploring natural networks such as social networks, biological networks, and citation networks. However, most existing approaches do not [...]



via WordPress http://blog.newitfarmer.com/anls/social-analytics-anls/12727/repost-efficient-discovery-of-overlapping-communities-in-massive-networks#utm_source=rss&utm_medium=rss&utm_campaign=repost-efficient-discovery-of-overlapping-communities-in-massive-networks

Labels:

[repost ]【SVM之菜鸟实现】 python版

original:http://blog.sina.com.cn/s/blog_88e2dbbf0101ohm5.html @余凯_西二旗民工 【SVM之菜鸟实现】 python版



via WordPress http://blog.newitfarmer.com/ai/machine-learning/12720/repost-%e3%80%90svm%e4%b9%8b%e8%8f%9c%e9%b8%9f%e5%ae%9e%e7%8e%b0%e3%80%91-python%e7%89%88#utm_source=rss&utm_medium=rss&utm_campaign=repost-%25e3%2580%2590svm%25e4%25b9%258b%25e8%258f%259c%25e9%25b8%259f%25e5%25ae%259e%25e7%258e%25b0%25e3%2580%2591-python%25e7%2589%2588

Labels:

Tuesday, September 03, 2013

[repost ]Convex Optimization Overview

original:http://cs229.stanford.edu/section/cs229-cvxopt.pdf



via WordPress http://blog.newitfarmer.com/ai/math/12701/repost-convex-optimization-overview#utm_source=rss&utm_medium=rss&utm_campaign=repost-convex-optimization-overview

Labels:

[repost ]JavaScript与有限状态机

original:http://www.ruanyifeng.com/blog/2013/09/finite-state_machine_for_javascript.html 有限状态机(Finite-state machine)是一个非常有用的模型,可以模拟世界上大部分事物。 简单说,它有三个特征:   * 状态总数(state)是有限的。 * 任一时刻,只处在一种状态之中。 * 某种条件下,会从一种状态转变(transition)到另一种状态。 它对JavaScript的意义在于,很多对象可以写成有限状态机。 举例来说,网页上有一个菜单元素。鼠标悬停的时候,菜单显示;鼠标移开的时候,菜单隐藏。如果使用有限状态机描述,就是这个菜单只有两种状态(显示和隐藏),鼠标会引发状态转变。 代码可以写成下面这样:   var menu = {     // 当前状态     currentState: 'hide',     // 绑定事件     initialize: function() {       var self = this;       self.on("hover", self.transition);     },     // 状态转换     transition: function(event){       switch(this.currentState) {         case "hide":           this.currentState = 'show';           doSomething();           break;         case "show":           this.currentState = 'hide';           doSomething();           break;         default:           console.log('Invalid [...]



via WordPress http://blog.newitfarmer.com/programming/javascript/12699/repost-javascript%e4%b8%8e%e6%9c%89%e9%99%90%e7%8a%b6%e6%80%81%e6%9c%ba#utm_source=rss&utm_medium=rss&utm_campaign=repost-javascript%25e4%25b8%258e%25e6%259c%2589%25e9%2599%2590%25e7%258a%25b6%25e6%2580%2581%25e6%259c%25ba

Labels:

Monday, September 02, 2013

[repost ]OWASP关于JAVA的安全问题汇总列表

original:http://automationqa.com/forum.php?mod=viewthread&tid=2843&fromuid=21 OWASP关于JAVA的安全问题汇总列表 https://www.owasp.org/index.php/Category:Java C Capture-replay Comparing classes by name Cross-site Scripting (XSS) D Deserialization of untrusted data F Failure to follow guideline/specification H Hibernate Hibernate-Guidelines How to add validation logic to HttpServletRequest How to encrypt a properties file I Improper Data Validation Improper temp file opening Information Leakage Insecure Randomness Insecure Transport Insufficient Session-ID Length [...]



via WordPress http://blog.newitfarmer.com/java/others-java/12691/repost-owasp%e5%85%b3%e4%ba%8ejava%e7%9a%84%e5%ae%89%e5%85%a8%e9%97%ae%e9%a2%98%e6%b1%87%e6%80%bb%e5%88%97%e8%a1%a8#utm_source=rss&utm_medium=rss&utm_campaign=repost-owasp%25e5%2585%25b3%25e4%25ba%258ejava%25e7%259a%2584%25e5%25ae%2589%25e5%2585%25a8%25e9%2597%25ae%25e9%25a2%2598%25e6%25b1%2587%25e6%2580%25bb%25e5%2588%2597%25e8%25a1%25a8

Labels:

[repost ]一致性hash和solr千万级数据分布式搜索引擎中的应用

original:http://blog.jobbole.com/47023/ 互联网创业中大部分人都是草根创业,这个时候没有强劲的服务器,也没有钱去买很昂贵的海量数据库。在这样严峻的条件下,一批又一批的创业者从创业中获得成功,这个和当前的开源技术、海量数据架构有着必不可分的关系。比如我们使用mysql、nginx等开源软件,通过架构和低成本服务器也可以搭建千万级用户访问量的系统。新浪微博、淘宝网、腾讯等大型互联网公司都使用了很多开源免费系统搭建了他们的平台。所以,用什么没关系,只要能够在合理的情况下采用合理的解决方案。 那怎么搭建一个好的系统架构呢?这个话题太大,这里主要说一下数据分流的方式。比如我们的数据库服务器只能存储200个数据,突然要搞一个活动预估达到600个数据。 可以采用两种方式:横向扩展或者纵向扩展。 纵向扩展是升级服务器的硬件资源。但是随着机器的性能配置越高,价格越高,这个代价对于一般的小公司是承担不起的。 横向扩展是采用多个廉价的机器提供服务。这样一个机器只能处理200个数据、3个机器就可以处理600个数据了,如果以后业务量增加还可以快速配置增加。在大多数情况都选择横向扩展的方式。如下图: 现在有个问题了,这600个数据如何路由到对应的机器。需要考虑如果均衡分配,假设我们600个数据都是统一的自增id数据,从1~600,分成3堆可以采用 id mod 3的方式。其实在真实环境可能不是这种id是字符串。需要把字符串转变为hashcode再进行取模。 目前看起来是不是解决我们的问题了,所有数据都很好的分发并且没有达到系统的负载。但如果我们的数据需要存储、需要读取就没有这么容易了。业务增多怎么办,大家按照上面的横向扩展知道需要增加一台服务器。但是就是因为增加这一台服务器带来了一些问题。看下面这个例子,一共9个数,需要放到2台机器(1、2)上。各个机器存放为:1号机器存放1、3、5、7、9 ,2号机器存放 2、4、6、8。如果扩展一台机器3如何,数据就要发生大迁移,1号机器存放1、4、7, 2号机器存放2、5、8, 3号机器存放3、6、9。如图: 从图中可以看出 1号机器的3、5、9迁移出去了、2好机器的4、6迁移出去了,按照新的秩序再重新分配了一遍。数据量小的话重新分配一遍代价并不大,但如果我们拥有上亿、上T级的数据这个操作成本是相当的高,少则几个小时多则数天。并且迁移的时候原数据库机器负载比较高,那大家就有疑问了,是不是这种水平扩展的架构方式不太合理? —————————–华丽分割线————————————— 一致性hash就是在这种应用背景提出来的,现在被广泛应用于分布式缓存,比如memcached。下面简单介绍下一致性hash的基本原理。最早的版本 http://dl.acm.org/citation.cfm?id=258660。国内网上有很多文章都写的比较好。如: http://blog.csdn.net/x15594/article/details/6270242 下面简单举个例子来说明一致性hash。 准备:1、2、3 三台机器 还有待分配的9个数 1、2、3、4、5、6、7、8、9 一致性hash算法架构 步骤 一、构造出来 2的32次方 个虚拟节点出来,因为计算机里面是01的世界,进行划分时采用2的次方数据容易分配均衡。另 2的32次方是42亿,我们就算有超大量的服务器也不可能超过42亿台吧,扩展和均衡性都保证了。 二、将三台机器分别取IP进行hashcode计算(这里也可以取hostname,只要能够唯一区别各个机器就可以了),然后映射到2的32次方上去。比如1号机器算出来的hashcode并且mod (2^32)为 123(这个是虚构的),2号机器算出来的值为 2300420,3号机器算出来为 90203920。这样三台机器就映射到了这个虚拟的42亿环形结构的节点上了。 三、将数据(1-9)也用同样的方法算出hashcode并对42亿取模将其配置到环形节点上。假设这几个节点算出来的值为 1:10,2:23564,3:57,4:6984,5:5689632,6:86546845,7:122,8:3300689,9:135468。可以看出 1、3、7小于123, 2、4、9 小于 2300420 大于 123, 5、6、8 大于 2300420 小于90203920。从数据映射到的位置开始顺时针查找,将数据保存到找到的第一个Cache节点上。如果超过2^32仍然找不到Cache节点,就会保存到第一个Cache节点上。也就是1、3、7将分配到1号机器,2、4、9将分配到2号机器,5、6、8将分配到3号机器。 这个时候大家可能会问,我到现在没有看见一致性hash带来任何好处,比传统的取模还增加了复杂度。现在马上来做一些关键性的处理,比如我们增加一台机器。按照原来我们需要把所有的数据重新分配到四台机器。一致性hash怎么做呢?现在4号机器加进来,他的hash值算出来取模后是12302012。 5、8 大于2300420 小于12302012 ,6 大于 [...]



via WordPress http://blog.newitfarmer.com/ai/ai-ir/solr/12687/repost-%e4%b8%80%e8%87%b4%e6%80%a7hash%e5%92%8csolr%e5%8d%83%e4%b8%87%e7%ba%a7%e6%95%b0%e6%8d%ae%e5%88%86%e5%b8%83%e5%bc%8f%e6%90%9c%e7%b4%a2%e5%bc%95%e6%93%8e%e4%b8%ad%e7%9a%84%e5%ba%94%e7%94%a8#utm_source=rss&utm_medium=rss&utm_campaign=repost-%25e4%25b8%2580%25e8%2587%25b4%25e6%2580%25a7hash%25e5%2592%258csolr%25e5%258d%2583%25e4%25b8%2587%25e7%25ba%25a7%25e6%2595%25b0%25e6%258d%25ae%25e5%2588%2586%25e5%25b8%2583%25e5%25bc%258f%25e6%2590%259c%25e7%25b4%25a2%25e5%25bc%2595%25e6%2593%258e%25e4%25b8%25ad%25e7%259a%2584%25e5%25ba%2594%25e7%2594%25a8

Labels:

Sunday, September 01, 2013

[repost ]Convergence Failures in Logistic Regression

original:http://www2.sas.com/proceedings/forum2008/360-2008.pdf 1 Paper 360-2008 Convergence Failures in Logistic Regression Paul D. Allison, University of Pennsylvania, Philadelphia, PA ABSTRACT A frequent problem in estimating logistic regression models is a failure of the likelihood maximization algorithm to converge. In most cases, this failure is a consequence of data patterns known as complete or quasi-complete separation. For these [...]



via WordPress http://blog.newitfarmer.com/ai/machine-learning/12679/repost-convergence-failures-in-logistic-regression#utm_source=rss&utm_medium=rss&utm_campaign=repost-convergence-failures-in-logistic-regression

Labels:

[repost ]Estimation fails when weights are applied in Logistic Regression

original:http://www-01.ibm.com/support/docview.wss?uid=swg21479681 Technote (troubleshooting) Problem(Abstract) Estimation fails when weights are applied in Logistic Regression: “Estimation failed due to numerical problem. Possible reasons are: (1) at least one of the convergence criteria LCON, BCON is zero or too small, or (2) the value of EPS is too small (if not specified, the default value that is used [...]



via WordPress http://blog.newitfarmer.com/anls/spss-bi/12674/repost-estimation-fails-when-weights-are-applied-in-logistic-regression#utm_source=rss&utm_medium=rss&utm_campaign=repost-estimation-fails-when-weights-are-applied-in-logistic-regression

Labels: