2.38 score from hupso.pl for:
aboutdm.com



HTML Content


Titleabout data mining

Length: 17, Words: 3
Description a blog about data mining and machine learning, recommender systems, business applications and related news and events.

Length: 118, Words: 17
Keywords pusty
Robots
Charset UTF-8
Og Meta - Title exist
Og Meta - Description exist
Og Meta - Site name pusty
Tytuł powinien zawierać pomiędzy 10 a 70 znaków (ze spacjami), a mniej niż 12 słów w długości.
Meta opis powinien zawierać pomiędzy 50 a 160 znaków (łącznie ze spacjami), a mniej niż 24 słów w długości.
Kodowanie znaków powinny być określone , UTF-8 jest chyba najlepszy zestaw znaków, aby przejść z powodu UTF-8 jest bardziej międzynarodowy kodowaniem.
Otwarte obiekty wykresu powinny być obecne w stronie internetowej (więcej informacji na temat protokołu OpenGraph: http://ogp.me/)

SEO Content

Words/Characters 8137
Text/HTML 23.77 %
Headings H1 1
H2 25
H3 19
H4 0
H5 0
H6 0
H1
about data mining
H2
pages
sep 23, 2015
apr 22, 2014
jan 20, 2014
jan 3, 2014
jan 2, 2014
nov 11, 2013
sep 12, 2013
sep 4, 2013
jul 26, 2013
jul 9, 2013
jul 8, 2013
apr 30, 2013
apr 29, 2013
apr 26, 2013
apr 25, 2013
apr 24, 2013
mar 12, 2013
feb 21, 2013
feb 11, 2013
about the author
search this blog
blog archive
follow by email
followers
H3
iot and big data
product attribute extraction
statistics and data mining
data mining conferences in 2014
hiring data scientists
overview of data mining
how casinos are betting on big data
machine learning as a service
learning to rank and recommender systems
text mining: name entity detection (2)
text mining: named entity detection
the future of video mining
stroke prediction
history of machine learning
yelp and big data
machine learning for anti-virus software
basic steps of applying machine learning methods
data mining and neuroscience
data mining vs. machine learning
H4
H5
H6
strong
b
i
em
Bolds strong 0
b 0
i 0
em 0
Zawartość strony internetowej powinno zawierać więcej niż 250 słów, z stopa tekst / kod jest wyższy niż 20%.
Pozycji używać znaczników (h1, h2, h3, ...), aby określić temat sekcji lub ustępów na stronie, ale zwykle, użyj mniej niż 6 dla każdego tagu pozycje zachować swoją stronę zwięzły.
Styl używać silnych i kursywy znaczniki podkreślić swoje słowa kluczowe swojej stronie, ale nie nadużywać (mniej niż 16 silnych tagi i 16 znaczników kursywy)

Statystyki strony

twitter:title pusty
twitter:description pusty
google+ itemprop=name exist
Pliki zewnętrzne 16
Pliki CSS 3
Pliki javascript 13
Plik należy zmniejszyć całkowite odwołanie plików (CSS + JavaScript) do 7-8 maksymalnie.

Linki wewnętrzne i zewnętrzne

Linki 345
Linki wewnętrzne 44
Linki zewnętrzne 301
Linki bez atrybutu Title 183
Linki z atrybutem NOFOLLOW 0
Linki - Użyj atrybutu tytuł dla każdego łącza. Nofollow link jest link, który nie pozwala wyszukiwarkom boty zrealizują są odnośniki no follow. Należy zwracać uwagę na ich użytkowania

Linki wewnętrzne

- //www.blogger.com/rearrange?blogid=3860385786864415309&widgettype=pagelist&widgetid=pagelist1&action=editwidget§ionid=crosscol
- //www.blogger.com/rearrange?blogid=3860385786864415309&widgettype=text&widgetid=text2&action=editwidget§ionid=sidebar-right-1
- //www.blogger.com/rearrange?blogid=3860385786864415309&widgettype=blogsearch&widgetid=blogsearch1&action=editwidget§ionid=sidebar-right-1
- //www.blogger.com/rearrange?blogid=3860385786864415309&widgettype=label&widgetid=label1&action=editwidget§ionid=sidebar-right-1
▼  javascript:void(0)
▼  javascript:void(0)
►  javascript:void(0)
►  javascript:void(0)
►  javascript:void(0)
►  javascript:void(0)
►  javascript:void(0)
►  javascript:void(0)
►  javascript:void(0)
►  javascript:void(0)
►  javascript:void(0)
►  javascript:void(0)
►  javascript:void(0)
►  javascript:void(0)
►  javascript:void(0)
►  javascript:void(0)
- //www.blogger.com/rearrange?blogid=3860385786864415309&widgettype=blogarchive&widgetid=blogarchive1&action=editwidget§ionid=sidebar-right-1
- //www.blogger.com/rearrange?blogid=3860385786864415309&widgettype=gadget&widgetid=gadget1&action=editwidget§ionid=sidebar-right-1
- //www.blogger.com/rearrange?blogid=3860385786864415309&widgettype=followbyemail&widgetid=followbyemail1&action=editwidget§ionid=sidebar-right-1
- //www.blogger.com/rearrange?blogid=3860385786864415309&widgettype=followers&widgetid=followers1&action=editwidget§ionid=sidebar-right-1
- //www.blogger.com/rearrange?blogid=3860385786864415309&widgettype=attribution&widgetid=attribution1&action=editwidget§ionid=footer-3

Linki zewnętrzne

home http://www.aboutdm.com/
about http://www.aboutdm.com/p/about.html
iot and big data http://www.aboutdm.com/2015/09/iot-and-big-data.html
junling hu https://www.blogger.com/profile/14080175423926243363
9:26 pm http://www.aboutdm.com/2015/09/iot-and-big-data.html
92 comments: http://www.aboutdm.com/2015/09/iot-and-big-data.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=1991456555696767060&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=1991456555696767060&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=1991456555696767060&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=1991456555696767060&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=1991456555696767060&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=1991456555696767060&target=pinterest
product attribute extraction http://www.aboutdm.com/2014/04/product-attribute-extraction.html
- http://4.bp.blogspot.com/-av3oexlkc4e/u1cgjotx3bi/aaaaaaaaauu/ivjwwq7od_8/s1600/ebay_product.jpg
junling hu https://www.blogger.com/profile/14080175423926243363
7:06 pm http://www.aboutdm.com/2014/04/product-attribute-extraction.html
149 comments: http://www.aboutdm.com/2014/04/product-attribute-extraction.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=7825907255978520446&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7825907255978520446&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7825907255978520446&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7825907255978520446&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7825907255978520446&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7825907255978520446&target=pinterest
text mining http://www.aboutdm.com/search/label/text%20mining
statistics and data mining http://www.aboutdm.com/2014/01/statistics-and-data-mining.html
- http://1.bp.blogspot.com/-yxzvmckbhii/ut3y0aevw9i/aaaaaaaaas0/slcx_ghoeco/s1600/dices.jpg
- http://3.bp.blogspot.com/-bwzy6ik23ji/ut3wx7vrkxi/aaaaaaaaaso/6nzsfv3efzs/s1600/conditional.png
junling hu https://www.blogger.com/profile/14080175423926243363
8:09 pm http://www.aboutdm.com/2014/01/statistics-and-data-mining.html
30 comments: http://www.aboutdm.com/2014/01/statistics-and-data-mining.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=8709621989617359194&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=8709621989617359194&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=8709621989617359194&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=8709621989617359194&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=8709621989617359194&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=8709621989617359194&target=pinterest
frequent pattern mining http://www.aboutdm.com/search/label/frequent%20pattern%20mining
overview of data mining http://www.aboutdm.com/search/label/overview%20of%20data%20mining
data mining conferences in 2014 http://www.aboutdm.com/2014/01/data-mining-conferences-in-2014.html
- http://3.bp.blogspot.com/-evh15akluza/useamupsgvi/aaaaaaaaase/f6hg31n8mfs/s1600/kdd2013.png
kdd 2013 http://www.kdd.org/kdd2013/
sdm  http://www.siam.org/meetings/sdm14/
kdd http://www.kdd.org/kdd2014/
strata http://strataconf.com/strata2014
predictive analytics world http://www.predictiveanalyticsworld.com/sanfrancisco/2014/
icml http://icml.cc/2014/
aaai http://www.aaai.org/conferences/aaai/aaai14.php
icdm http://www.cs.uvm.edu/~icdm/
www http://www2014.kr/
acl http://www.cs.jhu.edu/acl2014/
sigir http://sigir.org/sigir2014/
interspeech http://www.interspeech2014.org/public.php?page=home.html
recsys http://recsys.acm.org/recsys14/
junling hu https://www.blogger.com/profile/14080175423926243363
7:32 pm http://www.aboutdm.com/2014/01/data-mining-conferences-in-2014.html
22 comments: http://www.aboutdm.com/2014/01/data-mining-conferences-in-2014.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=3944241764086152784&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=3944241764086152784&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=3944241764086152784&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=3944241764086152784&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=3944241764086152784&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=3944241764086152784&target=pinterest
news and events http://www.aboutdm.com/search/label/news%20and%20events
hiring data scientists http://www.aboutdm.com/2014/01/hiring-data-scientists.html
- http://3.bp.blogspot.com/-o-1cebadw7c/usy8r00kpwi/aaaaaaaaarw/2spmtuskouu/s1600/funny-cartoon-scientist.jpg
report by mckinsey http://www.mckinsey.com/~/media/mckinsey/dotcom/insights%20and%20pubs/mgi/research/technology%20and%20innovation/big%20data/mgi_big_data_full_report.ashx
junling hu https://www.blogger.com/profile/14080175423926243363
8:28 pm http://www.aboutdm.com/2014/01/hiring-data-scientists.html
44 comments: http://www.aboutdm.com/2014/01/hiring-data-scientists.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=7926628880959287137&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7926628880959287137&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7926628880959287137&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7926628880959287137&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7926628880959287137&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7926628880959287137&target=pinterest
data science team http://www.aboutdm.com/search/label/data%20science%20team
overview of data mining http://www.aboutdm.com/2013/11/overview-of-data-mining.html
- http://4.bp.blogspot.com/-wumhtjk8lvu/uod3aektp9i/aaaaaaaaaqm/tcg4mzi5q0e/s1600/overview_dm.png
junling hu https://www.blogger.com/profile/14080175423926243363
8:00 am http://www.aboutdm.com/2013/11/overview-of-data-mining.html
25 comments: http://www.aboutdm.com/2013/11/overview-of-data-mining.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=3199830936039759211&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=3199830936039759211&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=3199830936039759211&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=3199830936039759211&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=3199830936039759211&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=3199830936039759211&target=pinterest
overview of data mining http://www.aboutdm.com/search/label/overview%20of%20data%20mining
how casinos are betting on big data http://www.aboutdm.com/2013/09/how-casinos-are-betting-on-big-data.html
- http://1.bp.blogspot.com/-dwafeuqaxis/ujjd-rnj1vi/aaaaaaaaan0/f6hakpgjdj4/s1600/casino-games.jpg
kdd 2011 industry prattice expo http://www.kdd2011.com/indexpo.shtml
here http://finance.yahoo.com/blogs/big-data-download/casinos-bet-big-data-160015466.html
junling hu https://www.blogger.com/profile/14080175423926243363
5:40 pm http://www.aboutdm.com/2013/09/how-casinos-are-betting-on-big-data.html
24 comments: http://www.aboutdm.com/2013/09/how-casinos-are-betting-on-big-data.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=4360143544130485884&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=4360143544130485884&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=4360143544130485884&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=4360143544130485884&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=4360143544130485884&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=4360143544130485884&target=pinterest
companies http://www.aboutdm.com/search/label/companies
machine learning as a service http://www.aboutdm.com/2013/09/machine-learning-as-service.html
- http://3.bp.blogspot.com/--kzmm7ku7c4/uic_ffhq-zi/aaaaaaaaanu/yip5gnb80yw/s1600/ml_box.jpg
mu sigma http://www.mu-sigma.com/
opera solutions http://www.operasolutions.com/
actian http://www.actian.com/
fractal analytics http://www.fractalanalytics.com/
grok https://www.groksolutions.com/
alpine data labs http://www.alpinedatalabs.com/
skytree http://www.skytree.net/
junling hu https://www.blogger.com/profile/14080175423926243363
7:08 am http://www.aboutdm.com/2013/09/machine-learning-as-service.html
35 comments: http://www.aboutdm.com/2013/09/machine-learning-as-service.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=5416340628070332736&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=5416340628070332736&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=5416340628070332736&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=5416340628070332736&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=5416340628070332736&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=5416340628070332736&target=pinterest
companies http://www.aboutdm.com/search/label/companies
learning to rank and recommender systems http://www.aboutdm.com/2013/07/learning-to-rank-and-recommender-systems.html
- http://1.bp.blogspot.com/-g5ndn2k0iya/ufnrna_l3li/aaaaaaaaame/_somo38wgp8/s1600/yelp.png
- http://3.bp.blogspot.com/-rknpqzi1uos/ufnrrnxd-ci/aaaaaaaaamm/pdukkv-w_hg/s1600/learning_rank.png
junling hu https://www.blogger.com/profile/14080175423926243363
9:54 pm http://www.aboutdm.com/2013/07/learning-to-rank-and-recommender-systems.html
30 comments: http://www.aboutdm.com/2013/07/learning-to-rank-and-recommender-systems.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=7807643284857338313&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7807643284857338313&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7807643284857338313&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7807643284857338313&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7807643284857338313&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7807643284857338313&target=pinterest
recommender system http://www.aboutdm.com/search/label/recommender%20system
supervised learning http://www.aboutdm.com/search/label/supervised%20learning
text mining: name entity detection (2) http://www.aboutdm.com/2013/07/text-mining-name-entity-detection-2.html
- http://1.bp.blogspot.com/-meezfmdskvg/udzfwea-uvi/aaaaaaaaalg/pzydxjxwdts/s1600/robot-brain.jpg
junling hu https://www.blogger.com/profile/14080175423926243363
9:09 pm http://www.aboutdm.com/2013/07/text-mining-name-entity-detection-2.html
18 comments: http://www.aboutdm.com/2013/07/text-mining-name-entity-detection-2.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=2336724296033415481&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=2336724296033415481&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=2336724296033415481&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=2336724296033415481&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=2336724296033415481&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=2336724296033415481&target=pinterest
text mining http://www.aboutdm.com/search/label/text%20mining
text mining: named entity detection http://www.aboutdm.com/2013/07/text-mining-named-entity-detection.html
- http://3.bp.blogspot.com/-pzxsneuqbem/udukgupxpxi/aaaaaaaaalm/6wb7pbpwdng/s1600/text_mining_humor.jpg
junling hu https://www.blogger.com/profile/14080175423926243363
8:49 pm http://www.aboutdm.com/2013/07/text-mining-named-entity-detection.html
66 comments: http://www.aboutdm.com/2013/07/text-mining-named-entity-detection.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=1370465733782559081&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=1370465733782559081&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=1370465733782559081&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=1370465733782559081&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=1370465733782559081&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=1370465733782559081&target=pinterest
text mining http://www.aboutdm.com/search/label/text%20mining
the future of video mining http://www.aboutdm.com/2013/04/the-future-of-video-mining.html
- http://2.bp.blogspot.com/-rjs8nur3whc/ux_siihimci/aaaaaaaaajm/h5w1yu7jbwi/s1600/dropcam.jpg
dropcam http://dropcam.com/
junling hu https://www.blogger.com/profile/14080175423926243363
9:09 am http://www.aboutdm.com/2013/04/the-future-of-video-mining.html
34 comments: http://www.aboutdm.com/2013/04/the-future-of-video-mining.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=5416431278376835956&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=5416431278376835956&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=5416431278376835956&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=5416431278376835956&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=5416431278376835956&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=5416431278376835956&target=pinterest
companies http://www.aboutdm.com/search/label/companies
video mining http://www.aboutdm.com/search/label/video%20mining
stroke prediction http://www.aboutdm.com/2013/04/stroke-prediction.html
- http://2.bp.blogspot.com/-laqjya-5jxs/ux6bdv8zvqi/aaaaaaaaai8/n-h44r5rquu/s1600/stroke_patient.jpg
junling hu https://www.blogger.com/profile/14080175423926243363
9:13 am http://www.aboutdm.com/2013/04/stroke-prediction.html
22 comments: http://www.aboutdm.com/2013/04/stroke-prediction.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=7551777614107867875&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7551777614107867875&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7551777614107867875&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7551777614107867875&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7551777614107867875&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=7551777614107867875&target=pinterest
supervised learning http://www.aboutdm.com/search/label/supervised%20learning
history of machine learning http://www.aboutdm.com/2013/04/history-of-machine-learning.html
- http://2.bp.blogspot.com/-lxr78abkif4/uxqxvdgk5gi/aaaaaaaaais/phj1k6sjthg/s1600/history_clock.jpg
- http://4.bp.blogspot.com/-yg_wu1q_8b4/uxqxhfdquzi/aaaaaaaaaik/qrapuswo630/s1600/timeline2.png
icml (international conference on machine learning) http://icml.cc/2013/
junling hu https://www.blogger.com/profile/14080175423926243363
9:58 am http://www.aboutdm.com/2013/04/history-of-machine-learning.html
14 comments: http://www.aboutdm.com/2013/04/history-of-machine-learning.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=6530048147492741742&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=6530048147492741742&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=6530048147492741742&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=6530048147492741742&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=6530048147492741742&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=6530048147492741742&target=pinterest
overview of data mining http://www.aboutdm.com/search/label/overview%20of%20data%20mining
supervised learning http://www.aboutdm.com/search/label/supervised%20learning
yelp and big data http://www.aboutdm.com/2013/04/yelp-and-big-data.html
- http://2.bp.blogspot.com/-zxqqhpunyxc/uxlwuvumz-i/aaaaaaaaaiu/q7ma7ryknj0/s1600/yelp.jpg
big data gurus meetup http://www.meetup.com/bigdatagurus/events/114645332/
junling hu https://www.blogger.com/profile/14080175423926243363
9:15 am http://www.aboutdm.com/2013/04/yelp-and-big-data.html
12 comments: http://www.aboutdm.com/2013/04/yelp-and-big-data.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=8785318456998868888&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=8785318456998868888&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=8785318456998868888&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=8785318456998868888&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=8785318456998868888&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=8785318456998868888&target=pinterest
big data http://www.aboutdm.com/search/label/big%20data
companies http://www.aboutdm.com/search/label/companies
news and events http://www.aboutdm.com/search/label/news%20and%20events
machine learning for anti-virus software http://www.aboutdm.com/2013/04/machine-learning-for-anti-virus-software.html
- http://2.bp.blogspot.com/-d5evu8cda5o/uxfpc8v5q-i/aaaaaaaaaie/nimhwh76mye/s1600/virus.jpg
- http://1.bp.blogspot.com/-fafu81kgcli/uxfpqckkmxi/aaaaaaaaah8/mrtjed5pana/s1600/rocurve.png
junling hu https://www.blogger.com/profile/14080175423926243363
7:18 am http://www.aboutdm.com/2013/04/machine-learning-for-anti-virus-software.html
7 comments: http://www.aboutdm.com/2013/04/machine-learning-for-anti-virus-software.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=5653634522208053013&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=5653634522208053013&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=5653634522208053013&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=5653634522208053013&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=5653634522208053013&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=5653634522208053013&target=pinterest
companies http://www.aboutdm.com/search/label/companies
supervised learning http://www.aboutdm.com/search/label/supervised%20learning
basic steps of applying machine learning methods http://www.aboutdm.com/2013/03/basic-steps-of-applying-machine.html
- http://2.bp.blogspot.com/-ybq6jo0w748/ut963xftyqi/aaaaaaaaahs/dcwkxncrahi/s1600/steps.jpg
junling hu https://www.blogger.com/profile/14080175423926243363
12:01 pm http://www.aboutdm.com/2013/03/basic-steps-of-applying-machine.html
25 comments: http://www.aboutdm.com/2013/03/basic-steps-of-applying-machine.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=4227809367520706040&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=4227809367520706040&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=4227809367520706040&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=4227809367520706040&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=4227809367520706040&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=4227809367520706040&target=pinterest
supervised learning http://www.aboutdm.com/search/label/supervised%20learning
data mining and neuroscience http://www.aboutdm.com/2013/02/data-mining-and-neuroscience.html
- http://2.bp.blogspot.com/-emzhwkajpss/uszbfieyk6i/aaaaaaaaaha/ryhsmot4mck/s1600/brain.jpg
junling hu https://www.blogger.com/profile/14080175423926243363
7:50 am http://www.aboutdm.com/2013/02/data-mining-and-neuroscience.html
351 comments: http://www.aboutdm.com/2013/02/data-mining-and-neuroscience.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=3911812242710400978&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=3911812242710400978&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=3911812242710400978&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=3911812242710400978&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=3911812242710400978&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=3911812242710400978&target=pinterest
news and events http://www.aboutdm.com/search/label/news%20and%20events
data mining vs. machine learning http://www.aboutdm.com/2013/02/data-mining-vs-machine-learning.html
- http://1.bp.blogspot.com/-mniayevt6ym/urktqrzn9ui/aaaaaaaaago/tun5z6chk5q/s1600/difference_sheep.jpg
junling hu https://www.blogger.com/profile/14080175423926243363
7:52 am http://www.aboutdm.com/2013/02/data-mining-vs-machine-learning.html
696 comments: http://www.aboutdm.com/2013/02/data-mining-vs-machine-learning.html#comment-form
- https://www.blogger.com/post-edit.g?blogid=3860385786864415309&postid=4603365158116643851&from=pencil
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=4603365158116643851&target=email
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=4603365158116643851&target=blog
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=4603365158116643851&target=twitter
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=4603365158116643851&target=facebook
https://www.blogger.com/share-post.g?blogid=3860385786864415309&postid=4603365158116643851&target=pinterest
overview of data mining http://www.aboutdm.com/search/label/overview%20of%20data%20mining
older posts http://www.aboutdm.com/search?updated-max=2013-02-11t07:52:00-08:00&max-results=100
home http://www.aboutdm.com/
posts (atom) http://www.aboutdm.com/feeds/posts/default
companies http://www.aboutdm.com/search/label/companies
supervised learning http://www.aboutdm.com/search/label/supervised%20learning
news and events http://www.aboutdm.com/search/label/news%20and%20events
recommender system http://www.aboutdm.com/search/label/recommender%20system
overview of data mining http://www.aboutdm.com/search/label/overview%20of%20data%20mining
text mining http://www.aboutdm.com/search/label/text%20mining
frequent pattern mining http://www.aboutdm.com/search/label/frequent%20pattern%20mining
video mining http://www.aboutdm.com/search/label/video%20mining
audio mining http://www.aboutdm.com/search/label/audio%20mining
big data http://www.aboutdm.com/search/label/big%20data
clustering http://www.aboutdm.com/search/label/clustering
data science team http://www.aboutdm.com/search/label/data%20science%20team
graph mining http://www.aboutdm.com/search/label/graph%20mining
image mining http://www.aboutdm.com/search/label/image%20mining
stream mining http://www.aboutdm.com/search/label/stream%20mining
2015 http://www.aboutdm.com/search?updated-min=2015-01-01t00:00:00-08:00&updated-max=2016-01-01t00:00:00-08:00&max-results=1
september http://www.aboutdm.com/2015_09_01_archive.html
iot and big data http://www.aboutdm.com/2015/09/iot-and-big-data.html
2014 http://www.aboutdm.com/search?updated-min=2014-01-01t00:00:00-08:00&updated-max=2015-01-01t00:00:00-08:00&max-results=4
april http://www.aboutdm.com/2014_04_01_archive.html
january http://www.aboutdm.com/2014_01_01_archive.html
2013 http://www.aboutdm.com/search?updated-min=2013-01-01t00:00:00-08:00&updated-max=2014-01-01t00:00:00-08:00&max-results=19
november http://www.aboutdm.com/2013_11_01_archive.html
september http://www.aboutdm.com/2013_09_01_archive.html
july http://www.aboutdm.com/2013_07_01_archive.html
april http://www.aboutdm.com/2013_04_01_archive.html
march http://www.aboutdm.com/2013_03_01_archive.html
february http://www.aboutdm.com/2013_02_01_archive.html
january http://www.aboutdm.com/2013_01_01_archive.html
2012 http://www.aboutdm.com/search?updated-min=2012-01-01t00:00:00-08:00&updated-max=2013-01-01t00:00:00-08:00&max-results=18
december http://www.aboutdm.com/2012_12_01_archive.html
november http://www.aboutdm.com/2012_11_01_archive.html
blogger https://www.blogger.com

Zdjęcia

Zdjęcia 50
Zdjęcia bez atrybutu ALT 50
Zdjęcia bez atrybutu TITLE 50
Korzystanie Obraz ALT i TITLE atrybutu dla każdego obrazu.

Zdjęcia bez atrybutu TITLE

https://resources.blogblog.com/img/icon18_wrench_allbkg.png
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://4.bp.blogspot.com/-av3oexlkc4e/u1cgjotx3bi/aaaaaaaaauu/ivjwwq7od_8/s1600/ebay_product.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://1.bp.blogspot.com/-yxzvmckbhii/ut3y0aevw9i/aaaaaaaaas0/slcx_ghoeco/s1600/dices.jpg
http://3.bp.blogspot.com/-bwzy6ik23ji/ut3wx7vrkxi/aaaaaaaaaso/6nzsfv3efzs/s1600/conditional.png
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://3.bp.blogspot.com/-evh15akluza/useamupsgvi/aaaaaaaaase/f6hg31n8mfs/s200/kdd2013.png
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://3.bp.blogspot.com/-o-1cebadw7c/usy8r00kpwi/aaaaaaaaarw/2spmtuskouu/s200/funny-cartoon-scientist.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://4.bp.blogspot.com/-wumhtjk8lvu/uod3aektp9i/aaaaaaaaaqm/tcg4mzi5q0e/s640/overview_dm.png
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://1.bp.blogspot.com/-dwafeuqaxis/ujjd-rnj1vi/aaaaaaaaan0/f6hakpgjdj4/s200/casino-games.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://3.bp.blogspot.com/--kzmm7ku7c4/uic_ffhq-zi/aaaaaaaaanu/yip5gnb80yw/s200/ml_box.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://1.bp.blogspot.com/-g5ndn2k0iya/ufnrna_l3li/aaaaaaaaame/_somo38wgp8/s200/yelp.png
http://3.bp.blogspot.com/-rknpqzi1uos/ufnrrnxd-ci/aaaaaaaaamm/pdukkv-w_hg/s400/learning_rank.png
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://1.bp.blogspot.com/-meezfmdskvg/udzfwea-uvi/aaaaaaaaalg/pzydxjxwdts/s200/robot-brain.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://3.bp.blogspot.com/-pzxsneuqbem/udukgupxpxi/aaaaaaaaalm/6wb7pbpwdng/s200/text_mining_humor.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://2.bp.blogspot.com/-rjs8nur3whc/ux_siihimci/aaaaaaaaajm/h5w1yu7jbwi/s200/dropcam.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://2.bp.blogspot.com/-laqjya-5jxs/ux6bdv8zvqi/aaaaaaaaai8/n-h44r5rquu/s200/stroke_patient.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://2.bp.blogspot.com/-lxr78abkif4/uxqxvdgk5gi/aaaaaaaaais/phj1k6sjthg/s200/history_clock.jpg
http://4.bp.blogspot.com/-yg_wu1q_8b4/uxqxhfdquzi/aaaaaaaaaik/qrapuswo630/s400/timeline2.png
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://2.bp.blogspot.com/-zxqqhpunyxc/uxlwuvumz-i/aaaaaaaaaiu/q7ma7ryknj0/s1600/yelp.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://2.bp.blogspot.com/-d5evu8cda5o/uxfpc8v5q-i/aaaaaaaaaie/nimhwh76mye/s1600/virus.jpg
http://1.bp.blogspot.com/-fafu81kgcli/uxfpqckkmxi/aaaaaaaaah8/mrtjed5pana/s320/rocurve.png
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://2.bp.blogspot.com/-ybq6jo0w748/ut963xftyqi/aaaaaaaaahs/dcwkxncrahi/s200/steps.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://2.bp.blogspot.com/-emzhwkajpss/uszbfieyk6i/aaaaaaaaaha/ryhsmot4mck/s200/brain.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://1.bp.blogspot.com/-mniayevt6ym/urktqrzn9ui/aaaaaaaaago/tun5z6chk5q/s200/difference_sheep.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
https://resources.blogblog.com/img/icon18_wrench_allbkg.png
https://resources.blogblog.com/img/icon18_wrench_allbkg.png
https://resources.blogblog.com/img/icon18_wrench_allbkg.png
https://resources.blogblog.com/img/icon18_wrench_allbkg.png
https://resources.blogblog.com/img/icon18_wrench_allbkg.png
https://resources.blogblog.com/img/icon18_wrench_allbkg.png
https://resources.blogblog.com/img/icon18_wrench_allbkg.png
https://resources.blogblog.com/img/icon18_wrench_allbkg.png

Zdjęcia bez atrybutu ALT

https://resources.blogblog.com/img/icon18_wrench_allbkg.png
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://4.bp.blogspot.com/-av3oexlkc4e/u1cgjotx3bi/aaaaaaaaauu/ivjwwq7od_8/s1600/ebay_product.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://1.bp.blogspot.com/-yxzvmckbhii/ut3y0aevw9i/aaaaaaaaas0/slcx_ghoeco/s1600/dices.jpg
http://3.bp.blogspot.com/-bwzy6ik23ji/ut3wx7vrkxi/aaaaaaaaaso/6nzsfv3efzs/s1600/conditional.png
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://3.bp.blogspot.com/-evh15akluza/useamupsgvi/aaaaaaaaase/f6hg31n8mfs/s200/kdd2013.png
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://3.bp.blogspot.com/-o-1cebadw7c/usy8r00kpwi/aaaaaaaaarw/2spmtuskouu/s200/funny-cartoon-scientist.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://4.bp.blogspot.com/-wumhtjk8lvu/uod3aektp9i/aaaaaaaaaqm/tcg4mzi5q0e/s640/overview_dm.png
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://1.bp.blogspot.com/-dwafeuqaxis/ujjd-rnj1vi/aaaaaaaaan0/f6hakpgjdj4/s200/casino-games.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://3.bp.blogspot.com/--kzmm7ku7c4/uic_ffhq-zi/aaaaaaaaanu/yip5gnb80yw/s200/ml_box.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://1.bp.blogspot.com/-g5ndn2k0iya/ufnrna_l3li/aaaaaaaaame/_somo38wgp8/s200/yelp.png
http://3.bp.blogspot.com/-rknpqzi1uos/ufnrrnxd-ci/aaaaaaaaamm/pdukkv-w_hg/s400/learning_rank.png
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://1.bp.blogspot.com/-meezfmdskvg/udzfwea-uvi/aaaaaaaaalg/pzydxjxwdts/s200/robot-brain.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://3.bp.blogspot.com/-pzxsneuqbem/udukgupxpxi/aaaaaaaaalm/6wb7pbpwdng/s200/text_mining_humor.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://2.bp.blogspot.com/-rjs8nur3whc/ux_siihimci/aaaaaaaaajm/h5w1yu7jbwi/s200/dropcam.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://2.bp.blogspot.com/-laqjya-5jxs/ux6bdv8zvqi/aaaaaaaaai8/n-h44r5rquu/s200/stroke_patient.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://2.bp.blogspot.com/-lxr78abkif4/uxqxvdgk5gi/aaaaaaaaais/phj1k6sjthg/s200/history_clock.jpg
http://4.bp.blogspot.com/-yg_wu1q_8b4/uxqxhfdquzi/aaaaaaaaaik/qrapuswo630/s400/timeline2.png
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://2.bp.blogspot.com/-zxqqhpunyxc/uxlwuvumz-i/aaaaaaaaaiu/q7ma7ryknj0/s1600/yelp.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://2.bp.blogspot.com/-d5evu8cda5o/uxfpc8v5q-i/aaaaaaaaaie/nimhwh76mye/s1600/virus.jpg
http://1.bp.blogspot.com/-fafu81kgcli/uxfpqckkmxi/aaaaaaaaah8/mrtjed5pana/s320/rocurve.png
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://2.bp.blogspot.com/-ybq6jo0w748/ut963xftyqi/aaaaaaaaahs/dcwkxncrahi/s200/steps.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://2.bp.blogspot.com/-emzhwkajpss/uszbfieyk6i/aaaaaaaaaha/ryhsmot4mck/s200/brain.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
http://1.bp.blogspot.com/-mniayevt6ym/urktqrzn9ui/aaaaaaaaago/tun5z6chk5q/s200/difference_sheep.jpg
https://resources.blogblog.com/img/icon18_edit_allbkg.gif
https://resources.blogblog.com/img/icon18_wrench_allbkg.png
https://resources.blogblog.com/img/icon18_wrench_allbkg.png
https://resources.blogblog.com/img/icon18_wrench_allbkg.png
https://resources.blogblog.com/img/icon18_wrench_allbkg.png
https://resources.blogblog.com/img/icon18_wrench_allbkg.png
https://resources.blogblog.com/img/icon18_wrench_allbkg.png
https://resources.blogblog.com/img/icon18_wrench_allbkg.png
https://resources.blogblog.com/img/icon18_wrench_allbkg.png

Ranking:


Alexa Traffic
Daily Global Rank Trend
Daily Reach (Percent)









Majestic SEO











Text on page:

about data mining pages home about sep 23, 2015 iot and big data iot (internet of things) in catching fire recently. there are excitement in silicon valley about this topic. talks and meetups are springing everywhere on iot. when you visit startup scenes (incubabors, and entrepreneur events), you will meet entrepreneurs who make their next bet on this riding trend. what is iot? first, look around the appliances in a household: tvs, stoves, refrigerators,and thermostats can all be connected. when they are internet connected, they gather information and form a complete view of your need at home. they can then help you to experience home more pleasantly (think of nest) outside homes, iot can be installed everywhere. in a store, sensors on the shelves can monitor customer traffic and inventory in real time. surveillance cameras on the streets and in public places are captured crime in action. uber’s taxies send out data on their locations and their passengers. tesla’s cars report their working status.with the coming of connected cars, we are just at the dawn of huge data connection and processing. iot will be very useful for hospitals where patient monitoring is required. in agriculture fields, sensors help to monitor crops, light and humidity. one of the most practical use of iot is building sensors. sensors on lightening fixtures control the light on and off by monitoring people. from such sensors, they can also discover building occupancy and provide a report of building utilization. furthermore, they can provide meeting room occupancy report and dynamically re-allocate meeting rooms when no one shows up. this could be very useful for large companies when meetings rooms are in high demand. the most exciting application of iot would be location tracking of each person’s smart phone (or watch). imagine you walk into starbucks, and your favorite drink will be ready right there waiting for you. this depends on data analysis of your daily consumption, your location (as you walk in), and predictive modeling. the data generated from iot devices are much larger than the internet traffic data. in addition, data need to be processed in real time in order to get response immediately. we are at dawn of big data revolution. from cloud-based data processing, to intelligent algorithms that drive the response from devices, we will see continuing need for data scientists and data engineers. we are living in an exciting age, where data mining plays an central role in the next industrial revolution. how would iot impact the field of data mining? we will see significant posted by junling hu at 9:26 pm 92 comments: email thisblogthis!share to twittershare to facebookshare to pinterest apr 22, 2014 product attribute extraction one of the most popular application for named entity detection is product attribute extraction. automatically extracting product attributes from text helps e-commerce company correctly process user query and match it the right products. marketers want to know what people are talking about related to their products. corporate strategists wants to find out what products are trendy based on user discussions. product attributes are specific properties related to a product. for example, apple iphone 5 with black color has 4 major attributes: company name as “apple”, brand name as “iphone”, generation as “5”, color as “black”. a complete set of attributes define a product. what challenge do we face in extracting product attributes? we can create a dictionary of products and their corresponding attributes, but simply relying on a dictionary has its limitations: a dictionary is not complete as new products are created. there is ambiguity in matching the attributes. misspelling, abbreviations, and acronyms are used often , particularly in social media such as twitter. let’s look a case study of ebay (results were published), and see how they solve this problem. ebay is an online marketplace for sellers and buyers. small sellers create their listings to be sold on ebay. with more than 100 million listings on the site, sellers are selling everything from digital cameras, clothes, cars to collectibles. in order to help users quickly find an item they are interested in, ebay’s (product) search engine has to quickly match user query to existing listings. it is crucial that ebay group existing listing into products so that the search can be done quickly. this requires automatically extracting product attributes from each listing. however, they face the following challenges: each listing is less than 55 character long, which contains little context. text is ungrammatical, with many nouns piled together. misspelling, abbreviations and acronyms occur frequently as we can see, a dictionary-based approach does not work here due to the large volume of new listings and products on the site, and high variation in stating the same product attributes. for example, the brand “river island” has the following appearance in listings: river islands, river islanfd, river islan, ?river island, riverislandtop. how do we apply machine learning to this problem? in a supervised learning approach, a small set of listings are labeled by humans. in each listing, each word is tagged either with an attribute name or “other”. different product categories have different attributes. for example, in clothing category, there are 4 major product attributes: brand, garment type, color and size. the following is an example of a labeled listing: ann taylor asymmetrical knit dress nwt red petite size s brand garment type color size size but supervised learning is very expensive as it requires a lot of labeled data. our goal is using a small amount of labeled data and derive the rest based on bootstrapping. this is called semi-supervised learning. for detail of our semi-supervised learning approach, please see the paper in the reference. reference: duangmanee putthividhya and junling hu, "bootstrapped named entity recognition for product attribute extraction". emnlp 2011: 1557-1567 posted by junling hu at 7:06 pm 149 comments: email thisblogthis!share to twittershare to facebookshare to pinterest labels: text mining jan 20, 2014 statistics and data mining probability theory is the core of data mining. in the early day of data mining, people count the frequent pattern of those who buy diaper also buy beer. this is essentially deriving a conditional probability buying beer given buying diaper. in mathematical term, it is p(buy beer | buy diaper). here is example: suppose 5% of the shoppers buy diaper, and only 2% of shoppers buy both diaper and beer, what the likelihood of some buys diaper after buying beer? the answer is a conditional probability. p“buy beer”|“buy diaper”=p(“buy diaper”, “buy beer”)p(“buy diaper”) =2%5%=40% the so-called “association rule” in data mining can be much clearly understood through these statistics terms. in machine learning, the core concept is predicting whether a data sample belong to a certain population (class). for example, we want to predict whether a person is a potential buyer, or a credit card transaction is fraudulent. such prediction is always associated with a confidence score between 0 and 1. when the confidence score is 1, we are 100% sure about the prediction. generating such confidence score requires statistical inference. statistics is essential when evaluating a data mining model. we use control group or conduct a/b test. for example, if we test with 1000 people in treatment group (group a) vs. 1000 people in control group (group b) and find group a has better performance, is this result conclusive? in other words, is it statistically significant? that has to be answered by statistic knowledge. in data collection phrase, statistics helps us to decide the size of training data (sample), and whether it is representative. not to mention, some current popular machine learning methods such as logistic regression were originally developed in statistics community. as mathematics provides foundation for physics, statistics has now become a foundation for machine learning. its importance will become more prominent overtime in data mining. posted by junling hu at 8:09 pm 30 comments: email thisblogthis!share to twittershare to facebookshare to pinterest labels: frequent pattern mining, overview of data mining jan 3, 2014 data mining conferences in 2014 a new year has come and exciting data mining conferences are lining up on the horizon. here is a list of major conferences in this field. they are good places to socialize with other data science researchers and practitioners, to connect with potential candidates or future employers. last year, i missed kdd 2013. this year, this conference is on my top list. here is a list of all major ones. academic+industry · sdm (siam conference on data mining), april 24-26, philadelphia · kdd (knowledge discover and data mining), aug 24-27, new york city industry · strata (o'reilley conference), feb 11-13, santa clara, california · predictive analytics world, march 16-21, san francisco, california academic · icml (machine learning), june 21-26, beijing, china · aaai (artificial intelligence), july 27-31, quebec city, canada icdm (international conference on data mining), dec 14-17, shenzhen, china specialized area · www (web data, text mining), may 7-11, seoul, korea · acl (natural language processing), june 23-25, baltimore · sigir (text mining), july 6-11, gold coast, australia · interspeech (speech mining), sept 14-18, singapore · recsys (recommender system), oct 6-10, foster city posted by junling hu at 7:32 pm 22 comments: email thisblogthis!share to twittershare to facebookshare to pinterest labels: news and events jan 2, 2014 hiring data scientists many times i am asked by friends and colleagues on who are data scientists. many are interested in answers to a very practical question: “who should i hire as a data scientist?” in my practical experience in building data science teams, i have come to appreciate the following qualities: a fundamental understanding of machine learning. ultimately data mining cannot exist without machine learning, which provides core technique. thus a researcher in machine learning or related fields (such as natural language processing, computer vision, artificial intelligence, or bioinformatics) is an ideal candidate. they have studied different machine learning methods, and know the newest and best techniques to apply to a problem. a sophisticated understanding of statistics and advanced mathematics. such understanding requires years of training. thus a ph.d. degree is typically required for data scientists. training in computer science. ultimately, mining data is a way of computing. it requires design of computer algorithms that are efficient in memory (space) and time. people who are trained in computer science understand the tradeoff of space and time in computer. they understand the basic concept of computational complexity. someone who has majored in computer science would have this training ingrained in their dna. good coding skill. we live in a big data era. in order to work with data, we write code to process them, clean them, and transform them. then we need to create programs on big data platform, and test and improve the program constantly. all of these require good coding skill. data mining is about implementation and testing. programming skill is thus a core requirement. in hiring a data scientist, a few other qualifications are desirable but not required: experience with big data. this enables someone to work in certain environments such as hadoop, and use the tool fast. but such knowledge can be easily learned. knowledge of a specific program language. a good programmer can easily learn any new language quickly. in addition, there are many options to run big data program, from python, to java, to scala. if a person masters any one of these languages, he can be very productive. a good data scientist who satisfies the 4 basic-skill requirements is hard to find today. even though our universities train tens of thousands of them each year, the market demand is way higher than that. many people have read this report by mckinsey, which states that there will be 140,000 job gap (higher demand than talent supply) for data scientists by 2018. even today, in early 2014, companies are struggling to bring in data scientists. those who are on the job market are immediately snatched away by large and well-known companies. today, every company is trying to implement “data strategy” (or “big data strategy” in its fancier term). this is a golden age for data scientists but a challenging time for employers. posted by junling hu at 8:28 pm 44 comments: email thisblogthis!share to twittershare to facebookshare to pinterest labels: data science team nov 11, 2013 overview of data mining when people talk about data mining, sometimes they refer to the methods used in this field, such as machine learning.sometimes they refer to specific data of interest, such as text mining or video mining. many other times, people use the term "big data" simply referring to the infrastructure of data mining, such as hadoop or cassandra. given all of these various references, newcomers of the data mining field feel like lost in a wonderland. here we give an overview picture that connects all these components together. essentially we can think the data mining field consists of 4 major layers. at the top layer is the basic methodology, such as machine learning or frequent pattern mining. the second layer is the application to various data types. for social networks, this is graph mining. for sensors and mobile data, this is stream data mining. the third layer is the infrastructure, where hadoop, nosql and other environments are invented to support large data movement and retrieval. finally, at the fourth layer, we are concerned creating a data mining team, understanding the people profiles that support all of the above operations. given this four-layer separation, we can easily see how various discussions on data mining fall in the picture. for example, the hot topic of "deep learning" belongs to the machine learning layer. more specifically, it is part of unsupervised learning. another topic "natural language processing" (nlp) is part of mining text data. not surprisingly, this field uses machine learning extensively as its core methodology. posted by junling hu at 8:00 am 25 comments: email thisblogthis!share to twittershare to facebookshare to pinterest labels: overview of data mining sep 12, 2013 how casinos are betting on big data reported from yahoo! finance today, casinos like caesars crunch big data to find ways to attract gamblers. they can find out if a gambler is losing or winning too much: "they could win a lot or they lose a lot or they could have something in the middle. … so we do try to make sure that people don't have really unfortunate visits," said caesars ceo gary loveman. caesars has been leading data mining in gambling industry. they have more than 200 data experts (including some real data scientists?) in house to crunch data on royalty program, vip member patterns, and so on. that work was reported in kdd 2011 industry prattice expo. next time when you visit a casino, expect a suddenly friendlier slot machine after you are on a losing streak ... the complete yahoo! finance report is here. posted by junling hu at 5:40 pm 24 comments: email thisblogthis!share to twittershare to facebookshare to pinterest labels: companies sep 4, 2013 machine learning as a service with data widely available and maturity of machine learning methods, commercial application of machine learning has become a reality. a new type of businesses has sprung up: providing machine learning services. it turns out this service fills a big market void. about a dozen companies have rapidly grown in this space in the last few years. even so, many more startups are formed this year. let me take a look at a few successful companies in this space: 1. mu sigma the company derives its name from “mu” and “sigma” in statistics. based in chicago but with significant operation in india, mu sigma provides analytics for marketing, supply chain and risk management. the company was founded in 2004, but has grown to 2,500 employees. today, the company is valued at 1 billion. 2. opera solutions opera solutions was founded in 2004, headerquartered in new york but with significant operation in san diego. the company gathered a group of strong data scientists to work on projects ranging from financial service, health care, insurance to manufacturing. the company scientists participated in important data mining contests such as kdd cup and netflix contest, demonstrating strong technical and research skills. opera solutions works with large to mid-sized companies, and has been growing rapidly. it raised $84 million funding in late 2011, and then another $30 million in may 2013. today the company is valued at $500 million. 3. actian based in redwood city, ca, actian provides data analytic platform. they acquired another data company paraccel recently. 4. fractal analytics the company was founded in 2000, and headquartered in san mateo. it has significant operation presence in india. it provides business intelligence and analytic service for financial service, insurance, and telecommunication industries. 6. grok they provide asset (equipment) management for manufacturers, and predictive modeling. the company was founded in 2005, and based in redwood city. 5. alpine data labs they offer in-house consulting and 24-hour initial delivery. it raised $7.5 million round a funding in may 2011. the company is based in san mateo. 7. skytree the company specifically focuses on machine learning. it just raised $18 million round a funding in april 2013, after starting in early 2012 (with $1.5 million seed fund). the company is based in san jose. posted by junling hu at 7:08 am 35 comments: email thisblogthis!share to twittershare to facebookshare to pinterest labels: companies jul 26, 2013 learning to rank and recommender systems a recommendation problem is essentially a ranking problem: among a list of movies, which should rank higher in order to be recommended? among the job candidates, who should linkedin display to the recruiters? the task of recommendation can be viewed as creating a ranked list. classical approach to recommender systems is based on collaborative filtering. this is an approach using similar users or similar items to make recommendation. collaborative filtering is popularized by the netflix contest from 2006 to 2009, when many teams around the world participated to create movie recommendation based on movie ratings provided by netflix. while collaborative filtering has achieved certain success, it has its limitations. the fundamental problem is the limited information captured in user-item table. each cell of this table is either a rating or some aggregated activity score (such as purchase) on an item (from a specific user). complex information such as user browsing time, clicks, or external events is hard to capture in such table format. a ranking approach to recommendation is much more flexible. it can incorporate all the information as different variables (features). thus it is more explicit. in addition, we can combine ranking with machine learning, allowing ranking function evolve over time based on data. in traditional approach to ranking, a ranking score is generated by some fixed rules. for example, a page’s score depends on links pointing to that page, its text content, and its relevance to search keywords. other information such as visitor’s location, or time of day could all be part of the ranking formula. in this formula, variables and their weights are pre-defined. the idea of learning to rank is using actual user data to create a ranking function. the machine learning procedure for ranking has the following steps: gather training data based on click information. gather all attributes about each data point, such as item information, user information, time of day etc. create a training dataset that has 2 classes: positive (click) and negative (no click). apply a supervised machine learning algorithm (such as logistic regression) to the training data the learned model is our ranking model. for any new data point, the ranking model assigns a probability score between 0 and 1 on whether the item will be clicked (selected). we call this probability score our “ranking score”. the training data are constructed from user visit logs containing user clicks or it may be prepared manually by human raters. the learning to rank approach to recommendation has been adopted by netflix and linkedin today. it is fast and can be trained repeatedly. it is behind those wonderful movie recommendations and connection recommendations we enjoy on these sites. posted by junling hu at 9:54 pm 30 comments: email thisblogthis!share to twittershare to facebookshare to pinterest labels: recommender system, supervised learning jul 9, 2013 text mining: name entity detection (2) in the machine learning approach for detecting named entities, we first create labeled data, which are sentences with each word tagged. the tags correspond to the entities of interest. for example, may smith was born in may. person person date since an entity may have several words, we can use finer tags in iob (in-out-begin) format, which indicates whether the word is the beginning (b) or inside (i) of a phrase. we will have tags for b-person, i-person, b-date, i-date and o (other). with such tagging, the above example becomes may smith was born in may. b-person i-person o o o b-date our training data have every word tagged. our goal is learning about this mapping and apply it a new sentence. in other words, we want to find a mapping from a word (given its context) to an entity label. each word can be represented as a set of features. a machine learning model maps input features to an output (entity class). such features could include: • word identity (the word itself) • word position from the beginning. • word position from the end. • the word before • the word after • is the word capitalized? our training data would look like the following, where each row is a data point. word identity position from the beginning position to the end word before word after is capitalized? class ‘may’ 0 5 / ‘smith’ yes b-person ‘smith’ 1 4 ‘may’ ‘was’ yes i-person ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ‘may’ 5 0 ‘in’ / yes b-date once we create the training data like the above table, we can then train a standard machine learning method such as svm, decision tree or logistic regression to generate a model (which contains machine generated rules). we can then apply this model to any new sentence and detect entity types related to "person" and "date". note that we can only detect entity types contained in our training data. while this may seem a limitation, it allows us to discover new instances or values associated with existing entity types. the approach we have discussed so far is a supervised learning approach, which depends heavily on human-labeled data. when manual labels are hard to come by, we can find ways to create training data by machine. such approach is called semi-supervised learning, which holds a lot of promise when data get large. posted by junling hu at 9:09 pm 18 comments: email thisblogthis!share to twittershare to facebookshare to pinterest labels: text mining jul 8, 2013 text mining: named entity detection an interesting task of text mining is detecting entities in the text. such entities could be a person, a company, a product, or a location. since an entity is associated with a special name, it is also called named entity. for example, the following text contains 3 named entities: apple has hired paul deneve as vice president, reporting to ceo tim cook. the first term “apple” indicates a company, and the second and third are persons. named entity detection (ner) is an important component in social media analysis. it helps us to understand user sentiment on specific products. ner is also important for product search for e-commerce companies. it helps us to understand user search query related to certain products. to map each name to an entity, one solution is using a dictionary of special names. unfortunately, this approach has two serious problems. the first problem is that our dictionary is not complete. new companies are created and new products are sold every day. it is hard to keep track all the new names. the second problem is the ambiguity of associating a name to an entity. the following example illustrates this: as washington politicians argue about the budget reform, it is a good time to look back at george washington’s time. in this text, the first mention of “washington” refers to a city, while the second mention refers to a person. the distinction of these two entities comes from their context. to resolve ambiguity in entity mapping, we can create certain rules to utilize the context. for example, we can create the following rules: when ‘washington’ is followed by ‘politician’, then it refers to a city. when ‘washington’ is preceded by ‘in’, then it refers to a city. when ‘washington’ is preceded by ‘george’, then it refers to a person. but such rules could be too many. for example, each of the following phrases would generate a different rule: “washington mentality”, “washington atmosphere”, “washington debate” as well as “washington biography” and “washington example”. the richness of natural language makes the number of rules exploding and still susceptible to exceptions. instead of manually creating rules, we can apply machine learning. the advantage of machine learning is that it creates patterns automatically from examples. no rule needs to be manually written by humans. the machine learning algorithm takes a set of training examples, and chunk out its own model (that is comparable to rules). if we get new training data, we can re-train the machine learning algorithm and generate a new model quickly. how does the machine learning approach work? i will discuss it in the next post. posted by junling hu at 8:49 pm 66 comments: email thisblogthis!share to twittershare to facebookshare to pinterest labels: text mining apr 30, 2013 the future of video mining ip camera has become inexpensive and ubiquitous. for $149, you can get an ip camera to use at home. this camera saves recording in the cloud, and you never have to worry about buying memory cards. dropcam offers this amazing service. the camera connects to your wi-fi network and stream real-time video 24 hours a day to the server. you can watch your home remotely from your smart phone or laptop. dropcam is a startup based in san francisco. founded in 2009, the company has raised 2 rounds of funding, with $12 million of series b in june 2012. as a young startup, the company has seen rapid user growth. dropcam ceo greg duffy said that dropcam cameras now upload more video per day than youtube. what makes dropcam unique from other ip camera company is its video mining capability. if you want to review your video of the last 7 days, you don’t have to watching them from the beginning to the end. dropcam automatic mark the video segment with motion, so that you can jump to those segments right away. in addition, dropcam plans to implement face and figure detection so that it can give more intelligent viewing options. furthermore, the video view software can detect events such as cat running, dinner gathering and so on. the potential of video mining is limitless. given our limited time to review videos of many hours (or days and weeks) and continuing accumulation of our daily recording, the need for video mining will keep growing. posted by junling hu at 9:09 am 34 comments: email thisblogthis!share to twittershare to facebookshare to pinterest labels: companies, video mining apr 29, 2013 stroke prediction stroke is the third leading cause of death in the united states. it is also the principal cause of serious long-term disability. stroke risk prediction can contribute significantly to its prevention and early treatment. numerous medical studies and data analyses have been conducted to identify effective predictors of stroke. traditional studies adopted features (risk factors) that are verified by clinical trials or selected manually by medical experts. for example, one famous study by lumley and others[1] built a 5-year stroke prediction model using a set of 16 manually selected features. however, these manually selected features could miss some important indicators. for example, past studies have shown that there exist additional risk factors for stroke such as creatinine level, time to walk 15 feet, and others. the framingham study [2] surveyed a wide range of stroke risk factors including blood pressure, the use of anti-hypertensive therapy, diabetes mellitus, cigarette smoking, prior cardiovascular disease, and atrial fibrillation. with a large number of features in current medical datasets, it is a cumbersome task to identify and verify each risk factor manually. machine learning algorithms are capable of identifying features highly related to stroke occurrence efficiently from the huge set of features. by doing so, it can improve the prediction accuracy of stroke risk, in addition to discover new risk factors. in a study by khosla and others [3], a machine-learning based predictive model was built on stroke data, and several feature selection methods were investigated. their model was based on automatically selected features. it outperformed existing stroke model. in addition, they were able to identify risk factors that have not been discovered by traditional approaches. the newly identified factors include: • total medications • any ecg abnormality • min. ankle arm ratio • maximal inflation level • calculated 100 point score • general health • minimental score 35 point it’s exciting to see machine learning play a more important role in medicine and health management. reference: [1] t. lumley, r. a. kronmal, m. cushman, t. a. manolio, and s. goldstein. a stroke prediction score in the elderly: validation and web-based application. journal of clinical epidemiology, 55(2):129–136, february 2002. [2] p. a. wolf, r. b. d'agostino, a. j. belanger, and w. b. kannel. probability of stroke: a risk profile from the framingham study. stroke, 22:312{318, march 1991. [3] aditya khosla, yu cao, cliff chiung-yu lin, hsu-kuang chiu, junling hu, and honglak lee. "an integrated machine learning approach to stroke prediction." in proceedings of the 16th acm sigkdd international conference on knowledge discovery and data mining, pp. 183-192. acm, 2010. posted by junling hu at 9:13 am 22 comments: email thisblogthis!share to twittershare to facebookshare to pinterest labels: supervised learning apr 26, 2013 history of machine learning the development of machine learning is an integral part of the development of artificial intelligence. in the early days of ai, people were interested in building machines that mimic human brains. the perceptron model was invented in 1957, and it generated over optimistic view for ai during 1960s. after marvin minsky pointed out the limitation of this model in expressing complex functions, researchers stopped pursuing this model for the next decade. . in 1970s, the machine learning field was dormant, when expert systems became the mainstream approach in ai. the revival of machine learning came in mid-1980s, when the decision tree model was invented and distributed as software. the model can be viewed by a human and is easy to explain. it is also very versatile and can adapt to widely different problems. it is also in mid 1980s multi-layer neural networks were invented, with enough hidden layers, a neural network can express any function, thus overcoming the limitation of perceptron. we see a revival of the neural network study. both decisions trees and neural networks see wide application in financial applications such as loan approval, fraud detection and portfolio management. they are also applied to a wide-range of industrial process and postal office automation (address recognition). machine learning saw rapid growth in 1990s, due to the invention of world-wide-web and large data gathered on the internet. the fast interaction on the intern called for more automation and more adaptivity in computer systems. around 1995, svm was proposed and have become quickly adopted. svm packages like libsvm, svm light make it a popular method to use. after year 2000, logistic regression was rediscovered and re-designed for large scale machine learning problems . in the ten years following 2003, logistic regression has attracted a lot of research work and has become a practical algorithm in many large-scale commercial systems, particularly in large internet companies. we discussed the development of 4 major machine learning methods. there are other method developed in parallel, but see declining use today in the machine field: naive bayes, bayesian networks, and maximum entropy classifier (most used in natural language processing). in addition to the individual methods, we have seen the invention of ensemble learning, where several classifiers are used together, and its wide adoption today. new machine learning methods are still invented each day. for the newest development, please check out the annual icml (international conference on machine learning) conference. posted by junling hu at 9:58 am 14 comments: email thisblogthis!share to twittershare to facebookshare to pinterest labels: overview of data mining, supervised learning apr 25, 2013 yelp and big data in the big data gurus meetup yesterday, hosted in samsung r&d center in san jose, jimmy retzlaff from yelp gave a talk on big data at yelp. by the end of march 2013, yelp has 36 million user reviews. these reviews cover from restaurants to hair salon, and other local businesses. the number of reviews on yelp website has grown exponentially in the last few years. yelp also sees high traffic now. in january 2013, there are 100 million unique visitors to yelp. the website records 2 terabytes of log data and another 2 terabytes of derived log every day. while this data size is still small comparing to ebay or linkedin, it calls for implementation of big data infrastructure and data mining methods. yelp uses mapreduce extensively and builds its infrastructure on amazon cloud. yelp’s log data contain ad display, user clicks and so on. data mining helps yelp in designing search system, showing ads, and filter fake reviews. in addition, data mining enables products such as "review highlights", "people who viewed this also viewed...". yelp is one example of companies starting to tackle big data and taking advantage of data mining for creating better services. posted by junling hu at 9:15 am 12 comments: email thisblogthis!share to twittershare to facebookshare to pinterest labels: big data, companies, news and events apr 24, 2013 machine learning for anti-virus software symantec is the largest anti-virus software vendor. it has 120 million subscribers, who visit 2 billion websites a day and generate 700 billion submissions. given such a large number of data, it is paramount that an anti-virus software can detect the virus fast and accurately. anti-virus software was originally built manually. security expert review each malware and construct their “signature”. each computer file is checked against such signatures. given the rapid change of malware and many variations, there are not enough human experts to generate all the exact signatures. this gives rise to heuristic or generic signatures which can handle more variations of the same file. however, new types of malware are created every day. thus we need a more adaptive approach to identify malware automatically (without manual effort of creating signatures). this is where machine learning can help. computer virus has come a long way. the first virus “creeper” appeared in 1971. then we have rabbit or wabbit. after that came computer worms like “love letter” and nimda. today computer virus gets much more sophisticated. it evolves much faster and is constantly changing. virus creation is now funded by organizations and some governments. there is big incentive to steal user financial information or companies’ trade secrets. in addition, malware enables certain governments to conduct spying or potential cyber war on their targets. symantec uses about 500 features for their machine learning model. the feature value can be continuous or discrete. such features include: • how did it come this machine (through browser, email, ..) • when/were • how many other files on this machine? • how many clean files on this machine? • is file packed or obfuscated? (mutated?) • does it write, communicate? • how often does it run? • who runs it? researchers at symantec experiment with svm, decision tree and linear regression models. in building a classifier, they are not simply optimizing accuracy or true positive rate. they are also concerned false positive instances where a benign software was classified as malware. such false positive prediction could have high cost for the users. the balance of true positive vs. false positive leads to using roc (receiver operating characteristic) curve. an roc curve plots the trade-off between true positive rate vs. false positive rate. each point on the curve corresponds to a cutoff we choose. they use roc curve to select a target point. below is an illustration of the tradeoff. the chart above suggests that when we aim for 90% true positive rate, we will have 20% false positive rate. however, when we only aim for 80% true positive rate, the false positive rate be reduced to 20%. (a better classifier could shift the roc curve up, so that we achieve high true positive rate for any given false positive rate.) according their researcher, symantec has achieved high accuracy rate (the average of true positive and true negative rate) at 95%. its true positive rate is above 98% and its false positive rate is below 1%. i am a user of norton software (by symantec) and enjoy it. i hope to see more success from symantec and we are winning the war against malware! posted by junling hu at 7:18 am 7 comments: email thisblogthis!share to twittershare to facebookshare to pinterest labels: companies, supervised learning mar 12, 2013 basic steps of applying machine learning methods deploying a machine learning model typically takes the following five steps: 1. data collection. 2. data preprocessing: 1) data cleaning; 2) data transformation; 3) divide data into training and testing sets. 3. build a model on training data. 4. evaluate the model on the test data. 5. if the performance is satisfying, deploy to the real system. this process can be iterative, meaning we can re-start from step 1 again. for example, after a model is deployed, we can collect new data and repeat this process. let’s look at the details of each step: 1. data collection: at this stage, we want to collect all relevant data. for an online business, user click, search queries, and browsing information should be all be captured and saved into the database. in manufacturing, log data capture machine status and activities. such data are used to produce maintenance schedules and predict required parts for replacement. 2. data preprocessing: the data used in machine learning describes factors, attributes, or features of an observation. simple first steps in looking at the data include finding missing values. what is the significance of that missing value? would replacing a missing data value with the median value for the feature be acceptable? for example, perhaps the person filling out a questionnaire doesn't want to reveal his salary. this could be because the person has a very low salary or a very high salary. in this case, perhaps using other features to predict the missing salary data might be appropriate. one might infer the salary from the person’s zip code. the fact that the value is missing may be important. there are machine learning methods that ignore missing values and one of these could be used for this data set. 2) data transformation: in general we work with both numerical and categorical data. numerical data consists of actual numbers, while categorical data have a few discrete values. examples of categorical data include eye color, species type, marriage status, or gender. actually a zip code is categorical. the zip code is a number but there is no meaning to adding two zip codes. there may or may not be an order to categorical data. for instance good, better, best is descriptive categorical data which has an order. 3) after the data has been cleaned and transformed it needs to be split into a training set and a test set. 3. model building: this training data set is used to create the model which is used to predict the answers for new cases in which the answer or target is unknown. for example, section 1.3 describes how a decision tree is built using the training data set. several different modeling techniques have been introduced and will be discussed in detail in future sections. various models can be built using the same training data set. 4. model evaluation once the model is built with the training data, it is used to predict the targets for the test data. first the target values are removed from the test data set. the model is applied to the test data set to predict the target values for the test data. the predicted value of the target is then compared with the actual target value. the accuracy of the model is the percentage of correct predictions made. these accuracies of can be used to compare the different models. several other ways to compare model accuracy are discussed in the next section on performance evaluation. 5. model deployment: this is the most important step. if the speed and accuracy of the model is acceptable, then that model should be deployed in the real system. the model that is used in production should be made with all the available data. models improve with the amount of available data used to create the model. the results of the model need to be incorporated in the business strategy. data mining models provide valuable information which give companies great advantages. obama won the election in part by incorporating the data mining results into his campaign strategy. the last chapter of this book provides information in how a company can incorporate data mining results into its daily business. posted by junling hu at 12:01 pm 25 comments: email thisblogthis!share to twittershare to facebookshare to pinterest labels: supervised learning feb 21, 2013 data mining and neuroscience bradley voytek from university of california at sf gave a talk on data mining and neuroscience yesterday in mountain view. this is part of big data think tank meetup. voytek said, data mining could play an important role in brain study and treatment. imagine applying data mining to reduce open-skull surgery from 45 minutes to 2 minutes, imagine a better way to analyze and understand fmri data. imagine we can apply data mining to understand aging and brain response, helping us to identify ways to improve cognitive function for the elderly. in addition, not mentioned in this talk, are recent studies on applying data mining (specifically association rule mining) to understand alzheimer’s disease, identifying associations in brain region change. voytek also mentioned the need to understand the network of neural signals. this is an exciting domain as it will ultimately help us to improve human cognition, such as hearing and vision. as reported recently in the news, the implementation of “bionic eyes” depends on the mapping of neurons corresponding to visual processing. deeper understanding of neurons for this function could help us eradicate blindness completely. imagine a future when there is no more blindness or deafness. how much human suffering can be eliminated! neuroscience is the next frontier of science. understanding human brain and ultimately human consciousness will resolve age-old question on self and soul. with that understanding, imagine we can preserve consciousness or even human memory. it is not far-fetched to think we can watch another person’s memory like watching a movie, if we can truly understand those neural signals (associated memory retrieval). it is exciting to see data mining can play an important role of in the advancement of science, particularly in neuroscience. posted by junling hu at 7:50 am 351 comments: email thisblogthis!share to twittershare to facebookshare to pinterest labels: news and events feb 11, 2013 data mining vs. machine learning data mining and machine learning used to be two cousins. they have different parents. now they grow increasingly like each other, almost like twins. many times people even call data mining by the name machine learning. the field of machine learning grew out of the effort of building artificial intelligence. its major concern is making a machine learn and adapt to new information. the origin of machine learning can be traced back to 1957 when the perceptron model was invented. this is modeled after neurons in human brain. that prompted the development of neural network model, which flourished in late 1980s. from 1980s to 1990s, the decision tree method has become very popular, owing to the efficient package of c4.5. svm was invented in mid-1990s and it has since been widely used in industry. logistic regression, an old method in statistics, has seen growing adoption in machine learning after 2001 when the book on statistical learning (the elements of statistical learning) was published. the field of data mining grows out of knowledge discovery from databases. in 1993, a seminal paper by rakesh agrawal and two others proposed an efficient algorithm of mining association rules in large databases. this paper promoted many research papers on discovering frequent patterns and more efficient mining algorithms. the early work of data mining in 1990s was linked to creating better sql statement and working with databases directly. data mining has its strong focus on working with industrial problems and getting practical solutions. therefore it concerns with not only data size (large data), but also data processing speed (stream data). in addition, personalized recommender systems and network mining are all developed due to business need, outside the machine learning field. the two major conferences for data mining are kdd (knowledge discovery and data mining) and icdm (international conference on data mining). the two major conferences for machine learning are icml (international conference on machine learning) and nips (neural information processing systems). machine learning researchers attend both types of conferences.however, the data mining conferences have much stronger industrial link. data miners typically have strong foundation in machine learning, but also have a keen interesting in applying it large-scale problems. over time, we will see deeper connection between data mining and machine learning. could they become twins one day? only time will tell. posted by junling hu at 7:52 am 696 comments: email thisblogthis!share to twittershare to facebookshare to pinterest labels: overview of data mining older posts home subscribe to: posts (atom) about the author junling hu is a leader in data mining. and author of an upcoming book on data mining. follow her on twitter: @junling_tech search this blog companies supervised learning news and events recommender system overview of data mining text mining frequent pattern mining video mining audio mining big data clustering data science team graph mining image mining stream mining blog archive ▼  2015 (1) ▼  september (1) iot and big data ►  2014 (4) ►  april (1) ►  january (3) ►  2013 (19) ►  november (1) ►  september (2) ►  july (3) ►  april (5) ►  march (1) ►  february (4) ►  january (3) ►  2012 (18) ►  december (17) ►  november (1) there was an error in this gadget follow by email followers simple template. powered by blogger.


Here you find all texts from your page as Google (googlebot) and others search engines seen it.

Words density analysis:

Numbers of all words: 7794

One word

Two words phrases

Three words phrases

the - 5.52% (430)
and - 2.72% (212)
data - 2.49% (194)
are - 1.83% (143)
for - 1.42% (111)
his - 1.37% (107)
this - 1.31% (102)
mining - 1.17% (91)
her - 1.13% (88)
learn - 1.1% (86)
per - 1.06% (83)
learning - 1.05% (82)
machine - 0.91% (71)
can - 0.89% (69)
all - 0.86% (67)
use - 0.74% (58)
red - 0.67% (52)
model - 0.6% (47)
any - 0.56% (44)
has - 0.51% (40)
with - 0.49% (38)
such - 0.46% (36)
log - 0.46% (36)
here - 0.45% (35)
from - 0.45% (35)
rate - 0.45% (35)
that - 0.45% (35)
cat - 0.44% (34)
was - 0.42% (33)
tim - 0.41% (32)
form - 0.41% (32)
out - 0.41% (32)
but - 0.41% (32)
end - 0.4% (31)
new - 0.4% (31)
they - 0.37% (29)
have - 0.37% (29)
rest - 0.36% (28)
age - 0.36% (28)
product - 0.36% (28)
train - 0.35% (27)
our - 0.35% (27)
on. - 0.35% (27)
day - 0.33% (26)
over - 0.32% (25)
other - 0.32% (25)
example - 0.32% (25)
view - 0.31% (24)
one - 0.31% (24)
you - 0.31% (24)
win - 0.31% (24)
time - 0.31% (24)
big - 0.31% (24)
work - 0.3% (23)
junling - 0.3% (23)
predict - 0.3% (23)
roc - 0.3% (23)
training - 0.28% (22)
book - 0.28% (22)
blog - 0.28% (22)
more - 0.28% (22)
face - 0.28% (22)
when - 0.28% (22)
company - 0.27% (21)
email - 0.27% (21)
how - 0.27% (21)
user - 0.27% (21)
its - 0.27% (21)
low - 0.27% (21)
text - 0.27% (21)
very - 0.27% (21)
call - 0.27% (21)
- 0.27% (21)
person - 0.27% (21)
word - 0.26% (20)
each - 0.26% (20)
not - 0.26% (20)
2013 - 0.26% (20)
ner - 0.26% (20)
twittershare - 0.24% (19)
thisblogthis!share - 0.24% (19)
able - 0.24% (19)
ten - 0.24% (19)
get - 0.24% (19)
there - 0.24% (19)
comments: - 0.24% (19)
pinterest - 0.24% (19)
facebookshare - 0.24% (19)
labels - 0.24% (19)
posted - 0.24% (19)
see - 0.24% (19)
name - 0.23% (18)
set - 0.23% (18)
approach - 0.23% (18)
buy - 0.23% (18)
companies - 0.23% (18)
owing - 0.23% (18)
labels: - 0.23% (18)
entity - 0.23% (18)
feature - 0.23% (18)
method - 0.23% (18)
positive - 0.22% (17)
search - 0.22% (17)
create - 0.22% (17)
will - 0.22% (17)
process - 0.22% (17)
rank - 0.22% (17)
based - 0.22% (17)
two - 0.22% (17)
war - 0.22% (17)
core - 0.22% (17)
attribute - 0.22% (17)
large - 0.22% (17)
data. - 0.21% (16)
follow - 0.21% (16)
many - 0.21% (16)
come - 0.21% (16)
may - 0.21% (16)
list - 0.21% (16)
mar - 0.21% (16)
understand - 0.21% (16)
conference - 0.21% (16)
example, - 0.21% (16)
now - 0.21% (16)
value - 0.19% (15)
their - 0.19% (15)
which - 0.19% (15)
used - 0.19% (15)
could - 0.19% (15)
stroke - 0.19% (15)
features - 0.19% (15)
people - 0.18% (14)
►  - 0.18% (14)
also - 0.18% (14)
supervised - 0.18% (14)
row - 0.18% (14)
about - 0.18% (14)
test - 0.18% (14)
statistic - 0.18% (14)
information - 0.18% (14)
human - 0.17% (13)
attributes - 0.17% (13)
scientist - 0.17% (13)
even - 0.17% (13)
video - 0.17% (13)
bet - 0.17% (13)
some - 0.17% (13)
field - 0.17% (13)
system - 0.17% (13)
rule - 0.17% (13)
problem - 0.17% (13)
science - 0.17% (13)
after - 0.15% (12)
detect - 0.15% (12)
part - 0.15% (12)
score - 0.15% (12)
ranking - 0.15% (12)
following - 0.15% (12)
addition - 0.15% (12)
200 - 0.15% (12)
refer - 0.15% (12)
who - 0.15% (12)
require - 0.15% (12)
fact - 0.15% (12)
help - 0.15% (12)
high - 0.15% (12)
give - 0.15% (12)
apply - 0.14% (11)
provide - 0.14% (11)
intern - 0.14% (11)
way - 0.14% (11)
washington - 0.14% (11)
vice - 0.14% (11)
miss - 0.14% (11)
million - 0.14% (11)
came - 0.14% (11)
these - 0.14% (11)
computer - 0.14% (11)
cover - 0.14% (11)
then - 0.14% (11)
processing - 0.14% (11)
san - 0.14% (11)
iot - 0.14% (11)
yelp - 0.14% (11)
year - 0.14% (11)
know - 0.14% (11)
methods - 0.14% (11)
products - 0.14% (11)
dec - 0.14% (11)
need - 0.14% (11)
listing - 0.14% (11)
apr - 0.14% (11)
old - 0.13% (10)
risk - 0.13% (10)
network - 0.13% (10)
late - 0.13% (10)
major - 0.13% (10)
build - 0.13% (10)
like - 0.13% (10)
prediction - 0.13% (10)
discover - 0.13% (10)
grow - 0.13% (10)
today - 0.13% (10)
class - 0.13% (10)
let - 0.13% (10)
scientists - 0.13% (10)
manual - 0.13% (10)
type - 0.13% (10)
mining) - 0.12% (9)
important - 0.12% (9)
date - 0.12% (9)
where - 0.12% (9)
generate - 0.12% (9)
connect - 0.12% (9)
diaper - 0.12% (9)
data, - 0.12% (9)
report - 0.12% (9)
algorithm - 0.12% (9)
research - 0.12% (9)
statistics - 0.12% (9)
learning. - 0.12% (9)
rules - 0.12% (9)
become - 0.12% (9)
using - 0.12% (9)
different - 0.12% (9)
lot - 0.12% (9)
point - 0.12% (9)
were - 0.12% (9)
layer - 0.12% (9)
true - 0.12% (9)
virus - 0.12% (9)
find - 0.12% (9)
addition, - 0.12% (9)
them - 0.1% (8)
click - 0.1% (8)
off - 0.1% (8)
size - 0.1% (8)
ratio - 0.1% (8)
building - 0.1% (8)
much - 0.1% (8)
manually - 0.1% (8)
your - 0.1% (8)
mining. - 0.1% (8)
factor - 0.1% (8)
first - 0.1% (8)
given - 0.1% (8)
tree - 0.1% (8)
specific - 0.1% (8)
once - 0.1% (8)
yes - 0.1% (8)
language - 0.1% (8)
been - 0.1% (8)
real - 0.1% (8)
neural - 0.1% (8)
false - 0.1% (8)
recommendation - 0.1% (8)
top - 0.1% (8)
city - 0.1% (8)
review - 0.1% (8)
opera - 0.1% (8)
service - 0.1% (8)
camera - 0.1% (8)
software - 0.1% (8)
wide - 0.1% (8)
target - 0.1% (8)
group - 0.1% (8)
every - 0.1% (8)
item - 0.09% (7)
mark - 0.09% (7)
file - 0.09% (7)
business - 0.09% (7)
pattern - 0.09% (7)
regression - 0.09% (7)
early - 0.09% (7)
knowledge - 0.09% (7)
beer - 0.09% (7)
events - 0.09% (7)
understanding - 0.09% (7)
home - 0.09% (7)
next - 0.09% (7)
malware - 0.09% (7)
2014 - 0.09% (7)
svm - 0.09% (7)
- 0.09% (7)
systems - 0.09% (7)
select - 0.09% (7)
into - 0.09% (7)
identify - 0.09% (7)
try - 0.09% (7)
look - 0.09% (7)
what - 0.09% (7)
sep - 0.09% (7)
map - 0.09% (7)
step - 0.09% (7)
own - 0.09% (7)
discuss - 0.09% (7)
visit - 0.09% (7)
study - 0.09% (7)
dropcam - 0.09% (7)
good - 0.09% (7)
application - 0.09% (7)
factors - 0.09% (7)
invented - 0.09% (7)
meet - 0.09% (7)
run - 0.09% (7)
program - 0.09% (7)
day. - 0.09% (7)
most - 0.09% (7)
than - 0.09% (7)
overview - 0.09% (7)
want - 0.09% (7)
conferences - 0.09% (7)
named - 0.09% (7)
logistic - 0.08% (6)
(1) - 0.08% (6)
related - 0.08% (6)
significant - 0.08% (6)
rating - 0.08% (6)
provides - 0.08% (6)
jul - 0.08% (6)
kdd - 0.08% (6)
round - 0.08% (6)
function - 0.08% (6)
complete - 0.08% (6)
creating - 0.08% (6)
mining), - 0.08% (6)
entities - 0.08% (6)
recommender - 0.08% (6)
better - 0.08% (6)
contain - 0.08% (6)
detection - 0.08% (6)
gather - 0.08% (6)
make - 0.08% (6)
table - 0.08% (6)
quickly - 0.08% (6)
dictionary - 0.08% (6)
number - 0.08% (6)
categorical - 0.08% (6)
values - 0.08% (6)
thus - 0.08% (6)
term - 0.08% (6)
play - 0.08% (6)
researcher - 0.08% (6)
missing - 0.08% (6)
ebay - 0.08% (6)
“washington - 0.08% (6)
labeled - 0.08% (6)
order - 0.08% (6)
collect - 0.08% (6)
accuracy - 0.08% (6)
river - 0.08% (6)
exciting - 0.08% (6)
would - 0.08% (6)
listings - 0.08% (6)
automatic - 0.08% (6)
100 - 0.08% (6)
built - 0.08% (6)
imagine - 0.08% (6)
jan - 0.08% (6)
exist - 0.08% (6)
probability - 0.08% (6)
brain - 0.08% (6)
sensors - 0.08% (6)
symantec - 0.08% (6)
certain - 0.08% (6)
learning, - 0.08% (6)
mid - 0.08% (6)
should - 0.08% (6)
talk - 0.08% (6)
market - 0.08% (6)
decision - 0.08% (6)
types - 0.08% (6)
mining, - 0.08% (6)
frequent - 0.08% (6)
above - 0.06% (5)
times - 0.06% (5)
improve - 0.06% (5)
news - 0.06% (5)
founded - 0.06% (5)
another - 0.06% (5)
analytic - 0.06% (5)
team - 0.06% (5)
strong - 0.06% (5)
memory - 0.06% (5)
ways - 0.06% (5)
few - 0.06% (5)
efficient - 0.06% (5)
works - 0.06% (5)
skill - 0.06% (5)
stream - 0.06% (5)
code - 0.06% (5)
intelligence - 0.06% (5)
natural - 0.06% (5)
11, - 0.06% (5)
rapid - 0.06% (5)
feb - 0.06% (5)
problems - 0.06% (5)
refers - 0.06% (5)
does - 0.06% (5)
watch - 0.06% (5)
however, - 0.06% (5)
requires - 0.06% (5)
selected - 0.06% (5)
media - 0.06% (5)
industry - 0.06% (5)
development - 0.06% (5)
limitation - 0.06% (5)
expert - 0.06% (5)
color - 0.06% (5)
helps - 0.06% (5)
solution - 0.06% (5)
automatically - 0.06% (5)
popular - 0.06% (5)
long - 0.06% (5)
curve - 0.06% (5)
location - 0.06% (5)
deploy - 0.06% (5)
include - 0.06% (5)
practical - 0.06% (5)
light - 0.06% (5)
salary - 0.06% (5)
set. - 0.06% (5)
models - 0.06% (5)
internet - 0.06% (5)
mention - 0.06% (5)
international - 0.06% (5)
called - 0.06% (5)
“buy - 0.06% (5)
movie - 0.06% (5)
while - 0.06% (5)
last - 0.06% (5)
capture - 0.06% (5)
fast - 0.06% (5)
result - 0.06% (5)
several - 0.06% (5)
model. - 0.06% (5)
whether - 0.06% (5)
implement - 0.06% (5)
only - 0.06% (5)
answer - 0.06% (5)
those - 0.06% (5)
studies - 0.05% (4)
infrastructure - 0.05% (4)
steps - 0.05% (4)
applying - 0.05% (4)
solutions - 0.05% (4)
actual - 0.05% (4)
management - 0.05% (4)
various - 0.05% (4)
discussed - 0.05% (4)
26, - 0.05% (4)
second - 0.05% (4)
zip - 0.05% (4)
netflix - 0.05% (4)
financial - 0.05% (4)
results - 0.05% (4)
neuroscience - 0.05% (4)
recent - 0.05% (4)
association - 0.05% (4)
concern - 0.05% (4)
1990s - 0.05% (4)
linked - 0.05% (4)
viewed - 0.05% (4)
rate. - 0.05% (4)
did - 0.05% (4)
2011 - 0.05% (4)
too - 0.05% (4)
others - 0.05% (4)
raised - 0.05% (4)
funding - 0.05% (4)
operation - 0.05% (4)
position - 0.05% (4)
features. - 0.05% (4)
adapt - 0.05% (4)
1980s - 0.05% (4)
networks - 0.05% (4)
mapping - 0.05% (4)
beginning - 0.05% (4)
trade - 0.05% (4)
uses - 0.05% (4)
classifier - 0.05% (4)
learning) - 0.05% (4)
reviews - 0.05% (4)
companies, - 0.05% (4)
text. - 0.05% (4)
filter - 0.05% (4)
anti-virus - 0.05% (4)
signatures - 0.05% (4)
correspond - 0.05% (4)
smith - 0.05% (4)
think - 0.05% (4)
artificial - 0.05% (4)
corporate - 0.05% (4)
apple - 0.05% (4)
ann - 0.05% (4)
april - 0.05% (4)
predictive - 0.05% (4)
transform - 0.05% (4)
potential - 0.05% (4)
clean - 0.05% (4)
ultimately - 0.05% (4)
associated - 0.05% (4)
existing - 0.05% (4)
brand - 0.05% (4)
monitor - 0.05% (4)
space - 0.05% (4)
generated - 0.05% (4)
between - 0.05% (4)
basic - 0.05% (4)
essential - 0.05% (4)
future - 0.05% (4)
statistical - 0.05% (4)
small - 0.05% (4)
researchers - 0.05% (4)
vs. - 0.05% (4)
both - 0.05% (4)
paper - 0.05% (4)
products. - 0.05% (4)
social - 0.05% (4)
years - 0.05% (4)
algorithms - 0.05% (4)
march - 0.05% (4)
tens - 0.05% (4)
industrial - 0.05% (4)
hard - 0.05% (4)
(international - 0.05% (4)
today, - 0.05% (4)
startup - 0.05% (4)
required - 0.05% (4)
depends - 0.05% (4)
phone - 0.05% (4)
buying - 0.05% (4)
role - 0.05% (4)
foundation - 0.04% (3)
tagged - 0.04% (3)
performance - 0.04% (3)
achieve - 0.04% (3)
person’s - 0.04% (3)
it. - 0.04% (3)
traditional - 0.04% (3)
success - 0.04% (3)
extraction - 0.04% (3)
year, - 0.04% (3)
still - 0.04% (3)
same - 0.04% (3)
icml - 0.04% (3)
analytics - 0.04% (3)
california - 0.04% (3)
due - 0.04% (3)
linkedin - 0.04% (3)
control - 0.04% (3)
task - 0.04% (3)
variation - 0.04% (3)
collaborative - 0.04% (3)
filtering - 0.04% (3)
incorporate - 0.04% (3)
contest - 0.04% (3)
approach, - 0.04% (3)
world - 0.04% (3)
scale - 0.04% (3)
status - 0.04% (3)
patterns - 0.04% (3)
complex - 0.04% (3)
meeting - 0.04% (3)
room - 0.04% (3)
advantage - 0.04% (3)
‘washington’ - 0.04% (3)
treatment - 0.04% (3)
developed - 0.04% (3)
amount - 0.04% (3)
card - 0.04% (3)
‘may’ - 0.04% (3)
response - 0.04% (3)
gets - 0.04% (3)
created - 0.04% (3)
problems. - 0.04% (3)
query - 0.04% (3)
svm, - 0.04% (3)
website - 0.04% (3)
sentence - 0.04% (3)
end. - 0.04% (3)
derive - 0.04% (3)
essentially - 0.04% (3)
semi-supervised - 0.04% (3)
january - 0.04% (3)
detail - 0.04% (3)
billion - 0.04% (3)
age, - 0.04% (3)
person, - 0.04% (3)
special - 0.04% (3)
500 - 0.04% (3)
confidence - 0.04% (3)
clicks - 0.04% (3)
2012 - 0.04% (3)
adopted - 0.04% (3)
collection - 0.04% (3)
(or - 0.04% (3)
walk - 0.04% (3)
right - 0.04% (3)
(2) - 0.04% (3)
words, - 0.04% (3)
tags - 0.04% (3)
daily - 0.04% (3)
companies. - 0.04% (3)
(the - 0.04% (3)
since - 0.04% (3)
conduct - 0.04% (3)
match - 0.04% (3)
b-person - 0.04% (3)
i-person - 0.04% (3)
b-date - 0.04% (3)
files - 0.04% (3)
sure - 0.04% (3)
meetup - 0.04% (3)
include: - 0.04% (3)
june - 0.04% (3)
seen - 0.04% (3)
extracting - 0.04% (3)
ceo - 0.04% (3)
reported - 0.04% (3)
origin - 0.04% (3)
time. - 0.04% (3)
caesars - 0.04% (3)
cameras - 0.04% (3)
simply - 0.04% (3)
question - 0.04% (3)
users - 0.04% (3)
neurons - 0.04% (3)
said - 0.04% (3)
recently - 0.04% (3)
traffic - 0.04% (3)
captured - 0.04% (3)
interested - 0.04% (3)
experts - 0.04% (3)
house - 0.04% (3)
design - 0.04% (3)
in, - 0.04% (3)
medical - 0.04% (3)
typically - 0.04% (3)
widely - 0.04% (3)
available - 0.04% (3)
ambiguity - 0.04% (3)
perceptron - 0.04% (3)
cars - 0.04% (3)
easily - 0.04% (3)
(3) - 0.04% (3)
particularly - 0.04% (3)
arm - 0.04% (3)
job - 0.04% (3)
nov - 0.04% (3)
higher - 0.04% (3)
demand - 0.04% (3)
today. - 0.04% (3)
hadoop - 0.04% (3)
case - 0.04% (3)
around - 0.04% (3)
sellers - 0.04% (3)
solve - 0.04% (3)
enables - 0.04% (3)
graph - 0.04% (3)
implementation - 0.04% (3)
third - 0.04% (3)
discovery - 0.04% (3)
databases - 0.04% (3)
attributes. - 0.04% (3)
experience - 0.04% (3)
topic - 0.04% (3)
reduce - 0.04% (3)
science. - 0.04% (3)
growing - 0.04% (3)
compare - 0.04% (3)
connected - 0.04% (3)
quickly. - 0.04% (3)
city. - 0.04% (3)
scientists. - 0.04% (3)
methods, - 0.04% (3)
(such - 0.04% (3)
health - 0.04% (3)
context. - 0.04% (3)
city, - 0.04% (3)
instance - 0.04% (3)
cup - 0.04% (3)
contains - 0.04% (3)
modeling - 0.04% (3)
section - 0.04% (3)
gold - 0.04% (3)
days - 0.04% (3)
management. - 0.04% (3)
won - 0.04% (3)
take - 0.04% (3)
examples - 0.04% (3)
voytek - 0.04% (3)
grown - 0.04% (3)
working - 0.04% (3)
cause - 0.04% (3)
formed - 0.04% (3)
sigma - 0.04% (3)
eye - 0.04% (3)
2013, - 0.04% (3)
coming - 0.04% (3)
specifically - 0.04% (3)
sept - 0.04% (3)
july - 0.04% (3)
connection - 0.04% (3)
study. - 0.03% (2)
[3] - 0.03% (2)
challenge - 0.03% (2)
terabytes - 0.03% (2)
profile - 0.03% (2)
applied - 0.03% (2)
automation - 0.03% (2)
black - 0.03% (2)
e-commerce - 0.03% (2)
february - 0.03% (2)
[1] - 0.03% (2)
invention - 0.03% (2)
1990s, - 0.03% (2)
fraud - 0.03% (2)
intelligence. - 0.03% (2)
acm - 0.03% (2)
attributes, - 0.03% (2)
check - 0.03% (2)
growth - 0.03% (2)
25, - 0.03% (2)
methods. - 0.03% (2)
define - 0.03% (2)
large-scale - 0.03% (2)
product. - 0.03% (2)
corresponding - 0.03% (2)
revival - 0.03% (2)
misspelling, - 0.03% (2)
attributes: - 0.03% (2)
enough - 0.03% (2)
express - 0.03% (2)
adoption - 0.03% (2)
proposed - 0.03% (2)
gave - 0.03% (2)
yelp. - 0.03% (2)
iphone - 0.03% (2)
reviews. - 0.03% (2)
away - 0.03% (2)
smart - 0.03% (2)
continuing - 0.03% (2)
strategy. - 0.03% (2)
deeper - 0.03% (2)
places - 0.03% (2)
action. - 0.03% (2)
mentioned - 0.03% (2)
minutes - 0.03% (2)
yesterday - 0.03% (2)
21, - 0.03% (2)
election - 0.03% (2)
made - 0.03% (2)
consciousness - 0.03% (2)
deployed - 0.03% (2)
speed - 0.03% (2)
correct - 0.03% (2)
targets - 0.03% (2)
evaluation - 0.03% (2)
just - 0.03% (2)
discrete - 0.03% (2)
dawn - 0.03% (2)
blindness - 0.03% (2)
self - 0.03% (2)
huge - 0.03% (2)
subscribe - 0.03% (2)
(4) - 0.03% (2)
2015 - 0.03% (2)
september - 0.03% (2)
recently. - 0.03% (2)
▼  - 0.03% (2)
everywhere - 0.03% (2)
entrepreneur - 0.03% (2)
author - 0.03% (2)
posts - 0.03% (2)
signals - 0.03% (2)
twins - 0.03% (2)
home. - 0.03% (2)
focus - 0.03% (2)
sql - 0.03% (2)
outside - 0.03% (2)
databases. - 0.03% (2)
package - 0.03% (2)
1957 - 0.03% (2)
numerical - 0.03% (2)
processing. - 0.03% (2)
construct - 0.03% (2)
constantly - 0.03% (2)
below - 0.03% (2)
analysis - 0.03% (2)
(as - 0.03% (2)
models. - 0.03% (2)
machine? - 0.03% (2)
modeling. - 0.03% (2)
devices - 0.03% (2)
governments - 0.03% (2)
way. - 0.03% (2)
rate, - 0.03% (2)
revolution. - 0.03% (2)
effort - 0.03% (2)
variations - 0.03% (2)
processing, - 0.03% (2)
change - 0.03% (2)
signatures. - 0.03% (2)
against - 0.03% (2)
intelligent - 0.03% (2)
aim - 0.03% (2)
20% - 0.03% (2)
infer - 0.03% (2)
simple - 0.03% (2)
might - 0.03% (2)
useful - 0.03% (2)
monitoring - 0.03% (2)
salary. - 0.03% (2)
perhaps - 0.03% (2)
values. - 0.03% (2)
occupancy - 0.03% (2)
furthermore, - 0.03% (2)
describes - 0.03% (2)
up, - 0.03% (2)
repeat - 0.03% (2)
meaning - 0.03% (2)
system. - 0.03% (2)
rooms - 0.03% (2)
up. - 0.03% (2)
testing - 0.03% (2)
preprocessing: - 0.03% (2)
general - 0.03% (2)
acronyms - 0.03% (2)
george - 0.03% (2)
often - 0.03% (2)
without - 0.03% (2)
icdm - 0.03% (2)
mateo. - 0.03% (2)
2000, - 0.03% (2)
redwood - 0.03% (2)
actian - 0.03% (2)
hiring - 0.03% (2)
answers - 0.03% (2)
hire - 0.03% (2)
fundamental - 0.03% (2)
fields - 0.03% (2)
starting - 0.03% (2)
participated - 0.03% (2)
insurance - 0.03% (2)
service, - 0.03% (2)
gathered - 0.03% (2)
newest - 0.03% (2)
valued - 0.03% (2)
2004, - 0.03% (2)
best - 0.03% (2)
techniques - 0.03% (2)
offer - 0.03% (2)
(with - 0.03% (2)
sophisticated - 0.03% (2)
2013. - 0.03% (2)
field. - 0.03% (2)
steps: - 0.03% (2)
idea - 0.03% (2)
candidates - 0.03% (2)
evolve - 0.03% (2)
variables - 0.03% (2)
employers. - 0.03% (2)
time, - 0.03% (2)
browsing - 0.03% (2)
limited - 0.03% (2)
china - 0.03% (2)
achieved - 0.03% (2)
list. - 0.03% (2)
teams - 0.03% (2)
2009, - 0.03% (2)
similar - 0.03% (2)
(knowledge - 0.03% (2)
display - 0.03% (2)
among - 0.03% (2)
york - 0.03% (2)
academic - 0.03% (2)
supply - 0.03% (2)
so, - 0.03% (2)
point, - 0.03% (2)
hadoop, - 0.03% (2)
write - 0.03% (2)
discussions - 0.03% (2)
them, - 0.03% (2)
concerned - 0.03% (2)
support - 0.03% (2)
networks, - 0.03% (2)
types. - 0.03% (2)
environments - 0.03% (2)
consists - 0.03% (2)
connects - 0.03% (2)
skill. - 0.03% (2)
picture - 0.03% (2)
options - 0.03% (2)
program, - 0.03% (2)
sometimes - 0.03% (2)
read - 0.03% (2)
states - 0.03% (2)
gap - 0.03% (2)
strategy” - 0.03% (2)
immediately - 0.03% (2)
live - 0.03% (2)
extensively - 0.03% (2)
years. - 0.03% (2)
tradeoff - 0.03% (2)
rapidly - 0.03% (2)
services. - 0.03% (2)
businesses - 0.03% (2)
commercial - 0.03% (2)
here. - 0.03% (2)
... - 0.03% (2)
trained - 0.03% (2)
industry. - 0.03% (2)
leading - 0.03% (2)
unfortunate - 0.03% (2)
12, - 0.03% (2)
someone - 0.03% (2)
winning - 0.03% (2)
losing - 0.03% (2)
gambler - 0.03% (2)
attract - 0.03% (2)
coding - 0.03% (2)
crunch - 0.03% (2)
finance - 0.03% (2)
yahoo! - 0.03% (2)
casinos - 0.03% (2)
information. - 0.03% (2)
information, - 0.03% (2)
level - 0.03% (2)
either - 0.03% (2)
abbreviations - 0.03% (2)
hours - 0.03% (2)
occur - 0.03% (2)
recording - 0.03% (2)
takes - 0.03% (2)
needs - 0.03% (2)
humans. - 0.03% (2)
makes - 0.03% (2)
well - 0.03% (2)
preceded - 0.03% (2)
greg - 0.03% (2)
garment - 0.03% (2)
resolve - 0.03% (2)
comes - 0.03% (2)
person. - 0.03% (2)
type, - 0.03% (2)
back - 0.03% (2)
dress - 0.03% (2)
track - 0.03% (2)
keep - 0.03% (2)
together. - 0.03% (2)
unique - 0.03% (2)
serious - 0.03% (2)
range - 0.03% (2)
discovered - 0.03% (2)
let’s - 0.03% (2)
khosla - 0.03% (2)
problem. - 0.03% (2)
identifying - 0.03% (2)
manually. - 0.03% (2)
online - 0.03% (2)
disease, - 0.03% (2)
including - 0.03% (2)
[2] - 0.03% (2)
watching - 0.03% (2)
framingham - 0.03% (2)
sold - 0.03% (2)
site, - 0.03% (2)
lumley - 0.03% (2)
clinical - 0.03% (2)
treatment. - 0.03% (2)
engine - 0.03% (2)
segment - 0.03% (2)
less - 0.03% (2)
character - 0.03% (2)
expensive - 0.03% (2)
goal - 0.03% (2)
dataset - 0.03% (2)
tagged. - 0.03% (2)
before - 0.03% (2)
identity - 0.03% (2)
class). - 0.03% (2)
prediction. - 0.03% (2)
indicates - 0.03% (2)
1000 - 0.03% (2)
may. - 0.03% (2)
born - 0.03% (2)
(group - 0.03% (2)
detecting - 0.03% (2)
point. - 0.03% (2)
mining: - 0.03% (2)
system, - 0.03% (2)
enjoy - 0.03% (2)
recommendations - 0.03% (2)
current - 0.03% (2)
originally - 0.03% (2)
mathematics - 0.03% (2)
learned - 0.03% (2)
negative - 0.03% (2)
lining - 0.03% (2)
capitalized? - 0.03% (2)
‘smith’ - 0.03% (2)
names. - 0.03% (2)
far - 0.03% (2)
component - 0.03% (2)
“apple” - 0.03% (2)
please - 0.03% (2)
entity. - 0.03% (2)
company, - 0.03% (2)
interesting - 0.03% (2)
9:09 - 0.03% (2)
reference: - 0.03% (2)
hu, - 0.03% (2)
recognition - 0.03% (2)
belong - 0.03% (2)
149 - 0.03% (2)
instances - 0.03% (2)
conditional - 0.03% (2)
rules). - 0.03% (2)
shoppers - 0.03% (2)
through - 0.03% (2)
table, - 0.03% (2)
concept - 0.03% (2)
‘in’ - 0.03% (2)
sample - 0.03% (2)
november - 0.03% (2)
machine learning - 0.77% (60)
data mining - 0.76% (59)
of the - 0.31% (24)
in the - 0.31% (24)
junling hu - 0.28% (22)
we can - 0.26% (20)
such as - 0.26% (20)
to pinterest - 0.24% (19)
to facebookshare - 0.24% (19)
to twittershare - 0.24% (19)
by junling - 0.24% (19)
big data - 0.24% (19)
posted by - 0.24% (19)
comments: email - 0.24% (19)
twittershare to - 0.24% (19)
facebookshare to - 0.24% (19)
email thisblogthis!share - 0.24% (19)
thisblogthis!share to - 0.24% (19)
pinterest labels: - 0.23% (18)
to the - 0.22% (17)
for example, - 0.21% (16)
training data - 0.21% (16)
can be - 0.19% (15)
is the - 0.19% (15)
on the - 0.19% (15)
of data - 0.18% (14)
supervised learning - 0.17% (13)
the company - 0.17% (13)
this is - 0.14% (11)
the following - 0.14% (11)
positive rate - 0.14% (11)
in addition - 0.14% (11)
text mining - 0.13% (10)
the model - 0.13% (10)
at the - 0.13% (10)
data scientists - 0.12% (9)
from the - 0.12% (9)
in this - 0.12% (9)
in addition, - 0.12% (9)
of machine - 0.12% (9)
the data - 0.12% (9)
the machine - 0.12% (9)
on and - 0.1% (8)
false positive - 0.1% (8)
true positive - 0.1% (8)
used to - 0.1% (8)
on data - 0.1% (8)
for the - 0.1% (8)
and data - 0.09% (7)
there are - 0.09% (7)
video mining - 0.09% (7)
here is - 0.09% (7)
machine learning. - 0.09% (7)
based on - 0.09% (7)
learning methods - 0.09% (7)
model is - 0.08% (6)
set of - 0.08% (6)
learning approach - 0.08% (6)
a data - 0.08% (6)
data set - 0.08% (6)
to create - 0.08% (6)
conference on - 0.08% (6)
approach to - 0.08% (6)
in san - 0.08% (6)
logistic regression - 0.08% (6)
overview of - 0.08% (6)
based in - 0.08% (6)
when the - 0.08% (6)
company is - 0.08% (6)
want to - 0.08% (6)
is also - 0.08% (6)
product attributes - 0.08% (6)
risk factor - 0.08% (6)
will be - 0.08% (6)
they are - 0.08% (6)
refers to - 0.06% (5)
to predict - 0.06% (5)
data mining. - 0.06% (5)
related to - 0.06% (5)
frequent pattern - 0.06% (5)
the word - 0.06% (5)
in data - 0.06% (5)
to identify - 0.06% (5)
in machine - 0.06% (5)
out the - 0.06% (5)
create the - 0.06% (5)
in computer - 0.06% (5)
for data - 0.06% (5)
the test - 0.06% (5)
of these - 0.06% (5)
founded in - 0.06% (5)
all the - 0.06% (5)
with the - 0.06% (5)
decision tree - 0.06% (5)
model was - 0.06% (5)
the next - 0.06% (5)
part of - 0.06% (5)
order to - 0.06% (5)
we will - 0.06% (5)
data in - 0.06% (5)
test data - 0.06% (5)
to understand - 0.06% (5)
the training - 0.06% (5)
named entity - 0.06% (5)
risk factors - 0.06% (5)
with a - 0.06% (5)
data mining, - 0.06% (5)
an entity - 0.06% (5)
could be - 0.06% (5)
and so - 0.06% (5)
a dictionary - 0.06% (5)
come a - 0.06% (5)
international conference - 0.06% (5)
categorical data - 0.06% (5)
neural network - 0.06% (5)
and other - 0.06% (5)
a person - 0.06% (5)
and it - 0.06% (5)
used in - 0.06% (5)
need to - 0.05% (4)
in may - 0.05% (4)
(international conference - 0.05% (4)
the second - 0.05% (4)
news and - 0.05% (4)
for an - 0.05% (4)
the beginning - 0.05% (4)
machine learning, - 0.05% (4)
the first - 0.05% (4)
as its - 0.05% (4)
data set. - 0.05% (4)
and we - 0.05% (4)
predict the - 0.05% (4)
zip code - 0.05% (4)
of stroke - 0.05% (4)
on. the - 0.05% (4)
the development - 0.05% (4)
is used - 0.05% (4)
it has - 0.05% (4)
hard to - 0.05% (4)
using a - 0.05% (4)
one of - 0.05% (4)
a machine - 0.05% (4)
large data - 0.05% (4)
stroke prediction - 0.05% (4)
was founded - 0.05% (4)
data and - 0.05% (4)
the most - 0.05% (4)
labels: companies - 0.05% (4)
the last - 0.05% (4)
use of - 0.05% (4)
generate a - 0.05% (4)
learning to - 0.05% (4)
4 major - 0.05% (4)
a ranking - 0.05% (4)
labeled data - 0.05% (4)
learning algorithm - 0.05% (4)
development of - 0.05% (4)
has been - 0.05% (4)
number of - 0.05% (4)
entity detection - 0.05% (4)
anti-virus software - 0.05% (4)
and events - 0.05% (4)
so that - 0.05% (4)
has become - 0.05% (4)
was invented - 0.05% (4)
in order - 0.05% (4)
there is - 0.05% (4)
mining and - 0.05% (4)
• how - 0.05% (4)
data science - 0.05% (4)
problem is - 0.05% (4)
positive rate. - 0.05% (4)
ways to - 0.05% (4)
the target - 0.05% (4)
on machine - 0.04% (3)
(3) ►  - 0.04% (3)
and big - 0.04% (3)
we have - 0.04% (3)
(1) ►  - 0.04% (3)
recommender systems - 0.04% (3)
learning approach, - 0.04% (3)
by human - 0.04% (3)
roc curve - 0.04% (3)
can then - 0.04% (3)
our training - 0.04% (3)
log data - 0.04% (3)
a supervised - 0.04% (3)
each word - 0.04% (3)
is based - 0.04% (3)
funding in - 0.04% (3)
company was - 0.04% (3)
this field - 0.04% (3)
this model - 0.04% (3)
include: • - 0.04% (3)
and is - 0.04% (3)
opera solutions - 0.04% (3)
out data - 0.04% (3)
all be - 0.04% (3)
collaborative filtering - 0.04% (3)
by the - 0.04% (3)
⋮ ⋮ - 0.04% (3)
has its - 0.04% (3)
of this - 0.04% (3)
(such as - 0.04% (3)
learning model - 0.04% (3)
selected features - 0.04% (3)
it can - 0.04% (3)
list of - 0.04% (3)
to see - 0.04% (3)
significant operation - 0.04% (3)
become a - 0.04% (3)
a model - 0.04% (3)
entity types - 0.04% (3)
example, a - 0.04% (3)
and its - 0.04% (3)
with data - 0.04% (3)
and their - 0.04% (3)
on this - 0.04% (3)
is part - 0.04% (3)
you can - 0.04% (3)
pattern mining - 0.04% (3)
should be - 0.04% (3)
mining can - 0.04% (3)
‘washington’ is - 0.04% (3)
we want - 0.04% (3)
associated with - 0.04% (3)
can create - 0.04% (3)
confidence score - 0.04% (3)
is essential - 0.04% (3)
accuracy of - 0.04% (3)
it refers - 0.04% (3)
data collection - 0.04% (3)
helps us - 0.04% (3)
are used - 0.04% (3)
attribute extraction - 0.04% (3)
will see - 0.04% (3)
the field - 0.04% (3)
role in - 0.04% (3)
the answer - 0.04% (3)
important role - 0.04% (3)
a list - 0.04% (3)
learning is - 0.04% (3)
knowledge discovery - 0.04% (3)
is not - 0.04% (3)
due to - 0.04% (3)
the same - 0.04% (3)
field of - 0.04% (3)
ip camera - 0.04% (3)
create a - 0.04% (3)
lot of - 0.04% (3)
of big - 0.04% (3)
human brain - 0.04% (3)
is using - 0.04% (3)
interested in - 0.04% (3)
semi-supervised learning - 0.04% (3)
imagine a - 0.04% (3)
an important - 0.04% (3)
labels: text - 0.04% (3)
mining conferences - 0.04% (3)
of training - 0.04% (3)
the above - 0.04% (3)
machine learning) - 0.04% (3)
be very - 0.04% (3)
all of - 0.04% (3)
mining is - 0.04% (3)
and use - 0.04% (3)
data mining), - 0.04% (3)
any new - 0.04% (3)
, particularly - 0.04% (3)
to work - 0.04% (3)
data scientists. - 0.04% (3)
science team - 0.04% (3)
a very - 0.04% (3)
layer is - 0.04% (3)
the intern - 0.04% (3)
and predict - 0.04% (3)
major conferences - 0.04% (3)
a good - 0.04% (3)
every day. - 0.04% (3)
understanding of - 0.04% (3)
understand the - 0.04% (3)
computer science - 0.04% (3)
time to - 0.04% (3)
test data. - 0.04% (3)
who are - 0.04% (3)
they have - 0.04% (3)
of features - 0.04% (3)
thus a - 0.04% (3)
time in - 0.04% (3)
about the - 0.04% (3)
depends on - 0.04% (3)
terabytes of - 0.03% (2)
the framingham - 0.03% (2)
svm was - 0.03% (2)
2 terabytes - 0.03% (2)
and more - 0.03% (2)
to stroke - 0.03% (2)
last few - 0.03% (2)
this data - 0.03% (2)
100 million - 0.03% (2)
addition to - 0.03% (2)
discovery and - 0.03% (2)
invention of - 0.03% (2)
talk on - 0.03% (2)
for large - 0.03% (2)
large number - 0.03% (2)
artificial intelligence. - 0.03% (2)
the perceptron - 0.03% (2)
invented in - 0.03% (2)
in large - 0.03% (2)
gave a - 0.03% (2)
exciting to - 0.03% (2)
a more - 0.03% (2)
the limitation - 0.03% (2)
able to - 0.03% (2)
about data - 0.03% (2)
implementation of - 0.03% (2)
imagine we - 0.03% (2)
learning. the - 0.03% (2)
have different - 0.03% (2)
11, 2013 - 0.03% (2)
neural signals - 0.03% (2)
as reported - 0.03% (2)
an exciting - 0.03% (2)
in brain - 0.03% (2)
a talk - 0.03% (2)
the decision - 0.03% (2)
2013 data - 0.03% (2)
mining results - 0.03% (2)
results into - 0.03% (2)
available data - 0.03% (2)
the real - 0.03% (2)
target is - 0.03% (2)
applied to - 0.03% (2)
target values - 0.03% (2)
adapt to - 0.03% (2)
out of - 0.03% (2)
have been - 0.03% (2)
the two - 0.03% (2)
►  november - 0.03% (2)
►  january - 0.03% (2)
(4) ►  - 0.03% (2)
iot and - 0.03% (2)
graph mining - 0.03% (2)
mining text - 0.03% (2)
have a - 0.03% (2)
icml (international - 0.03% (2)
conferences for - 0.03% (2)
mining in - 0.03% (2)
two major - 0.03% (2)
field. the - 0.03% (2)
mining are - 0.03% (2)
data processing - 0.03% (2)
but also - 0.03% (2)
data size - 0.03% (2)
working with - 0.03% (2)
creating better - 0.03% (2)
discussed in - 0.03% (2)
built using - 0.03% (2)
example of - 0.03% (2)
learning can - 0.03% (2)
are also - 0.03% (2)
building a - 0.03% (2)
svm, decision - 0.03% (2)
does it - 0.03% (2)
machine? • - 0.03% (2)
how many - 0.03% (2)
model. the - 0.03% (2)
computer virus - 0.03% (2)
effort of - 0.03% (2)
aim for - 0.03% (2)
types of - 0.03% (2)
are not - 0.03% (2)
of malware - 0.03% (2)
software was - 0.03% (2)
can detect - 0.03% (2)
data, it - 0.03% (2)
a large - 0.03% (2)
advantage of - 0.03% (2)
vs. false - 0.03% (2)
positive rate, - 0.03% (2)
code is - 0.03% (2)
of each - 0.03% (2)
consists of - 0.03% (2)
categorical data. - 0.03% (2)
missing values - 0.03% (2)
the feature - 0.03% (2)
data include - 0.03% (2)
data used - 0.03% (2)
2. data - 0.03% (2)
at this - 0.03% (2)
look at - 0.03% (2)
of true - 0.03% (2)
this process - 0.03% (2)
real system. - 0.03% (2)
if the - 0.03% (2)
model on - 0.03% (2)
data preprocessing: - 0.03% (2)
1. data - 0.03% (2)
12, 2013 - 0.03% (2)
rate is - 0.03% (2)
manually selected - 0.03% (2)
new data - 0.03% (2)
study by - 0.03% (2)
has to - 0.03% (2)
2, 2014 - 0.03% (2)
text mining), - 0.03% (2)
icdm (international - 0.03% (2)
new york - 0.03% (2)
kdd (knowledge - 0.03% (2)
conferences in - 0.03% (2)
30 comments: - 0.03% (2)
for machine - 0.03% (2)
developed in - 0.03% (2)
in other - 0.03% (2)
this training - 0.03% (2)
group (group - 0.03% (2)
1000 people - 0.03% (2)
control group - 0.03% (2)
score is - 0.03% (2)
score between - 0.03% (2)
whether a - 0.03% (2)
shoppers buy - 0.03% (2)
a conditional - 0.03% (2)
early day - 0.03% (2)
algorithms that - 0.03% (2)
good coding - 0.03% (2)
mining jan - 0.03% (2)
as machine - 0.03% (2)
find ways - 0.03% (2)
data to - 0.03% (2)
yahoo! finance - 0.03% (2)
of mining - 0.03% (2)
creating a - 0.03% (2)
mining. the - 0.03% (2)
stream data - 0.03% (2)
or video - 0.03% (2)
refer to - 0.03% (2)
they refer - 0.03% (2)
and transform - 0.03% (2)
the job - 0.03% (2)
are on - 0.03% (2)
those who - 0.03% (2)
companies are - 0.03% (2)
that there - 0.03% (2)
easily learn - 0.03% (2)
a specific - 0.03% (2)
improve the - 0.03% (2)
we need - 0.03% (2)
statistics and - 0.03% (2)
for product - 0.03% (2)
lot or - 0.03% (2)
provide a - 0.03% (2)
addition, data - 0.03% (2)
data. in - 0.03% (2)
modeling. the - 0.03% (2)
and predictive - 0.03% (2)
smart phone - 0.03% (2)
application of - 0.03% (2)
useful for - 0.03% (2)
this could - 0.03% (2)
meeting room - 0.03% (2)
of iot - 0.03% (2)
automatically extracting - 0.03% (2)
very useful - 0.03% (2)
dawn of - 0.03% (2)
on their - 0.03% (2)
and in - 0.03% (2)
in real - 0.03% (2)
sensors on - 0.03% (2)
of your - 0.03% (2)
what is - 0.03% (2)
when you - 0.03% (2)
need for - 0.03% (2)
find out - 0.03% (2)
junling hu, - 0.03% (2)
by humans. - 0.03% (2)
of our - 0.03% (2)
is called - 0.03% (2)
amount of - 0.03% (2)
a small - 0.03% (2)
our goal - 0.03% (2)
labeled data. - 0.03% (2)
requires a - 0.03% (2)
size s - 0.03% (2)
word is - 0.03% (2)
apply machine - 0.03% (2)
name as - 0.03% (2)
has the - 0.03% (2)
the large - 0.03% (2)
existing listing - 0.03% (2)
are interested - 0.03% (2)
let’s look - 0.03% (2)
in social - 0.03% (2)
ambiguity in - 0.03% (2)
new products - 0.03% (2)
dictionary of - 0.03% (2)
can find - 0.03% (2)
or they - 0.03% (2)
manually by - 0.03% (2)
detect entity - 0.03% (2)
is that - 0.03% (2)
understand user - 0.03% (2)
it helps - 0.03% (2)
a company, - 0.03% (2)
2013 text - 0.03% (2)
at 9:09 - 0.03% (2)
discover new - 0.03% (2)
while this - 0.03% (2)
training data. - 0.03% (2)
that we - 0.03% (2)
example, we - 0.03% (2)
new sentence - 0.03% (2)
like the - 0.03% (2)
word position - 0.03% (2)
word identity - 0.03% (2)
features could - 0.03% (2)
find a - 0.03% (2)
goal is - 0.03% (2)
data have - 0.03% (2)
in may. - 0.03% (2)
a person. - 0.03% (2)
city. when - 0.03% (2)
may smith - 0.03% (2)
company has - 0.03% (2)
that are - 0.03% (2)
cause of - 0.03% (2)
labels: companies, - 0.03% (2)
our daily - 0.03% (2)
software can - 0.03% (2)
furthermore, the - 0.03% (2)
the video - 0.03% (2)
to review - 0.03% (2)
has seen - 0.03% (2)
have to - 0.03% (2)
preceded by - 0.03% (2)
at home. - 0.03% (2)
to use - 0.03% (2)
of video - 0.03% (2)
mining apr - 0.03% (2)
and generate - 0.03% (2)
data, we - 0.03% (2)
that it - 0.03% (2)
can apply - 0.03% (2)
the number - 0.03% (2)
was born - 0.03% (2)
will have - 0.03% (2)
could have - 0.03% (2)
in late - 0.03% (2)
who should - 0.03% (2)
26, 2013 - 0.03% (2)
a funding - 0.03% (2)
million round - 0.03% (2)
round a - 0.03% (2)
it raised - 0.03% (2)
san mateo. - 0.03% (2)
in redwood - 0.03% (2)
is valued - 0.03% (2)
and has - 0.03% (2)
be viewed - 0.03% (2)
valued at - 0.03% (2)
in 2004, - 0.03% (2)
operation in - 0.03% (2)
with significant - 0.03% (2)
few years. - 0.03% (2)
new type - 0.03% (2)
learning methods, - 0.03% (2)
2013 machine - 0.03% (2)
you visit - 0.03% (2)
task of - 0.03% (2)
netflix contest - 0.03% (2)
whether the - 0.03% (2)
data are - 0.03% (2)
words, we - 0.03% (2)
since an - 0.03% (2)
born in - 0.03% (2)
smith was - 0.03% (2)
text mining: - 0.03% (2)
9, 2013 - 0.03% (2)
fast and - 0.03% (2)
may be - 0.03% (2)
user clicks - 0.03% (2)
between 0 - 0.03% (2)
an item - 0.03% (2)
probability score - 0.03% (2)
ranking model - 0.03% (2)
for any - 0.03% (2)
as logistic - 0.03% (2)
a training - 0.03% (2)
of day - 0.03% (2)
traditional approach - 0.03% (2)
over time - 0.03% (2)
to recommendation - 0.03% (2)
november (1) - 0.03% (2)
to twittershare to - 0.24% (19)
to facebookshare to - 0.24% (19)
thisblogthis!share to twittershare - 0.24% (19)
by junling hu - 0.24% (19)
facebookshare to pinterest - 0.24% (19)
posted by junling - 0.24% (19)
email thisblogthis!share to - 0.24% (19)
twittershare to facebookshare - 0.24% (19)
comments: email thisblogthis!share - 0.24% (19)
junling hu at - 0.24% (19)
to pinterest labels: - 0.23% (18)
of data mining - 0.17% (13)
of machine learning - 0.12% (9)
the machine learning - 0.1% (8)
machine learning method - 0.1% (8)
machine learning methods - 0.09% (7)
overview of data - 0.08% (6)
true positive rate - 0.08% (6)
refers to a - 0.06% (5)
the training data - 0.06% (5)
and data mining - 0.06% (5)
to predict the - 0.05% (4)
(international conference on - 0.05% (4)
one of the - 0.05% (4)
was founded in - 0.05% (4)
in order to - 0.05% (4)
the development of - 0.05% (4)
the data mining - 0.05% (4)
the company is - 0.05% (4)
machine learning algorithm - 0.05% (4)
news and events - 0.05% (4)
it is also - 0.05% (4)
learning to rank - 0.04% (3)
pinterest labels: overview - 0.04% (3)
for example, the - 0.04% (3)
labels: overview of - 0.04% (3)
on big data - 0.04% (3)
based in san - 0.04% (3)
is used to - 0.04% (3)
of big data - 0.04% (3)
training data set - 0.04% (3)
when ‘washington’ is - 0.04% (3)
and so on. - 0.04% (3)
false positive rate. - 0.04% (3)
company was founded - 0.04% (3)
the company was - 0.04% (3)
then it refers - 0.04% (3)
in the next - 0.04% (3)
data mining can - 0.04% (3)
of data mining, - 0.04% (3)
labels: text mining - 0.04% (3)
we can create - 0.04% (3)
is hard to - 0.04% (3)
a list of - 0.04% (3)
is using a - 0.04% (3)
supervised learning approach, - 0.04% (3)
of the model - 0.04% (3)
conference on data - 0.04% (3)
named entity detection - 0.04% (3)
the field of - 0.04% (3)
helps us to - 0.04% (3)
position from the - 0.04% (3)
the test data. - 0.04% (3)
in computer science - 0.04% (3)
the company has - 0.03% (2)
the limitation of - 0.03% (2)
was invented in - 0.03% (2)
discovery and data - 0.03% (2)
a large number - 0.03% (2)
software can detect - 0.03% (2)
the perceptron model - 0.03% (2)
of video mining - 0.03% (2)
in addition to - 0.03% (2)
pinterest labels: supervised - 0.03% (2)
we can apply - 0.03% (2)
22 comments: email - 0.03% (2)
important role in - 0.03% (2)
machine learning. the - 0.03% (2)
about data mining - 0.03% (2)
a talk on - 0.03% (2)
to create the - 0.03% (2)
mining and machine - 0.03% (2)
icdm (international conference - 0.03% (2)
the two major - 0.03% (2)
field of data - 0.03% (2)
and machine learning - 0.03% (2)
pinterest labels: news - 0.03% (2)
imagine we can - 0.03% (2)
play an important - 0.03% (2)
mining and neuroscience - 0.03% (2)
2013 data mining - 0.03% (2)
25 comments: email - 0.03% (2)
need to be - 0.03% (2)
predict the target - 0.03% (2)
the last few - 0.03% (2)
used to predict - 0.03% (2)
data, it is - 0.03% (2)
built using the - 0.03% (2)
the number of - 0.03% (2)
the real system. - 0.03% (2)
pinterest labels: companies, - 0.03% (2)
positive rate is - 0.03% (2)
vs. false positive - 0.03% (2)
of true positive - 0.03% (2)
this machine? • - 0.03% (2)
• how many - 0.03% (2)
on this machine? - 0.03% (2)
2 terabytes of - 0.03% (2)
create the model - 0.03% (2)
approach to recommendation - 0.03% (2)
to a city. - 0.03% (2)
is a list - 0.03% (2)
in this space - 0.03% (2)
has become a - 0.03% (2)
2013 machine learning - 0.03% (2)
data mining in - 0.03% (2)
lot or they - 0.03% (2)
find ways to - 0.03% (2)
data mining field - 0.03% (2)
such as machine - 0.03% (2)
they refer to - 0.03% (2)
machine learning methods, - 0.03% (2)
machine learning or - 0.03% (2)
labels: news and - 0.03% (2)
on data mining), - 0.03% (2)
pm 30 comments: - 0.03% (2)
with significant operation - 0.03% (2)
in data mining. - 0.03% (2)
1000 people in - 0.03% (2)
a data mining - 0.03% (2)
score between 0 - 0.03% (2)
the early day - 0.03% (2)
attributes. for example, - 0.03% (2)
due to the - 0.03% (2)
dictionary is not - 0.03% (2)
a dictionary of - 0.03% (2)
product attributes from - 0.03% (2)
of the most - 0.03% (2)
predictive modeling. the - 0.03% (2)
be very useful - 0.03% (2)
in the last - 0.03% (2)
company is valued - 0.03% (2)
to a person. - 0.03% (2)
iot and big - 0.03% (2)
day. it is - 0.03% (2)
to understand user - 0.03% (2)
it helps us - 0.03% (2)
we can then - 0.03% (2)
⋮ ⋮ ⋮ - 0.03% (2)
to the end - 0.03% (2)
a machine learning - 0.03% (2)
set of features. - 0.03% (2)
our goal is - 0.03% (2)
smith was born - 0.03% (2)
since an entity - 0.03% (2)
born in may. - 0.03% (2)
may smith was - 0.03% (2)
between 0 and - 0.03% (2)
founded in 2004, - 0.03% (2)
has the following - 0.03% (2)
part of the - 0.03% (2)
time of day - 0.03% (2)
information such as - 0.03% (2)
this is an - 0.03% (2)
can be viewed - 0.03% (2)
recommender systems a - 0.03% (2)
in san mateo. - 0.03% (2)
company is based - 0.03% (2)
a funding in - 0.03% (2)
based in redwood - 0.03% (2)
is valued at - 0.03% (2)
significant operation in - 0.03% (2)
january (3) ►  - 0.03% (2)

Here you can find chart of all your popular one, two and three word phrases. Google and others search engines means your page is about words you use frequently.

Copyright © 2015-2016 hupso.pl. All rights reserved. FB | +G | Twitter

Hupso.pl jest serwisem internetowym, w którym jednym kliknieciem możesz szybko i łatwo sprawdź stronę www pod kątem SEO. Oferujemy darmowe pozycjonowanie stron internetowych oraz wycena domen i stron internetowych. Prowadzimy ranking polskich stron internetowych oraz ranking stron alexa.