ElasticCC: Crawl the Web on a Large Scale with Stormcrawler and Elasticsearch

  Рет қаралды 1,702

Official Elastic Community

Official Elastic Community

Күн бұрын

By Julien Nioche
StormCrawler is a popular and mature open-source web crawler. It is written in Java and is both lightweight and scalable, thanks to the distribution layer based on Apache Storm. One of the attractions of the crawler is that it is extensible and modular, as well as versatile. In this presentation, we will have a closer look at the Elasticsearch module of StormCrawler and see how it is being used in production by various organizations, sometimes on a very large scale.

Пікірлер: 1
@NoName-ep2xp
@NoName-ep2xp 2 жыл бұрын
I'm continously surprised by the inability of the elasticsearch community to encourage people from outside the community to get involved. I'm a developer with 15years experience and most of the elasticsearch tutorials are close to useless as they demand so much effort. There seems to be no attempt to make the tutorials useful to anyone but people who already know about the complete tech stack (in which case it's a reference not a tutorial). I've looked at all your tutorials on stormcrawler and elasticsearch and you assume quite a lot: that I don't know about (or can be bothered to research) Storm, flux!, bolts, topology, elasticsearch, kiban, am a Java developer!. Do you really expect this as a prerequisite to the tutorial being useful? (or maybe I'm just too intellectually challenged). Whilst I could wade through all the stuff needed to work it out, for the simple problem I have (a simple search of a relatively few sites, I probably won't be bothered. I can see why AWS/opensearch believe they can kill elasticsearch given the poor value delivered by the elasticsearch community videos & documents. So that my contribution isn't totally negative, If you want some suggestions on how to improve/create your Stormcrawler videos for beginners I'm open to helping. Just reply and we'll work out a way to talk. Sorry if this is uncomfortable to read but this whole technical area is missing a usable on-ramp for beginners. Hours of research is not a usable path if you want this thing to grow beyond existing insiders. Cheers 🙂
ElasticCC: Creating and Visualizing advanced metrics
7:00
Official Elastic Community
Рет қаралды 1,5 М.
How To Get Married:   #short
00:22
Jin and Hattie
Рет қаралды 22 МЛН
ПРИКОЛЫ НАД БРАТОМ #shorts
00:23
Паша Осадчий
Рет қаралды 6 МЛН
小天使和小丑太会演了!#小丑#天使#家庭#搞笑
00:25
家庭搞笑日记
Рет қаралды 21 МЛН
Minecraft Creeper Family is back! #minecraft #funny #memes
00:26
Beginner's Crash Course to Elastic Stack -  Part 1: Intro to Elasticsearch and Kibana
56:42
Web Crawling vs. Web Scraping: The battle for data extraction dominance!
6:11
Jelvix | TECH IN 5 MINUTES
Рет қаралды 60 М.
Get the Crawl Rolling: Indexing with the Elastic Web Crawler
26:31
Image crawler in python - web scraping
19:59
Hitesh Choudhary
Рет қаралды 40 М.
How to Trick Hackers & Web Crawlers with Spidertrap
8:36
John Hammond
Рет қаралды 45 М.
Brutally honest advice for new .NET Web Developers
7:19
Ed Andersen
Рет қаралды 198 М.
Setting Up a Amazon Opensearch (ElasticSearch) Cluster with Free Tier
25:46
Intro To Web Crawlers & Scraping With Scrapy
28:56
Traversy Media
Рет қаралды 275 М.
What is Elasticsearch?
9:53
IBM Technology
Рет қаралды 402 М.
3x 2x 1x 0.5x 0.3x... #iphone
0:10
Aksel Alze
Рет қаралды 2,6 МЛН
Is this Samsung's change over time #shorts
0:13
Si pamerR
Рет қаралды 832 М.
Китайцы сделали телефон БАЯН
0:14
Собиратель новостей
Рет қаралды 1,2 МЛН
Mac USB
0:59
Alina Saito / 斎藤アリーナ
Рет қаралды 24 МЛН
iPhone 7
0:13
ARGEN
Рет қаралды 9 МЛН