Norconex Web Crawler: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Inline

Latest revision as of 21:15, 21 May 2024

Redirect to:

Web crawler#Open-source crawlers

@@ Line 1: / Line 1: @@
+#REDIRECT [[Web crawler#Open-source crawlers]]
-{{Short description|Free and open-source Java web crawler}}
-{{notability|Product|date=October 2023}}
+{{Rcat shell|
-<!-- Note: The following pages were redirects to [[Norconex_Web_Crawler]] before draftification:
+{{R to related topic}}
-*[[Draft:Norconex Web Crawler]]
--->
-{{Infobox software
-| title =
-| other_names = Norconex HTTP Collector
-| developer = {{URL | https://norconex.com/ | Norconex Inc.}}
-| released = 2016
-| latest release version = 3.0.2
-| latest release date = 2022-01-05
-| repo = {{URL | https://github.com/Norconex/collector-http | GitHub Repository}}
-| programming language = [[Java (programming language)|Java]]
-| operating system = [[Cross-platform software|Cross-platform]]
-| license = {{URL | https://en.wikipedia.org/wiki/Apache_License | Apache License}}
-| website = {{URL | https://opensource.norconex.com/crawlers/web/ | Norconex Web Crawler}}
 }}
-'''Norconex Web Crawler''' is a [[Free and open-source software|free and open-source]] [[web crawling]] and [[web scraping]] Software written in [[Java (programming language)|Java]] and released under an [[Apache License]]. It can export data to many repositories such as [[Apache Solr]], [[Elasticsearch]], [[Azure Cognitive Search|Microsoft Azure Cognitive Search]], [[Amazon CloudSearch]] and more.<ref>{{cite web |title=Committers |url=https://opensource.norconex.com/committers/ |website=opensource.norconex.com}}</ref><ref>{{cite web |last1=Hoppa |first1=Jocelyn |title=Importing Data from the Web with Norconex & Neo4j |url=https://neo4j.com/blog/importing-data-from-the-web-norconex-neo4j/ |website=Graph Database & Analytics |language=en |date=10 February 2020}}</ref><ref>{{cite web |title=Deploy a Norconex HTTP Collector Indexer Plugin {{!}} Cloud Search |url=https://developers.google.com/cloud-search/docs/guides/norconex-http-connector |website=Google for Developers |language=en}}</ref>
-The Crawler can be run on its own or embedded in your own [[Java (programming language)|Java]] application.<ref>{{cite web |last1=Valcheva |first1=Silvia |title=10 Best Open Source Web Crawlers: Web Data Extraction Software |url=https://www.intellspot.com/open-source-web-crawlers/ |website=Blog For Data-Driven Business |date=11 February 2018}}</ref><ref>{{cite web |title=Norconex HTTP Collector |url=https://www.softpedia.com/get/Internet/Other-Internet-Related/Norconex-HTTP-Collector.shtml |website=Softpedia |access-date=25 September 2023}}</ref>
-Some key features are:
-* Multi-threaded
-* Extract text from a variety of file formats (HTML, PDF, Word, etc.)
-* Extract metadata associated with documents
-* Supports pages rendered with JavaScript
-* Incremental crawls
-* Supports external commands to parse or manipulate documents
-* Send extracted data to a variety of repositories
-Some well-known companies and products using Norconex Web Crawler are: Apache Solr Ecosystem, Department of National Defence, Universities Canada, U.S. Department of Education, Department of National Defence.<ref>{{cite web |title=SolrEcosystem - Solr - Apache Software Foundation |url=https://cwiki.apache.org/confluence/display/solr/SolrEcosystem |website=cwiki.apache.org}}</ref>
-<ref>{{cite web |title=Norconex Crawler Users |url=https://opensource.norconex.com/crawlers/usedby |website=opensource.norconex.com}}</ref>
-== History ==
-Norconex Web Crawler was released as [[free and open-source software]] in 2013.<ref>{{Cite web |title=Norconex Gives Back to Open-Source – Norconex Inc |url=https://norconex.com/norconex-gives-back-to-open-source/ |access-date=2023-09-25 |language=en-US}}</ref>
-== References ==
-<references />
-== Mentions in Academic Research ==
-* {{cite journal |last1=Kancherla |first1=Vinay |title=A Smart Web Crawler for a Concept Based Semantic Search Engine (pg. 18) |url=https://scholarworks.sjsu.edu/etd_projects/380/ |journal=Master's Projects |access-date=28 September 2023 |doi=10.31979/etd.ubfy-s3es |date=1 December 2014|doi-access=free }}
-* {{cite journal |last1=Horváth |first1=Balázs |title=Recommendation Techniques for smart cities (pg. 12) |url=https://aaltodoc.aalto.fi/handle/123456789/27974 |website=Aalto University |access-date=28 September 2023 |language=en |date=28 August 2017}}
-* {{cite arXiv |last1=Wani |first1=Mudasir Ahmad |last2=Agarwal |first2=Nancy |last3=Jabin |first3=Suraiya |last4=Hussain |first4=Syed Zesahn |title=Design of iMacros-based Data Crawler and the Behavioral Analysis of Facebook Users |date=2018 |class=cs.SI |eprint=1802.09566 }}
-* {{cite web |last1=Abbasi |first1=Vahid |title=Phonetic Analysis and Searching with Google Glass API |url=https://uub.primo.exlibrisgroup.com/discovery/fulldisplay?docid=alma991018494504807596&context=L&vid=46LIBRIS_UUB:UUB&lang=en&search_scope=MyInst_and_CI&adaptor=Local%20Search%20Engine&tab=Everything&query=creator,contains,vahid%20abbasi&offset=0 |website=uub.primo.exlibrisgroup.com |access-date=28 September 2023 |language=en}}
-== See also ==
-* {{cite web |last1=Mitchell |first1=Pete |title=25 Best Free Web Crawler Tools |url=https://techcult.com/best-free-web-crawler-tools/ |access-date=2023-09-05 |website=TechCult |date=8 April 2022}}
-[[Category:Web crawlers]]