Norconex Web Crawler: Difference between revisions
Content deleted Content added
→top: Expanded Template:Notability and General fixes, replaced: {{notability|date=October 2023}} → {{notability|Product|date=October 2023}} |
Wikipedia:Articles for deletion/Norconex Web Crawler closed as redirect (XFDcloser) Tag: New redirect |
||
(6 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
#REDIRECT [[Web crawler#Open-source crawlers]] |
|||
{{Short description|Free and open-source Java web crawler}} |
|||
{{notability|Product|date=October 2023}} |
|||
{{Rcat shell| |
|||
<!-- Note: The following pages were redirects to [[Norconex_Web_Crawler]] before draftification: |
|||
{{R to related topic}} |
|||
*[[Draft:Norconex Web Crawler]] |
|||
--> |
|||
{{Infobox software |
|||
| title = |
|||
| other_names = Norconex HTTP Collector |
|||
| developer = {{URL | https://norconex.com/ | Norconex Inc.}} |
|||
| released = 2016 |
|||
| latest release version = 3.0.2 |
|||
| latest release date = 2022-01-05 |
|||
| repo = {{URL | https://github.com/Norconex/collector-http | GitHub Repository}} |
|||
| programming language = [[Java (programming language)|Java]] |
|||
| operating system = [[Cross-platform software|Cross-platform]] |
|||
| license = {{URL | https://en.wikipedia.org/wiki/Apache_License | Apache License}} |
|||
| website = {{URL | https://opensource.norconex.com/crawlers/web/ | Norconex Web Crawler}} |
|||
}} |
}} |
||
'''Norconex Web Crawler''' is a [[Free and open-source software|free and open-source]] [[web crawling]] and [[web scraping]] Software written in [[Java (programming language)|Java]] and released under an [[Apache License]]. It can export data to many repositories such as [[Apache Solr]], [[Elasticsearch]], [[Azure Cognitive Search|Microsoft Azure Cognitive Search]], [[Amazon CloudSearch]] and more.<ref>{{cite web |title=Committers |url=https://opensource.norconex.com/committers/ |website=opensource.norconex.com}}</ref><ref>{{cite web |last1=Hoppa |first1=Jocelyn |title=Importing Data from the Web with Norconex & Neo4j |url=https://neo4j.com/blog/importing-data-from-the-web-norconex-neo4j/ |website=Graph Database & Analytics |language=en |date=10 February 2020}}</ref><ref>{{cite web |title=Deploy a Norconex HTTP Collector Indexer Plugin {{!}} Cloud Search |url=https://developers.google.com/cloud-search/docs/guides/norconex-http-connector |website=Google for Developers |language=en}}</ref> |
|||
The Crawler can be run on its own or embedded in your own [[Java (programming language)|Java]] application.<ref>{{cite web |last1=Valcheva |first1=Silvia |title=10 Best Open Source Web Crawlers: Web Data Extraction Software |url=https://www.intellspot.com/open-source-web-crawlers/ |website=Blog For Data-Driven Business |date=11 February 2018}}</ref><ref>{{cite web |title=Norconex HTTP Collector |url=https://www.softpedia.com/get/Internet/Other-Internet-Related/Norconex-HTTP-Collector.shtml |website=Softpedia |access-date=25 September 2023}}</ref> |
|||
Some key features are: |
|||
* Multi-threaded |
|||
* Extract text from a variety of file formats (HTML, PDF, Word, etc.) |
|||
* Extract metadata associated with documents |
|||
* Supports pages rendered with JavaScript |
|||
* Incremental crawls |
|||
* Supports external commands to parse or manipulate documents |
|||
* Send extracted data to a variety of repositories |
|||
Some well-known companies and products using Norconex Web Crawler are: Apache Solr Ecosystem, Department of National Defence, Universities Canada, U.S. Department of Education, Department of National Defence.<ref>{{cite web |title=SolrEcosystem - Solr - Apache Software Foundation |url=https://cwiki.apache.org/confluence/display/solr/SolrEcosystem |website=cwiki.apache.org}}</ref> |
|||
<ref>{{cite web |title=Norconex Crawler Users |url=https://opensource.norconex.com/crawlers/usedby |website=opensource.norconex.com}}</ref> |
|||
== History == |
|||
Norconex Web Crawler was released as [[free and open-source software]] in 2013.<ref>{{Cite web |title=Norconex Gives Back to Open-Source – Norconex Inc |url=https://norconex.com/norconex-gives-back-to-open-source/ |access-date=2023-09-25 |language=en-US}}</ref> |
|||
== References == |
|||
<references /> |
|||
== Mentions in Academic Research == |
|||
* {{cite journal |last1=Kancherla |first1=Vinay |title=A Smart Web Crawler for a Concept Based Semantic Search Engine (pg. 18) |url=https://scholarworks.sjsu.edu/etd_projects/380/ |journal=Master's Projects |access-date=28 September 2023 |doi=10.31979/etd.ubfy-s3es |date=1 December 2014|doi-access=free }} |
|||
* {{cite journal |last1=Horváth |first1=Balázs |title=Recommendation Techniques for smart cities (pg. 12) |url=https://aaltodoc.aalto.fi/handle/123456789/27974 |website=Aalto University |access-date=28 September 2023 |language=en |date=28 August 2017}} |
|||
* {{cite arXiv |last1=Wani |first1=Mudasir Ahmad |last2=Agarwal |first2=Nancy |last3=Jabin |first3=Suraiya |last4=Hussain |first4=Syed Zesahn |title=Design of iMacros-based Data Crawler and the Behavioral Analysis of Facebook Users |date=2018 |class=cs.SI |eprint=1802.09566 }} |
|||
* {{cite web |last1=Abbasi |first1=Vahid |title=Phonetic Analysis and Searching with Google Glass API |url=https://uub.primo.exlibrisgroup.com/discovery/fulldisplay?docid=alma991018494504807596&context=L&vid=46LIBRIS_UUB:UUB&lang=en&search_scope=MyInst_and_CI&adaptor=Local%20Search%20Engine&tab=Everything&query=creator,contains,vahid%20abbasi&offset=0 |website=uub.primo.exlibrisgroup.com |access-date=28 September 2023 |language=en}} |
|||
== See also == |
|||
* {{cite web |last1=Mitchell |first1=Pete |title=25 Best Free Web Crawler Tools |url=https://techcult.com/best-free-web-crawler-tools/ |access-date=2023-09-05 |website=TechCult |date=8 April 2022}} |
|||
[[Category:Web crawlers]] |
Latest revision as of 21:15, 21 May 2024
Redirect to:
This page is a redirect. The following categories are used to track and monitor this redirect:
When appropriate, protection levels are automatically sensed, described and categorized. |