Release Notes

Laravel Scavenger

A highly flexible Laravel 5.x scraper package.


Top Features

Scavenger provides the following features and more out of the box.

  • Ease of use
    • Scavenger is super-easy to configure. Simple publish the config file and set your targets.
  • Scrape data from multiple sources at once.
  • Convert scraped data into useable Laravel model objects.
    • eg. You may scrape an article and have it converted into an object of your choice and saved in your database. Immediately available to your viewers.
  • You can easily perform one or more operations to each property of any scraped entity.
    • eg. You may call a paraphrase service from a model or package of your choice on data attributes before saving them to your database.
  • Data integrity constraints
    • Scavanger uses a hashing algorithm of your choice to maintain data integrity. This hash is used to ensure that one scrap (source article) is not converted to multiple output objects (model duplicates).
  • Console Command
    • Once scavenger is configured, a simple artisan command launches the seeker. Since this is a console command it is more efficient and timeouts are less likely to occur.
    • Artisan command: php artisan scavenger:seek
  • Schedule ready
    • Scavenger can easily be set to scrape on a schedule. Hence, creating a someone autonomous website is super easy!
  • SERP
    • Scavenger can be used to flexibly scrape Search Engine Result Pages.

NB: As of version 2.2.0 raw HTML is scraped by default for each target key (attribute), excluding the reserved title attribute. The \ReliqArts\Scavenger\Services\Scanner::cleanText() method is provided, and can be used as a preprocess method for attributes needed as pure text.

e.g. usage:

//...
// modify attributes by calling functions
'preprocess' => [
    // strip all HTML within title and provide clean, simple text
    'title' => ['\ReliqArts\Scavenger\Services\Scanner', 'cleanText'],
],
//...

Glossary

The following words may appear in this documentation.

  • Daemon: User instance to be used by the scavenger service.
  • Scrap: Scraped data before being converted to the target object.
  • Target: Configured source-model mapping for a single entity.
  • Target Object: Eloquent model object to be generated from scrap.

Acknowledgements

Author

Patrick Reid (Reliq) - [email protected] - http://twitter.com/iamreliq

Major Third-Party Libraries

  • Guzzle

    This package is heavily inspired by and dependent on the Guzzle library, although several concepts may have been adjusted.