Package 

Class Manual

    • Field Summary

      Fields 
      Modifier and Type Field Description
      private final String url
      private final PulsarSession session
    • Constructor Summary

      Constructors 
      Constructor Description
      Manual(PulsarSession session)
    • Method Summary

      Modifier and Type Method Description
      final String getUrl()
      final PulsarSession getSession()
      final WebPage load() Load url if it's not in the database, or it's expired.
      final Collection<WebPage> loadOutPages() Load url if it's not in the database or it's expired, and then load out pages specified by -outLink or they are expiredportal page expire time: 1 day out page expire time: 7 days css query for out links in portal page: ahref~=item
      final List<Map<String, String>> scrape() Load url if it's not in the database or it's expired, and then scrape the fields in the page, all fields are restricted in a page section specified by restrictCss, each field is specified by a css selectorexpire time: 1 day restrict css selector: lidata-sku css selectors for fields: .p-name em, .
      final List<Map<String, String>> scrape2()
      final List<Map<String, String>> scrapeOutPages()
      final List<Map<String, String>> scrapeOutPages2()
      final Unit runAll()
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • Manual

        Manual(PulsarSession session)
    • Method Detail

      • load

         final WebPage load()

        Load url if it's not in the database, or it's expired. The expiry time is: 1 day

      • loadOutPages

         final Collection<WebPage> loadOutPages()

        Load url if it's not in the database or it's expired, and then load out pages specified by -outLink or they are expired

        portal page expire time: 1 day out page expire time: 7 days css query for out links in portal page: ahref~=item

      • scrape

         final List<Map<String, String>> scrape()

        Load url if it's not in the database or it's expired, and then scrape the fields in the page, all fields are restricted in a page section specified by restrictCss, each field is specified by a css selector

        expire time: 1 day restrict css selector: lidata-sku css selectors for fields: .p-name em, .p-price