|
| 1 | +--- |
| 2 | +layout: default |
| 3 | +title: html2rss-configs |
| 4 | +has_children: false |
| 5 | +nav_order: 5 |
| 6 | +--- |
| 7 | + |
| 8 | +# Creating Feed Configurations |
| 9 | + |
| 10 | +Welcome to the guide for `html2rss-configs`. This document explains how to create your own configuration files to convert any website into an RSS feed. |
| 11 | + |
| 12 | +You can find a list of all community-contributed configurations in the [Feed Directory]({{ '/feed-directory/' | relative_url }}). |
| 13 | + |
| 14 | +--- |
| 15 | + |
| 16 | +## Core Concepts |
| 17 | + |
| 18 | +An `html2rss` config is a YAML file that defines how to extract data from a web page. It consists of two main building blocks: `channel` and `selectors`. |
| 19 | + |
| 20 | +### The `channel` Block |
| 21 | + |
| 22 | +The `channel` block contains metadata about the RSS feed itself, such as its title and the source URL. |
| 23 | + |
| 24 | +**Example:** |
| 25 | + |
| 26 | +```yaml |
| 27 | +channel: |
| 28 | + url: https://example.com/blog |
| 29 | + title: My Awesome Blog |
| 30 | +``` |
| 31 | +
|
| 32 | +For a complete list of all available channel options, please see the [Channel Reference]({{ '/ruby-gem/reference/channel/' | relative_url }}). |
| 33 | +
|
| 34 | +### The `selectors` Block |
| 35 | + |
| 36 | +The `selectors` block is the core of the configuration, defining the rules for extracting content. It always contains an `items` selector to identify the list of articles and individual selectors for the data points within each item (e.g., `title`, `link`). |
| 37 | + |
| 38 | +**Example:** |
| 39 | + |
| 40 | +```yaml |
| 41 | +selectors: |
| 42 | + items: |
| 43 | + selector: "article.post" |
| 44 | + title: |
| 45 | + selector: "h2 a" |
| 46 | + link: |
| 47 | + selector: "h2 a" |
| 48 | +``` |
| 49 | + |
| 50 | +For a comprehensive guide on all available selectors, extractors, and post-processors, please see the [Selectors Reference]({{ '/ruby-gem/reference/selectors/' | relative_url }}). |
| 51 | + |
| 52 | +--- |
| 53 | + |
| 54 | +## Tutorial: Your First Config |
| 55 | + |
| 56 | +This tutorial walks you through creating a basic configuration file from scratch. |
| 57 | + |
| 58 | +### Step 1: Identify the Target Content |
| 59 | + |
| 60 | +First, identify the HTML structure of the website you want to create a feed for. For this example, we'll use a simple blog structure: |
| 61 | + |
| 62 | +```html |
| 63 | +<div class="posts"> |
| 64 | + <article class="post"> |
| 65 | + <h2><a href="/post/1">First Post</a></h2> |
| 66 | + <p>This is the summary of the first post.</p> |
| 67 | + </article> |
| 68 | + <article class="post"> |
| 69 | + <h2><a href="/post/2">Second Post</a></h2> |
| 70 | + <p>This is the summary of the second post.</p> |
| 71 | + </article> |
| 72 | +</div> |
| 73 | +``` |
| 74 | + |
| 75 | +### Step 2: Create the Config File and Define the Channel |
| 76 | + |
| 77 | +Create a new YAML file (e.g., `my-blog.yml`) and define the `channel`: |
| 78 | + |
| 79 | +```yaml |
| 80 | +# my-blog.yml |
| 81 | +channel: |
| 82 | + url: https://example.com/blog |
| 83 | + title: My Awesome Blog |
| 84 | + description: The latest news from my awesome blog. |
| 85 | +``` |
| 86 | + |
| 87 | +### Step 3: Define the Selectors |
| 88 | + |
| 89 | +Next, add the `selectors` block to extract the content for each post. |
| 90 | + |
| 91 | +```yaml |
| 92 | +# my-blog.yml |
| 93 | +selectors: |
| 94 | + items: |
| 95 | + selector: "article.post" |
| 96 | + title: |
| 97 | + selector: "h2 a" |
| 98 | + link: |
| 99 | + selector: "h2 a" |
| 100 | + description: |
| 101 | + selector: "p" |
| 102 | +``` |
| 103 | + |
| 104 | +- `items`: This CSS selector identifies the container for each article. |
| 105 | +- `title`, `link`, `description`: These selectors target the specific data points within each item. For a `link` selector, `html2rss` defaults to extracting the `href` attribute from the matched `<a>` tag. |
| 106 | + |
| 107 | +--- |
| 108 | + |
| 109 | +## Advanced Techniques |
| 110 | + |
| 111 | +### Handling Pagination |
| 112 | + |
| 113 | +To aggregate content from multiple pages, use the `pagination` option within the `items` selector. |
| 114 | + |
| 115 | +```yaml |
| 116 | +selectors: |
| 117 | + items: |
| 118 | + selector: ".post-listing .post" |
| 119 | + pagination: |
| 120 | + selector: ".pagination .next-page" |
| 121 | + limit: 5 # Optional: sets the maximum number of pages to follow |
| 122 | +``` |
| 123 | + |
| 124 | +### Dynamic Feeds with Parameters |
| 125 | + |
| 126 | +Use the `parameters` block to create flexible configs. This is useful for feeds based on search terms, categories, or regions. |
| 127 | + |
| 128 | +```yaml |
| 129 | +# news-search.yml |
| 130 | +parameters: |
| 131 | + query: |
| 132 | + type: string |
| 133 | + default: "technology" |
| 134 | +
|
| 135 | +channel: |
| 136 | + url: "https://news.example.com/search?q={query}" |
| 137 | + title: "News results for '{query}'" |
| 138 | +``` |
| 139 | + |
| 140 | +--- |
| 141 | + |
| 142 | +## Contributing Your Config |
| 143 | + |
| 144 | +Have you created a config that others might find useful? We strongly encourage you to contribute it to the project! By sharing your config, you make it available to all users of the public `html2rss-web` service and the Feed Directory. |
| 145 | + |
| 146 | +To contribute, please [create a pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request) to the `html2rss-configs` repository. |
| 147 | + |
| 148 | +--- |
| 149 | + |
| 150 | +## Usage and Integration |
| 151 | + |
| 152 | +### With `html2rss-web` |
| 153 | + |
| 154 | +Once your pull request is reviewed and merged, your config will become available on the public [`html2rss-web`]({{ '/web-application/' | relative_url }}) instance. You can then access it at the path `/<domainname.tld/path>.rss`. |
| 155 | + |
| 156 | +### Programmatic Usage in Ruby |
| 157 | + |
| 158 | +You can also use `html2rss-configs` programmatically in your Ruby applications. |
| 159 | + |
| 160 | +Add this to your Gemfile: |
| 161 | + |
| 162 | +```ruby |
| 163 | +gem 'html2rss-configs', git: 'https://github.com/html2rss/html2rss-configs.git' |
| 164 | +``` |
| 165 | + |
| 166 | +And use it in your code: |
| 167 | + |
| 168 | +```ruby |
| 169 | +require 'html2rss/configs' |
| 170 | +
|
| 171 | +config = Html2rss::Configs.find_by_name('domainname.tld/whatever') |
| 172 | +rss = Html2rss.feed(config) |
| 173 | +``` |
0 commit comments