Skip to content

Commit 503d2de

Browse files
docs: add html2rss-config (#922)
* docs: everything of configs readme * . Signed-off-by: Gil Desmarais <[email protected]> * Update get-involved/contributing.md Co-authored-by: Copilot <[email protected]> * Update get-involved/contributing.md Co-authored-by: Copilot <[email protected]> --------- Signed-off-by: Gil Desmarais <[email protected]> Co-authored-by: Copilot <[email protected]>
1 parent 56db2ed commit 503d2de

File tree

3 files changed

+194
-3
lines changed

3 files changed

+194
-3
lines changed

feed-directory/index.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
title: Feed Directory
44
nav_order: 2
55
noindex: true
6+
has_children: true
67
---
78

89
<div class="text-center">

get-involved/contributing.md

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,24 @@ Here are some of the ways you can contribute to the `html2rss` project:
2525

2626
Are you missing an RSS feed for a website? You can create your own feed config and share it with the community. It's a great way to get started with `html2rss` and help other users.
2727

28-
[**Learn how to create a feed config**](https://github.com/html2rss/html2rss-configs)
28+
The html2rss "ecosystem" is a community project. We welcome contributions of all kinds. This includes new feed configs, suggesting and implementing features, providing bug fixes, documentation improvements, and any other kind of help.
29+
30+
Which way you choose to add a new feed config is up to you. You can do it manually. Please [submit a pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork)!
31+
32+
After you're done, you can test your feed config by running `bundle exec html2rss feed lib/html2rss/configs/<domainname.tld>/<path>.yml`.
33+
34+
#### Preferred way: manually
35+
36+
1. Fork the `html2rss-config` git repository and run `bundle install` (you need to have Ruby >= 3.3 installed).
37+
2. Create a new folder and file following this convention: `lib/html2rss/configs/<domainname.tld>/<path>.yml`
38+
3. Create the feed config in the `<path>.yml` file.
39+
4. Add this spec file in the `spec/html2rss/configs/<domainname.tld>/<path>_spec.rb` file.
40+
41+
```ruby
42+
RSpec.describe '<domainname.tld>/<path>' do
43+
include_examples 'config.yml', described_class
44+
end
45+
```
2946

3047
### 2. Improve this Website
3148

@@ -37,13 +54,13 @@ This website is built with Jekyll and is hosted on GitHub Pages. If you have any
3754

3855
The [`html2rss-web`](https://github.com/html2rss/html2rss-web) project is a web application that allows you to create and manage your RSS feeds through a user-friendly interface. You can host your own public instance to help other users create feeds.
3956

40-
[**Learn how to host a public instance**](https://github.com/html2rss/html2rss-web/wiki/Instances)
57+
[**Learn how to host a public instance**]({{ '/web-application/how-to/deployment' | relative_url }})
4158

4259
### 4. Improve the `html2rss` Gem
4360

4461
Are you a Ruby developer? You can help us improve the core `html2rss` gem. Whether you're fixing a bug, adding a new feature, or improving the documentation, your contributions are welcome.
4562

46-
[**Check out the repository on GitHub**](https://github.com/html2rss/html2rss)
63+
[**Check out the documentation for the `html2rss` Gem**]({{ '/ruby-gem/' | relative_url }})
4764

4865
### 5. Report Bugs & Discuss Features
4966

html2rss-configs/index.md

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
---
2+
layout: default
3+
title: html2rss-configs
4+
has_children: false
5+
nav_order: 5
6+
---
7+
8+
# Creating Feed Configurations
9+
10+
Welcome to the guide for `html2rss-configs`. This document explains how to create your own configuration files to convert any website into an RSS feed.
11+
12+
You can find a list of all community-contributed configurations in the [Feed Directory]({{ '/feed-directory/' | relative_url }}).
13+
14+
---
15+
16+
## Core Concepts
17+
18+
An `html2rss` config is a YAML file that defines how to extract data from a web page. It consists of two main building blocks: `channel` and `selectors`.
19+
20+
### The `channel` Block
21+
22+
The `channel` block contains metadata about the RSS feed itself, such as its title and the source URL.
23+
24+
**Example:**
25+
26+
```yaml
27+
channel:
28+
url: https://example.com/blog
29+
title: My Awesome Blog
30+
```
31+
32+
For a complete list of all available channel options, please see the [Channel Reference]({{ '/ruby-gem/reference/channel/' | relative_url }}).
33+
34+
### The `selectors` Block
35+
36+
The `selectors` block is the core of the configuration, defining the rules for extracting content. It always contains an `items` selector to identify the list of articles and individual selectors for the data points within each item (e.g., `title`, `link`).
37+
38+
**Example:**
39+
40+
```yaml
41+
selectors:
42+
items:
43+
selector: "article.post"
44+
title:
45+
selector: "h2 a"
46+
link:
47+
selector: "h2 a"
48+
```
49+
50+
For a comprehensive guide on all available selectors, extractors, and post-processors, please see the [Selectors Reference]({{ '/ruby-gem/reference/selectors/' | relative_url }}).
51+
52+
---
53+
54+
## Tutorial: Your First Config
55+
56+
This tutorial walks you through creating a basic configuration file from scratch.
57+
58+
### Step 1: Identify the Target Content
59+
60+
First, identify the HTML structure of the website you want to create a feed for. For this example, we'll use a simple blog structure:
61+
62+
```html
63+
<div class="posts">
64+
<article class="post">
65+
<h2><a href="/post/1">First Post</a></h2>
66+
<p>This is the summary of the first post.</p>
67+
</article>
68+
<article class="post">
69+
<h2><a href="/post/2">Second Post</a></h2>
70+
<p>This is the summary of the second post.</p>
71+
</article>
72+
</div>
73+
```
74+
75+
### Step 2: Create the Config File and Define the Channel
76+
77+
Create a new YAML file (e.g., `my-blog.yml`) and define the `channel`:
78+
79+
```yaml
80+
# my-blog.yml
81+
channel:
82+
url: https://example.com/blog
83+
title: My Awesome Blog
84+
description: The latest news from my awesome blog.
85+
```
86+
87+
### Step 3: Define the Selectors
88+
89+
Next, add the `selectors` block to extract the content for each post.
90+
91+
```yaml
92+
# my-blog.yml
93+
selectors:
94+
items:
95+
selector: "article.post"
96+
title:
97+
selector: "h2 a"
98+
link:
99+
selector: "h2 a"
100+
description:
101+
selector: "p"
102+
```
103+
104+
- `items`: This CSS selector identifies the container for each article.
105+
- `title`, `link`, `description`: These selectors target the specific data points within each item. For a `link` selector, `html2rss` defaults to extracting the `href` attribute from the matched `<a>` tag.
106+
107+
---
108+
109+
## Advanced Techniques
110+
111+
### Handling Pagination
112+
113+
To aggregate content from multiple pages, use the `pagination` option within the `items` selector.
114+
115+
```yaml
116+
selectors:
117+
items:
118+
selector: ".post-listing .post"
119+
pagination:
120+
selector: ".pagination .next-page"
121+
limit: 5 # Optional: sets the maximum number of pages to follow
122+
```
123+
124+
### Dynamic Feeds with Parameters
125+
126+
Use the `parameters` block to create flexible configs. This is useful for feeds based on search terms, categories, or regions.
127+
128+
```yaml
129+
# news-search.yml
130+
parameters:
131+
query:
132+
type: string
133+
default: "technology"
134+
135+
channel:
136+
url: "https://news.example.com/search?q={query}"
137+
title: "News results for '{query}'"
138+
```
139+
140+
---
141+
142+
## Contributing Your Config
143+
144+
Have you created a config that others might find useful? We strongly encourage you to contribute it to the project! By sharing your config, you make it available to all users of the public `html2rss-web` service and the Feed Directory.
145+
146+
To contribute, please [create a pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request) to the `html2rss-configs` repository.
147+
148+
---
149+
150+
## Usage and Integration
151+
152+
### With `html2rss-web`
153+
154+
Once your pull request is reviewed and merged, your config will become available on the public [`html2rss-web`]({{ '/web-application/' | relative_url }}) instance. You can then access it at the path `/<domainname.tld/path>.rss`.
155+
156+
### Programmatic Usage in Ruby
157+
158+
You can also use `html2rss-configs` programmatically in your Ruby applications.
159+
160+
Add this to your Gemfile:
161+
162+
```ruby
163+
gem 'html2rss-configs', git: 'https://github.com/html2rss/html2rss-configs.git'
164+
```
165+
166+
And use it in your code:
167+
168+
```ruby
169+
require 'html2rss/configs'
170+
171+
config = Html2rss::Configs.find_by_name('domainname.tld/whatever')
172+
rss = Html2rss.feed(config)
173+
```

0 commit comments

Comments
 (0)