Source Configuration

Learn how the horizon scanner works and how sources are configured.

1. Purpose

Graceview is a regulatory intelligence solution that aggregates regulatory, legislative, and related institutional developments from global sources. This guide explains how our horizon scanner works and how sources are configured.

We always prioritise and optimise for authoritative public primary sources. However, we can monitor secondary and non-public sources when needed, subject to technical and legal constraints.

2. What We Monitor

2.1 Qualifying Sources

A source is a location where new content is regularly published. Examples include:

Newsroom feeds on regulatory websites
RSS feeds from government agencies
Email newsletters and mailing lists
API endpoints from official sources

When in doubt, identify the specific page, feed, or mechanism that shows new content over time.

2.2 What We Cannot Monitor

Standalone links to PDF documents
Entire websites without specific feeds

3. Monitoring Methods

3.1 Available Now

The following monitoring methods are available now:

Method

What We Monitor

Limitations

Web (Feed)

Web pages where new content is published

If you need to version-track changes to a single static web page, refer to Web (Static).

Some complex sites need multiple sub-sources (see EDPS example).

RSS

All RSS feeds

None

Email (Body + Attachment)

Mailing lists (each email is extracted as a single entry)

If you need links from emails to be extracted as separate entries, refer to Email (Index).

X / Twitter

Posts from public accounts (@username)

None

API

Available upon request

None

3.2 Coming Soon

The following monitoring methods are coming soon:

Method

What We Monitor

Limitations

Posts from public accounts

None

Email (Index)

Mailing lists (links extracted as separate entries)

If you need emails to be extracted as one entry, refer to Email (Body + Attachment).

Web (Static)

Version-track changes to a single static web page

If you need to track new content published to a feed, refer to Web (Feed).

4. Data We Capture From Sources

4.1 Available Now

The following data can be captured from sources:

Field

How Obtained

Description

Title

Extracted

Auto translated to English. Long titles auto shortened to previews >200 chars.

Description / Body

Extracted

Original markup preserved (headings, links, images).

Attachments

Extracted

Directly linked PDFs are downloaded, OCR-processed, and stored in a vector DB.

Inline URLs in Description / Body markup are not auto-ingested but can be extracted via the interface.

Dated

Extracted

Publication date where available.

Preview

Short English summary of Description / Body and Attachments.

Found

System

Timestamp of when the horizon scanner found the content.

Domain

System

The base domain of the source.

Method

Config

Refer to Section 3.

Source

Config

The name of the source.

Authority

Config

Refer to Section 6.

Authorisation

Config

Refer to Section 7.

Location

Config

ISO codes for region, subregion, country, state.

Custom associations (e.g., EU, UN, NATO) and cities (e.g., San Francisco).

4.2 Coming Soon

Custom field extraction based on your workspace configuration. Rules can be configured to extract structured data (i.e., multi-value, single-value, free text area, toggle, date, and number).

5. How Each Method Works

5.1 Web (Feed)

5.1.1 Central Feed Site

If a central feed exists (for example Monetary Authority of Singapore), we can:

Track all content: https://www.mas.gov.sg/search
Track content using filters (for example Consultations): apply query parameters such as ?content_type=Consultations
Track content using search terms (for example "AML"): apply query parameters such as ?q=AML

5.1.2 Multi‑Section Site

If no central feed exists (for example European Data Protection Supervisor), we can configure each feed available as a sub-source:

Annual Reports: https://www.edps.europa.eu/annual-reports_en
Factsheets: https://www.edps.europa.eu/press-publications/publications/factsheets_en
Press Releases: https://www.edps.europa.eu/press-publications/press-news/press-releases_en
Strategy: https://www.edps.europa.eu/press-publications/publications/strategy_en

5.2 RSS

Direct monitoring of RSS feeds. Example:

SEC Commission Orders (click here)

5.3 Email (Body + Attachment)

5.3.1 Public Subscriptions

We subscribe a generated email address to public mailing lists
You can specify filter options if available (for example All News): https://www.mas.gov.sg/subscription-services

5.3.2 Private / Internal Subscriptions

We can share a generated email address for private or internal mailing lists
To filter by mailing lists, we require a copy of the sender email address

5.4 X / Twitter

Direct monitoring of posts from public accounts via the official X API. Example:

U.S. Securities and Exchange Commission (@secgov): https://x.com/secgov

5.5 API

Available upon request for sources offering API services. Example:

US Federal Register API (click here)

5.6 Credentials and Paywalls

We can use credentials you supply (service accounts or user logins) to access gated content. We review the site’s terms of use before enabling automated access. Where licensing is restrictive, content may need to be monitored in Protected Mode (see Section 7) or be refused if the site is incompatible.

5.7 Web (Static) (Coming Soon)

We will soon add support for version-tracking static web pages with the ability to compare between snapshots of the web page. Example:

EU Deforestation Regulation (click here)

6. Source Classification (Primary vs Secondary)

Graceview classifies sources by authority level. Example:

Primary Sources

Secondary Sources

Government websites

General media

Regulatory agencies

Legal publishers

Official gazettes

Industry associations

Courts and tribunals

Professional journals

Legislative portals

Think tanks

Central banks

NGOs

Use the “Authority” filter to refine results by Primary or Secondary.

7. Operating Modes (Public vs Protected)

Graceview respects intellectual property rights through two operating modes. Use the “Authorization” filter to refine results by Public or Protected content.

7.1 Public Mode (licence compatible)

Usage: Content is public domain or under an open/attribution licence
Storage: Full text, attachments, metadata, dates, original link
Display: Complete content with attribution (copyright notices preserved)

7.2 Protected Mode (restricted/uncertain rights)

Usage: Content without compatible licenses or with usage restrictions
Storage: Non-copyrightable metadata only (title, date, original link)
Display: Redirect to original site (similar to search engines)

8. Key Features

8.1 Available Now

AI translation
AI summarisation
AI prefiltering
Lexical keyword filtering
Content deduplication (e.g., query string filtering)
Agents preserve original markup upon extraction (e.g., headings, images, links)
Agents are indistinguishable from organic traffic
Agents can use credentials to access gated content
Agents can interact with webpages and perform actions (e.g., navigate to other pages, execute searches, populate forms, etc.)
Agents can download files and take screenshots
Horizontally scalable (e.g., 100,000+ articles extracted in June 2025 alone)

8.2 Coming Soon

LinkedIn: Monitor posts from public accounts
Web (Static): Version-track changes to a single static web page
Email (Index): Mailing lists (links extracted as separate entries)
AI Extraction: Custom field extraction based on your workspace configuration

9. How It Works

Configuration: Our content team configures structured data extraction for each source
Synchronization: Sources are checked every 30 minutes (17,000+ times yearly)
Processing: Content is extracted, summarised, translated, and made available based on your workspace configuration
Monitoring: Real-time analytics and audit logs track horizon scanner performance
Maintenance: Daily health checks ensure data integrity

10. Technology Stack

The horizon scanner uses enterprise-grade technologies including:

AWS: Amazon EKS
Microsoft: Translation via Azure AI Translator
OpenAI: Summarisation via frontier LLMs
Google: Headless browser via Google Chrome
Selenium: Browser automation
Laravel Horizon: Queue worker framework
LangChain: LLM framework

PreviousGraceview Documentation NextPrompt Engineering

Last updated 1 month ago

hashtag1. Purpose

hashtag2. What We Monitor

hashtag2.1 Qualifying Sources

hashtag2.2 What We Cannot Monitor

hashtag3. Monitoring Methods

hashtag3.1 Available Now

hashtag3.2 Coming Soon

hashtag4. Data We Capture From Sources

hashtag4.1 Available Now

hashtag4.2 Coming Soon

hashtag5. How Each Method Works

hashtag5.1 Web (Feed)

hashtag5.1.1 Central Feed Site

hashtag5.1.2 Multi‑Section Site

hashtag5.2 RSS

hashtag5.3 Email (Body + Attachment)

hashtag5.3.1 Public Subscriptions

hashtag5.3.2 Private / Internal Subscriptions

hashtag5.4 X / Twitter

hashtag5.5 API

hashtag5.6 Credentials and Paywalls

hashtag5.7 Web (Static) (Coming Soon)

hashtag6. Source Classification (Primary vs Secondary)

hashtag7. Operating Modes (Public vs Protected)

hashtag7.1 Public Mode (licence compatible)

hashtag7.2 Protected Mode (restricted/uncertain rights)

hashtag8. Key Features

hashtag8.1 Available Now

hashtag8.2 Coming Soon

hashtag9. How It Works

hashtag10. Technology Stack

1. Purpose

2. What We Monitor

2.1 Qualifying Sources

2.2 What We Cannot Monitor

3. Monitoring Methods

3.1 Available Now

3.2 Coming Soon

4. Data We Capture From Sources

4.1 Available Now

4.2 Coming Soon

5. How Each Method Works

5.1 Web (Feed)

5.1.1 Central Feed Site

5.1.2 Multi‑Section Site

5.2 RSS

5.3 Email (Body + Attachment)

5.3.1 Public Subscriptions

5.3.2 Private / Internal Subscriptions

5.4 X / Twitter

5.5 API

5.6 Credentials and Paywalls

5.7 Web (Static) (Coming Soon)

6. Source Classification (Primary vs Secondary)

7. Operating Modes (Public vs Protected)

7.1 Public Mode (licence compatible)

7.2 Protected Mode (restricted/uncertain rights)

8. Key Features

8.1 Available Now

8.2 Coming Soon

9. How It Works

10. Technology Stack