Source Configuration

Learn how the horizon scanner works and how sources are configured.

1. Purpose

Graceview is a regulatory intelligence solution that aggregates regulatory, legislative, and related institutional developments from global sources. This guide explains how our horizon scanner works and how sources are configured.

We always prioritise and optimise for authoritative public primary sources. However, we can monitor secondary and non-public sources when needed, subject to technical and legal constraints.

2. What We Monitor

2.1 Qualifying Sources

A source is a location where new content is regularly published. Examples include:

  • Newsroom feeds on regulatory websites

  • RSS feeds from government agencies

  • Email newsletters and mailing lists

  • API endpoints from official sources

2.2 What We Cannot Monitor

  • Standalone links to PDF documents

  • Entire websites without specific feeds

3. Monitoring Methods

3.1 Available Now

The following monitoring methods are available now:

Method
What We Monitor
Limitations

Web (Feed)

Web pages where new content is published

If you need to version-track changes to a single static web page, refer to Web (Static).

Some complex sites need multiple sub-sources (see EDPS example).

RSS

All RSS feeds

None

Email (Body + Attachment)

Mailing lists (each email is extracted as a single entry)

If you need links from emails to be extracted as separate entries, refer to Email (Index).

X / Twitter

Posts from public accounts (@username)

None

API

Available upon request

None

3.2 Coming Soon

The following monitoring methods are coming soon:

Method
What We Monitor
Limitations

LinkedIn

Posts from public accounts

None

Email (Index)

Mailing lists (links extracted as separate entries)

If you need emails to be extracted as one entry, refer to Email (Body + Attachment).

Web (Static)

Version-track changes to a single static web page

If you need to track new content published to a feed, refer to Web (Feed).

4. Data We Capture From Sources

4.1 Available Now

The following data can be captured from sources:

Field
How Obtained
Description

Title

Extracted

Auto translated to English. Long titles auto shortened to previews >200 chars.

Description / Body

Extracted

Original markup preserved (headings, links, images).

Attachments

Extracted

Directly linked PDFs are downloaded, OCR-processed, and stored in a vector DB.

Inline URLs in Description / Body markup are not auto-ingested but can be extracted via the interface.

Dated

Extracted

Publication date where available.

Preview

AI

Short English summary of Description / Body and Attachments.

Found

System

Timestamp of when the horizon scanner found the content.

Domain

System

The base domain of the source.

Method

Config

Refer to Section 3.

Source

Config

The name of the source.

Authority

Config

Refer to Section 6.

Authorisation

Config

Refer to Section 7.

Location

Config

ISO codes for region, subregion, country, state.

Custom associations (e.g., EU, UN, NATO) and cities (e.g., San Francisco).

4.2 Coming Soon

Custom field extraction based on your workspace configuration. Rules can be configured to extract structured data (i.e., multi-value, single-value, free text area, toggle, date, and number).

5. How Each Method Works

5.1 Web (Feed)

5.1.1 Central Feed Site

If a central feed exists (for example Monetary Authority of Singapore), we can:

  • Track content using filters (for example Consultations): apply query parameters such as ?content_type=Consultations

  • Track content using search terms (for example "AML"): apply query parameters such as ?q=AML

5.1.2 Multi‑Section Site

If no central feed exists (for example European Data Protection Supervisor), we can configure each feed available as a sub-source:

5.2 RSS

Direct monitoring of RSS feeds. Example:

5.3 Email (Body + Attachment)

5.3.1 Public Subscriptions

5.3.2 Private / Internal Subscriptions

  • We can share a generated email address for private or internal mailing lists

  • To filter by mailing lists, we require a copy of the sender email address

5.4 X / Twitter

Direct monitoring of posts from public accounts via the official X API. Example:

5.5 API

Available upon request for sources offering API services. Example:

5.6 Credentials and Paywalls

We can use credentials you supply (service accounts or user logins) to access gated content. We review the site’s terms of use before enabling automated access. Where licensing is restrictive, content may need to be monitored in Protected Mode (see Section 7) or be refused if the site is incompatible.

5.7 Web (Static) (Coming Soon)

We will soon add support for version-tracking static web pages with the ability to compare between snapshots of the web page. Example:

6. Source Classification (Primary vs Secondary)

Graceview classifies sources by authority level. Example:

Primary Sources
Secondary Sources

Government websites

General media

Regulatory agencies

Legal publishers

Official gazettes

Industry associations

Courts and tribunals

Professional journals

Legislative portals

Think tanks

Central banks

NGOs

Use the “Authority” filter to refine results by Primary or Secondary.

7. Operating Modes (Public vs Protected)

Graceview respects intellectual property rights through two operating modes. Use the “Authorization” filter to refine results by Public or Protected content.

7.1 Public Mode (licence compatible)

  • Usage: Content is public domain or under an open/attribution licence

  • Storage: Full text, attachments, metadata, dates, original link

  • Display: Complete content with attribution (copyright notices preserved)

7.2 Protected Mode (restricted/uncertain rights)

  • Usage: Content without compatible licenses or with usage restrictions

  • Storage: Non-copyrightable metadata only (title, date, original link)

  • Display: Redirect to original site (similar to search engines)

8. Key Features

8.1 Available Now

  • AI translation

  • AI summarisation

  • AI prefiltering

  • Lexical keyword filtering

  • Content deduplication (e.g., query string filtering)

  • Agents preserve original markup upon extraction (e.g., headings, images, links)

  • Agents are indistinguishable from organic traffic

  • Agents can use credentials to access gated content

  • Agents can interact with webpages and perform actions (e.g., navigate to other pages, execute searches, populate forms, etc.)

  • Agents can download files and take screenshots

  • Horizontally scalable (e.g., 100,000+ articles extracted in June 2025 alone)

8.2 Coming Soon

  • LinkedIn: Monitor posts from public accounts

  • Web (Static): Version-track changes to a single static web page

  • Email (Index): Mailing lists (links extracted as separate entries)

  • AI Extraction: Custom field extraction based on your workspace configuration

9. How It Works

  1. Configuration: Our content team configures structured data extraction for each source

  2. Synchronization: Sources are checked every 30 minutes (17,000+ times yearly)

  3. Processing: Content is extracted, summarised, translated, and made available based on your workspace configuration

  4. Monitoring: Real-time analytics and audit logs track horizon scanner performance

  5. Maintenance: Daily health checks ensure data integrity

10. Technology Stack

The horizon scanner uses enterprise-grade technologies including:

  • AWS: Amazon EKS

  • Microsoft: Translation via Azure AI Translator

  • OpenAI: Summarisation via frontier LLMs

  • Google: Headless browser via Google Chrome

  • Selenium: Browser automation

  • Laravel Horizon: Queue worker framework

  • LangChain: LLM framework

Last updated