Source Configuration
Learn how the horizon scanner works and how sources are configured.
1. Purpose
Graceview is a regulatory intelligence solution that aggregates regulatory, legislative, and related institutional developments from global sources. This guide explains how our horizon scanner works and how sources are configured.
We always prioritise and optimise for authoritative public primary sources. However, we can monitor secondary and non-public sources when needed, subject to technical and legal constraints.
2. What We Monitor
2.1 Qualifying Sources
A source is a location where new content is regularly published. Examples include:
Newsroom feeds on regulatory websites
RSS feeds from government agencies
Email newsletters and mailing lists
API endpoints from official sources
When in doubt, identify the specific page, feed, or mechanism that shows new content over time.
2.2 What We Cannot Monitor
Standalone links to PDF documents
Entire websites without specific feeds
3. Monitoring Methods
3.1 Available Now
The following monitoring methods are available now:
Web (Feed)
Web pages where new content is published
If you need to version-track changes to a single static web page, refer to Web (Static).
Some complex sites need multiple sub-sources (see EDPS example).
RSS
All RSS feeds
None
Email (Body + Attachment)
Mailing lists (each email is extracted as a single entry)
If you need links from emails to be extracted as separate entries, refer to Email (Index).
X / Twitter
Posts from public accounts (@username)
None
API
Available upon request
None
3.2 Coming Soon
The following monitoring methods are coming soon:
Posts from public accounts
None
Email (Index)
Mailing lists (links extracted as separate entries)
If you need emails to be extracted as one entry, refer to Email (Body + Attachment).
Web (Static)
Version-track changes to a single static web page
If you need to track new content published to a feed, refer to Web (Feed).
4. Data We Capture From Sources
4.1 Available Now
The following data can be captured from sources:
Title
Extracted
Auto translated to English. Long titles auto shortened to previews >200 chars.
Description / Body
Extracted
Original markup preserved (headings, links, images).
Attachments
Extracted
Directly linked PDFs are downloaded, OCR-processed, and stored in a vector DB.
Inline URLs in Description / Body markup are not auto-ingested but can be extracted via the interface.
Dated
Extracted
Publication date where available.
Preview
AI
Short English summary of Description / Body and Attachments.
Found
System
Timestamp of when the horizon scanner found the content.
Domain
System
The base domain of the source.
Source
Config
The name of the source.
Location
Config
ISO codes for region, subregion, country, state.
Custom associations (e.g., EU, UN, NATO) and cities (e.g., San Francisco).
4.2 Coming Soon
Custom field extraction based on your workspace configuration. Rules can be configured to extract structured data (i.e., multi-value, single-value, free text area, toggle, date, and number).
5. How Each Method Works
5.1 Web (Feed)
5.1.1 Central Feed Site
If a central feed exists (for example Monetary Authority of Singapore), we can:
Track all content: https://www.mas.gov.sg/search
Track content using filters (for example Consultations): apply query parameters such as
?content_type=ConsultationsTrack content using search terms (for example "AML"): apply query parameters such as
?q=AML
5.1.2 Multi‑Section Site
If no central feed exists (for example European Data Protection Supervisor), we can configure each feed available as a sub-source:
Annual Reports: https://www.edps.europa.eu/annual-reports_en
5.2 RSS
Direct monitoring of RSS feeds. Example:
SEC Commission Orders (click here)
5.3 Email (Body + Attachment)
5.3.1 Public Subscriptions
We subscribe a generated email address to public mailing lists
You can specify filter options if available (for example All News): https://www.mas.gov.sg/subscription-services
5.3.2 Private / Internal Subscriptions
We can share a generated email address for private or internal mailing lists
To filter by mailing lists, we require a copy of the sender email address
5.4 X / Twitter
Direct monitoring of posts from public accounts via the official X API. Example:
U.S. Securities and Exchange Commission (@secgov): https://x.com/secgov
5.5 API
Available upon request for sources offering API services. Example:
US Federal Register API (click here)
5.6 Credentials and Paywalls
We can use credentials you supply (service accounts or user logins) to access gated content. We review the site’s terms of use before enabling automated access. Where licensing is restrictive, content may need to be monitored in Protected Mode (see Section 7) or be refused if the site is incompatible.
5.7 Web (Static) (Coming Soon)
We will soon add support for version-tracking static web pages with the ability to compare between snapshots of the web page. Example:
EU Deforestation Regulation (click here)
6. Source Classification (Primary vs Secondary)
Graceview classifies sources by authority level. Example:
Government websites
General media
Regulatory agencies
Legal publishers
Official gazettes
Industry associations
Courts and tribunals
Professional journals
Legislative portals
Think tanks
Central banks
NGOs
Use the “Authority” filter to refine results by Primary or Secondary.
7. Operating Modes (Public vs Protected)
Graceview respects intellectual property rights through two operating modes. Use the “Authorization” filter to refine results by Public or Protected content.
7.1 Public Mode (licence compatible)
Usage: Content is public domain or under an open/attribution licence
Storage: Full text, attachments, metadata, dates, original link
Display: Complete content with attribution (copyright notices preserved)
7.2 Protected Mode (restricted/uncertain rights)
Usage: Content without compatible licenses or with usage restrictions
Storage: Non-copyrightable metadata only (title, date, original link)
Display: Redirect to original site (similar to search engines)
8. Key Features
8.1 Available Now
AI translation
AI summarisation
AI prefiltering
Lexical keyword filtering
Content deduplication (e.g., query string filtering)
Agents preserve original markup upon extraction (e.g., headings, images, links)
Agents are indistinguishable from organic traffic
Agents can use credentials to access gated content
Agents can interact with webpages and perform actions (e.g., navigate to other pages, execute searches, populate forms, etc.)
Agents can download files and take screenshots
Horizontally scalable (e.g., 100,000+ articles extracted in June 2025 alone)
8.2 Coming Soon
LinkedIn: Monitor posts from public accounts
Web (Static): Version-track changes to a single static web page
Email (Index): Mailing lists (links extracted as separate entries)
AI Extraction: Custom field extraction based on your workspace configuration
9. How It Works
Configuration: Our content team configures structured data extraction for each source
Synchronization: Sources are checked every 30 minutes (17,000+ times yearly)
Processing: Content is extracted, summarised, translated, and made available based on your workspace configuration
Monitoring: Real-time analytics and audit logs track horizon scanner performance
Maintenance: Daily health checks ensure data integrity
10. Technology Stack
The horizon scanner uses enterprise-grade technologies including:
AWS: Amazon EKS
Microsoft: Translation via Azure AI Translator
OpenAI: Summarisation via frontier LLMs
Google: Headless browser via Google Chrome
Selenium: Browser automation
Laravel Horizon: Queue worker framework
LangChain: LLM framework
Last updated

