Skip to content

Accept IRIs #1

@duerst

Description

@duerst

The link checker currently doesn't accept IRIs (i.e. URIs that contain non-ASCII characters). This should be fixed.

I have looked at this in the past, but didn't get around to do actual work.
There are basically two steps involved.
The first step is to make sure that the encoding of the document being checked is detected correctly.
The second step is then to convert the link to UTF-8 and percent-encode it before testing for resolution.
There are some additional details, such as treating query parts for HTTP/HTTPS as being in the document encoding.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions