User:Glrx
Notes about images[edit]
WMF does some interesting processing when it displays images. One might think that a JPEG image of the Mona Lisa is just transmitted to a browser, but that is not the case. The File:Mona Lisa, by Leonardo da Vinci, from C2RMF retouched.jpg is 90 MB, and that much data takes time to transmit (about 1 second at 1 Gbit/s or 10 seconds at 100 Mbit/s) and could put a big dent in a modest cellular telephone data plan.
Instead, WMF does something different. The 90 MB, 7,479 × 11,146 pixel, JPEG is downsampled to the display size. The result is a small image is transferred over the network. The transfer is faster, and the impact on data bandwidth and data plans is much smaller. For example, the downsampled JPEG is just 21143 bytes, a mere fraction of 90 MB.
Information for downsampling[edit]
The wiki markup will specify a particular image to display. The markup will specify a desired size (such as a width of 160 pixels).
Below is speculation. JPEG and PNG files may be reparsed at each inclusion. I need to check the sources.
I expect a database to hold critical properties. With quick access to some properties, it will not be necessary to parse the image file or the image description page. Such properties would be a base URL, file type, width, and height should be accessed quickly. From that information, one could quickly determine the HTML to include the image. The img
element's width, height, and URL can be determined without reading the image file.
If a new image is uploaded, that upload should clear the server caches of that image, but the upload need not trigger rebuilds of all the pages that use the image. Those pages need to be rebuilt only when the aspect ratio changes. Such a change could impact the page layout. *** Check that the img
element has both width and height attributes.
SVG files may be different. SVG URLs many need to specify the desired language. If a language is added, then that may trigger the rebuild of all pages that use the image. Any of those pages may specify the recently added language. The MW database holds (at least some of) the IETF language tags that the file supports. Well, the list of pages to rebuild can be much shorter. If |lang=
is set, then the page does not need to be rebuilt. That means if the SVG file adds the fr
langtag, then only pages on wikis that default to use the fr
langtag need to be rebuilt (the fr.Wiki). Any other wiki will not be affected. If such a wiki had requested a French version of the SVG, its pages would already have URLs that specify the French version. (Before the addition, the thumbnailer would not find a French version, so it would render the SVG default language.)
- Note: The MW database has an edited list of languages in its metadata. A MediaWiki API
imageinfo
query provides metadata and languages (MW type not IETF?): - Languages are filtered:
zh_CN
does not appear. Non-existent langtags also appear:zh
. - https://www.mediawiki.org/wiki/Manual:File_metadata_handling
Caching[edit]
The WMF servers confront a computational burden because they must downsample the requested image, but local server computation may be less expensive than data bandwidth.
In addition, WMF servers also cache the images that it downsamples. If I ask for Mona Lisa at a particular width, then a WMF server will generate that size. That work is stored in a cache, so if I or somebody else asks for the same size image at another time, the cached version is supplied rather than re-downsampling the image. The cache saves computation and time.
The moment when the caching is done is also significant. Although I can ask for images at particular sizes, the usual scenario is the image (such as Mona Lisa) is used on a wiki article page. When a wiki page is updated, MediaWiki rebuilds the wiki page (creating a cached version of the wiki page) and also caching any new images that were added to the page. That can require a lot of computation, but the result is the wiki page and all of its images are now in the server caches. Preloading the cache reduces the latency that a user would experience when he views the page. He need only wait for the data to be transferred; he would not wait for the downsampling because it has already been done.
Furthermore, WMF tells my browser to cache local copies on my computer. If I view a wiki page with the Mona Lisa image on it, the wiki page and the Mona Lisa image are copied to my computer. I can leave that wiki page, but the local copies remain on my computer. If I reload the page later, my browser can display the page without re-downloading the page and image from the WMF server. (Some small network traffic may confirm that my cached copies are still valid.)
That local caching interaction can be involved. The mechanism is part of the Hyper Text Transfer Protocol (HTTP). When a server transfers web pages or images using HTTP, it will specify some caching information. That information tells my browser if it may cache the data and how long the cached data is accurate.[1]
Consider srcset
and displayed sizes. MW chooses fixed-size images. Changing the width of a browser window changes the layout on my screen, but it does not require downloading new thumbnails.
Caches can cause trouble[edit]
Say wiki page ABC uses an image XYZ.
If page ABC is rebuilt every time it is accessed, then the page will always be up to date. If the page is cached, then the cache may have a stale version.
If somebody edits page ABC, then it is clear that page ABC should be purged from the cache.
If somebody edits image XYZ, then the cache should be cleared of XYZ. But the appearance of page ABC may also change even though none of the wikitext for page ABC has changed. How does page ABC get updated?
If the aspect ratio of XYZ does not change, then nothing much needs to happen. When page ABC is accessed, it comes out of the cache. The cached page has a reference to XYZ, but that image has been invalidated, so the new version of XYZ is fetched.
If the aspect ratio of XYZ changed, then the layout of ABC may have been altered. ABC needs to be rebuilt. MW maintains a database of where each image is used, so MW can invalidate all of the pages that use XYZ. There is a cascade: the invalidated pages may be transcluded, so more pages may need to be invalidated and rebuilt.
More on server caching[edit]
The server cache can be a separate set of servers positioned between the users and the actual servers. WMF uses Varnish.
- https://wp-rocket.me/blog/varnish-http-cache-server/ Varnish Cache: How It Works and How to Use It on Your WordPress Site, July 29, 2021, Alice Orru
- https://www.oreilly.com/library/view/getting-started-with/9781491972212/ch01.html Getting Started with Varnish Cache, 15 September 2016, Thijs Feryn
- w:Varnish (software)
- https://queue.acm.org/detail.cfm?id=1814327 You're Doing It Wrong, Queue vol. 8, no. 6, 11 June 2010, Poul-Henning Kamp
Domain names[edit]
Domain names such as commons.wikimedia.org
or upload.wikimedia.org
must be resolved to an IP address. That resolution need not be to a single IP address. Check namespace resolution and redirect messages as ways to shuffle the load.
A domain name resolves to one IP address. Many domain names may resolve to the same IP address.[2] But I think a name may have many A
records. I'm looking for information about random selection.[3]
Alt text[edit]
Proposal about alt=
text being added to HTML.
Page regeneration[edit]
Consider a typical Wikipedia page. It will use templates and images.
If one of the templates is edited, then the Wikipedia page probably needs to be rebuilt. The template may affect the page content or layout. MediaWiki keeps track of which pages use a template, so when a template is edited, then MediaWiki knows which pages need to be regenerated. There can also be as cascade because some templates use other templates.
That means that editing a template that is used on thousands of Wikipedia pages would trigger the regeneration of thousands of Wikipedia pages. Editing commonly used templates should not be done lightly. Commonly used templates may be protected. For example, {{Cite book}} on the English Wikipedia affects almost 1.5 million pages.
Editing an image does not require rebuilding the pages that use the image. The page still references the same image name, but now the image scalers will supply the new image rather than the old one. The cached HTML of the Wikipedia page is still good.
Well, not quite. When MediaWiki builds a page, it specifies the width
and height
attributes of the img
element. That allows the browser to layout the page before it has downloaded all the images. That avoids continual layout adjustments as image sizes are learned. So rebuilding pages may be required. WikiMedia could just do it all the time, or it could update pages only if a significant change occurred. If the image aspect ratio changed, then img
elements would need to be updated. If a multilingual SVG file added another language, then pages may need updating.
Media for cleanup[edit]
SVG images[edit]
WMF processes SVG images in a similar manner as JPEG images. Instead of serving the actual SVG file on a wiki page, WMF builds a PNG file of the requested width and serves the PNG. There are a couple of advantages to serving a PNG.
First, serving the PNG file can be much smaller than the SVG file. For example, the SVG map of Gibraltar is 290 kB. The request above produced a 48 kB PNG file:
accept-ranges: bytes access-control-allow-origin: * access-control-expose-headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache age: 74796 content-disposition: inline;filename*=UTF-8''Gibraltar_map-en.svg.png content-length: 47987 content-type: image/png date: Tue, 21 Dec 2021 02:25:44 GMT etag: 8391f68640a7f0cedd3971fef7b8b3d3 last-modified: Mon, 01 Feb 2021 12:23:52 GMT nel: { "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0} permissions-policy: interest-cohort=() report-to: { "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] } server: ATS/8.0.8 server-timing: cache;desc="hit-front", host;desc="cp1078" strict-transport-security: max-age=106384710; includeSubDomains; preload timing-allow-origin: * x-cache: cp1078 hit, cp1078 hit/2 x-cache-status: hit-front x-timestamp: 1612182231.00644
However, the SVG file is transferred with GZip compression; the transfer size is only 89 kB. The compression factor is 290/89 = 3.26. The transfer size is larger than the PNG, but it is less than twice the size of the PNG (89/48 = 1.85).
Second, when WMF started supporting the SVG file format, the browser support for SVG was nonexistent or uneven. Serving PNG files had strong support. Serving PNG renditions of SVG files also gives a uniform presentation. SVG images can vary depending upon the availability of particular fonts and the depth of SVG support.
Directly serving SVG[edit]
SVG client side rendering (Phabricator T5593)
The img
element allows animations but should block scripts. The object
element allows scripts.
Scripts can be malicious. WMF blocks uploading SVG files that contain scripts.
There is also a concern with animated files triggering seizures. That has been cited as a reason to not serve SVG directly. Detecting animated SVG is also made difficult because there are both SMIL and CSS animations. Automatically detecting update rate may be difficult. Even with a fast update rate, an animation may not trigger a seizure. See Commons:Deletion requests/File:Color Flash.gif.
What happens to mouse clicks? Wrap an a
element around a bitmap file. Wrap it around an SVG file.
SVG files can be malicious. An SVG file could be a computational nightmare that taxes the computer. PNG files will render in finite time. The SVG renderer on WMF servers put a time limit on the rendering. If it does not complete within a few seconds, then the process is terminated. There are some SVG files on Commons that can hit that time limit.
There are some language translation differences when an SVG file is directly served; see below.
SVG is XML or not[edit]
Dislike XML.
SVG has namespaces, but HTML does not. HTML lossage creeps in.
If XML is so good, why is CSS not XML?
XML details[edit]
Some notes for later.
The XML Spec 1.0 (Fifth edition). https://www.w3.org/TR/xml/
The XML prolog is optional.
- XML version
- encoding (ASCII / ISO issue) EBCDIC and UTF 16.
- standalone
- From the XML specification § 2.9:
- The standalone document declaration must have the value "no" if any external markup declarations contain declarations of:
- attributes with default values, if elements to which these attributes apply appear in the document without specifications of values for these attributes, or
- entities (other than
amp, lt, gt, apos, quot
), if references to those entities appear in the document, or - attributes with tokenized types, where the attribute appears in the document with a value such that normalization will produce a different value from that which would be produced in the absence of the declaration, or
- element types with element content, if white space occurs directly within any instance of those types.
- The standalone document declaration must have the value "no" if any external markup declarations contain declarations of:
- From the XML specification § 2.9:
The default attribute values raise issues with #REQUIRED
, #IMPLIED
, #FIXED
and default
. https://www.w3.org/TR/xml/#dt-default
The SVG 1.1 style
element has the type
attribute, and that attribute is #REQUIRED
. See https://www.w3.org/Graphics/SVG/1.1/styling.html#StyleElement
That means Phab:T68672 ("SVG style element ignored if no type attribute is specified") may have been invalid, and that Commons:Commons SVG Checker should require type="text/css"
.
The SVG 1.1 DTD has
<!ATTLIST %SVG.style.qname;
xml:space ( preserve ) #FIXED 'preserve'
%SVG.id.attrib;
%SVG.base.attrib;
%SVG.lang.attrib;
%SVG.Core.extra.attrib;
type %ContentType.datatype; #REQUIRED
media %MediaDesc.datatype; #IMPLIED
title %Text.datatype; #IMPLIED
>
For SVG 2.0, the type
attribute has an initial value of text/css
. https://svgwg.org/svg2-draft/styling.html#StyleElement
There is also the style
element content being a CDATA section. The SVG 1.1 conservative view was
<defs>
<style type="text/css"><![CDATA[
rect {
fill: red;
stroke: blue;
stroke-width: 3
}
]]></style>
</defs>
The CDATA section was needed to avoid entity interpretation and <. I remember having trouble at some point, but I think that was resolved by using CSS character literals rather than XML character literals. It may also be that the modern style
element is a CDATA section rather than PCDATA. Find the references.
That SVG snippet also shows the style
element within a defs
element. That used to be common practice, but it may have never been needed. The advantage of the defs
element was its content would never be rendered. There is more to say about defs
; many elements (such as linearGradient
) do not need to be within a defs
element.
SVG treading where it should not[edit]
SVG is about representing vector images, but it often steps into areas that do not affect the appearance of the image or where it has no authority.
Reinventing namespaces.
Effectively merging xml:lang
and lang
attributes. If both exist, they must be equal. Why not just keep one? It also complicates the CSS lang()
psuedo-selector. The only bad thing about xml:lang
is its impact on RDF metadata, but that can be fixed with xml:lang=""
. (Check XML specification.)
Deprecating xml:space
for CSS. XSLT knows how to handle xml:space
but it does not know how to handle CSS. Compare the options. Also raise the issue about text-anchor
and directions.
Why merge xlink:href
and href
? The XLink specification exists and was incorporated. Look at other parts of XLink. Yes, there are better ways to handle titles. List the XLink attributes. A more pointed complaint is that having adopted xlink:href
in SVG 1.0, switching to href
in SVG 2.0 is not downward compatible. Why make a breaking change?
The translate
attribute is inherited from ITS and its its:translate
attribute. The attribute tells language translators to not translate this text. It has no impact on the display of SVG. SVG is also missing features such as its:term
. That tells language translators that the phrase is a technical term that should have a consistent translation. SVG should have just pointed to ITS and suggested its use. Yes, ITS should not be using multiple namespaces.
The data-*
attributes reinvent what should be the data
namespace. Inherited from HTML. Also inherited is the crazy case-insensitive mapping that is not needed in case-sensitive XML. Show some examples. These attributes have no impact on the display of SVG. They are there to simplify some handling in the DOM, but that should be a separable extension.
The aria-*
attributes reinvent what should be the aria
namespace. The SVG Working Group does not control the meaning of these screen reading attributes, so it should have just pointed to the ARIA specification. That specification should have used namespace syntax.
SVG DOM[edit]
Significant advantage.
Looking for type hierarchy, but not seeing what I want.
Descriptive elements desc
and title
.
Metadata element metadata
.
Container elements such as g
.
SVG 2.0 says, "An element which can have graphics elements and other container elements as child elements. Specifically: ‘a’, ‘clipPath’, ‘defs’, ‘g’, ‘marker’, ‘mask’, ‘pattern’, ‘svg’, ‘switch’ and ‘symbol’."
Graphics elements such as line
and text
.
Inherits from
SVGGraphicsElement, so it has some methods, but not a type?
Interface SVGGraphicsElement.
Style information in the DOM[edit]
I do not believe the SVG DOM makes all the style
information available. A couple years ago I went looking for aural stylesheet information, and it was not there. Consequently, I do not believe that style properties such as -inkscape-font-specification
are broken out. Does that mean that they disappear completely when the DOM is written out?
It may be that none of such style properties make any difference, so removing them could be seen as beneficial.
Attributes that could be removed[edit]
Is there a list of Inkscape attributes that are always safe to remove? For example, Inkscape could always regenerate the node type list. If Inkscape can regenerate the information, then why keep it?
sodipodi:nodetypes
attribute can be regeneratedsodipodi:role="line"
-inkscape-font-specification
CSS can be removedline-height
CSS (has multiple defaults)font-style
if defaultfont-weight
if defaultfont-stretch
if defaultfont-variant
if default (short cut?)font-variant-ligatures
if defaultfont-variant-caps
if defaultfont-variant-numeric
if defaultfont-variant-east-asian
if defaultletter-spacing
property if defaultword-spacing
property if default
Some Inkscape and sodipodi attributes should be preserved. Some g
elements are identified as layers. Information about drawing grids does not take up much space, so removing that information does not have much benefit.
g
element (perhaps toplevel) withinkscape:groupmode="layer"
andinkscape:label="name"
andid="identifier"
- the
sodipodi:namedview
element will haveinkscape:current-layer="identifier"
- the
The significant benefit is removing verbose style information.
Additional information to remove would be needless graphics state. For example, if stroke="none"
, then we probably do not care about stroke-width
, stroke-dasharray
, stroke-dashoffset
, line joins, and end caps. Some font information may be a little different. If text has been converted to curves, keeping that information around would help in reconstructing the text.
Validation[edit]
- W3C validator
- Commons:Commons SVG Checker
- Jarry1250's SVG Check
- RDF validator
- https://www.w3.org/RDF/Validator/
- validate
- Constructed with
{{urlencode:{{filepath:First Ionization Energy.svg}}}}
{{filepath:}}
→ (no argument would work in file namespace){{filepath:First Ionization Energy.svg}}
→ https://upload.wikimedia.org/wikipedia/commons/1/1d/First_Ionization_Energy.svg{{filepath:File:First Ionization Energy.svg}}
→ https://upload.wikimedia.org/wikipedia/commons/1/1d/First_Ionization_Energy.svg{{urlencode:...}}
→ https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2F1%2F1d%2FFirst_Ionization_Energy.svg
- Constructed with
- validate
- https://www.w3.org/RDF/Validator/
SVG recommendations[edit]
Small size is a significant goal.
SVG is not arbitrarily scalable. Scalable is more about eliminating jaggies.
Fixed width lines. (CSS can adjust.)
SVG is not a good file format for bitmap images such as bar codes and QR codes. Those objects are not arbitrarily scalable; they must fit on a pixel grid. One could use barcode fonts within an SVG file; fonts will align to an underlying pixel grid.
Not for photographs (but can be used to label photographs).
Limited colors (can use color gradients). Color blocking suggestions rather than enormous detail.
Filters can produce complex objects such as chalk textures and clouds.
File size[edit]
SVG files can be small, but they can also be surprisingly large.
Files that unnecessarily large[edit]
Some files are inordinately large. Extraneous clipping paths, gradients, and copied rather than instanced symbols.
-
simple figure is 348 kB
See also SVG House of Estridsen.
Polar patterns[edit]
Consider some images from Category:NAVAID pictograms:[4]
-
TACAN navigation symbol is a reasonable 4 kB.
-
NDB navigation symbol is a huge 224 kB.
-
NDB-DME navigation symbol is nearly as complex but uses only 11 kB.
-
Test the display of zero-length dashes.
The first image is a central dot and 12 line segments, and it has a simple representation. The second image is problematic. it is a central dot, a central solid ring, and 10 dotted radial rings. It has a lot of dots, but why does it need so many bytes? Each dot is not a circle element, but rather a path that looks like a circle. The third image is nearly as complex (only 7 dotted rings), but it is a more efficient representation. Instead of round dots, it uses stroke-dasharray
for the dots. Notice that the dash array has some issues along the north axis.
We can get dots easily. Use a circle
element (stroked but no fill), set the stroke-dasharray="0 xxx"
, and set stroke-linecap="round"
. The value xxx
is chosen to be an integer fraction of the circumference. A close look at the NDB-DME symbol shows dashes instead of dots.
There is a problem with librsvg
: the stroke-dasharray
attribute must use commas rather than spaces.
Compression using pattern elements[edit]
Here is a simple file that takes 307 kB:
-
307 kB on 2022-09-20.
It has linear and radial gradients, and many groups are scaled. It should use a pattern
or feTile
.
Done Reduced file size to 9 kB. The interesting challenge here is the pattern fill. The shield outline is simple (and its path can be used for clipping). The shield can have a solid green fill that is overlaid with left and right red stripes. The white cross can be a path with a white fill (the path needs to be stroked within the shield, so overlaying two rectangles does not work). The pattern is difficult because it does not have a simple rectangular pitch. After filling the white cross, it is drawn two more times: first with a rectangular pattern of half the figures, and second with a similar pattern offset by half the pitch. The whole shield is then covered by a radial gradient. Finally, the outline of the shield is stroked at 3 pixels.
Three are many similar designs at w:List of Breton flags.
This file should have a trivial size:
-
164 kB on 2022-09-22.
A file that has already been compressed using group elements.
-
369 kB → 1 kB using pattern
Filters[edit]
File:Award-star-gold-3d.svg could use several SVG filters. Simple filter would be the shadow. Lighting would be more complicated.
Difficult tasks[edit]
Illustrating metal badges such as File:Police Badge,P.R.China.svg at 274 kB.
Extraction of groups[edit]
The magazine files.
Chemistry diagrams such as File:Calvin-cycle4.svg.
File:Fault types.svg (154 kB).
SVG optimizers[edit]
I am in favor of optimized SVG files. Some editors include a lot of pointless information. For example, an unstroked path
may have attributes specifying the stroke color and the stroke dash properties. A rect
may have a font-family
attribute.
I'm also OK with bona fide structural groups sharing style information. If a group has several common elements, then it makes sense that they be styled similarly.
There are command-line SVG optimizers, such as
- svgo (=SVGOmg)
- scour
- svgcleaner.
In general, I'm skeptical of using such optimizers. The notion of optimization is often based on deleting as many bytes as possible. Consequently, path data attributes become difficult to read, and metadata may get tossed completely. While specifying coordinates to the micro is often pointless, truncating coordinates to a fixed number of digits is a pretty big hammer. I do not know for sure, but I believe some tools may group neighboring elements to share formatting information; that is a poor way of imposing styles.
See also
Optimizers, if they do their job, do not change the appearance of the image.
Title and description[edit]
It is reasonable to include title
and desc
elements.
SVG 2.0 will allow language versions (using the lang
or xml:lang
attribute — not the systemLanguage
attribute).[5] The acceptance of the language versions is not clear, and it has an at-risk warning in the SVG 2.0 specification.
The title
and desc
elements are not display elements. For that reason, they cannot be selected within a switch
element. In that context, the elements would be giving a title and a description to the parent switch
element.
There should be support for the Dublin Core dc:title
and dc:description
elements with xml:lang
attributes.
Metadata[edit]
This section needs reorganization and clarity about types. It is about machine readable metadata.
Machine readable often turns to RDF. RDF is sophisticated, so the use is often a limited subset.
There are vocabularies such as Dublin Core. There are also schemas that describe how the vocabulary is used.
Then there is how the metadata is actually used.
Metadata and copyright are intertwined. Metadata should include information about the origin of an image, and several copyright licenses require that some information be provided.
The Creative Commons licenses require some specific information. For example, there should be a link to the CC license. Derivative works need to say what was changed. In many cases, these requirements may not be met.
I believe all SVG files should include metadata. It is not hard to add, and it can be useful. Including license data in the image metadata may fulfill licensing requirements or at least provide a colorable defense. Failure to follow all licensing requirements may lead to trouble.[6][7][8]
Moral rights. Even if I do not need to credit an author, there may be a moral obligation to give them credit. Sometimes that moral right can become a legal right. Some contributors allow free use of some or all of their work. LadyofHats is a notable example. That means I can use the work for any purpose, and I do not need to give anyone credit. That does not seem reasonable or even right. I could take Herman Melville's Moby Dick and publish it under my own name. It seems far better to say it is Melville's work.
Providing metadata also makes it easier for someone else to check the licensing rights. Commons encourages everybody (not just its wikipedia projects) to use the available art. Say Alice uses some CC0 SVG images from Commons on her website. The images are CC0, so Alice does not mention any licensing details. Bob sees Alice's website and likes the images, but how can Bob determine the licensing of the images?
The license check would be simple if the SVG file included the licensing information. Just given an image, it may be hard to find out who made it. If the image has metadata, then that information may be easy to find. The information may not be accurate (somebody may be license washing), but it is a starting point and could serve as a good defense.
https://www.dublincore.org/specifications/dublin-core/dcq-rdf-xml/
Other metadata[edit]
Metadata is not about just copyright information. Metadata can include other relevant information.
For photographs, there is latitude, longitude, altitude. More detail would be the aiming roll, pitch, and yaw. Even more would be the lens distortion coefficients.
For maps, the metadata may include information about the map projection. With that information, one could take the (x, y) location of a point on the map and convert it to the corresponding latitude and longitude.
For chemicals, the metadata may include structured descriptions of the chemical.
For illustrations, there might be metadata that says this file uses web colors or suitable for colorblind viewers. There might be a simple check for SVG files: use a finite number of colors, no gradients, no color-based filters, and the colors pass a color-difference test. Assessing a bitmap is tougher because colors mix at the boundaries and with anti-aliasing. A color histogram would be enough as long as there are large areas of color. Thin colored lines are a problem because their border can have as much area as their interior.
I'm leery of too much metadata. SVG should be more of an output format rather than a container for detailed information. Providing a small amount of information is reasonable, but including lots of information may be inappropriate. The intended use of SVG is to display an image.
The mess that is xml:lang
[edit]
The issues with lang
and xml:lang
. Watch out for accidental captures.
Creative Commons license requirements[edit]
Creative Commons licenses are used extensively on Commons and WMF servers.
State the common legal requirements of CC licenses.
- CC- requires a link to the CC license. That means a it is easy to find the license terms. Check if a CC0 license also has this requirement.
- CC-BY must provide reasonable attribution. May distribute and alter. May impose more restrictive license.
- CC-SA (implies a derivative work) must not use a more restrictive license and must describe the changes.
- CC-ND allows use but not modification.
- CC-NC does not allow commercial use. (What is the constraint on commercial? May a nonprofit use the work in its fundraising? May the Girl Scouts use it to sell their cookies? In the US, agency settles some of these questions.)
State the failings.
The file description pages are often inadequate. Sometimes there are gross errors such as an improper license. Derivative works often omit the attribution information in the license. The description of a derivative work often fails to describe the changes made to the original work.
Most file uses on WMF servers satisfy the requirements because MW links the file to its description page:
[[File:Yellow banana.svg|A picture of a yellow banana.]]
Presumably, the file description page has a link to the CC license and meets the attribution and modification requirements.
However, the file use may alter that link (MW:Help:Images).
[[File:Yellow banana.svg|link=https://www.nowhere.com/bitbucket|A picture of a yellow banana.]]
or
[[File:Yellow banana.svg|link=|A picture of a yellow banana.]]
If the override link does not provide the needed licensing information, then the license is violated. There can be disastrous ramifications. MW should not allow such links for CC-licensed material.
Dublin Core and Creative Commons[edit]
Reasonable SVG metadata should use both Dublin Core and Creative Commons vocabularies. The metadata can be expressed using RDF.
Dublin Core[edit]
A general reference:
It suggests some vocabularies. Looking for "Terman, Frederick" gives the MARC value
- https://authorities.loc.gov/cgi-bin/Pwebrecon.cgi?AuthRecID=6104582&v1=1&HC=1&SEQ=20220810164445&PID=REbDGJ6P_nYjOw83gOGJlAHS490
- permalink: https://lccn.loc.gov/n2003015013
Dublin Core provides a vocabulary for references. There are two Dublin Core namespaces:
dc: http://purl.org/dc/elements/1.1/
Original 15 element namespace defined in 2000.dcterms: http://purl.org/dc/terms/
Extended namespace defined in 2001. Everything in elements was mirrored in 2008.
Sometimes, the dcterms
namespace uses the dc
prefix. The goal is to use the dcterms
vocabulary rather than the 15-element dc
namespace. It is possible to translate dc
to dcterms
(e.g., using XSLT), but that translation may confuse existing software.
Dublin Core elements/1.1/
is a short (15 term), general, vocabulary for works:
dc:title
(there is also an SVGtitle
element)dc:date
dc:creator
dc:contributor
(I would use for translators)dc:source
dc:format
(less important) for SVG, useimage/svg+xml
dc:type
(less important) oftenrdf:resource="http://purl.org/dc/dcmitype/StillImage"
dc:publisher
(If empty, I would have this point to Wikimedia Commons)dc:subject
DC states, "Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary." I do not see a widely adopted practice here. Most people would probably use a text string of comma-separated keyword phrases. That would match the HTMLmeta
tag: e.g.,<meta name="keywords" content="HTML, CSS, Javascript" >
. However, the obvious RDF approach would use anrdf:Bag
that holds each keyword phrase:<cc:license><rdf:Bag><rdf:li>HTML</rdf:li><rdf:li>CSS</rdf:li><rdf:li>Javascript</rdf:li></rdf:Bag></cc:license>
. Thedcterms:
mirror is not a list of keywords.dc:coverage
Time or location. Not widely used? E.g., Port Royal earthquake.dc:description
dc:identifier
dc:language
dc:relation
dc:rights
The clearer practice here would be to usecc:license
The Dublin Core vocabulary uses general rather than specific terms. For example, the dc:creator
predicate covers several possibilities such as author, composer, lyricist, illustrator, and photographer. There are vocabularies that make finer distinctions,[9] but those distinctions may not be necessary for many works, and most applications probably do not support the terms.
Usage examples:
- https://www.dublincore.org/specifications/dublin-core/usageguide/2001-04-12/generic/
- https://www.dublincore.org/specifications/dublin-core/usageguide/2003-08-26/elements/
- list creators separately...
Interesting metadata in
Specifies data types.
Here is the metadata secion:
<metadata>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="">
<dc:title xml:lang="ja">DCMIのロゴ</dc:title>
<dc:creator rdf:resource="https://meta.wikimedia.org/?curid=10484470"/>
<dc:subject rdf:datatype="http://ndl.go.jp/dcndl/terms/NIISubject">情報学</dc:subject>
<dc:description xml:lang="ja" rdf:parseType="Resource">
<dc:format rdf:datatype="http://purl.org/dc/terms/IMT">text/x-wiki</dc:format>
<rdf:value><![CDATA[
[https://www.dublincore.org/ '''ダブリンコアメタデータイニシアチブ''']({{en|1=Dublin Core Metadata Initiative; DCMI|inline=inline}})のロゴ画像。
{{quote|lang=ja|text=
中央の円はイニシアチブの中核を,それを取り囲む内側の円達は[[w:ja:Dublin_Core#基本記述要素一覧|DCMIメタデータ要素集合]]([https://webdesk.jsa.or.jp/books/W11M0090/index/?bunsyo_id=JIS+X+0836:2005 JIS X 0836:2005])で利用できる15の基本記述要素を,外側の円達は要素集合の解釈及び拡張を,それぞれ表す。
|cite=ダブリンコアメタデータイニシアチブ
|source=[https://www.dublincore.org/about/#web-site-policies-software-logo-banner About DCMI/Web site, policies, software, logo, banner]の[[User:cmplstofB]]による試訳
}}
]]><!-- --></rdf:value>
</dc:description>
<dc:contributor rdf:resource="https://www.dublincore.org/"/>
<dc:date rdf:datatype="http://www.w3.org/2001/XMLSchemadate">2019-09-11</dc:date>
<dc:type rdf:datatype="http://purl.org/dc/terms/DCMIType">StillImage</dc:type>
<dc:format rdf:datatype="http://purl.org/dc/terms/IMT">image/svg+xml</dc:format>
<dc:source rdf:resource="https://www.dublincore.org/images/DCMI_logo_cropped.jpg"/>
<dc:language rdf:datatype="http://purl.org/dc/terms/ISO639-2">jpn</dc:language>
<dc:rights rdf:resource="http://www.wtfpl.net/about/"/>
</rdf:Description>
</rdf:RDF>
</metadata>
Many of the Dublin Core fields are text. They will use rdf:datatype
attributes. Some of those use the dcterms
vocabulary, but sometimes they use some another set of types. Consider dc:date
: it uses http://www.w3.org/2001/XMLSchemadate
. Some other fields use rdf:resource
rather than text. Is there a rewrite rule for rdf:resource
? A URI string with a URI datatype?
Dublin Core schemas:
Looking at a schema for elements
Looks like an arbitrary sequence of the 15 elements. Looks like the element content is text only (xml:lang
attributes are allowed).
<xs:complexType name="elementType">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute ref="xml:lang" use="optional"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
Significantly, this declaration does not show using rdf:resource
attribute.
I expected the dcterms
schema to be more restrictive.
However, the schema states
Encoding schemes are defined as complexTypes which are restrictions of the dc:SimpleLiteral complexType. These complexTypes restrict values to an appropriates syntax or format using data typing, regular expressions, or enumerated lists. In order to specify one of these encodings an xsi:type attribute must be used in the instance document. Also, note that one shortcoming of this approach is that any type can be applied to any of the elements or refinements. There is no convenient way to restrict types to specific elements using this approach.
Here's a dcterms
to dc
and what looks like a W3C Date-Time Format.
<xs:element name="date" substitutionGroup="dc:date"/>
<xs:complexType name="W3CDTF">
<xs:simpleContent>
<xs:restriction base="dc:SimpleLiteral">
<xs:simpleType>
<xs:union memberTypes="xs:gYear xs:gYearMonth xs:date xs:dateTime"/>
</xs:simpleType>
<xs:attribute ref="xml:lang" use="prohibited"/>
</xs:restriction>
</xs:simpleContent>
</xs:complexType>
The details are both troubling and confusing. Dublin Core looks like simple text (simpleType
). What impact does that have? For multiple authors, one either uses several creator
elements or puts the list in simple text. The dcterms
set does not provide access to rdf:Seq
.
Does common usage of Dublin Core violate the schema?
The schemas, without more, do not do a sensible validation of, for example, date syntax.
A reification from dcterms
to elements
is clear.
Creative Commons[edit]
Creative Commons adds some terms for specifying the license and attribution:
cc:license
(said to be the same asxhtml:license
; Commons does not allow uploading SVG that uses thexhtml
namespace)cc:attributionURL
(may be needed for CC-BY; I would have this point to the File: page on Commons)cc:attributionName
(may be needed for CC-BY)
For Commons files, making the cc:attributionURL
point to the file page on Commons may satisfy the attribution requirements of CC-BY licenses.
Resource Description Framework (RDF) statements have a subject, a predicate, and an object.
Although vocabularies are specified, how those vocabularies should be used is not nailed down. If there are two creators, how should that be specified? Should there be an RDF dc:creator
statement for each creator? Should there be one dc:creator
statement whose object is a set of the creators? The situation for licenses is more obvious. If the user gets to choose which of several licenses, then there should be one cc:license
, and the object should be an rdf:Alt
that identifies the alternative licenses. However, most software probably expects exactly one license rather than a list of alternatives. The simple approach is to offer only one license.
The lack of consistency implies problems. If a graphics program does not understand the input RDF, then it may get corrupted on output. The appropriate goal is to have metadata that most graphics editors understand. That way, the metadata is preserved during import and export.
Consistency and accuracy are also missing in many Commons licenses.
Say Alice creates a CC-BY-SA image and uploads it to Commons. Bob then reuses Alice's image. Bob is required to use a CC-BY-SA license, and Bob's image must carry attribution to Alice. Many mistakes happen on Commons. Bob's image may not mention Alice's licensed image. Bob may claim his work is CC0 (license washing). Bob may use CC-BY-SA, but he may not point out that Alice must be acknowledged, too. Given license information on Commons may be missing or incomplete, it is no surprise that license metadata may be haphazard, too.
A CC-BY-SA license permits modification (i.e., derivative works). The licenses require the modifier to describe the changes, but Creative Commons does not have a vocabulary term for describing the modifications.
Creative Commons and closure[edit]
Creative Commons does a good job for the original work. The license is declared, and there are constructs for attribution. If a work is used without modification, then the metadata has the information for proper attribution.
The metadata is insufficient when the original work is modified. The license requires that the changes be identified, but there are no XML elements for describing the changes.
List the licenses and the issues.
- 0
- -BY
- -SA
- -ND
- -NC
Another issue is how graphics editors can merge metadata.
Adobe Systems XMP[edit]
Adobe XMP uses the elements
namespace:
- "The [Dublin Core] namespace URI shall be
http://purl.org/dc/elements/1.1/
."
Sigh.
Adobe Systems includes metadata, and it has settled on a specific syntax with its eXtensible Metadata Platform (XMP). Adobe solves the multiple creator problem by always using a set of creators (even if there is only one creator). Adobe also restricts the use of complex RDF syntax.
In XMP, the dc:creator
should be an ordered list of ProperName
.[10] A ProperName
is a simple text value.
<rdf:Description rdf:about=""> <dc:creator> <rdf:Seq> <rdf:li>John Smith</rdf:li> <rdf:li>Richard Roe</rdf:li> </rdf:Seq> </dc:creator> </rdf:Description>
Should discuss the equivalent form.
<cc:Work rdf:about=""> <dc:creator>Alice</dc:creator> </cc:Work>
<rdf:Description rdf:about=""> <rdf:type rdf:resource="http://creativecommons.org/ns#Work" /> <dc:creator>Alice</dc:creator> </rdf:Description>
Inkscape metadata[edit]
Which namespace does Inkscape use? elements
or dcterms
? If it uses elements
, then it should upgrade. Or at least accept one or the other. I'm looking at a file I believe to be Inkscape, and it has xmlns:dc="http://purl.org/dc/elements/1.1/"
.
Inkscape has a metadata form to fill in, but Inkscape uses an agent description. (Pull a copy of Inkscape metadata).
<dc:creator> <cc:Agent> <dc:title>Andy Fitzsimon</dc:title> </cc:Agent> </dc:creator>
Please note that cc:Agent
is not part of the http://creativecommons.org/ns
namespace.
Dublin Core has a dc:Agent
, so it is possible that Inkscape meant dc:Agent
rather than cc:Agent
.
I'm not a happy camper...
The page does not include the attributionName
or attributionURL
elements. It has a set of licenses. It also points to some SIL licenses.
There is a significant but unresolved issue here.[11] An original goal is to identify the license and the creator. Not a lot of information is needed to acknowledge those rights; a simple text reference to a name might be good enough. However, more details can be given about the rights holder, so should the representation give more details? At what point would there be too much information. More information could be added, but very few systems will be able to process that information. The simple approach is to keep the information simple enough to satisfy license requirements and avoid adding extraneous details.
A URL is a better method of identifying a person than some text. Many people have the name John Smith, but the URL https://www.imdb.com/name/nm0808774/ identifies a particular John Smith. Unfortunately, many applications probably expect a text string and cannot handle a URL. If an application expects this input
<dc:creator>John Smith</dc:creator>
then how will it handle this input (i.e., a URL that identifies a particular John Smith)?
<dc:creator rdf:resource="https://www.imdb.com/name/nm0808774/" />
Try this out in Inkscape.... Try this out in Adobe Illustrator....
Well-known licenses[edit]
Creative Commons wants a cc:License
element that summarizes the license, but I do not like that practice for well-known licenses. What happens if the summary is inaccurate? Say the license URL is CC-BY-SA 4.0 but the license summary prohibits commercial use? Does the summary take precedence over the URL?
In theory, it should be easy to obtain the RDF description of a well-known license. For example, the license HTML at
has a link in the HTML
<link rel="alternate" type="application/rdf+xml" href="rdf" />
which refers to the license RDF at
Consequently, an RDF description of a well-known license is available.
It is possible to check whether the license summary is consistent with the published URL.
Creators also misuse CC-BY licenses on Commons by stating additional license terms. For example, the creator may state that the attribution must appear next to where the image is used. Creative Commons CC-BY licenses require attribution, but the license lets the licensee use any reasonable method of attribution. Here's the text about attribution from CC-BY-SA 4.0:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
If a creator states the license is CC-BY-SA 4.0, then the creator should not be able to state additional requirements. Additional requirements contradict the terms of CC-BY-SA 4.0.
Complicated licenses[edit]
If there is only one author, then license information is simple. When a work builds on others, then the license is complicated.
Consider File:Angriffe Antibiotika.svg. It is an improved copy of FEERERO's CC-SA 3.0 File:Angriffe Antibiotika.png. In addition, it borrows from
- File:Prokaryote cell.svg (CC-BY-SA 4.0 by Ali Zifan).
- File:Difference DNA RNA-DE.svg (CC-BY-SA 3.0 by Sponk).
- File:Main protein structure levels en.svg (PD-USER by LadyofHats).
- File:Green-Up-Arrow.svg (PD-ineligible by Palffy).
Consequently, the metadata should contain a lot of information.
A trivial solution has the metadata element just point to the Commons File: page.
Metadata checker[edit]
A few years ago, I did some tests on RDF XML validation.
A sophisticated metadata checker could...
- look for appropriate namespaces
- check value consistency (ISO dates, finite set ranges)
- calculate a list of ranges
- validate XML schemas (valid RDF, valid CC, ...)
- learn frequency of metadata (lots of image/xml+svg but little cc:creator)
- possibility of rewriting metadata
- possibility of adding metadata
My notes on RDF. GRIDDL.
Related topics[edit]
A more general topic, SVG validation, is Help:SVG. There were more involved discussions about SVG validation. Validation is often too strict (complaining about extensions such as Inkscape or new SVG features).
Commons:Overwriting existing files.
Removing metadata[edit]
Removing metadata from an SVG file (or any other file) may be inappropriate. Removing metadata may trigger legal issues but leaving it in has minimal cost.
Consider a signed painting. Should someone come along and paint over the signature?
Removing metadata is similar to removing a watermark. See Legal issues with the removal of watermarks and Removal of watermarks from Commons images. WMF legal staff opines that removing a watermark could violate the DMCA and even violate the terms of some Creative Commons licenses.
Compare to removing a watermark that was not part of the original image. Sometimes a person who hosts an image may add a watermark.
Sometimes metadata is inadvertently removed. When librsvg
produces a PNG, I doubt that it copies metadata from the SVG to the PNG.
Perhaps a scan of files should look for comments that imply an optimizer was used. Optimizers often strip metadata. See Massive edits. For example, scanning history of File:BSicon hKRZWa red.svg would reveal the upload comment "Slimmed down with svgomg".
Other uses of metadata[edit]
Images on Commons should have free licenses, but many uploads violate the creator's license. Generally, Commons relies on its users to upload only free material.
Some of that checking can be done automatically. Consider an image that is published on some website, and the website states a non-free license for that image. Alice likes the image, so she uploads it to Commons claiming it is her own work. Commons does not know.
Now consider that the image has metadata that says the creator is Bob and the license is CC-BY-NC. Commons could read the metadata, realize it does not know that Alice is Bob, and recognize the CC-BY-NC license is not compatible with Commons. Thus Commons could refuse the upload automatically.
At upload, Commons could also notice that a work is CC-BY-SA with required attribution. Commons could fill in the attribution details.
Graphics applications might also warn users about editing files that carry CC-BY-ND licenses.
Transparent backgrounds[edit]
Many SVG files have transparent backgrounds. Such files can be overlaid on colored backgrounds without adding squares of white.
The transparency can be overdone. For example, the file
has a transparent background, but that background includes the interior of the mixing vessel. That interior is not part of he background.
Removing watermarks[edit]
An unexpected issue. Instead of watermarks or timestamps, the information should be put in the file's metadata.
Detection of this information? Small fonts, strange fill colors, and outside of viewport.
Detect symbol candidates[edit]
Artists often copy-paste an image component rather than creating and using a symbol.
Also works for text-to-curve images.
Use styles[edit]
There is a difference between content and style. The content is the information, and the style is how it is displayed. Content that is in a particular class may be displayed the same way by using CSS to select and style SVG elements in that class.
In general, it is better to use CSS to achieve a consistent display rather than individually formatting graphics elements.
In particular, elements should not be grouped merely to impose a consistent style.
Consider a map. We may want the rivers and the names of rivers displayed in blue. A river
class can set the color for both the rivers and the font fill. Cities with a population under 100,000 may use a small dot and a small font, and cities over that size may use a larger dot and a larger font. CSS can set the font size and even the radius of a circle. Capital cities may use a star instead of a dot.
Using CSS and the class
attribute can make the display both consistent and easy to change. Fill colors and font families are set in just the CSS rather than on each SVG element. Changing the CSS will apply the change to all elements in the class.
Graphics editors should have a way to manage styles, but they may not round-trip them.
CSS selectors[edit]
Need better understanding of merging styles (as Adobe Illustrator uses).
.see https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors
Also precedence.
line { /* works */ }
.cls { /* works */ }
line.cls { /* works */ }
.cls1.cls2 { /* WMF fail. Would be good for class="city uk" or class="city ru" */ }
/* in general, WMF does not support attribute selectors */
[class] { /* any element with a class attribute */ }
[class~="cls"] { /* should be equivalent to class selector. */ }
/* these may not be legal */
.cls[fill="#000"] { /* want class filtered by attribute. fails? */ }
.cls[fill="#fff" i] { /* case insensitive match. fails? */ }
Support for CSS units[edit]
The simple approach to always use pixels, but CSS has many other units that may allow more flexible layouts.
- https://www.w3schools.com/cssref/css_units.php
- https://developer.mozilla.org/en-US/docs/Learn/CSS/Building_blocks/Values_and_units
- https://www.w3.org/TR/css-values-4/#dimensions
absolute and relative lengths
Possible trick: font-size
takes smaller
and larger
. Might use 0.7em
and 1.3em
.
The relative height px, em, ex are usually relative to the element, but they are relative to the parent for the font-size
property.
Unit | Version | Description |
---|---|---|
px | 1.0 | pixel (1/96 inch) |
cm | 2.0 | centimeter |
mm | 2.0 | millimeter |
in | 2.0 | inch |
pt | 2.0 | Adobe point (1/72 inch) |
pc | 2.0 | pica (1/6 inch) |
Q | 3.0 | quarter millimeters (Why?) |
lh | 4.0 | line height of the element |
rlh | 4.0 | line height of the root element |
em | 1.0 | font size |
rem | 3.0 | root font size |
ex | 1.0 | x-height |
rex | 4.0 | root x-height |
ch | 3.0 | character advance of narrow glyph ("0") |
rch | 4.0 | character advance of narrow glyph ("0") |
ic | 4.0 | character advance of full-width glyph ("水") |
ric | 4.0 | root element character advance of full-width glyph ("水") |
cap | 4.0 | capital height width |
rcap | 4.0 | root element capital height width |
vw | 3.0 | one percent of viewport width |
vh | 3.0 | one percent of viewport height |
vmin | 3.0 | one percent of viewport's smallest dim |
vmax | 3.0 | one percent of viewport's max dim |
vb | 4.0 | |
vi | 4.0 | percent of parent element |
svw, svh | Example | one percent of the small viewport |
lvw, lvh | Example | one percent of the large viewport |
dvw, dvh | Example | one percent of the dynamic viewport |
% | 1.0 | percent of parent element |
Media queries[edit]
SVG allows images to adapt. Printing a color image in black and white may not produce a satisfactory result. A red fill may look similar to a dark gray fill. SVG can use CSS media queries and adjust the presentation.
Consider the image on the right. If the media supports color, then the picture can have a blue background with white text and lines. If the media is black and white, then the background can be white, and the text and lines can be black. Mechanical drawings can be more complex. For color media, solid color fills may distinguish different components; for black and white media, crosshatches may distinguish the components.
An SVG style
element has a media
attribute. CSS syntax allows @media
queries.
SVG 1.1 / CSS 2 media query support is very limited. SVG 2.0 is much richer. There is some support, but it may not work well. In tests tried around 2019, one browser could distinguish color and monochrome requests, but it would not follow changes in the printer properties.
Consider these images:
-
remove colored backgrounds
-
remove blue background and change to black traces.
-
Color picture that should be drawn differently in black and white.
The color is nice, but when the image is printed on a black and white printer, the colors will be levels of gray.
Media queries can change the presentation on a black and white printer.
Width, height, and viewBox[edit]
These svg
element attributes can cause problems.
Specifying width
and height
gives a concrete size for the image and also implies clipping. If such an image is put in a small container, then that container can be scrolled. For example, File:2022 Russian invasion of Ukraine.svg is a detailed file where zooming and panning make sense. To see the fine detail in the image, it must be scrolled.
Specifying viewBox
sets a view port on the SVG and does not imply clipping. Parts of the image that are outside of the view port are still visible. Putting such an image in a small container just sizes the image to fit; it does not imply scrolling.
Errors and validation[edit]
Validation is often overly strict: issues cited as errors are often reasonable extensions. For example, HTML 5 allows data-*
attributes, and those attributes have found their way into the SVG 2.0 drafts. An SVG 1.1 validator will list them as errors, but they are harmless.
I'm on the fence with some other errors. Some SVG tools emit invalid XML identifiers (e.g., an identifier beginning with a digit is invalid; for example, File:Diagram_of_IGNORE.svg). Most XML implementations will handle such identifiers, so they are not a big deal. However, it may also be reasonable to fix these legitimate errors. What if some future XML spec required implementation to throw an exception when encountering such identifiers? Similarly, duplicate identifiers (such as those emitted by SVG Translate) are errors that may be reasonable to fix. (Do the duplicate identifiers confuse SVG Translate?)
SVG text[edit]
In general, the text within an SVG file should be in SVG text
elements. Avoid converting text to paths/curves. Such path text expands the size of the file, and it is often unnecessary. Artistic text (such as used in logos) may need to be converted to curves.
In general, if an SVG file contains text, then users should be able to copy and paste that text from the SVG file. A simple test is to load the SVG file into a browser and then try to select all the text (control-A in Windows). If no text is selected, then the diagram's text has been converted to curves.
The text that is selected should be readable, grouped appropriately, and spaced correctly. Independent phrases should be in their own text
element; they should not be combined with other phrases. Independent phrases that need two lines should not use two text
elements but rather code the lines in tspan
elements. That keeps the phrase together.
In addition, the selected text should not be missing spaces or have extra spaces. If the text is displayed on two lines, then it should have a space between those two lines. For example, the better result is "Holy Roman Empire" rather than "Holy RomanEmpire". Unfortunately, SVG does not handle spaces well. Spaces at the beginning or end of a line may not align as expected (the SVG hanging space problem of text-anchor
).
<text><tspan>Holy Roman</tspan><tspan x="0" dy="20">Empire</tspan></text> <text><tspan>Holy Roman </tspan><tspan x="0" dy="20">Empire</tspan></text>
Sometimes, text is spaced for emphasis. For example, a map of the United States may have text that looks like United States. That text should copy and paste without the additional spaces. Instead of inserting actual spaces to achieve the effect, the graphic artist should set the letter-spacing
of the string. Furthermore, do not space text by individually placing the characters. That makes the text difficult to translate, and it may render poorly when fonts are substituted. Use the mechanisms that SVG provides.
Similarly, a string that displays as all capital letters should use text-transform: uppercase
. For example, United States uses a text transform and will copy-paste as "United States". There are other text transforms, but they are less useful.
Sadly, text-transform
is not very smart. The result of text-transform: capitalize
applied to "united states of america" is united states of america. It also does not follow some language conventions such as capitalizing a Dutch "IJ": ijland; works with ij ligature character (U+133): ijland.
The perils of hidden text. It can confuse editors. Any hidden elements can cause confusion.
Fonts[edit]
Point to section about fonts and what scaling them means.
A good example of the benefits of nonlinear scaling of fonts is a bar code font. The font symbols are scaled to integer pixel widths. The symbols use Manhattan geometry, so the edges are sharp; no anti-aliasing is needed. The strict symbol geometries are maintained.
Recommend the CSS generic fonts serif
and sans-serif
. If possible, do not use exotic fonts.
WMF also has problems because librsvg
. There are times that we want a text string to be an exact length. SVG supports that with the textLength
attribute, but librsvg
does not support it.
In practice, there is not much support for particular font properties.
- font-family
- depends on system. Use CSS fallback.
- font-size
- specific size has excellent support; relative sizes may not be supported
- font-style
- normal, italic. Good support. Oblique....
- font-weight
- normal, bold. Other options have little support.
- font-stretch
- normal. Other options have little support.
- font-variant
- all purpose OTF support depends on font.
- font-size-adjust
- mumble.
- baseline....
It may be appropriate to fix some SVG files. For example, File:Planets2013.svg uses font-family="Arial-BoldMT"
and does not have font-weight="bold"
. Most font matchers will fail. There was a Phabricator request to have MediaWiki do that automatically, but it would be better as a robot task.
The WMF font list:
- https://noc.wikimedia.org/conf/fc-list
- Phab:T329576: SVG Checker doesn't have all fonts installed
- Phab:T280722: Commons SVG Checker has different fonts than Wikimedia rendering
Fonts and scripts[edit]
Given a script (4-letter IETF script), which fonts support it? Which languages does a font support?
Character encoding[edit]
Unicode is common now, so most SVG files will use Unicode or a Unicode-compatible subset. In practice, that means UTF-8, but UTF-16 is also a possibility. UTF-16 wants a byte-order mark (BOM), and some UTF-8 files will also include a BOM. Software should handle those cases.
Even though a file may claim to be Unicode, that does not mean the file uses Unicode. There are many special fonts that put exotic glyphs in non-Unicode character positions. The Adobe Symbol font, for example, uses its own character encoding.[12] Zapf Dingbats[13] and Adobe Sonata[14] also use their own encodings.
- Symbol
- ABCDEabcde → ABCDEabcde
- Zapf Dingbats, Wingdings
- ABCDEabcde → ABCDEabcde
- Sonata
- ABCDEabcde → ABCDEabcde
Even common fonts may have non-Unicode character assignments.[15] For example, many Adobe fonts use the Adobe Standard Encoding[16] which puts a dagger at 0xD1 (Ñ instead of U+2020: †) and the "fi" ligature at 0xAE (® instead of U+FB01: fi).
Files that claim to be Unicode but use non-Unicode fonts should be recoded with Unicode fonts. Font substitution may not work when fonts use non-Unicode character encodings.
See Phab:T272133 Make all Postscript core 35 fonts available to SVG by installing some packages.
Adobe Font list. https://adobe-type-tools.github.io/font-tech-notes/pdfs/5090.FontNameList.pdf
Courier, Helvetica, Times, Symbol, Avant Garde Gothic, Bookman, New Century Schoolbook, Palatino, Zaph Chancery, Zaph Dingbats.
Files with non-Unicode characters:
Files that use less common character encodings (such as Shift-JIS) do not need to be recoded if they use Unicode fonts. XML files that use such encodings can convert the text to Unicode.
Detecting non-Unicode files would be involved. The first step is converting to Unicode. The XML charset
attribute should be authoritative, and it offers a clear route to convert an XML file to Unicode. The XML DOM should automatically convert a known charset
to a DOMString
, which is essentially Unicode. (XML DOM now hides the character encoding.)
The second step is searching for fonts within an SVG file. If a font is Unicode, then the text content is OK. If the font uses a non-Unicode charset, then the text content should be searched for non-Unicode characters. If no non-Unicode assignments are used, then the text content is OK. If non-Unicode assignments are used, then select a Unicode font replacement, and edit the text content to change the non-Unicode characters to equivalent Unicode characters.
Those steps require a significant database.
- font family
- character encoding (points to a (possibly standard) table)
- replacement font
Character sets
- see w:Mojibake
Character metrics[edit]
Differences in font metrics are expected. The character widths will vary. The cap-height and x-height will vary.
But there are technical placement issues that spell trouble.
- Find the Phabricator item about SVG music and fonts.
The first music font was Adobe Sonata. It uses its own 8-bit character encodings. The common staff height is 1 em. The notehead centers were set on the baseline. So we can get the simple
- Sonata
- =qq=
Unicode defined a music block, but it did not specify the glyph sizes or positions. Google's Noto Music font has different note positioning but similar stem heights:
- Noto Music
- 𝄚𝅘𝅥𝅘𝅥𝄚
The SMuFL Bravura font also uses the Unicode block character assignments, but it uses the Sonata note positions.
- Bravura
- 𝄚𝅘𝅥𝅘𝅥𝄚
SMuFL fonts should have a JSON file that specifies the metrics. Reading that file should make all SMuFL fonts compatible, but many tools do not read the JSON files.
Other Unicode music fonts are less structured. A notehead may not fit between staff lines. The staff height may not be 1 em.
- default music font
- 𝄚𝅘𝅥𝅘𝅥𝄚
Path text[edit]
Talk about what path text looks like in some contexts.
There are files that should have path text removed.
- File:Chromosome.svg This file primarily has path text because flowRoot does not display. Done
- File:Animal cell structure en.svg (227 kB) clean this file up and then use SVG Translate.... Many separate translations exist.
- File:Tulejki zaciskowe.svg collet closing schemes
- File:Volcanic Arc System SVG en.svg
SVG files that have converted text to path are often marked with {{Path text SVG}}. Another (earlier?) convention was to explicitly categorize files that should use text to Category:Convert to TXT. The category has JPEG, PNG, and SVG files in it. Ah! It's from {{ShouldBeText}}. That template wants the figures to be converted to wikitext rather than using an illustration. It may be better to mark some files with {{Path text SVG}} or {{Convert to SVG}}.
Here is a file in the category:
- File:Ipv6 header.svg 30 kB Done
The file has not only converted the text to paths, each letter is a symbol
, and the text is typeset by placing those symbols.
This file is also interesting in that it describes a technical standard, and the text in the file are candidates for translate="no"
.
Another file:
Files with even more confusion (text as symbols and curves drawn as line segments):
- File:6rd.svg 312 kB
- File:Tunnel-ipv6.svg 270 kB
ARIA label[edit]
Some Inkscape files have aria-label
attributes.
- File:Regions of Finland labelled FI.svg (there are versions a few days later)
Here is some curve text from the SVG:
<g
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:12px;line-height:0%;font-family:Arial;-inkscape-font-specification:Arial;text-align:center;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;display:inline;fill:#646464;fill-opacity:1;stroke:none;stroke-width:0.999998"
id="text3787"
aria-label="Uusimaa">
<path
id="path6751"
style="font-size:19.715px;line-height:1.25;stroke-width:0.999998"
d="m 381.84099,1384.7132 ... z" />
<path
id="path6753"
style="font-size:19.715px;line-height:1.25;stroke-width:0.999998"
d="m 393.30611,1398.8256 ... z" />
<path
id="path6755"
style="font-size:19.715px;line-height:1.25;stroke-width:0.999998"
d="m 396.88715,1395.774 ... z" />
<path
id="path6757"
style="font-size:19.715px;line-height:1.25;stroke-width:0.999998"
d="m 407.44738,1386.7058 ... z" />
<path
id="path6759"
style="font-size:19.715px;line-height:1.25;stroke-width:0.999998"
d="m 411.82742,1398.8256 ... z" />
<path
id="path6761"
style="font-size:19.715px;line-height:1.25;stroke-width:0.999998"
d="m 434.94057,1397.5645 ... z" />
<path
id="path6763"
style="font-size:19.715px;line-height:1.25;stroke-width:0.999998"
d="m 445.91473,1397.5645 ... z" />
</g>
Observations:
- The DOM can reconstruct some details.
- The
aria-label
gives most of the text (it may be multiline). - Various font properties can be used to determine font, size, style, fill, and anchor.
- The SVGDOM can determine the bounding box of the
g
element. - The SVGDOM can determine the bounding box of the synthesized
text
element. - The SVG has some contradictions such as font size being both 12 and 19.715 pixels and line height of 0 or 1.25.
- The SVG has irrelevant stroke information.
- The CORS issue....
[edit]
Some graphics artists have hidden text and visible path text.
Generally, hidden text may indicate path text. Consider the cases
- hidden text (
display="none"
orvisibility="hidden"
) - text outside of the viewport
- often used for notes or example graphics
- may be part of a clipped or zoomed=in image
The difference between display
and visibility
- https://developer.mozilla.org/en-US/docs/Web/SVG/Attribute/display (element and its children are not displayed)
- https://developer.mozilla.org/en-US/docs/Web/SVG/Attribute/visibility (the layout of hidden text is still calculated)
Hidden text may be a standard practice in Category:SVG labeled maps of administrative divisions (location map scheme), which has more than 2,500 files. Styles should be used instead of concrete formatting, but Inkscape makes that difficult.
An example file is File:Togo,_administrative_divisions_-_de_-_colored.svg.
<g id="TT" display="none"> <g id="TT_Countries" display="inline"> <text transform="matrix(1 0 0 1 8.2725 236.5)" fill="#646464" font-family="'DejaVuSans-Bold'" font-size="18">GHANA</text> <text transform="matrix(1 0 0 1 41.6494 16.6387)" fill="#646464" font-family="'DejaVuSans-Bold'" font-size="18">BURKINA FASO</text> <text transform="matrix(1 0 0 1 292.7539 119.7319)" fill="#646464" font-family="'DejaVuSans-Bold'" font-size="18">BENIN</text> </g> <text id="TT_Sea" transform="matrix(0.75 0 0 1 239.7402 608.8057)" display="inline" fill="#0978AB" font-family="'DejaVuSerif-BoldItalic'" font-size="16">ATLANTISCHER OZEAN</text> <g id="TT_Regions" display="inline"> <text transform="matrix(0.8 0 0 1 97.8931 73.6895)"><tspan x="0" y="0" fill="#646464" font-family="'DejaVuSansCondensed-Bold'" font-size="22">Savanes</tspan><tspan x="93.168" y="0" fill="#646464" font-family="'DejaVuSansCondensed-Bold'" font-size="22" letter-spacing="33.75"> </tspan></text> <text transform="matrix(0.8 0 0 1 173.4487 197.23)" fill="#646464" font-family="'DejaVuSansCondensed-Bold'" font-size="22">Kara</text> <text transform="matrix(0.8 0 0 1 172.1001 299.9858)" fill="#646464" font-family="'DejaVuSansCondensed-Bold'" font-size="22">Centrale</text> <text transform="matrix(0.8 0 0 1 181.4487 442.9453)" fill="#646464" font-family="'DejaVuSansCondensed-Bold'" font-size="22">Plateaux</text> <text transform="matrix(0.8 0 0 1 188.5601 537.9512)" fill="#646464" font-family="'DejaVuSansCondensed-Bold'" font-size="22">Maritime</text> </g> <g id="Nmbrs_Regions" display="inline"> <text transform="matrix(0.8 0 0 1 135.1401 110.792)" fill="#646464" font-family="'DejaVuSansCondensed-Bold'" font-size="50">1</text> <text transform="matrix(0.8 0 0 1 173.4487 197.23)" fill="#646464" font-family="'DejaVuSansCondensed-Bold'" font-size="50">2</text> <text transform="matrix(0.8 0 0 1 188.5601 311.9995)" fill="#646464" font-family="'DejaVuSansCondensed-Bold'" font-size="50">3</text> <text transform="matrix(0.8 0 0 1 203.7798 450.9453)" fill="#646464" font-family="'DejaVuSansCondensed-Bold'" font-size="50">4</text> <text transform="matrix(0.8 0 0 1 223.1953 549.5059)" fill="#646464" font-family="'DejaVuSansCondensed-Bold'" font-size="50">5</text> </g> </g> <g id="PIX"> <g id="TT_Countries_1_"> ... </g> ... </g>
It has particular id
elements. TT for TrueType?
Most of the text should be anchored to the middle rather than the start.
The map files are using the Adobe method of burying font characteristics in the font name.
The font specification DejaVuSansCondensed-Bold
should be font-family="DejaVu Sans, sans-serif" font-weight="bold" font-stretch="condensed"
. In addition, the transform x-scale should be 1 rather than 0.8; automatic detection may be difficult.
Should SVG with such font names be edited?
Notice the SVG uses groups to impose consistent style information. Using structural hierarchy to impose style is an odd practice. The text is not grouped with the other graphics but rather with siblings. Separating one region would involve exploding several groups to remove irrelevant regions.
Classifier to style? These files may be simple. Would like to do some normalization (such as pulling out a default fill for the text).
Another hidden-text example is File:Cochlea-crosssection.svg. It also has the unsupported flowRoot
element, so the hidden text may not be such a bad thing.
tref
element[edit]
SVG user agents did not implement the tref
element, and the SVG 2.0 specification drops the element. As specified, the element does not appear useful. Duplicating rather than referencing text is simple enough.
Consider a map. Most place names (such as towns and cities) will be used once, so a reference would not be useful. A generic feature such as an airport may appear several times, but it can use a symbol to repeat the text.
Rivers, on the other hand, are long, so they may be labeled on a map a few times. For example, the Amazon may appear on a map several times. One could use symbol to do the repeat, but rivers are oven labeled using a path that follows the river. A tref
might be convenient to label the river at several places along its twisting length.
<defs> <text id="amazontext">Amazon River</text> <path id="amazonpath1" d="..."/> <path id="amazonpath2" d="..."/> <path id="amazonpath3" d="..."/> </defs> <text><textPath xlink:href="#amazonpath1"><tref xlink:href="#amazontext"/></textPath></text> <text><textPath xlink:href="#amazonpath2"><tref xlink:href="#amazontext"/></textPath></text> <text><textPath xlink:href="#amazonpath3"><tref xlink:href="#amazontext"/></textPath></text>
However, the approach is not suitable for switch
translations.
A textPath
element may have tref
children.
The target of tref
is not well specified. It sounds like it could point to any element with text content, but that does not mean that pointing to a switch
element would provide just the rendered text.
A construct using convertional translation tools and its:term
would work better.
Glyphs[edit]
SVG 1.0 and 1.1 have SVG elements that allow a user to embed a font.
For those cases where converting text to curves makes sense, using glyphs offers potential benefits.
Discussion at Commons:Graphic_Lab/Illustration_workshop/Archive/2021#Vietnamese-style_seal_of_the_Government-General_of_French_Indo-China (and several sections immediately following)
General information about w:en:Seal script.
The seal is 115 kB for 15 characters. That is about 7700 bytes per character, which is rather large. Using the path
element, one should be able to describe a line segment in less than 100 bytes. Examining the image with magnification shows that the character strokes have a lot of noise.
Modern script (not seal script) using writing-mode: vertical-rl
:
大法國欽命
總統東法全
權大臣管理
Some SVG files embedded commercial fonts as glyphs. For example, an Adobe Illustration file might embed a portion of the Arial font in an SVG file. That practice should be discouraged.
For Unicode seal script, there is a list of Unicode documents at https://www.unicode.org/L2/topical/seal/
- sample files from that list
Finding the corresponding characters in that document or a similar document might be helpful.
- For example, the second document above maps 林 to Serial No.字序 04418
SVG 2.0 drops glyph
[edit]
SVG was developed when web fonts were not available, so SVG included a rudimentary embedded font mechanism.[17] With web fonts, such a facility is not as important, so the mechanism has been deprecrated. As of 2021, support may still be found in the Safari and Android browsers.
Glyphs would not work with some scripts[edit]
The Unicode specification will not add any new composed characters. That simplifies the number of characters needed. For example, Siddham script has thousands of glyphs, but most of those glyphs are composed characters. In Unicode, Siddham has a small number of fundamental characters. Composed characters are still drawn, but they no longer have exposed codepoints.
WMF prohibits web fonts[edit]
SVG 2.0 may have dropped glyph
support because web fonts are now available. In the past, web pages depended upon the fonts that a user already had on his local machine. If the local machine did not have the font, then it would substitute some other font. Those substitutions could lead to bizarre results.
It gets even more troublesome when the desired font is for uncommon Unicode scripts. Unicode supports many scripts, but most users will not have those scripts. Unicode has assignments for Egyptian hieroglyphics and ancient Sanskrit Siddham.
CSS now has a mechanism to load web font.
Google offers a lot of fonts, and it also has CSS files to use those fonts as web fonts. (Reference)
The downside is the webfonts allow some tracking. The web font files have a long caching time (was it a year?). A browser would download the font and use it without continually querying the Google servers. The CSS files have relatively short cache times, so the browser would be contacting Google servers often. (Reference)
Alberta road signs[edit]
Road signs can be thorny. They may contain artistic text, and they may contain ordinary text.
Even with artistic text, the file sizes are often not large because the signs are simple (they do not contain much text).
Old Alberta road signs could sensibly use a stylized font.
The modern road signs are too stylized.
- File:Alberta Highway 2.svg
- File:Alberta Highway 584.svg "This SVG sign uses the path text method."
See Category:Alberta Highway shields
Fonts are not that important to signs. See File:AB69ewSigns-TwoFontsYMM (28172571140).jpg which shows two road signs using the old Alberta logo but the highway numbers are in different fonts.
Font height may remain fixed, but the font weight (e.g., bold) or font stretch (e.g., condensed) may vary.
2, 2A, 93, 93A
File:Alberta wordmark 2009.svg
File:AB-provincial highway.svg
Polish road signs[edit]
About wide variation in Category:Diagrams of Voivodeship road signs of Poland. Height is 270, but widths are all over the map.
The font is Drogowskaz. See en:Polish road signs typeface.
Arial is a reasonable facsimile: 0123456789.
<?xml version="1.0" encoding="UTF-8" ?>
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://www.w3.org/2000/svg"
version="1.1"
viewBox="0 0 540 270"
font-family="Drogowskaz, Arial, sans-serif"
font-size="230"
font-weight="bold">
<metadata id="metadata15">
<rdf:RDF>
<cc:Work rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
</cc:Work>
</rdf:RDF>
</metadata>
<rect x="6" y="6" width="528" height="258" stroke="black" stroke-width="12" rx="27" fill="#fafd4a" />
<text x="50%" y="220" text-anchor="middle">100</text>
</svg>
Text anchors[edit]
SVG files should use reasonable anchors. The usual choices are left aligned, center aligned, or right aligned. If I want text aligned on the right edge, I should not insert some left-aligned text and then move the position of the whole string so the right edge ends whene I want it to end.
Alignment is important because font metrics vary. Text that seems to align correctly with one font may look ragged with another font.
Choosing the correct text anchor is a simple defense against varying font metrics.
The leader lines should be placed with precision. The lines should be at the start, the middle, or end of the text. Consider a lengthy English label may translate to a single character. If the leader starts at the third character, it may be misaligned for Chinese. It's not a great example, but compare pancreas (Q9618) and 胰脏(Q9618). Also, a leader line should be careful about character clearance. Starting just below the baseline will not work will if the translation has a descending character there. Similarly, starting a leader just above some English text may intersect with the Chinese translation because Chinese uses taller characters.
Perhaps show translation boxes?
Text anchor alignment[edit]
SVG is unusual. Normally, when text is justified, that justification ignores leading and trailing whitespace. That means the visible characters (the ones that make marks on the canvas) are centered. SVG treats spaces as visible characters.
Text anchors, direction, and BIDI[edit]
There are subtle problems with SVG text anchors. The SVG semantics do not play well with text direction. If the anchor sets a starting point, then left-to-right text builds to the right, but right-to-left text builds to the left. That can give screwy results.
The issue is a bit more complex. There is an interaction between the specified text direction and the Unicode BIDI algorithm. They will give reasonable results in simple cases.
Consider start-aligned text: text-anchor="start"
.
For English, we expect direction="ltr"
, so we would expect
|English
For Arabic, we expect direction="rtl"
, so we would expect
cibarA|
But if the Arabic is laid out with direction="ltr"
, the layout sort of works due to BIDI:
|cibarA
Do nothing, and the layout sort of works. It fails when strings contain both LTR and RTL characters. Consider "English 17 kg". Where we want to keep the "kg" units. The result with direction="ltr"
is
|English 17 kg |17 cibarA kg
BIDI starts out in LTR. It sees the Arabic, so it starts a RTL block for "Arabic". The space is neutral, so it is added RTL. The numbers are weak LTR, so they are added to the RTL block as a subblock. The "kg" is strong LTR, so they terminate the blocks and go back to toplevel.
Here are some span
elements in HTML:
- English 17 kg
- عربي 17 kg
To do it correctly, we need to set the direction for the entire phrase and swap the text-anchor
.
Here is the text setting the direction for the entire span
- English 17 kg
- عربي 17 kg
Ideally, the text
element should set the text direction that is appropriate for the script. English text should set the text direction as left-to-right. Arabic text should set the text direction as right-to-left. Unicode BIDI will then layout the strings correctly, but now the layout will head in opposite directions. For expected results, one must change both the text anchor and the text direction. That's a headache.
In theory, CSS can fix the problem, but SVG agents may have weak CSS implementations.
Phab:T271663 Offer to invert text-anchor for RTL languages
SVG warts[edit]
Graphics state scope[edit]
SVG does not nest the graphics scope. The ex2 problem. In HTML, this cascaded form works correctly: the exponent 2 is scoped to the x graphics state. In SVG, the exponent 2 is scoped to the e graphics state.
The superscript and subscript problem. When there is both a subscript and a superscript, we want them on top of each other.
- HTML and SVG: 126C+42.
- .see https://stackoverflow.com/questions/3742975/subscript-and-superscript-for-the-same-element
- 126C+42
- .see https://bytes.com/topic/html-css/answers/672245-both-sub-superscript-together
- try a variation
- 126C+42
- .see https://stackoverflow.com/questions/3742975/subscript-and-superscript-for-the-same-element
- Using MediaWiki
<math>
: ; also - Using MediaWiki
<chem>
:
SVG is not HTML[edit]
SVG 1.1 uses the xlink:
namespace. HTML does not have namespaces, so HTML uses just href
rather than xlink:href
. For some reason (perhaps embedding SVG within HTML), SVG 2.0 has decided to use href
.
The problem with xml:lang
and lang
.
SVG should be about making marks on a screen or a piece of paper. It should not be about myriad other topics. If the semantics are not about marks on paper, then the semantics do not belong in the SVG specification.
For example, there is a notion that some text might be translated to another language, while other text should not be. People who were interested in XML markup developed the Internationalization Tag Set for making such notations. Consequently, one could add rules and attributes to an XML file that translation utilities could use. The attribute its:translate="no"
means do not translate the content, and its:translate="yes"
means translate the content. The specification also included rules using XPATH patterns to identify what should or should not be translated. Everything in the its
namespace is distinct from other namespaces (and the default namespace).
HTML does not have namespaces, so the use of ITS is a bit awkward. So HTML added the translate
attribute. It is not as powerful as the ITS specification, but it is simple. SVG comes along and copies the HTML translate
attribute. There is no reason. SVG can support the its:
namespace; it is not crippled like HTML.
HTML does not have namespaces. Instead of following XML and adopting namespaces (as XHTML did), HTML invented a poor man's namespace. If attributes start with data-
or aria-
, then they are in a quasi-namespace. SVG is XML, so it need not stoop to such measures. SVG should have used data:
and aria:
namespaces.
HTML ignores capitalization. Consequently, <Head>
is the same as <HEAD>
. It is the same for attributes. However, the data-
attributes wanted to have database keys that were cases sensitive. So HTML uses a hyphen algorithm. Everything after the data-
prefix is in lowercase unless it is immediately preceded by a hyphen. The attribute DATA-NAME="Smith"
sets database["name"] = "Smith"
. If we wanted the database key to be all capitals, we must say data--N-A-M-E="Smith"
. That's due to the case-insensitive nature of HTML. XML and SVG can be much simpler. They could say either data:name="Smith"
or data:NAME="Smith"
. No need for a pseudo namespace, and no need for a hyphen capitalization rule.
Zoom and pan[edit]
Can SVG images be inserted such that the image can be zoomed and panned without affecting the including page?
Does the use of viewBox
frustrate that goal?
MediaWiki problems[edit]
librsvg
color issue[edit]
Where was this?
The librsvg
renderer produces PNG bitmaps, but those bitmaps may not set the colorspace to sRGB.
MediaWiki whitelisted namespaces[edit]
MediaWiki has a select set of namespaces for SVG.
The intention of the whitelist is to avoid script injection. For example, the SVG might include an XHTML namespace subtree, and that subtree might allow scripting that is not detected by the ordinary SVG filter.
It looks like the test requires elements to be in whitelisted namespaces but does not require attribute values to be in whitelisted namespaces. I should check that distinction. Might try
<svg version="1.1"
xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://namespace... http://schemalocation/schema.xsd">
See also Help:SVG.
- T153285: Unexpected error "This SVG file contains an illegal namespace" for xmlns:rdfs attribute value (closed).
- T278044: This SVG file contains an illegal namespace "http://www.w3.org/2000/02/svg/testsuite/description/".
- T283316: More whitelisted namespaces for SVG files
static $validNamespaces = [
'',
'adobe:ns:meta/',
'http://creativecommons.org/ns#',
'http://inkscape.sourceforge.net/dtd/sodipodi-0.dtd',
'http://ns.adobe.com/adobeillustrator/10.0/',
'http://ns.adobe.com/adobesvgviewerextensions/3.0/',
'http://ns.adobe.com/extensibility/1.0/',
'http://ns.adobe.com/flows/1.0/',
'http://ns.adobe.com/illustrator/1.0/',
'http://ns.adobe.com/imagereplacement/1.0/',
'http://ns.adobe.com/pdf/1.3/',
'http://ns.adobe.com/photoshop/1.0/',
'http://ns.adobe.com/saveforweb/1.0/',
'http://ns.adobe.com/variables/1.0/',
'http://ns.adobe.com/xap/1.0/',
'http://ns.adobe.com/xap/1.0/g/',
'http://ns.adobe.com/xap/1.0/g/img/',
'http://ns.adobe.com/xap/1.0/mm/',
'http://ns.adobe.com/xap/1.0/rights/',
'http://ns.adobe.com/xap/1.0/stype/dimensions#',
'http://ns.adobe.com/xap/1.0/stype/font#',
'http://ns.adobe.com/xap/1.0/stype/manifestitem#',
'http://ns.adobe.com/xap/1.0/stype/resourceevent#',
'http://ns.adobe.com/xap/1.0/stype/resourceref#',
'http://ns.adobe.com/xap/1.0/t/pg/',
'http://purl.org/dc/elements/1.1/',
'http://purl.org/dc/elements/1.1',
'http://schemas.microsoft.com/visio/2003/svgextensions/',
'http://sodipodi.sourceforge.net/dtd/sodipodi-0.dtd',
'http://taptrix.com/inkpad/svg_extensions',
'http://web.resource.org/cc/',
'http://www.freesoftware.fsf.org/bkchem/cdml',
'http://www.inkscape.org/namespaces/inkscape',
'http://www.opengis.net/gml',
'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'http://www.w3.org/2000/svg',
'http://www.w3.org/tr/rec-rdf-syntax/',
'http://www.w3.org/2000/01/rdf-schema#',
];
Namespaces to add:
'http://www.w3.org/2000/02/svg/testsuite/description/'
for W3C test suites'http://purl.org/dc/terms/'
more recent Dublin Core'http://www.w3.org/1998/Math/MathML'
MathML'http://www.opengis.net/gml/3.2'
new version of GML'http://www.w3.org/2005/11/its'
w:Internationalization Tag Set- Library of Congress vocabulary
- LOC MADS
- LOC MODS
'http://www.ogc.org/crs'
maybe...
One of the whitelisted namespaces is suspect:
Might check if there are any SVG files that use this namespace.
Some absent namespaces are significant. When Dublin Core came out in 2000, it provided a succinct set of terms in the dc/elements/1.1/
namespace. The next year, it came out with an expanded dc/terms/
namespace and vocabulary. In 2008, it encouraged dropping the first namespace in favor of the dc/terms/
namespace. WMF accepts the former but not the latter namespace.
MathML namespace[edit]
The MathML namespace is also not whitelisted. MathML allows sophisticated mathematical typesetting, but WMF blocks its upload. Users cannot upload this file:
<?xml version="1.0" encoding="utf-8"?> <svg viewBox="0 0 300 200" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"> <title>SVG MathML test</title> <desc>Test if MathML is available in SVG. Will not upload to Commons due to MathML namespace.</desc> <metadata> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/terms/" xmlns:cc="http://creativecommons.org/ns#" > <cc:Work rdf:about=""> <dc:publisher>Wikimedia Commons</dc:publisher> <cc:license rdf:resource="https://creativecommons.org/publicdomain/zero/1.0/"/> <cc:attributionName rdf:resouce="http://commons.wikimedia.org/wiki/User:Glrx" /> <cc:attributionURL rdf:resource="http://commons.wikimedia.org/wiki/File:SVG_MathML_test.svg" /> </cc:Work> </rdf:RDF> </metadata> <text x="150" y="40" text-anchor="middle">SVG MathML test</text> <switch transform="translate(50,100)"> <foreignObject width="200" height="50" requiredExtensions="http://www.w3.org/1998/Math/MathML"> <math xmlns="http://www.w3.org/1998/Math/MathML"> <msqrt> <msup><mi>x</mi><mn>2</mn></msup> <mo>+</mo> <msup><mi>y</mi><mn>2</mn></msup> </msqrt> </math> </foreignObject> <text>\sqrt{x^2 + y^2}</text> </switch> <text x="10" y="175" font-size="8">should display a formula in either MathML or TeX</text> </svg>
General issues[edit]
- https://commons.wikimedia.org/w/index.php?title=User_talk%3ASarang&type=revision&diff=611702254&oldid=611488584 at § Calvin-cycle4.svg
- about File:Calvin-cycle4.svg and other topics.
- WMF
librsvg
does not supportstyle="overflow:visible"
. Symbols always have a clipping region.
Converting bitmaps to SVG[edit]
Many files on Commons are bitmaps, but some would be more useful as SVG files. Bitmap files are great with large, orthogonal, features, but they can struggle with thin features and curves. Zooming in on a feature will show more anti-aliasing fuzz or jagged edges. More details require more bits. Bitmap files can be difficult to edit. Changing lines or text involves not only adding the new content, but also erasing the old. Erasures can be difficult because the background must be reconstructed. It is difficult to copy text that is in a bitmap: the text is just a picture that must be converted to characters. It takes a lot of work to translate a bitmap to another language. Bitmap files that are good candidates for vectorization can be marked with {{Convert to SVG}}.
Value[edit]
An issue in any undertaking is its cost-benefit. it will take time and effort to improve or convert a file. Is that cost worth it? If an image is widely used, then the cost can be amortized over many views. If the image is little used, then the benefit may be just an intellectual challenge.
Quick examples[edit]
-
JPEG should be SVG
-
PNG should be SVG
-
PNG should be SVG (29 kB)
-
PNG should be SVG (56 kB)
Notes[edit]
Expensive parser tests:
- {{PAGESIZE:{{FULLPAGENAME}} → 283,082
- {{PAGESIZE:File:Silversmith.jpg}} → 2,951 (size of description page)
- https://upload.wikimedia.org/wikipedia/commons/4/41/Silversmith.jpg
- 0
Conversions[edit]
Unfortunately, converting a bitmap file to a vector file may not be an easy task. It also may not be desired.
Technical bitmaps such as a QR code should remain bitmaps. (Do not convert QR code PNG files to JPEG bitmaps.)
Converting a photograph or other continuous-tone image to SVG is usually inappropriate. See w:Image tracing. Good candidates for conversion need to have significant structure. Some continuous-tone images have structured color gradients, so they can be vectorized.
Images with a lot of random details may be inappropriate. it does not take much information to describe a long straight line, but it does take a lot of information to describe 10,000 individual objects. There are times that randomness can be described by a pseudorandom process. (For example, MPEG replaces fricative sounds with a noise generator.)
Here is a progression of changes to a subject image. The details and appearance of an image can be improved and still be an efficient representation of the object. The last image has the detail of the gun powder grains without individually drawing each grain.
-
(SVG) March 2006
179 × 437 (9 KB).
Powder suggested with precisely placed dots. -
(PNG) July 2010
332 × 462 (13 KB).
Primer pocket added. -
(JPEG) July 2012
792 × 648 (64 KB).
Bitmap file showing the case not completely filled, cross section errors fixed, and powder represented with image. -
(SVG) May 2021
512 × 512 (9 KB).
Coloring with gradients, case not completely filled, and powder represented with a random process.
Many technical images can be good candidates for conversion to SVG. See, for example, Category:Cross sections of valves.
Issues[edit]
This section is confused. It should start with straightforward conversions such as diagrams that are easy to redraw.
Next, it can address stepped conversions. A stepped conversion is where a bitmap is still present in the SVG, but parts of the bitmap are replaced with SVG elements. Eventually, the SVG elements may eliminate the bitmap. A "stepped conversion" may include SVG files that will always contain a bitmap image. For example, the bitmap may be a photograph, but SVG may use text elements to label the photograph.
From there, it can address the random process methods. The section should not lead with the most difficult conversions. It can also serve as a counterpoint to not converting the Mona Lisa to SVG.
Alternatives to bitmap conversions[edit]
Instead of converting a bitmap, there may be a better way to achieve the end result.
Instead of converting, redraw the image from scratch using a tool. Chemical diagrams. See Category:Crystal structures of copper(II) sulfate pentahydrate. Also, there may be a better format for some drawings.
-
PNG (754 kB)
-
JPEG (24 kB) (poor angle on SO4)
Straightforward conversions[edit]
-
50 kB
-
12 kB
-
23 kB
-
518 kB
Here is an example of a PNG file that has been converted to SVG.
The files are not widely used, but the SVG format makes it easier to fix some minor issues with the original file. For example, the variable such as VTorpedo can be edited to use the more common italic-variable convention of VTorpedo. The arrows for the torpedo and target velocities look like velocity vectors, but they do not make sense as velocity vectors. The diagram suggests that by the time the torpedo reaches the target's track, the target has already gone by that point.
The SVG also brings up an issue with SVG marker
elements. In the past, I have created a new marker for each fill. There is a technical issue about inheritance of attributes such as stroke
and fill
. A use
instance will not inherit from its environment because it is not part of the DOM tree. In some cases, an instance will use attributes that are set on the use
element because they are part of the (inaccessible) tree.
-
57 kB PNG 2006-07-20 en:User:Ziggle
-
29 kB SVG 2010-09-04 Snubcube
Less simple conversions[edit]
-
Littoral Zones.jpg 35 kB JPEG. US Navy.
-
Map of battle St. Mihiel.JPG 37 kB JPEG. en:User:Redmarkviolinist
File:Littoral Zones.jpg needs a lot of cleaning up.
File:Map of battle St. Mihiel.JPG needs cleaning up. Dashed lines are not good candidates for automatic vectorization. Adding color.
Map locations with given scale: St. Mihiel Saint-Mihiel (Q194932), Frenes Frênes (Q538768), Hattonchatel Hattonchâtel (Q30127896), Vigneulles , Thiacourt Thiaucourt-Regniéville (Q497719), Pont-a-Mousson Pont-à-Mousson (Q461413), Plain of the Woevre Woëvre (Q1476825), River Meuse Meuse (Q41986).
Conversions with gradients[edit]
An image may look complex, but it may just need the appropriate construct.
-
82 kB PNG. CC-SA 2.5 by Ben Stefanowitsch
-
3 kB SVG conversion by TilmannR
Conversions should be good[edit]
Suggested replacements should only be used if they are superior. Replacements may not be supperior.
-
JPEG image (Southern blot)
-
a replacement SVG vector (northern blot)
The original JPEG is simple and clean. The vector replacement has problems. The JPEG uses a single font. The SVG uses several font sizes and uses colors. A yellow font can be lost in a white background, so the yellow font is stroked. The paper towels are flat in the JPEG, but they are wavy in SVG. One purpose of the paper towels is to evenly distribute the weight; wavy towels (especially when the waves line up) do not convey that purpose. The solution is divided in the JPEG but connected in the SVG. What is the distinction between Southern blot and northern blot?
Despite the image having simple vector shapes, the majority of the image is a bitmap.
The SVG vector was derived from File:Capillary blot setup.svg.
The file descriptions are slightly different: the first is about a Southern blot while the second is about a northern blot. The first is for DNA and the second is for RNA, but both procedures use agarose gel electrophoresis.[18]
Electroblotting makes more sense as a blot, but the electro-transfer is vertical. That has issues with applying voltage in the given images.
w:File:Electroblot.gif is public domain, but not yet transferred to Commons. It shows the vertical electrodes. w:Northern blot states, "Strictly speaking, the term 'northern blot' refers specifically to the capillary transfer of RNA from the electrophoresis gel to the blotting membrane."
Multiline text causes trouble[edit]
For translations, try to keep the text on one line. Text that is broken into many lines is troublesome.
-
PNG of glow discharge.
-
SVG of glow discharge.
-
SVG (but Path text SVG problem now)
- Cathode
- Aston Dark Space
- Cathode Glow
- Cathode Dark Space
- Negative Glow
- Faraday Space
- Positive Column
- Anode Glow
- Anode Dark Space
- Anode
The diagram has unconventional leader lines. The diagram has negative shading: the dark spaces are white; some glows are dark.
Stepped conversions[edit]
Here are files that can be converted in steps. The first step would use an underlying bitmap file with overlaid SVG text elements. Later, the bitmap image could be converted to SVG.
-
57 kB
-
170 kB
A stepped conversion with difficult image[edit]
Here's a file that has conversion problems and can be converted in steps.
-
PNG with text
For the first step, the PNG can be edited to remove the text and leader lines. That PNG can be inserted into an SVG file, and the text and leader lines can be redrawn using SVG primitives. Removing the text is usually simple, but removing the leader lines can be tricky. In some cases, the leader lines can be retained. In either case, the leader lines pose a problem with text alignment. The current layout requires the text to fit the space between the margin and the start of the leader line. That strategy works for PNG files, but it has problems with SVG because font metrics may change slightly. A substituted font with slightly different metrics may not fit between the margins and the leader lines. One fix would be to add a background filter to the text; it would overwrite the leader line with white (see filter
below). Alternatively (and probably better) would be to right align the lefthand text and left align the righthand text. Another text fitting problem is the title: it runs from the left margin to the right margin. A slightly wider font would go outside both margins.
<filter id="textflood" filterUnits="objectBoundingBox" primitiveUnits="objectBoundingBox"> <feFlood flood-color="white" flood-opacity="1.0" x="0" y="0" width="1.0" height="1.0" result="back"/> <feMerge> <feMergeNode in="back" /> <feMergeNode in="SourceGraphic" /> </feMerge> </filter>
For the second step, the body image could be redone as SVG. Completely converting the image to SVG is hard because the image has gradient fills; a raster-to-vector conversion application will probably not have a good result. Rendering the intestines looks difficult, too. There are many twists and turns, so shading is difficult. Perhaps a good place for filter
primitives.
A simpler target is the following image.
-
simpler target
Another stepped conversion[edit]
Here is a large JPEG. The text labels are better done in SVG; translation would be possible. The white text on black background is difficult to read. The multiline text descriptions are a tossup. There are technical problems with the cross sectioning: the cutting tangents. The planet details could remain as bitmap images, but they could be done with symbols and gradients. Edges do not darken, but that also seems to be true with images of the Earth. More challenging would be using a random process for the planet's surface. Furthermore, CSS could be used for a printing.
-
(576 kB)
-
compare Blue Marble (6.21 MB)
Significant detail[edit]
-
2.04 MB JPEG
Trees have significant detail[edit]
Here, the JPEG image is much higher quality than the SVG. Both files are about the same size.
-
JPEG 1,279 × 1,605; 489 KB
-
SVG 512 × 685; 552 KB
Even synthetic images can have significant detail[edit]
The JPEG image has more character than the SVG. The files have similar size.
-
JPEG 123 kB
-
SVG 95 kB
Stamps[edit]
Originally, I thought some files were bitmaps, but now it looks like something much stranger happened. The original artist made an SVG with Inkscape and used an appropriate filter, but somehow the SVG file bloated out of control. Why?
See Category:Powered by Wikidata.
Rectangular stamp[edit]
Font family is "Sans", but the SVG text was converted to curves. There are many instances of filters, and those instances include "Rubber Stamp" and "Chalk and Sponge". The defs
section is huge, and it has several huge clipping paths. However, only one clipping path is used. The wiki barcode does not use a clipping path, so it is drawn without special effects.
The SVG files have neutered flowRoot
elements.
-
SVG 2 May 2016 (5.1 MB)
-
PNG 2 May 2016 (25 kB)
This SVG file uses the "Gill Sans" font.
-
SVG 2 May 2016 (5.4 MB)
-
PNG 2 May 2016 (46 kB)
Round stamp[edit]
-
Wikidata stamp SVG (2.2 MB)
-
Wikidata stamp PNG (78 kB)
-
Using random process, but has
librsvg
bugs. (3 kB)
The stamp runs into librsvg
inability to do textPath
.
It is a lot of bytes for a simple image; individual debris has a lot of information.
There is debris even in the unstamped areas.
Some of the debris is black.
Most of the debris is polygonal.
The WIKIDATA rectangle is filled, so not all of the apparent background is transparent; clipping would be appropriate.
Redid as an SVG with a random process filter
.
Vectorization[edit]
How good can automatic vectorization be?
Recovering text[edit]
Files on Commons can be OCR'd (produces JSON with a text
key with lines of OCR/d text):
- English
- Arm Bones.png
- Polish
- Tulejki zaciskowe.svg
- Chinese
- (zh)Illu epithelium.jpg
SVG extensions do not work with the above pattern. Need to use PNG rather than SVG. That can be achieved by supplying a thumbnail_size argument
- File:Tulejki zaciskowe.svg
{{filepath:Tulejki zaciskowe.svg}}
→{{filepath:Tulejki zaciskowe.svg|887}}
→
- Polish
- Tulejki zaciskowe.svg →
{"engine":"google","langs":["pl"],"psm":3,"crop":[],"image_hosts":["upload.wikimedia.org","upload.wikimedia.beta.wmflabs.org"],"text":"Typ \u015bci\u0105gaj\u0105cy\nTyp naciskaj\u0105cy\nTyp obustronny"}
So the Polish text is
- Typ ściągający → pull-back type; pull-to-close
- Typ naciskający → push type; push-to-close
- Typ obustronny → dead-length type
Can I just go to https://ocr.wmcloud.org/api.php directly?
Would some JavaScript run into CORS? Will origin=*
work? The simple return has
access-control-allow-origin: *
so it should work without problems.
See https://ocr.wmcloud.org/ for direct interface and api documentation.
There are tools for identifying fonts.
- Online Font Recognition Tools. Allison Reed. 20 March 2021.
What?[edit]
Tracing...
-
JPEG (20 kB)
-
SVG (181 kB) traced
Translations: internationalization and localization[edit]
Commons supports wikis in many different languages. Ideally, an image would be available in any language, but the reality is many images on Commons are just available in English. Images in a bitmap format such as PNG have painted in the text, so the text is not easy to change.
SVG can support translations.
Sadly, SVG has made some unusual choices. The class
attribute is a space-separated list of tokens, but the systemLanguage
attribute is a comma-separated list of tokens. The commas added confusion (some implementations used space-separated IETF language tags) and complicate pattern matching. Compare CSS [systemLanguage~="en"]
(which wants a space-separated list) and [systemLanguage|="en"]]
(which does not want a list).
Translations are welcome, but they have costs[edit]
There are important diagrams that have many translations.
- File:Bicycle diagram-en.svg This SVG file was a significant example of a file having multiple translated copies. In January 2022, Mrmw made this file multilingual. Two additional translations were added the next day.
- File:Diagram of the human heart (cropped).svg
- File:Standard Model of Elementary Particles.svg
- File:Active Margin.svg One SVG file with several separate-file language versions. en.Wiki 2006
- File:Oceanic-continental convergence Fig21oceancont.gif GIF from 2005. Source is USGS.
The translations do not fit[edit]
(Should follow multiline text.)
Some languages do not need much space, but other languages do. As a guess, I would say Spanish needs 30% more space than English. Chinese is very dense, but the characters are more detailed and should be larger. Diagrams meant for translation need to leave a lot of space.
I've seen some users add translations to similar languages. Instead of adding Catalan to an English SVG, add the language to the Spanish SVG.
- File:Oceanic divisions es.svg Spanish + Catalan
Dimension lines cause problems. They set a particular width for the text. Putting the text above the dimension lines is a simple approach. Another way around the problem is to paint a text background that overlays the dimension line. In many cases the background merges with the background of the diagram, but sometimes it will cover nearby details in the diagram. Painting a background is difficult when the background has a color gradient; it requires a good match. Use a mask. Use a Gaussian blur filter on the background?
Methods ran into Phab:T316962. Sadly, FillPaint
does not work for gradient fills on Commons or Chromium.
- File:Oceanic divisions.svg
- fix dimension line issue.
- better align some labels. Half done
- also fix fonts and marker arrows. Done
- reposition some text for better translation. Done
Keeping diagrams in sync[edit]
We want to make images available in several languages. That has often been done by loading an original image into an editor and changing the labels. During the translation, the artist may improve the image. The improved graphics do not make it back into the original. If there were several other translations of the image, those retain the old graphics.
Here are two diagrams of the same chemical process that have significant changes. In the translation, many vessels are larger and easier to read. The piping is also different. Which one is more accurate?
-
Original file (was 889 kB, now 12 kB)
-
Not only translated but also changed graphics (117 kB)
The German version uses Unicode subscript characters for text such as N₂ H₂. The Spanish version uses separately placed subscripts for text such as N2.
Here's an image where the English version was improved with new numbers (and changed graphics) but the other languages were not.
-
English
-
French
Here are images where the meansurement values have diverged.
-
English
-
French
Here is an image where updating the numbers updates them for several languages (SVG Translate can only update one language at a time).
-
English
-
German
MediaWiki language processing does not support tooltips[edit]
Using switch
elements is not a complete solution.
SVG 2.0 added but then removed language processing for title
elements.
MediaWiki language default is confused[edit]
MediaWiki believes English is the true default language. Relies on librsvg
defaulting to en
.
MediaWiki problems[edit]
I'm seeing some multilingual files that MediaWiki does not offer to show in various languages ("Render this image in ..."). I've come across a couple in the last month, and they are not the 256 kB case. Possibly newer page builder? Also may be one explicit langtag and a default.
- File:Celltypes.svg (99 kB) this has many languages... (later note: no
systemLanguage="en"
) - File:1263 Mediterranean Sea-es.svg should have
es
andeu
translations. (later note: nosystemLanguage="en"
) - File:Minsk Protocol.svg (later note: only has
systemLanguage="eu"
) - File:Eukaryotic DNA replication.svg (later note: only has
systemLanguag="fr"
)
Maybe this is a clue. Go to
- File:2022 Russian invasion of Ukraine.svg or
- https://commons.wikimedia.org/w/index.php?lang=en&title=File%3A2022_Russian_invasion_of_Ukraine.svg
and it will display the language drop down box. Now click on the "(default language)" option and GO. The language dropdown box disappears.
That is the same as going to
- https://commons.wikimedia.org/w/index.php?lang=und&title=File%3A2022_Russian_invasion_of_Ukraine.svg
Alternatively, go to the Klingon version, which does not have a render this image in (language) selector:
- https://commons.wikimedia.org/w/index.php?lang=tlh&title=File%3A2022_Russian_invasion_of_Ukraine.svg
So somebody creates a diagram in English on Commons. Somebody then applies SVG Translate to add a language such as French. SVG translate does not do the triple clause thing; instead it just adds "fr" clauses and keeps the default. The Commons file page will not display the potential languages. SVG Translate should ask for the default language (or use the lang=
or xml:lang=
attributes if available.
MediaWiki language fallbacks[edit]
MediaWiki sometimes tries to fallback to similar languages if the desired language is not available. See mw:Manual:Language#Fallback languages. Bulgarian does not have Russian as a fallback. Ukrainian has Russian as a fallback.
Consider File:Galvanic cell with no cation flow.svg. Currently, it has a Russian (ru) translation but not a Bulgarian (bg) translation; both language use Cyrillic script. The image is used in the Bulgarian wiki for the bg:Анод article. The page displays the SVG default English rather than Russian. Bulgarian does not have Russian as a fallback.
File:Galvanic cell with no cation flow.svg also does not have a Ukrainian translation. It is used in the Ukrainian wiki: uk:Напівреакція, but that image also displays as English.
Is this an argument that MWF should serve multilingual SVGs? It would provide understandable SVGs to users in ways that are not possible with a static map. A Japanese user may not understand English but may understand German.
MediaWiki language identifiers may not be the same as IETF langtags[edit]
MediaWiki language identifiers (which are all lowercase) are usually the same as IETF langtags (which are mixed case), but there are some differences. There is also an effort to conform some language tags.
sr-ec vs sr-Cryl, sr-el vs sr-Latn, als vs gsw.
In the als Wikipedia, SVG inclusions will now use the gsw
IETF langtag if it is available.
See Meta:Special language codes.
MediaWiki language matching[edit]
Phab:T311965 MediaWiki mishandles hyphenated language tags in SVG files.
MediaWiki language default[edit]
MediaWiki defaults the language to en
. It should remove that English bias. If a German editor creates an image that is used on several wikis, then the default display should be the German text.
So default the display to und
.
That is a breaking change. We should scan existing SVG to discover those that have switch
elements without default clauses.
MediaWiki language advertisements[edit]
Say a file is used on langtag.Wiki but does not have a langtag translation. In that case, the SVG file might have a link to SVG Translate. Alternatively, MW might notice that a file is switch
translated but the file does not have the desired language. Then it could insert not only the img
tag but also a link to translate the file. It could use the wiki symbol for translate: File:Translate (CoreUI Icons v1.0.0).svg — except it has a CC-BY 4.0 requirement so putting it in the SVG file would be cumbersome.
Also see File:Epicenter Diagram.svg. Many translations, but also used on many wikis.
The information section of a file page often lists other versions of a file. A file may, for example, have PNG and SVG versions. There may also be different language versions.
A template is often used to keep the other-versions information up-to-date on the different file pages. For example, {{Other versions/War in Ukraine (2022)}} is transcluded on many file pages. The template usually consists of an image gallery that lists each file with a comment.
For translations, the comment often identifies the language. That raises the question of how to identify the language.
- Using English words such as French, Russian, or Japanese. This approach only works for English readers and does little for other languages.
- Using langtag such as fr, ru, or ja. This approach is too cryptic for most users; they would not know what the strings mean.
- Using MW
{{#language: fr}}
to obtain the language in its representation. Anyone visiting the file page would see français, русский, 日本語. That makes it easy for native users to recognize their language, but the most users would have trouble recognizing all the languages. A user seeing lietuvių may not know that means Lithuanian. - Using
{{language|fr}}
to obtain the language in the MW page's language. The English version of the file page would show French, Russian, Japanese. The German version of the file page would show Französisch, Russisch, Japanisch. The translations depend upon the file page'suselang
URL parameter.
Methods 3 and 4 are the better approaches. I previously used method 3, but now I think method 4 is better.
The advent of multilingual SVG files raised an issue of how they should be represented in gallery. Should there be one file that lists all the languages that it supports, or should the file be repeated for each language?
I prefer the former. The gallery is usually so small that the translations do not show up. Painting essentially the same file 15 times on the file page seems wasteful. It is very repetitive for files such as File:Map of USA with state names.svg, a multilingual map with 150 languages.
Clicking on the image or a link. Should it select the render this image in language?
The language gallery templates. More to say. {{Svg lang}}, {{Lang gallery}}, and {{Lgallery}}.
{{Lgallery}} supports Category:Switch-controlled SVG which exploits systemLanguage
that browsers will not understand.
Switch translated files already have a general request to add translations. How about attracting translators for a specific language? Say XYZ.svg is an image used on the abc.Wiki but does not have an abc translation. How can we seek translators for the image?
- Embed a translation request in the SVG image. That would pollute the image.
- Add the image to a (possibly hidden) category of SVG images needing translation to language abc.
For the latter, make a {{Translation request}} template. Add the template to the File: page with the langtag abc
. The template can link to https:svgtranslate.toolforge.org/BASEPAGE
. The template can also add the File: page to the Category:Translation requests abc
.
- Does SVG Translate have URL parameters to specify the source and target languages?
- github repo
A wiki could encourage its users to go to the appropriate category page. Or there could be a translate a random page.
Incnis Mrsi wrote {{Translate SVG}} to mark an SVG file for translation to a particular language. It is not specific to switch
translations. It adds files to Category:SVG images to be translated to XXX.
- includeonly Category: SVG images to be translated to language(langcode) /includeonly
{{Language}} has a second argument of |2=en
to canonize the language in English.
Category:X does not display immediate text, so simply add the category to the template.
How MediaWiki handles images[edit]
When MediaWiki builds a page, it makes HTML img
elements that will display image.
For a JPEG image:
For an SVG image
The image URL pattern is [URL prefix]/H/HH/[filename]/[size]px-[filename][suffix]
When my browser displays the page, it processes the img
elements. The browser will use the src
attribute to make an HTTP request for the image. First, the browser will look in the cache to see if it has a local copy. If that local copy exists and is current, then it will use the local copy. If the local copy exists but is stale, then it will make a network request asking the remote server whether the local copy is still current. If the local copy is still current, then the browser will use the local copy. If there is no local copy or that copy is no longer current, then the image will be transferred over the network.
When the server gets an HTTP image request, it will look in its cache to see if it has that image ready to go. If it does, then it can answer the request from its cache. Otherwise, the server will pick apart the URL, process the request, store the result in its cache, and transfer the result over the network. (Processing the request might say the image is still current, or processing may involve scaling the image.)
Real life is a bit more complicated. WMF's wikis are high-traffic websites, so just one server cannot do all the work. A connection to a wiki will go to one of many servers. That server may ask other WMF computers to do the work. The /H/HH/
pattern in the image URL is from an MD5 hash code. It offers an easy method of load leveling work among up to 256 computers.
MediaWiki source code[edit]
SvgReader has many uglies including the exponential language list. SvgHandler trims the list.
SvgHander::normaliseParamsInternal() is where the lang param must be in the lang list.
MediaWiki file page.
MediaWiki compiling a page.
MediaWiki serving an image. MediaHandler/ImageHandler/SvgHandler
- SvgHandler::rasterize()
Thumbor serving a page.
- Phab:T40010 RFC: Re-evaluate librsvg as SVG renderer on Wikimedia wikis
- Phab:T261192 Rendering multilingual (systemLanguage) SVG files fails locally after upgrading librsvg from 2.40.21 to 2.44.10
Thumbor 7 changed to Python 3, a breaking change:[19]
Release 7.0.0 introduces a major breaking change due to the migration to python 3 and the modernization of our codebase. Please read the release notes for details on how to upgrade.
Gilles retires from WMF... https://mobile.twitter.com/monsieurperf/status/1409444342352187400
Language variants[edit]
It's been a long time, and I need to check these claims. I want to point to the code.
Building a wiki page that contains an SVG file is a little more involved. There are circumstances where MediaWiki will include a language specifier in the image URL:
- ...[filename]/lang[language]-[size]px-[filename].png
The language specifier is included if the wiki text has an explicit |lang=
parameter. That is the user making an explicit request, and that request is honored even if the SVG file does not have that language.
On the English wikipedia, if there is no |lang=
parameter, then no language specifier is emitted. This practice is due to WMF servers defaulting the SVG language to generic English. That practice makes it difficult to ask for the SVG file's default language. To get the default clause, one must ask for a langtag that does not exist in the file (e.g., tlh
Klingon). (Make a table showing the issue.)
On other wikipedias, if there is no |lang=
parameter, there is an attempt to use that wikipedia's default language.
MediaWiki checks the SVG file to see if it has any language dependencies. The check is simple, and the check can be fooled. Currently, it reads the first 256 kB of the file looking for systemLanguage
attributes. As it finds those attributes, it builds a list of languages the SVG file supports.
If the wikipedia's default language is in that list, then MediaWiki emits a URL that requests that language.
There is logic behind these choices. Most SVG files are not multilingual, and even if they are multilingual, they often do not support many languages. The goal is to avoid building language-specific URLs that do not affect the image. If an SVG file does not support Russian, then it does not make sense to scale and cache a Russian version of the SVG that looks exactly the same as the English version.
Languages and fonts[edit]
Much like HTML, itis recommended that an SVG file declare its language. It can do that with xml:lang
or lang
attributes. Setting xml:lang
on the toplevel svg
element will tell generic XML tools the language, but it can also add undesired language restrictions to RDF metadata. That can be worked around by adding xml:lang=""
to the metadata
element. Consequently, using lang
is simpler.
Few SVG files declare their language. In many cases, such a declaration is usually extraneous because basic SVG does not depend upon the language. SVG does not have any constructs that format text according to the language. For example, language does not determine whether a number should display as "123.45" or "123,45" or whether a date should display as "3 May 1999" or "May 3, 1999". It would be good if SVG had that ability, but it is not present yet. At best, the language attribute is a hint to programs that read an SVG file, but it does not affect the basic display of SVG text.
- Compare HTML
datetime
attribute that uniformly describes time information, but it does not format such a string. - Compare JavaScript formatting functions and the Intl package.
CSS can be sensitive to the language attribute.
Unicode does not specify which character variants should be used. (Well, it does sometimes. Normal "r" and rounded "ꝛ". Normal "d" and insular "ꝺ".[20]) For example, the Latin small letter a has two major variants: double-story a or single-story a. Chinese ideographs have similar variations, and some languages use specific variants. Chinese, Japanese, and Korean may draw the same ideograph differently. Unicode does not distinguish the character, so the font selection must make the change.
CSS can select an appropriate font for a language.
:lang(zh) {font-family: ...; }
:lang(ja) {font-family: ...; }
:lang(ko) {font-family: ...; }
:lang(bn) {font-family: Noto Sans Bengali, ...; }
On WMF servers, the problem is the :lang
selector is not supported by librsvg
. Also, we would want to distinguish between zh-Hans
and zh-Hant
. Unfortunately, old versions of librsvg
only distinguish up to the first hyphen.
There is not a good solution for the systemLanguage
attribute. CSS can do case-insensitive, partial, matches to an IETF langtag:
[systemLanguage|="zh" i] {font-family: ...; }
[systemLanguage|="ja" i] {font-family: ...; }
[systemLanguage|="ko" i] {font-family: ...; }
[systemLanguage|="bn" i] {font-family: Noto Sans Bengali, ...; }
But CSS is not designed to parse comma-separated lists (SVG should have made the systemLanguage
attribute a space separated list just like the class
attribute). Even then, CSS does not have prefix matching (=|
) on space-separated token lists (~=
matching). One can use several selectors to cover the cases, but it is cumbersome.
Languages and layout[edit]
Consider an x-y plot. The x-axis label will be horizontal and handled normally. The y-axis label is often written rotated by 90° with a text anchor of start or middle. That works for Western European languages, but Chinese should not rotate the characters but rather write them top to bottom.
The normal method of producing the y-axis label for a Western European language would be to rotate the text by -90°. The rotation point would be logically on the font baseline. For Chinese, the normal method would not be a rotation but to set the writing-mode
to top-to-bottom. The logical baseline is no longer the bottom of the text but rather the center of the text. If the Western text used a start anchor point, then the Chinese text would use an end anchor point.
CSS can do the transform or set the writing mode, but there are subtle issues. Using the CSS transform
property will trump any transform
attribute on the element (CSS priority). Similarly, CSS would not trump a transform
in a style
attribute. Also, such transforms are applied before a text
element's x
and y
attributes. Coordinates on tspan
elements may be problematic.
The green text in the "Vertical Layout tests" to the right uses CSS to adjust a possible y-axis label. It could use a better Western European language default. The CSS is
:lang(zh-Hans) { font-family: NSimSun, sans-serif; }
:lang(zh-Hant) { font-family: PMingLiU-ExtB, MingLiU_HKSCS-ExtB, Microsoft JhengHei, sans-serif; }
.vert { fill: green; }
.vert:lang(en) { transform: rotate(-90deg); transform-origin: 0px 0px; }
.vert:lang(zh) { writing-mode: tb; text-anchor: end; transform: translate(-0.5em, 0em); }
For English, the text is rotated. For Chinese, the writing-mode
is changed; the text is offset to the left to compensate for the different baselines.
Currently, the WMF rasterizer does not handle the example.
Internationalization (i18n) and localization (l10n)[edit]
Many SVG files are in just one language. Such files should set the xml:lang
or lang
attribute in the svg
element to identify that language. (SVG Translate should look for this attribute to set the default language; otherwise it should ask for the existing lanuguage.)
SVG files that use the switch
element and the systemLanguage
attribute are internationalized. One SVG file supports many languages. Such SVG files are also known as multilingual. There are not separate SVG files for each language. It is not clear whether such SVG file should have an xml:lang
or lang
attribute in their svg
element.
There are systems that support many languages but produce output files that are monolingual. The output of these systems localized (specialized to a specific locale).
MediaWiki uses internationalized/multilingual SVG files to produce localized PNG files. The PNG files that librsvg
produces are not multilingual.
That leads to semantic differences. When MediaWiki displays a multilingual SVG file, it displays the language desired by the wiki, but when I display an SVG file in my browser, my browser displays it in my preferred language.
Graphics editor roundtripping[edit]
A significant problem with i18n files is subsequent graphic editing. Importing an SVG file may not handle systemLanguage
or may cause other damage.
- Inkscape can do it, but human editors may get confused. File talk:2022 Russian invasion of Ukraine.svg.
- Adobe illustrator will wreck text alignment: it outputs SVG files as left justified text. See Help:Illustrator and Commons:Graphics village pump.
- CorelDraw is unknown.
- ... are there other significant SVG editors?
A significant limitation of switch
translated SVG is many graphics editing applications do not handle multilingual files. Most graphics applications have their own file format. They may import and export the SVG file format, but information may be lost during those conversions. We might expect grouping to survive, but we do not expect all SVG id
attributes to survive. If the native file format does not handle translations, then we expect translations to be thrown away. If the import code does not handle switch
, then all translations may be discarded.
The issue may be described as the ability to round trip information. Consider an SVG file with some particular information. If that SVG file is imported into an editing application and then exported, is the information still there?
What do we expect to round trip?
- actual text
- text anchor point
- text anchoring method (left, middle end)
What do we not expect to round trip?
id
class
data-*
style
Unfortunately, expectations are not always met. For example, Adobe Illustrator losing the text anchor.
Generally, it is difficult to mark specific text.
Many tools us $nnnn
text.
How about alternatives?
If some text is unique, then use it for the key. The problem is if the same text is added while editing, then the uniqueness is lost.
Use the text as a key, but prefix with a special character (e.g., "$") to mark it as a key.
Industrial option[edit]
Goals
- Want to keep translations
- Want to use any graphics editor
Sanitize SVG[edit]
Currently Commons just checks for safe SVG. For example, Commons rejects an upload if there are any on-*
attributes. An alternative is to just strip unsafe SVG.
There is a Phabricator task for that. Phab:T334953 Introduce an SVG Sanitizer.
Generate monolingual SVG[edit]
XSLT localizer. Transform multilingual SVG to monolingual. It can also strip unneeded namespaces such as inkscape:
(that will not remove properties in style
elements or attributes.
Special option: SVG to skeleton.
XSLT information
- https://www.informit.com/articles/article.aspx?p=24032&seqNum=8
- http://xmlsoft.org/XSLT/xsltproc.html
- https://gitlab.gnome.org/GNOME/libxslt
- https://packages.debian.org/stretch/xsltproc
-stringparam PARAMNAME PARAMVALUE
[21]--novalid
skip loading DTD--xinclude
do XInclude processing on input document (do not enable)--nonet
do not fetch DTDs or entities over the network--output file
Extract translations[edit]
Multilingual SVG to XLIFF.
Reintegrate translation[edit]
Two possibilities:
Skeleton + translations → monolingual SVG (preferred)
Skeleton + translations → multilingual SVG.
MediaWiki language handling[edit]
Explain the lang=
URL parameter on Commons. Does that demand the /lang
in the URL? There are multiple levels here, too. If I'm on a wiki and click an SVG image, it takes me to a File: page on that wiki that displays the wiki's language version. From there, I can click on the Commons link. That takes me to Commons and will display the default language version.
- Phab:T134408 Thumbnail-like rendering of localized SVGs for client-side rendering, 4 May 2016. Early recognition of localizing SVG.
- Phab:T134455 Add experimental option for direct SVG output via srcset, 4 May 2016. Needs a localizer.
- Phab:T134407 Provide a way to reference fonts for client-side SVG rendering, 4 May 2016. CSS would win here.
- Phab:T134482 Beta feature for opt-in client side SVG rendering, 5 May 2016. This seems problematic. Each wiki page would need either some JavaScript to select the SVG or PNG, or there would be an HTTP vary on the user's option that would double the cache requirements.
List of languages[edit]
See Phab:T259018.
MediaWiki API will report the available languages:
- https://commons.wikimedia.org/w/api.php?action=query&titles=File:First%20Ionization%20Energy.svg&prop=imageinfo&iiprop=metadata&formatversion=2&iimetadataversion=latest
- https://commons.wikimedia.org/w/api.php?action=query&titles=File:2022%20Russian%20invasion%20of%20Ukraine.svg&prop=imageinfo&iiprop=metadata&formatversion=2&iimetadataversion=latest
See the metadata: [ {"name": "translations", "value": [] } ]
. It is clearly from the switch
information. It will have entries such as { "name": "en", "value": 2 }
. IIRC, the 1 and 2 values are whether it is a substring match or an exact match. Find the code to be sure.
I'm presuming this metadata is stored in the database rather than triggering a reparsing of the file. Check that out.
Templates[edit]
Translation units[edit]
Does not work for numbered items; see File:Steam locomotive scheme new.png and File:Steam locomotive.svg; steam locomotive (Q171043) (which has-parts)
- Firebox firebox (Q549635)
- Ashpan ash-pan (Q722734)
- Water (inside the boiler) water (Q283)
- Smokebox Smokebox (Q954573)
- Cab (several options, none great. e.g., truck cabin (Q224773))
- Tender tender (Q749311)
- Steam Dome steam dome (Q1158778)
- Safety Valve safety valve (Q730458)
- Regulator Valve pressure regulator (Q1260990)
- Superheater Header in smokebox
- Piston piston (Q45227)
- Blastpipe
- Valve Gear
- Regulator Rod
- Drive Frame
- Rear Pony Truck
- Front Pony Truck
- Bearing bearing (Q190100) and Axlebox
- Leaf Spring leaf spring (Q773544)
- Brake shoe brake shoe (Q124366)
- Air brake pump railway air brake (Q1196198) and pump (Q134574)
- (Front) Centre Coupler railway coupling (Q1501648)
- Whistle steam whistle (Q1765082) subclass of whistle (Q204917)
- Sand box
RDF hack[edit]
- Check RDF
- No metadata
- try another
- Check i18n (the W3C Internationalization Checker does not work for SVG.)
- Check i18n (this page!)
string (en) | Wikidata item (en) | Wikidata item (en) |
---|---|---|
Atomic number (Z) | 'atomic number' (Q23809) wdprop: atomic number atomic number units: |
'atomic number' (Q23809) |
First ionization energy [eV] | 'ionization energy' (Q483769) wdprop: ionization energy ionization energy units: (Latn and Cyrl) |
'ionization energy' (Q483769) |
- problems with
- PAGELANGUAGE: en,
- PAGENAME: Glrx,
- FULLPAGENAME: User:Glrx,
- CURRENTCONTENTLANGUAGE: en,
- {{Uselang}}: (differs from Wikimedia), int:lang: en
- filepath:PAGENAME: https://upload.wikimedia.org/wikipedia/commons/1/1d/First_Ionization_Energy.svg
Wikidata (inconsistent templates, broken templates, and no Wikibase module)
- Quick and dirty translation via Wikidata:
- String string, Zeichenkette
- Q-item Wikidata item, Wikidata-Objekt
Wikidata item, en, ?uselang=
- Q19557: 'alkali metal' (Q19557): 'alkali metal' (Q19557), Template:Wikidata description
- Q19563: alkaline earth metal: [[:d:Q19563|alkaline earth metal (Q19563)]], Template:Wikidata description
- Q19609: noble gases: noble gases (Q19609), Template:Wikidata description
Problem file[edit]
File:2022 Russian invasion of Ukraine.svg is an important map on Commons, but it is a mess. The map is needed in many languages, and how those translations are handled is a difficult issue. There are many localized versions of the map, but they may not get continuing updates to the original file. The conflict is active, so updates are desired.
The file can be improved in several ways, but some improvements may make the file difficult to edit. There are tradeoffs, and this file shows some of the problems.
Planar translations[edit]
The original map now has some multilingual additions, but they are essentially planar translations that SVG Translate and other translation tools cannot handle.
Done Unwinding planar translations is tough. One needs to match the text elements by their position, but the positions may have moved slightly.
Now that the file uses the translation units that SVG Translate wants, several languages have been added. Some additional translations have been so close in time that I feared they would overwrite each other, but it does not appear that happened. SVG Translate may have a significant update model that allows concurrent translations.
Inkscape[edit]
The author of this SVG file uses Inkscape, so SVG Translate and hand edits to the file should not prevent the author from making changes. If the author has trouble, then it is important to list those troubles. It is possible the author with have trouble with class
attributes.
A significant problem is users do not know how to add new date labels and text. Copying some text and then editing often produces confused translations or untranslateable text. If an entire switch
is copied, then the English text is changed but the default text and all the other languages stay the same. The translations are confused. If just the text
element is copied, then it also carries the systemLanguage
attribute. That attribute prevents SVG Translate from translating the text. The best approach is to just insert new text; do not copy it from elsewhere in the image.
There are also strange edits that appear.
A switch
element may contain several unrelated (and ultimately undisplayable) text
elements. This may come about from copying the text elements. The copy somehow ends up within the switch
. It should not display on the screen, so it would confuse the user.
Geometry elements are being inserted into switch
elements that should contain only text
elements. What determines where Inkscape will insert a new element? It should be treating a switch
atomically. I have deleted several spurious geometry elements already, and now there are more:
<switch fill="#ffffff" transform="translate(1827.3,587.38)" id="switch4938">
<rect style="display:inline;opacity:0.948718;fill:#dc0000;fill-opacity:1;stroke:#000000;stroke-width:0.245063;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:1.92453;stroke-opacity:1"
id="rect352325-8-9-2-1-4-3-9-29-8-3"
width="25.413" height="5.2650332"
x="-12.691059" y="-3.4384575" ry="1.2425818"
transform="rotate(0.33424498)"/>
<text id="trsvg995" systemLanguage="fr"><tspan id="trsvg798">1er avril</tspan></text>
<text id="trsvg996-tr" systemLanguage="tr"><tspan id="trsvg799-tr">1 Nisan</tspan></text>
<text id="trsvg996-it" systemLanguage="it"><tspan id="trsvg799-it">1º aprile</tspan></text>
<text id="trsvg996-ru" systemLanguage="ru"><tspan id="trsvg799-ru">1 апреля</tspan></text>
<text id="trsvg996-pt" systemLanguage="pt"><tspan id="trsvg799-pt">1 de Abril</tspan></text>
<text id="trsvg996-el" systemLanguage="el"><tspan id="trsvg799-el">1 Απριλίου</tspan></text>
<text id="trsvg996-ca" systemLanguage="ca"><tspan id="trsvg799-ca">1 d'abril</tspan></text>
<text id="trsvg996-vi" systemLanguage="vi"><tspan id="trsvg799-vi">1 tháng 4</tspan></text>
<text id="trsvg996"><tspan id="trsvg799">1 April</tspan></text>
</switch>
Done The rect
element will prevent any display of text. Also notice that the systemLanguage="en"
clause was removed; it was probably replaced with the rect
element. There is also the sneaky rotate by less than 1 degree transform. Inkscape is also inserting copious style information.
Done Also, instead of editing a symbol definition, the use
was exploded and the result edited in place.
Colors[edit]
Many people want the map colors changed. One concern was using web safe/colorblind-friendly colors. Consistent (and easily changed) colors can be done with styles.
Place names[edit]
The map is already large. There are hundreds of community names on the map. That presents the same translation bloat problem that a 100-language version of a US map presents. The map should use a skeleton file that is localized with a database of translations. WMF does not have that capability for SVG files. SVG also does not have an easy line-breaking method.
Need to work with what we have today. To keep the file size down, the switch
elements are given or inherit styling from class="place"
. That allows the fill color, font family, and text-anchor to specified in one place rather than repeated on each element. The font size is also given or inherited. The font size is a function of the city's population. The text position is also specified on the switch
element so it need not be repeated for each translation.
Finding place names[edit]
Using WikiData to translate place names is complicated by difficult-to-resolve Ukrainian place names. For example, "Pershotravneve" maps to more than 30 WikiData items.[22] To automate the search, the name should be attached to a map point; that practice is not common on SVG maps. The projection parameters can be found by following sources back to File:Ukraine adm location map.svg; the base map claims to be an equirectangular projection that includes administrative regions. The SVG size is 1,546 × 1,038. Then invert that point with the map projection to get a latitude and longitude of the community. Then do the WikiData query that coincides with that position.
Equirectangular projection, vertical stretching 150 % | ||
---|---|---|
52.7 | ||
21.5 | ←↕→ | 40.7 |
44.1 |
Info This map is part of a series of location maps with unified standards: SVG as file format, standardised colours and name scheme. The boundaries on these maps always show the de facto situation and do not imply any endorsement or acceptance. In case of changes of the shown area the file is updated. The old version will be uploaded as a new file and thus is still available.
The file is 2,199 × 1,478 px. Radekhiv is at (350.01 px, 413.3 px) → 50.3° N, 24.56° E. Google Maps says 50.28° N, 24.60° E.[23]. The WikiData item is Radekhiv (Q904046); location is coordinate location (P625): 50°16′58″N 24°38′15″E
The vertical stretching comment of 150% is the same as shrinking the horizontal by 2/3. That gives the standard parallels as = ±48.1897.
Locations use circles; it might be better to use symbols.
<g fill="#ff4" stroke="#777" stroke-width=".71">
<circle cx="950.74" cy="379.56" r="2.49"/>
<circle cx="246.42" cy="424.61" r="2.49"/>
<circle cx="350.01" cy="413.3" r="2.49"/>
<circle cx="340.11" cy="175.71" r="2.49"/>
<circle cx="1252.6" cy="439.46" r="2.49"/>
<circle cx="1283.4" cy="500.98" r="2.49"/>
<circle cx="288.49" cy="210.71" r="2.49"/>
<circle cx="297.69" cy="259.51" r="2.49"/>
<circle cx="307.23" cy="319.26" r="2.49"/>
<circle cx="372.29" cy="378.3" r="2.49"/>
<circle cx="463.15" cy="243.6" r="2.49"/>
<circle cx="527.85" cy="150.26" r="2.49"/>
<circle cx="1596.7" cy="465.5" r="2.49"/>
<circle cx="1671.8" cy="477.12" r="2.49"/>
</g>
<g fill="#ff4" stroke-width=".71">
<g stroke="#777">
<circle cx="1598.5" cy="574.58" r="2.49"/>
<circle cx="1648.1" cy="611.47" r="2.49"/>
<circle cx="1687.9" cy="570.18" r="2.49"/>
<circle cx="1782.4" cy="651.77" r="2.49"/>
<circle cx="1722.5" cy="661.25" r="2.49"/>
<circle cx="1722.1" cy="533.15" r="2.49"/>
<circle cx="1700.9" cy="655.44" r="3.2"/>
<circle cx="1540.9" cy="377.32" r="2.49"/>
</g>
<circle cx="1577.5" cy="330.57" r="2.49" stroke="#787877"/>
<g stroke="#777">
<circle cx="1374.1" cy="333.64" r="2.49"/>
<circle cx="1330.1" cy="312.07" r="2.49"/>
<circle cx="1464.5" cy="261.7" r="2.49"/>
</g>
</g>
Sensible grouping may be done by finding a location circle and then finding nearby text. Alternatively, locate all circles near some text. The grouping also allows translation issues to be detected. For example, the anchor point of some text may need to be moved if a translation is significantly longer or shorter than the original.
<g>
<circle class="city" r="3" />
<text class="city" x="10" y="0">City Name</text>
</g>
The map may use a more sensible grouping of communities within districts.
Would like to detect content that is a date.
The text should use class
attributes and CSS for the formatting. Map (font, font size, color) → class.
Several g
elements are used to default the font size (or other formatting characteristics) of their contained text
elements. Unwinding those groups is a difficult problem. Perhaps detect a group that has presentation attributes and only text
children.
<g font-family="Calibri" font-size="3.27" font-weight="bold" stroke-width=".61">
<text x="923.1" y="241.1">1 April</text>
<text x="1180.34" y="253.12">1 April</text>
<text x="1133.34" y="158.79">2 April</text>
<text x="1372.74" y="238.32">2 April</text>
<text x="1446.72" y="159.12">4 April</text>
<text x="936.76" y="345.97">30 March</text>
<text x="983.05" y="390.31">31 March</text>
<text x="1047.81" y="215.22">31 March</text>
<text x="1180.34" y="204.35">1 April</text>
</g>
Done There are several screwy transform
attributes. For example, transform="scale(1.000,1)"
. Some other transforms have rotations of a fraction of a degree. The effective rotations are small enough that they can be ignored (except for the additional translation they introduce). Some matrices have a similar small rotation. There is even the bizarre:
<circle transform="scale(1 -1)" cx="1291.2" cy="-261.91" r="2.49"/>
<circle cx="1128.4" cy="255.56" r="2.49" fill-rule="evenodd" stroke-linecap="round" stroke-linejoin="round"/>
Done There are text
elements that are not stroked but have stroking attributes.
Done There are switch
elements that have a single default clause. Other editors have used SVG Translate on the file, so the problem has disappeared.
Done Bombing locations should be symbols, but they may not even be grouped. These use relative coordinates, so the path matching may be easier than expected. The DOM dropped path primitives.
<path d="m791.33 600.27 5.164 3.958 1.051-3.741 1.484 3.803 4.113-3.556-2.226 4.607 3.525-.124-2.288 2.876 3.061 1.979-3.649.062 3.216 3.865-4.607-2.134-.495 4.143-2.257-3.34-3.154 4.298.309-5.38-4.298.866 2.69-3.525-2.196-3.587 3.061.588z" fill="red"/>
<path d="m792.98 602.35 3.877 2.972.789-2.809 1.114 2.856 3.088-2.67-1.672 3.459 2.647-.093-1.718 2.159 2.299 1.486-2.74.047 2.415 2.902-3.459-1.602-.372 3.111-1.695-2.507-2.368 3.227.232-4.04-3.227.65 2.02-2.647-1.648-2.693 2.299.441z" fill="#ff8000"/>
<path d="m794.64 604.5 2.49 1.908.507-1.804.716 1.834 1.983-1.715-1.074 2.221 1.7-.06-1.103 1.387 1.476.954-1.759.03 1.551 1.864-2.222-1.029-.239 1.998-1.088-1.61-1.521 2.072.149-2.594-2.073.418 1.297-1.7-1.059-1.729 1.476.283z" fill="#ff0"/>
<path d="M798.43 608.4a.777.777 0 1 1-1.553 0 .777.777 0 1 1 1.553 0z"/>
The last path uses two arcs to make a circle of radius 0.777 and diameter 1.553. It would give the center of the bombing location and the presumptive symbol origin.
It may be better to localize the file with XSLT, fix some issues, and then restart the translations.
Source data[edit]
See w:Template:Russo-Ukrainian War detailed map and w:Module:Russo-Ukrainian War detailed map. These maps are made with technology on the English Wikipedia. The air bases and nuclear installations have names. Cities with latitude and longitude by Oblast. Labels are wikilinks such as Zelenodolsk, and following that link produces the WikiData item Zelenodolsk (Q640713). The diagram has lost of lot of information.
The module has some data in an apparent Lua object. I do not know if the data is available as JSON.
- located in the administrative territorial entity (P131) (allows sorting by Oblast?)
- coordinate location (P625) (allows checking location)
- population (P1082) (allows checking dot size)
How do I find which items have links to a Wikipedia article?
What is the best query approach?
Wikidata API queries[edit]
From a wiki article, find the Q-item? See https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bwbentityusage
returns
{ "batchcomplete": "", "query": { "pages": { "33276544": { "pageid": 33276544, "ns": 0, "title": "Zelenodolsk, Ukraine", "wbentityusage": { "Q10172305": { "aspects": [ "S" ] }, "Q640713": { "aspects": [ "C", "D.en", "O", "S", "T" ] } } } } } }
Can a SPARQL query find which item has a link to the article?
Position-based SPARQL query[edit]
Find a settlement using its latitude and longitude.
For example, Zelenodolsk is at Point(33.652359815 47.563096347)
. Find the settlements near that point:
#title: places in Ukraine near a coordinate
# SELECT ?place ?placeLabel ?location WHERE {
# wd:Q640713 wdt:P625 ?coord. # coordinates of the location
# ?place wdt:P17 wd:Q212; # country: Ukraine
# wdt:P625 ?location.
# FILTER(geof:distance(?location, ?coord) < 10). # less than 10 km away
SELECT DISTINCT ?place ?placeLabel ?oblastLabel ?location ?distance WHERE {
Bind("Point(33.652359815 47.563096347)"^^geo:wktLiteral as ?coord).
?place wdt:P31/wdt:P279* wd:Q12051488 . # populated place in Ukraine
# ?place wdt:P131* ?oblast.
# ?oblast wdt:P31 wd:Q3348196. # located in Ukrainian oblast
# Search by Nearest
SERVICE wikibase:around {
?place wdt:P625 ?location .
bd:serviceParam wikibase:center ?coord .
bd:serviceParam wikibase:radius "10" .
bd:serviceParam wikibase:distance ?distance .
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
Order by ?distance
Inverted SPARQL query[edit]
Alternatively, invert the problem. Get a list of human settlements in Ukraine and use that list to match the names. This query takes less than 4 seconds and returns 30,000 results. It does not find Kyiv because Kyiv is not located in an Oblast — like Washington D.C. is not located within a state. So the Oblast could be optional. Furthermore, not all settlements have a population. If I do not acquire population, then the query takes 30 seconds.
#title: populated places in Ukraine
# -> 30,000 results w/o population, 6000 w population, 1700 w pop >= 1000
SELECT DISTINCT ?place ?placeLabel ?oblastLabel ?location ?population ?native WHERE {
# populated place in Ukraine
?place wdt:P31/wdt:P279* wd:Q12051488 .
# coordinates for that place
?place wdt:P625 ?location .
# try to get the population
optional {
?place wdt:P1082 ?population .
# filter (?population >= 200000) .
}
# try to get the oblast
optional {
?place wdt:P131* ?oblast . # located in an administrative region
?oblast wdt:P31 wd:Q3348196. # that is a Ukrainian oblast
}
# get the native name of the settlement
optional {?place wdt:P1705 ?native .}
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}
Order by ?placeLabel
There are more issues for oblasts. Several settlements are repeated because their oblast changed over time. Consequently, start times and end times for administrative regions are important. Is there an easy way to screen for outdated oblasts?
The name matching does not work well. The map has about 600 place names, but only 249 matches are found. Many settlements do not have a native label (they may have a Ukrainian label). In addition, the English spelling used on the map does not always match the WikiData label. Approximate string matching may help.
Fixes[edit]
I made some fixes to the file, and there were surprises. Several graphic elements had been merged or shuffled, and it takes a lot of work to find even simple cases. It is tedious work by hand. Another problem with working on a frequently updated file: new revisions. A recent revision caught me half-way through doing some housekeeping edits. Now I need to figure out how to merge them. That is further complicated by Inkscape's verbose output: one attribute per line (with the addition of an id
attribute to every element). It is tougher to edit the file by hand. Time to run it through an XML pretty printer. I cannot really complain. Inkscape maintained the file structure and even the XML comments of my most recent upload. More importantly, the recent edit added content.
Another realization is another SVG Translate issue. Most of the file is a planar translation. It has a high-level switch
with the separate planes as g
elements. SVG translate leaves the complicated groups with systemLanguage
attributes alone, but it apparently processes the default clause. That processing includes adding switch
translations to every text
element.
I got caught again. This time, the file was changed with SVG Translate (the legend is now a good target for SVG Translate, but not the rest of the file) while I was working on some changes.
Planning other fixes....
Locations[edit]
Done The circles used to display cities are inconsistent. There are several radii to represent the size of the city, but the Ukrainian cities have a gray stroke while the Russian-held cities do not have a stroked border:
<circle cx="1597.2" cy="395.17" r="2.49" fill="#ff4" stroke="#777" stroke-width=".71"/>
<circle cx="1680.7" cy="408.34" r="2.49" fill="red"/>
Done The stroke width is almost exclusively 0.71, but there are some cases with 1.09, 0.5, and 0.41. Some CSS would be neater and allow quickly adding a border:
circle.uk {fill: #ff4; stroke: #777; stroke-width: .71px; }
circle.ru {fill: red; stroke: none; stroke-width: .71px; }
<circle cx="1597.2" cy="395.17" r="2.49" class="uk"/>
<circle cx="1680.7" cy="408.34" r="2.49" class="ru"/>
Done The Russian fill is usually red , but sometimes it is #fa2c29 .
- #fa2c29
- #ff0000
Just using red seems reasonable.
Done Date label fills use yellow and a darker red:
- #ff0
- #dc0000
Much of the placename text is a blue #04a .
The placename text usually is the same as the placenames used in the English version. Just use the English placenames and then add back the few changes (e.g., French uses Kiev).
The biggest problem with placenames is the dot size and the font size. Those sizes reflect the population, but consistent handling of those items is tough. In addition, some placename text may need different text anchors. Putting a size value in the class
would work to set the font size, but it may not work for SVG 1.1 circle
elements. The r
radius is a geometry property that can be set with CSS in SVG 2.0, but it is just an attribute in SVG 1.1.[24]
The issue of dot size.
Population | Dot Size | Possible r | Label Size | Possible font-size | Contested city size |
---|---|---|---|---|---|
Capital | Size: 35 | 8.71 | label size: 140 | 17.79 | |
Population 1M + | Size: 28 | label size: 130 | |||
Population 500K + | Size: 24 | label size: 120 | |||
Population 200K + | Size: 20 | label size: 110 | |||
Population 100K + | Size: 16 | label size: 100 | |||
Population 50K + | Size: 14 | label size: 90 | |||
Population 20K + | Size: 12 | label size: 80 | |||
Population 10K + | Size: 10 | label size: 70 | |||
Population 5K + | Size: 8 | label size: 60 | |||
Population < 5K | Size: 6 | label size: 0 or 50 |
--Towns & Villages -- Dotsize vs. Population --Arranged by Oblasts, then cities, alphabetical order
Locations
Styling with class[edit]
I would like to use class
attribute and CSS to set styling. I did that in the map legend, but some web searches suggest that it is difficult to use class/CSS formatting in Inkscape. I need to find out more to avoid making the file difficult for others to edit.
Some comments suggested that class
must be set in the XML editor (which might be daunting for many editors and have substantial peril). In addition, changing the class
may not cause Inkscape's visual display to be updated. How does Inkscape handle styling? There were also comments about using Inkscape styling extensions, but extensions are not a good route.
Date text[edit]
Done The date text does vary among versions, but the translations are direct. A wholesale use of the systemLanguage="en"
group followed by editing the dates should work.
Dates are done in Calibri bold. The date text depends on the background. Russian dates are white, Ukrainian dates are black. Unfortunately, librsvg
does not handle class conjunctions:[25]
text.date { font-family: Calibri; font-weight: bold; font-size: 3.27px; text-anchor: middle; }
text.date.ru { fill: #FFF; }
text.date.uk { fill: #000; }
Dates and background rects[edit]
Done A date label was made with a rect
element for the background and a text
element for the date. I changed the rect
elements to use the #labelru
and #labeluk
symbols. I also paired the symbols with their corresponding text, so the SVG now looks like:
<use xlink:href="#labelru" x="961.3" y="378.84" />
<switch fill="#fff" transform="translate(965.88, 382.3)">
<text systemLanguage="en"><tspan>25 February</tspan></text>
<text systemLanguage="fr"><tspan>25 février</tspan></text>
<text systemLanguage="tr"><tspan>25 Şubat</tspan></text>
<text><tspan>25 February</tspan></text>
</switch>
Done The text does not use text-anchor="middle"
, so the "25 Subat" will skew to the left.
Done The text x-coordinate should be shifted to the midpoint of the use
element. That would be .
Ideally, the origin of the symbol and the midpoint of the text would coincide. The #labelru
and #labeluk
symbols can be shifted to use the same origin as the text.
Filters would be a better way to handle the rect. Rather than having a separate use
, the switch
or the text
could use a filter instead. The filter could even adjust to the length of the date. The support for filter
may be troublesome. The best method may use feImage
that points to SVG for a rounded rectangle image, but I doubt there is reasonable support for that construct. Using rectangles would have good support, but it would have sharp corners.
-
rect and filter methods
Ugh. Clear that bounding boxes for text elements are not computed correctly.
Dates and groups[edit]
Placing both elements in a group would allow positioning both. Such a grouping may be confusing to others.
Several labels have the same date and consequently redundant translations. A simplification would be to put each date into a symbol where it would be translated once. It could even be used in both symbols:
<use xlink:href="#labelru" ...>
<use xlink:href="#april_15" fill="#fff"/>
<use xlink:href="#labeluk" ...>
<use xlink:href="#april_15" fill="#000"/>
That change may also be confusing to others.
Dates and automatic translations[edit]
The Intl package can format international dates.[26]
var date = new Date(2022, 2, 15);
new Intl.DateTimeFormat("de", {day: "numeric", month: "long"}).format(date);
There are some issues: "en"
→ "March 15", "en-GB"
→ "15 March". For German, a period is added after the day. Hand translation for French gives "1er avril".[27] For Italian, "1º aprile".[28] The flourishes are only for the first day of the month.
It would be better if the dates were generated automatically rather than manually translated.
I use the Date object to parse the default date (Date.parse(el.textContent + " 2022")
). Then I use the Intl package to compare the dates in the switch
element clauses.
From a data standpoint, a more sensible default clause would use an ISO date format.
<switch>
<text systemLanguage="en">15 March</text>
<text>2022-03-15</text>
</switch>
Unfortunately, that would confuse SVG Translate. Going from English to another language would present "15 March", but going from default would present "2022-03-15".
Copies and strange transformations[edit]
I'm seeing strange changes to the SVG. Notice the y="31.370001
. That suggests the number 31.37 was bumped by a single-float epsilon. Furthermore, transform="rotate(9.267) translate(1485.7, Y)"
was rewritten as transform="rotate(9.267,-957.72641,9148.3009)"
. It is rotating the origin to a desired location!
Gross check (needs work):
1460.7007169041,
273.62571979975
1460.7006854713,
273.62571624018
Is the rewrite done by Inkscape or by SVG Translate?
<use xlink:href="#labelru" transform="rotate(9.267)" x="1473" y="31.370001" id="use4827" width="100%" height="100%"/>
<switch fill="#ffffff" transform="rotate(9.267,-957.72641,9148.3009)" id="switch4839">
<text systemLanguage="en" id="trsvg973"><tspan id="trsvg776">25 February</tspan></text>
<text systemLanguage="fr" id="trsvg974"><tspan id="trsvg777">25 février</tspan></text>
<text id="trsvg975-tr" systemLanguage="tr"><tspan id="trsvg778-tr">25 Şubat</tspan></text>
<text id="trsvg975-it" systemLanguage="it"><tspan id="trsvg778-it">25 febbraio</tspan></text>
<text id="trsvg975-ru" systemLanguage="ru"><tspan id="trsvg778-ru">25 февраля</tspan></text>
<text id="trsvg975-pt" systemLanguage="pt"><tspan id="trsvg778-pt">25 de Fevereiro</tspan></text>
<text id="trsvg975-el" systemLanguage="el"><tspan id="trsvg778-el">25 Φεβρουαρίου</tspan></text>
<text id="trsvg975"><tspan id="trsvg778">25 February</tspan></text>
</switch>
Somebody is going nuts duplicating rotated use
elements.
<use xlink:href="#labelru" x="1937.40" y="544.58002"/>
<use xlink:href="#labelru" x="1937.40" y="544.58002" transform="rotate(0.334,-4875.8035,-16138.871)"/>
<use xlink:href="#labelru" x="1937.40" y="544.58002" transform="rotate(0.668,-5215.2903,-325.84574)"/>
<switch fill="#ffffff" transform="translate(1924.7,548.29)">
<text systemLanguage="en"><tspan>6 March</tspan></text>
<text systemLanguage="fr"><tspan>6 mars</tspan></text>
<text systemLanguage="tr"><tspan>6 Mart</tspan></text>
<text systemLanguage="it"><tspan>6 marzo</tspan></text>
<text systemLanguage="ru"><tspan>6 марта</tspan></text>
<text systemLanguage="pt"><tspan>6 de Março</tspan></text>
<text systemLanguage="el"><tspan>6 Μαρτίου</tspan></text>
<text systemLanguage="ca"><tspan>6 de març</tspan></text>
<text><tspan>6 March</tspan></text>
</switch>
See SVGAnimatedTransformList. The API has a wonderful .consolidate()
method. The API is incomplete. There is not a method to copy a transform or a list of transforms. Instead of using the API to concatenate two transform lists, it was easier to concatenate text strings:
el2.setAttribute("transform", el1.getAttribute("transform") + " " + el2.getAttribute("transform"));
Symbols[edit]
There are more symbols to extract: air bases, harbors, and power plants.
Done Contested city
Done The air base icon
Done The harbor icon
Done The power plant icons have changed from their original form. The Ukrainian version is a solid fill rather than a gradient. The Russian version still has a gradient, but it is not prominent. If I use a solid fill, then they can be a single symbol and the fill can be determined with class="uk"
or class="ru"
.
Hydroelectric plant (not used?)
- File:BSicon STRl blue.svg check out Kaniv hydroelectric
SVG Translate bogus langtags[edit]
Done
Change bogus systemLanguage="zh_HANT"
to systemLanguage="zh-HANT"
. Quick and dirty would select all the systemLanguage
attributes and change underscores to hyphens. Killing bad langtags is good practice, but it will give horrible user interactions in SVG Translate. Users may continually try to translate a phrase that already has a translation. That would mean keeping both the bad langtag (to satisfy SVG Trnaslate) and the good langtag (to satisfy SVG). Then updates to the bad langtag would have to be copied to the good langtag. What a mess.
Done SVG Translate seems to be duplicating clauses on subsequent invocations. Link to Phab issue.
Text element within switches with coordinates[edit]
I fixed a few switch
element bodies that have translated text. Ideally, the switch
element's transform
property
sets the starting text position. The text
element and its first tspan
element should not have x
, y
, or
transform
attributes.
<switch id="switch4565-3-6-9" transform="translate(1354.865,893.27667)" class="place" font-size="5.34px"
style="font-family:'Liberation Sans', Arial, sans-serif;text-anchor:middle;fill:#0044aa">
<text systemLanguage="en" id="trsvg2351-1-1-88" x="11.933496" y="1.381068"><tspan id="tspan17037-0-9-2">Vysokopillia</tspan></text>
<text id="text4206-2-uk-1-5" inkscape:label="text4206-2" systemLanguage="uk" x="12" y="2"><tspan id="tspan16021-7-uk-5-0">Високопілля</tspan></text>
<text id="text4206-2-7-94" inkscape:label="text4206-2" x="12" y="0"><tspan id="tspan16021-7-6-0">Vysokopillia</tspan></text>
</switch>
The problem may be common enough so I should try to detect it.
switch/text[x] | switch/text[y] | switch/text[transform]
switch/text[x or y or transform]
Elements with redundant style
information[edit]
Replace style
attribute with equivalent class
value.
Perhaps hoist information to the parent switch
element.
Overridden attributes[edit]
The fill
attribute below is overridden by the style
attribute. It should be removed (or replaced with the style
value.
<path
d="m 1843.9558,570.10404 c 2.6735,-9.49081 14.9422,-26.82254 30.9812,-32.57107 l -1.8944,-1.36528 8.0776,-0.85102 -3.664,6.36843 -1.0336,-2.24336 c -8.1598,4.31111 -14.7336,27.48597 -14.5545,35.09065 -1.1877,0.10078 -15.9766,-2.62975 -17.9123,-4.42835 z"
fill="url(#bb)"
id="path707-1"
style="fill:url(#linearGradient17463);fill-opacity:1;stroke-width:1.3109"
sodipodi:nodetypes="cccccccc" />
Transform removal[edit]
Some code that simplifies use
elements with a transform
attribute should be generalized. The width="100%"
and height="100%"
attributes may be removed.
<use
xlink:href="#bomb"
x="1569.3"
y="857.94"
id="use819-9"
width="100%"
height="100%"
transform="translate(150.788,415.75052)" />
Simple translations are consolidated, but handling scales are problematic.
Null translations[edit]
If the textContent
of all the switch
clauses is the same...
<switch id="switch329" transform="translate(147.16, 1453.72)">
<text id="text274-zh-hant" systemLanguage="zh-hant"><tspan id="trsvg43-zh-hant">40</tspan></text>
<text id="text274-zh-tw" systemLanguage="zh-tw"><tspan id="trsvg43-zh-tw">40</tspan></text>
<text id="text274-zh-cn" systemLanguage="zh-cn"><tspan id="trsvg43-zh-cn">40</tspan></text>
<text systemLanguage="en" id="trsvg1744"><tspan id="trsvg1120">40</tspan></text>
<text id="text274-fr" systemLanguage="fr"><tspan id="trsvg43-fr">40</tspan></text>
<text id="text274-es" systemLanguage="es"><tspan id="trsvg43-es">40</tspan></text>
<text id="text274-el" systemLanguage="el"><tspan id="trsvg43-el">40</tspan></text>
<text id="text274-uk" systemLanguage="uk"><tspan id="trsvg43-uk">40</tspan></text>
<text id="text274-ka" systemLanguage="ka"><tspan id="trsvg43-ka">40</tspan></text>
<text id="text274-lt" systemLanguage="lt"><tspan id="trsvg43-lt">40</tspan></text>
<text id="text274-ca" systemLanguage="ca"><tspan id="trsvg43-ca">40</tspan></text>
<text id="text274-ko" systemLanguage="ko"><tspan id="trsvg43-ko">40</tspan></text>
<text id="text274-mn" systemLanguage="mn"><tspan id="trsvg43-mn">40</tspan></text>
<text id="text274-nl" systemLanguage="nl"><tspan id="trsvg43-nl">40</tspan></text>
<text id="text274-zh" systemLanguage="zh"><tspan id="trsvg43-zh">40</tspan></text>
<text id="text274"><tspan id="trsvg43">40</tspan></text>
</switch>
The matching algorithm should not just delete clauses that match the default. Language preferences can produce unexpected results.
Mongolian dates[edit]
Apparently, the Intl package gives an unexpected result. When checking the date 5 May, I get the mismatch message.
Bad date (mn): 5 сарын 28 != тавдугаар сарын 28
The transliterated result is "fifth month 28" rather than "5 month 28".
new Intl.DateTimeFormat('mn', { month: 'long', day: 'numeric' }).format(date) → "тавдугаар сарын 28"
new Intl.DateTimeFormat('mn', { month: 'short', day: 'numeric' }).format(date) → "5-р сарын 28"
new Intl.DateTimeFormat('mn', { month: 'narrow', day: 'numeric' }).format(date) → "Vын 28"
new Intl.DateTimeFormat('mn', { month: 'numeric', day: 'numeric' }).format(date) → "V/28"
A better understanding of casual group formatting[edit]
The file uses casual groups to impose a common style on items that do not make their own sensible group. For example, a small set of Ukrainian villages may be grouped to impose a Ukrainian fill
color on the group. The group of villages does not have a good reason to exist as a group. If the Russians gained control of one of the villages, then the group would need to be pierced to change its rendering.
Similarly, several text elements may be grouped to impose a common font selection, size, or color. That is presentation style rather than semantics, and it should be done with CSS.
Many of the formatting groups have been removed from cities, places, and dates. That puts the city circles at toplevel. The places are one-level down inside a group of places. The dates are one-level down inside a group of dates.
There are elements that should be grouped, and they should be grouped to the point of making a symbol. For example, contested cities are represented as a checkerboard. The checkerboard is four grouped path elements followed by an outside-the-group rectangle. All of those elements are semantically related, and they should be a symbol in the defs
section. These groups have been converted to use a g
or symbol
inside the defs
element. That step uncovered some issues. The origin for an SVG 1.1 symbol
is always the upper-left corner. Furthermore, Inkscape has trouble cloning something that is inside the defs
element.
Some groupings would make sense. The circles used for cities should be grouped with the names of the city. The symbols used for nuclear power plants should be grouped with the name of the power plant. The arrows showing the troop movements should be grouped with their dates. The date label boxes should be grouped with the date text.
SVG does not have the notion of associated labels. Consider a flag note that contains a date. The file does that by drawing a rectangle and then overlaying that rectangle with text. That takes two elements. Logically, the elements should be grouped so they move together. To change the text, one must penetrate the grouping.
Given the restrictions on the symbol origin, a possibility for dates would be
<g transform="translate(...)">
<use xlink:href="#labelru" x="0" y="0" />
<switch class="date" fill="fff">
<text systemLanguage="en">March 15</text>
<text>March 15</text>
</switch>
</g>
So all the group's children are at the origin. A symbol would need a negative offset. Limitations on the WMF renderer prevents using CSS to set the text
color.
An alternative is to use a filter
on the text. The filter would automatically size for the text, so it could be better than a fixed label size. Inkscape may start making copies....
Does Inkscape's notion of layers allow easy editing? That is, does it avoid the cumbersome ungrouping and regrouping of an ordinary g
element?
What does it take to make an Inkscape layer? Presumably toplevel g
elements with two inkscape:
attributes and an id
of the form layern
(just like Inkscape identifies all its elements).[29]
<g
inkscape:label="Layer 1"
inkscape:groupmode="layer"
id="layer1" />
To use layers, must all graphics elements be in toplevel layers? If not, what happens to graphics elements that are outside of the layers?
Furthermore, removing the two attributes or the id
may not be wise.
Layers and objects can be locked with sodipodi:insensitive="true"
; it is the presence of the attribute and not its value that matters. See https://wiki.inkscape.org/wiki/Inkscape-specific_XML_attributes and example at https://gist.github.com/hedefalk/5b428772f7deefc906a194f297371e9e . The latter file suggests the group id
does not need a layer name.
See also https://wiki.inkscape.org/wiki/index.php/Inkscape_SVG_vs._plain_SVG .
Inkscape has the notion of symbols and clones, but I'm not sure that it expects to clone objects in the defs
section. What sort of access does Inkscape give to the defs
section?
Layers would do a better job of enforcing painting order. While the order was consistent, it is now confused. Some arrows are drawn before place names are rendered, and some arrows are drawn after.
Languages: LTR and RTL[edit]
The Hebrew and Arabic versions of this file swap the graphics on the map legend. There is also a cosmology illustration that does something similar. What is a good way to handle that problem? Two separate map legends?
Metadata for the file[edit]
Consider adding some metadata to the file. There was a recent comment that Inkscape prevented a user from editing someone else's SVG. Could that be a result of CC-BY-ND license? Or describing the license with a similar requirement? How could SVG Translate figure that out? How would it be handled in a production environment?
A CC license should have a URL link to the license.
A CC-BY license should have the attribution names or an attribution URL.
In a derivative work, it is not enough to just give attribution to the creator of the derivative work. The license attribution requirements do not disappear for a derivative work. Many Commons files state incomplete attributions.
There is a question about crediting the source maps. It may be that a link to their pages on Commons is enough. Should the metadata include the license information for the images that the derivative work uses?
<metadata>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/terms/"
xmlns:cc="http://creativecommons.org/ns#" >
<cc:Work rdf:about="">
<dc:creator rdf:resource="https://commons.wikimedia.org/wiki/User_talk:Viewsridge"/>
<dc:source>
<rdf:Bag>
<rdf:li rdf:resource="https://commons.wikimedia.org/wiki/File:Russo-Ukraine_Conflict_(2014-2021).svg"/>
<rdf:li rdf:resource="https://commons.wikimedia.org/wiki/File:Ukraine_adm_location_map.svg"/>
</rdf:Bag>
</dc:source>
<dc:publisher rdf:resource="http://commons.wikimedia.org"/>
<cc:license rdf:resource="https://creativecommons.org/licenses/by-sa/4.0/deed.en"/>
<cc:attributionName>
<rdf:Seq>
<rdf:li rdf:resource="https://commons.wikimedia.org/wiki/User_talk:Viewsridge"/>
<rdf:li rdf:resource="https://commons.wikimedia.org/wiki/User:Rr016"/>
<rdf:li rdf:resource="https://commons.wikimedia.org/wiki/User:NordNordWest"/>
</rdf:Seq>
</cc:attributionName>
<cc:attributionUrl rdf:resource="https://commons.wikimedia.org/wiki/File:2022_Russian_invasion_of_Ukraine.svg" />
</cc:Work>
</rdf:RDF>
</metadata>
SVG Translate[edit]
SVG Translate is an application that helps users translate SVG files with text
elements into other languages.
History[edit]
Discuss history, Summer of Code, ... XXX, Jarry1250,[30] 2017 Wishlist,[31] WMF Community Tech.[32]
What it does[edit]
Diagrams need to be simple for it to work well. Single line text is best. Leave plenty of room because some languages use more text than others.
See File:BirdBeaksA.svg and {{Other versions/BirdBeaksA}}. Several language versions that just vary the text labels.
Syntactically, the text to translate must be a text
element with 0 or more tspan
elements. The tspan
elements may not have any children. The tool expects only text; it does not expect switch
elements containing group (g
) elements or other graphics elements.
What it does not do[edit]
It does not handle complex text. It expects text to be lines of simple, unadorned, text. Text that tries to emphasize some words with bold or italic styles are not handled. Similarly, it does not handle changing text colors or fonts. It does not handle subscripts or superscripts.
It does not handle adjusting the position of the text or the anchors. That is more of a graphics task than a translation task. SVG Translate is a text rather than a graphics application. In commercial translation settings, translators are not responsible for text positioning.
Vertical text. There are different conventions. In the USA, a book title on the book's spine is rotated +90°. In Europe, the title on the spine is rotated -90°. In China, text is written vertically without rotating the characters; generally, English readers find such text difficult. Bizzare Chinese ambulance.
Numbers, quantities, currency, and dates. SVG should have better support. Javascript has the Intl
object, but Commons prohibits scripts. Different cultures use different punctuation and formatting for numbers: "1,000.00" (US) versus "1.000,00" (German). Quantities are numbers with units; the units may be spelled out (meters) or abbreviated with a symbol (m). Currency is even more problematic: should $10 stay as $10 or should it be converted to some other national currency (Marks, Pounds, or Euros)? Dates are often represented in a few ways. Translation from English "May 12, 1944" to German "12 Mai 1944". HTML has <time datetime="1944-05-12">May 12, 1944</time>
, but it does not imply processing; it is just a machine-readable time for the text content.
Hyphenation and word breaking. The semantics of ­
is confused: hyphenation.
Exotic CSS can fix some issues, but there must be support.
SVG Translate usage[edit]
JoKalliauer question about SVG Translate usage: talk page
Translate me![edit]
Consider mechanism to solicit translations of popular or significant images.
Say an image were served as SVG. The image could have
<a href="http://svgtranslate.wmftools.org/File:image.svg"> <text>{Translate icon}</text> </a>
The translate icon could disappear if the file has the desired translations.
<a href="http://svgtranslate.wmftools.org/File:image.svg"> <switch> <text systemLanguage="en"></text> <text>{Translate icon}</text> </switch> </a>
Maybe there is a game to play with the systemLanguage
attribute.
<a href="http://svgtranslate.wmftools.org/File:image.svg"> <text><tspan systemLanguage="es">{translation requested}</tspan><tspan systemLanguage="ru">{translation requested}</tspan></text> </a>
Good translation targets[edit]
There are English SVG files that are already included on many Wikipedias. Running SVG Translate on these files would make the files more accessible.
- File:Eukaryote DNA-en.svg
- File:Endomembrane system diagram en.svg many separate translations; one invocation of SVG Translate.
- File:Larynx external en.svg (has Wikdata template)
- File:Active Margin.svg Well used file.
SVG Translate simple text requirement[edit]
SVG Translate expects lines of simple text. Font shifts, subscripts, superscripts (i.e., nested tspan
elements) are not allowed. Many images have such text.
-
Optical glass diagram
- Refractive index nd (λ = 587.6 nm)
- Abbe number V
Wikidata for
- refractive index (Q174102) (Brechungsindex (Q174102), indice de réfraction (Q174102), показатель преломления (Q174102), 折射率(Q174102))
- quantity symbol (string) (P416)
- subscript for the wavelength (e.g., d is sodium line)
- so there could be qualified symbols
- nanometre (Q178674) (much more complicated than expected)
- and unit symbol (P5061) by language (rather than script)
- Abbe number (Q306259) (Abbe-Zahl (Q306259), nombre d'Abbe (Q306259), Число Аббе (Q306259), 阿贝数(Q306259))
- quantity symbol (LaTeX) (P7973)
- Why is LaTeX different? Unicode has some subscripts but does not have a general subscript; LaTeX does: (e.g.,
N_d
)
- optical glass (Q13326) (Wikidata is weak on coverage; just two subsclasses)
I cannot create this diagram with a Wikidata query that finds instances of optical glass (such as those in the Schott catalog).
SVG Translate Issues[edit]
- https://github.com/wikimedia/svgtranslate
- https://github.com/wikimedia/svgtranslate/blob/master/src/Model/Svg/SvgFile.php
- https://github.com/wikimedia/svgtranslate/pull/643
- https://phabricator.wikimedia.org/tag/svg_translate_tool/
clean code.
Too many unneeded identifiers[edit]
The resulting file has lots of identifiers.
Suppressing translation[edit]
SVG Translate does not obey translate="no"
.
The notion of blocking translation is deeper than just yes or now. For example, most European languages use the same representation for numbers. Labels that are just numbers probably do not need to be translated: "100", "200", and "300". Even the strings "100 km", "200 km", and "300 km" do not need translations for Latin languages. However, the units should be translated when the script changes from Latin to Cyrillic or something else: "km" versus "км" versus "粁" (unit symbol (P5061) for kilometre (Q828224)). Moreover, some languages use different conventions for numbers. Decimal points, thousands separators, and even the groupings may vary. Some languages also do not use the same characters for numbers: English versus Arabic. Currency is also an issue. Should dollars be translated into Euros or yen?
Dates are also problematic. Dates should be translated, but why not automatically? Just give a format specification. See also time
element and datetime
attribute. "The US celebrates (does not display a tooltip without an explicit title
attribute; see usage at en:Amelia Earhart).
Even if some text is marked translate="no"
, an editing tool may want to change the text. Consider a physical quantity such as the mass of neutrino; a new measurement may be available.
The default clause problem[edit]
The tool does not work well with language preferences. Adding a German translation to
<text>Hello, world</text>
produces something similar to
<switch> <text systemLanguage="de">Hallo Welt.</text> <text>Hello, world.</text> </text>
That works OK with WMF tools, but can be bizarre when displayed in a browser. For example, one can set a browser to prefer English but accept German. In that circumstance, the English language text should be displayed, but the SVG above will display German. The SVG agent does not know the default is English. The translation should be
<switch> <text systemLanguage="en">Hello, World.</text> <text systemLanguage="de">Hallo Welt.</text> <text>Hello, world.</text> </text>
The fix is to copy the default clause, add systemLanguage="en"
, and then insert that element into the switch
element. The insertNodeBefore(newNode, referenceNode)
. The SVG DOM method .cloneNode(deep)
copies the id
attribute, so the identifiers must be changed or removed.
Autocomplete[edit]
SVG Translate input boxes get filled in with previous inputs even if the file changed. Turn off autocomplete?
SVG Translate should handle textPath
[edit]
The Gibraltar map has a textPath
element.
Treat textPath
as equivalent to tspan
.
IIRC, mentioned in some Phabricator issue.
Possible add another test to reject textPath
similar to this test:
if (0 !== $this->document->getElementsByTagName('tref')->length) {
// Tref tags not (yet) supported
$this->logFileProblem('File {file} has <tref> tags');
return false;
}
SVG Translate descends too deeply[edit]
SVG Translate does not recognize planar translations. If it sees a switch
element with systemLanguage
clauses, then it should not process that subtree.
<switch>
<g systemLanguage="tlh">
<text>Klingon text cannot be translated to any other language</text>
</g>
<g>
<text>Default text can be translated, but it will be ugly</text>
</g>
</switch>
SVG Translate does not walk the tree, so it does not notice the problem. Instead, it fetches elements no matter their location in the with operations such as
$texts = $this->document->getElementsByTagName('text');
If the elements have an ancestor with a systemLanguage
attribute, then it should not process the element; the element's ability to be translated has been foreclosed already.
Where did I write about this before?
A file with planar translations:
Does SVG Translate multiply text? No.[edit]
To simply matters with language matching, SVG Translate explodes a systemLanguage
attribute with a language list.
A single element with three langtags is morphed into three elements with a single langtag.
SVG allows elements with systemLanguage
attributes anywhere; they need not be inside a switch.
Check that the combination of the two operations is safe or at least harmless. Maybe it is always harmless? If text is wrapped in a switch
before being copied, then it should be harmless?
The fear is isolated text with systemLanguage="en, en-US, en-GB"
gets duplicated three times.
Not a problem. The explosion is at line 370, and it is done while processing the children of a switch
.
SVG Translate switch
processing[edit]
SVG permits more than just text
elements within a switch
.
In particular, title
, desc
, metadata
, and comments should not confuse processing.
Less likely but still permitted elements are metadata
and animation elements.
Does the sorting of hyphenated langtags come into play? The title
element should be first for many SVG agents.
Run SVG Translate:
Looks like SVG Translate works on line 2.
What is going on? The method makeTranslationReady()
(line 136)
- does a bunch of stuff not relevant here
- loops through all the
switch
tags at line 329. - looks at the childNodes of the switch (it could just look at elements)
- returns false if the childnode is a non-empty
#text
node. - returns false if the childnode is not an element. So a PI or comment should return false.
- returns false if the childnode is not a
text
element. - examines
systemLanguage
- explodes if
systemLanguage
is multivalued.
I understood that makeTranslationReady()
returning false nixes the translation of the whole file.
Maybe not.
Look at _construct()
. The last line is the call to makeTranslationReady()
, but it does not check the result.
Looks like somebody does not check isTranslationReady()
.
I should split the test file into three files. KISS.
Check switch
processing[edit]
IETF lowercase at 293, but only to text
elements.
At line 324, all the switch
elements are grabbed. (No namespace check.)
Then all child nodes are examined at line 329.
(DOMElement https://www.php.net/manual/en/class.domelement.php does not have $children
, so code must use $childNodes
.)
Non-empty text nodes cause a problem at line 337; return false.
Non-element nodes cause a problem at line 344; return false. Does this mean a comment node returns false?
If the child is not a text
element, it fails at line 349. That should kill title
, desc
, metadata
, and animation elements, but I thought they were getting through....
Language split is at 356. It only happens inside a switch
. That should answer the isolated systemLanguage
element question.
An explosion will increase the number of text
elements in the live list, so some will be missed.
Identifiers take up a lot of text[edit]
Here is a clause from a switch
translation. Compare it with and without identifiers:
<text id="trsvg1077-9-1-8-zh-tw" systemLanguage="zh-tw"><tspan id="trsvg880-0-6-4-zh-tw">5/10</tspan></text>
<text systemLanguage="zh-tw"><tspan>5/10</tspan></text>
DOM: tags and namespace[edit]
The DOM distinguishes
nodeName
localName
namespaceURI
A nodeName
may or may not have a prefix. For example, text
or svg:text
.
The tag name has some interesting capitalization issues for HTML, but that is not relevant for SVG. In the case of just HTML, there is only one namespace, so the name ambiguity is not as important.
The specification says that getElementsbyTagName()
searches for a qualified name. https://developer.mozilla.org/en-US/docs/Web/API/Element/tagName says that the tag name for an element is the nodeName
. So what does a DOM implementation return? In an SVG file, if I ask for "text"
, do I get all the nodes in the default namespace? That is, those names do not need prefixes. Do I get the same nodes if I ask for "svg:text"
? Do some implementations simply provide all the elements with the localName
and ignore the namespace?
In particular, I think qualified names can vary depending on the context. It is possible to set the default namespace for a subtree.
The method getElementsByTagNameNS()
is far clearer. It looks for elements with the localName
and in the namespaceURI
. An implementation is not going to mess that up. It also does not depend on which prefixes are active.
I expected this to fail. It may work even if it fails.
I added a rdf:text
element. SVG Translate wrapped the .textContent
with a tspan
element in the SVG namespace, but it did not offer the text to be translated (the elements never got id
attributes.
Looking at the code, the rdf:text
element was found at line 201:
// Strip empty tspans, texts, fill $idsInUse
$idsInUse = [ 0 ];
$translatableNodes = [];
$tspans = $this->document->getElementsByTagName('tspan');
$texts = $this->document->getElementsByTagName('text');
So the PHP DOM implementation looks for the localName
.
The source for PHP
cites to the spec
which only claims the search is by "tag name" but does not define that term. The namespace version is explicit about using localName
. So using the non-namespace version for an XML document is not a good idea.
It also becomes clear that the identifier assignment believes that trsvg[0-9]+
identifiers will only appear on text
and tspan
elements. Also, the code does not need to keep the entire set of attributes around.
Is the svg
prefix required?[edit]
Phab:T316741 (Allow svg namespace prefixes other than 'svg') was opened.
I see a lot of tests that are something like ('svg:element' === $el->nodeName)
.
Does that mean the code depends on files using the svg:
prefix?
Does using xmlns:foo="http://www.w3.org/2000/svg"
and <foo:text>
confuse SVG Translate?
Is the problem with the PHP DOM or with the code?
Should tests such as
if ('tspan' === $node->nodeName || 'svg:tspan' === $node->nodeName) {
continue;
}
be written
if ('tspan' === $node->localName && 'http://www.w3.org/2000/svg' === $node->namespaceURI) {
continue;
}
If a file does not use the SVG namespace, then SVG Translate may refuse to translate it.
Programming idioms from when the PHP DOM did not do namespaces?
Reorder text
elements[edit]
Sorting the langtags so the hyphenated langtags come first.
Is it needed?
Does it compromise SVG 1.1 semantics? If I ask for en-GB
, then it should not match the en
clause.
If I promote en-RAREDIALECT
to first position in the clauses, then asking for generic en
gives me the rare dialect.
The only good way around this is allowReorder
.
With the underscore, it would work around the old librsvg
langtag matching bug.
/**
* Reorder text elements within the document so that all sublocales (i.e. systemLanguage values
* containing a hyphen or underscore, e.g. de_CH) are moved to the beginning of the switch
* element, and all fallback elements are moved to the end.
*/
protected function reorderTexts(): void
- Bulleted list item
The underscore issue[edit]
SVG Translate uses parochial systemLanguage
identifiers instead of IETF langtags. This bug arises as a workaround to langtag processing bugs in librsvg
and langtag passing methods in MediaWiki. First, the C-language version of librsvg
that WMF uses only matches langtags to the first hyphen. It treats zh-Hans
and zh-Hant
as equivalent. Second, MediaWiki passes langtags in the LANG
environment variable; that variable expects a locale string rather than an IETF langtag. Third, librsvg
uses the LANG
environment variable as a langtag. SVG Translate exploits those problems by using zh_HANS
rather than the correct zh-Hant
IETF langtag. It makes the display work on (broken) WMF servers, but the SVG files do not work on other SVG user agents.
The underscore issue is even more twisted. Multiple SVG Translate invocations are adding multiple identical clauses with non-unique identifiers:
<switch id="switch2174" transform="translate(1853.7,532.54)" class="place" font-size="5.34px">
<text id="text3001-zh-tw" systemLanguage="zh_TW"><tspan id="trsvg142-zh-tw">基夫沙里夫卡</tspan></text>
<text id="text3001-zh-hant" systemLanguage="zh_HANT"><tspan id="trsvg142-zh-hant">基夫沙里夫卡</tspan></text>
<text id="text3001-zh-tw" systemLanguage="zh_TW"><tspan id="trsvg142-zh-tw">基夫沙里夫卡</tspan></text>
<text id="text3001-zh-hant" systemLanguage="zh_HANT"><tspan id="trsvg142-zh-hant">基夫沙里夫卡</tspan></text>
<text id="text6217" systemLanguage="zh_TW"><tspan id="tspan6215">基夫沙里夫卡</tspan></text>
<text id="text6221" systemLanguage="zh_HANT"><tspan id="tspan6219">基夫沙里夫卡</tspan></text>
<text id="text6225" systemLanguage="zh_TW"><tspan id="tspan6223">基夫沙里夫卡</tspan></text>
<text id="text6229" systemLanguage="zh_HANT"><tspan id="tspan6227">基夫沙里夫卡</tspan></text>
<text id="text6233" systemLanguage="zh_TW"><tspan id="tspan6231">基夫沙里夫卡</tspan></text>
<text id="text6237" systemLanguage="zh_HANT"><tspan id="tspan6235">基夫沙里夫卡</tspan></text>
<text id="text6241" systemLanguage="zh_TW"><tspan id="tspan6239">基夫沙里夫卡</tspan></text>
<text id="text6245" systemLanguage="zh_HANT"><tspan id="tspan6243">基夫沙里夫卡</tspan></text>
<text id="text6249" systemLanguage="zh_HANT"><tspan id="tspan6247">基夫沙里夫卡</tspan></text>
<text systemLanguage="en" id="trsvg1842"><tspan id="trsvg1218">Kivsharivka</tspan></text>
<text id="text3001-it" systemLanguage="it"><tspan id="trsvg142-it">Kovšarovka</tspan></text>
<text id="text3001-fr" systemLanguage="fr"><tspan id="trsvg142-fr">Kivcharivka</tspan></text>
<text id="text3001-el" systemLanguage="el"><tspan id="trsvg142-el">Κιβσαρίφσκα</tspan></text>
<text id="text3001-ru" systemLanguage="ru"><tspan id="trsvg142-ru">Ковшаровка</tspan></text>
<text id="text3001-uk" systemLanguage="uk"><tspan id="trsvg142-uk">Ківшарівка</tspan></text>
<text id="text3001-ka" systemLanguage="ka"><tspan id="trsvg142-ka">კივშარივკა</tspan></text>
<text id="text3001-lt" systemLanguage="lt"><tspan id="trsvg142-lt">Kivšarivka</tspan></text>
<text id="text3001-ca" systemLanguage="ca"><tspan id="trsvg142-ca">Kivxàrivka</tspan></text>
<text id="text3001"><tspan id="trsvg142">Kivsharivka</tspan></text>
</switch>
The relevant SVG Translate code:
OK, it looks like a problem with inadvertently distinguishing equivalent langtags. The code mistakenly distinguishes zh_hant
, zh_Hant
, and zh_HANT
.
I believe $language
will be lowercase because
$langCode = str_replace('_', '-', strtolower($lang));
So the code reasonable canonizes all langtags to lower case and converts non-standard underscore langtags to hypen langtags.
Consequently, the code below will work for all-lowercase langtags (such as the usual en
or de
) that are present in the SVG file, but it will never match langtags with an uppercase character (such as zh-Hant
) because they have a capital letter. Furthermore, it will never match the converted, non-standard, langtags (such as zh_HANT
) even with a case-insensitive match because the underscore was changed to a hyphen. Also notice that if two or more text
elements match, then nothing will be changed, and no error will be logged.
// Put text tag into document
$path = 'fallback' === $language ?
"svg:text[not(@systemLanguage)]|text[not(@systemLanguage)]" :
"svg:text[@systemLanguage='$language']|text[@systemLanguage='$language']";
$existing = $this->xpath->query($path, $switch);
if (1 == $existing->length) {
// Only one matching text node, replace if different
if ($this->nodeToArray($newTextTag) === $this->nodeToArray($existing->item(0))) {
continue;
}
$switch->replaceChild($newTextTag, $existing->item(0));
} elseif (0 == $existing->length) {
// No matching text node for this language, so we'll create one
$switch->appendChild($newTextTag);
}
OK, tried a file with systemLanguage="FR"
, added a German translation, and the French clause was duplicated. SVG Translate produced:
<switch transform="translate(20, 60)">
<title>This title should display.</title>
<desc>Test that title, desc, and metadata process correctly.</desc>
<metadata/>
<text systemLanguage="tlh" id="trsvg13"><tspan id="trsvg2">Klingon</tspan></text>
<text systemLanguage="FR" id="trsvg14"><tspan id="trsvg3">French</tspan></text>
<text systemLanguage="en" id="trsvg15"><tspan id="trsvg4">English</tspan></text>
<text id="trsvg14" systemLanguage="fr"><tspan id="trsvg3">French</tspan></text>
<text id="trsvg16-de" systemLanguage="de"><tspan id="trsvg5-de">German 2</tspan></text>
<text id="trsvg16"><tspan id="trsvg5">Default</tspan></text>
</switch>
The basic issue above is the case-sensitive comparison in text[@systemLanguage='$language']
. XPath 1.0 does not have a case-sensitive comparison. A possible fix is to modify the XPath filter to fix both the lowercase and the underscore issues:
text[translate(@systemLanguage, "ABCDEFGHIJKLMNOPQRSTUVWXYZ_", "abcdefghijklmnopqrstuvwxyz-")='$language']
There are still some questions about the code. How does systemLanguage="zh_HANT"
enter? Are translation selections edited in the UI? And how does the file's systemLanguage="zh-Hant"
turn into systemLanguage="zh_HANT"
?
Looks like this routine will change zh-hant
into zh_HANT
:
/**
* @param string $langCode
* @return string
*/
private static function langCodeToOs(string $langCode): string
{
if (false === strpos($langCode, '-')) {
// No territory specified, so no change to make (fr => fr)
return $langCode;
}
[ $prefix, $suffix ] = explode('-', $langCode, 2);
return $prefix.'_'.strtoupper($suffix);
}
The number of tspan
elements must match[edit]
Here's a Phab:T216283#5106699 discussing mismatching number of tspan
elements.
Somewhere there was a more specific problem where SVG Translate drops the tspan
. Searching for it, but have not found it.
Look in the code to find this requirement.
IIRC, some code deletes empty text
and tspan
.
In addition, I think there was a Phabricator item where an empty tspan
was removed, and that cause subsequent translation problems.
The trouble may be at line 254. All the translatable nodes (text
and tspan
elements) are scanned. If the element has no child nodes (which would be true for an empty tspan
), then that element would be removed. Consider a two-line translation where some lines may be empty. This code will remove the empty line. When a translation on that switch
is attempted, the number of tspan
elements will not match.
if (!$translatableNode->hasChildNodes()) {
// Empty tag, will just confuse translators if we leave it in
$translatableNode->parentNode->removeChild($translatableNode);
}
The $ issue[edit]
SVG Translate is supposed to refuse to translate text that has a dollar sign followed by a number (e.g., "$17").
I suspect the code has a bug because it does not escape the $
in the preg_match
on line 288.
// Text strings like $1, $2 will cause problems later because
// self::replaceIndicesRecursive() will try to replace them
// with (non-existent) child nodes.
if (preg_match('/$[0-9]/', $text->textContent)) {
$this->logFileProblem('File {file} has text with $-numbers');
return false;
}
The issue was mentioned at Phab:T271000#8134211. Bad language code: zh_Hans should be zh-Hans.
The style
element must have trailing characters[edit]
Phab:T271595 SVG translate tool replaces all fields with "$1" (style element needs at least one trailing character)
What's left over after a failed media rule parse?
@media printer { text { fill: black; } }
The regex match does not balance parentheses, so the second brace is collected as part of the ruleset. The trailing character is a right brace.
Hoisting style
attributes[edit]
At https://github.com/wikimedia/svgtranslate/blob/master/src/Model/Svg/SvgFile.php I hope this does not do what I think it does:
// Non-translatable style elements on texts get lost, so bump up to switch
if ($text->hasAttribute('style')) {
$style = $text->getAttribute('style');
$text->parentNode->setAttribute('style', $style);
}
Splitting langtags duplicates identifiers[edit]
Splitting systemLanguage
langtags duplicates id
attributes:
foreach ($realLangs as $realLang) {
// Although the SVG spec supports multi-language text tags (e.g. "en,fr,de")
// these are a really poor idea since (a) they are confusing to read and (b) the
// desired translations could diverge at any point. So get rid.
$singleLanguageNode = $sibling->cloneNode(true);
$singleLanguageNode->setAttribute('systemLanguage', $realLang);
// @todo: Should also go into tspans and change their ids, too.
// $prefix = implode( '-', explode( '-', $singleLanguageNode->getAttribute( 'id' ), -1 ) );
// $singleLanguageNode->setAttribute( 'id', "$prefix-$realLang" );
// Add in new element
$switch->appendChild($singleLanguageNode);
}
$switch->removeChild($sibling);
Is the lang
attribute copied[edit]
Something to test. If the text
element has a lang
attribute, is that attribute copied to the translations?
PHP static analyzer[edit]
SVG Translate already has some phpdoc (JavaDoc-style) comments, but the declarations have some issues. Phab:T316310 seeks to add the PHP static analyzer / linter Phan. A proposed patch adds Phan to the build process.
See the https://github.com/phan/phan/wiki/Tutorial-for-Analyzing-a-Large-Sloppy-Code-Base tutorial. A baseline run on SVG Translate produced
// PhanUndeclaredMethod : 10+ occurrences // PhanTypeMismatchArgumentNullableInternal : 7 occurrences // PhanCommentDuplicateParam : 3 occurrences // PhanDeprecatedFunction : 3 occurrences // PhanTypeMismatchArgumentInternalProbablyReal : 2 occurrences // PhanTypeMismatchDeclaredParamNullable : 2 occurrences // PhanTypeMismatchPropertyProbablyReal : 2 occurrences // PhanAccessMethodInternal : 1 occurrence // PhanTypeMismatchArgumentNullable : 1 occurrence // PhanTypeMismatchArgumentProbablyReal : 1 occurrence // PhanTypePossiblyInvalidDimOffset : 1 occurrence // PhanTypeSuspiciousNonTraversableForeach : 1 occurrence
That does not look outrageous. Some look like expected trivial fixes. Maybe a higher analysis level will complain about code not looking at the returned value. I have not seen the start out with weak checking plan and increase the level later. Software engineering advice has been to set the warning level at maximum and just deal with the flood. The ratchet-up-later plan sounds interesting, but somebody needs to remember to do the ratcheting. (Ah, Eric just wanting to ignore 200+ warnings that had to be wrong.)
Phan has not been around a long time (in development in 2017). What other PHP linters are there?
Mere mortals cannot view the log[edit]
See https://phabricator.wikimedia.org/T271000#8201384
So raise a creature feature request for a logging window.
I think I was bit by silent errors. The code detected a problem, logged an error, stopped further processing in that metnod, but forged ahead with subsequent tasks that GIGO'd.
Phabricator tickets[edit]
I need to update some SVG Phabricator tickets and create some others.
T271000 Mentioned In T319259: Check that document element is <svg> and in the right namespace rGSVTd5fc9388e033: Allow tspan to be in svg namespace (#622) T248252: SVG Translate: Skip unsupported text pattern and continue with the supported ones rGSVTb8c23d973c36: Create (and catch) exceptions for all existing error states T316741: Allow svg namespace prefixes other than 'svg' rGSVT91689a31bb6b: Fix regex to find $1, $2, etc. rGSVT05d2724e6765: Normalize lang codes when reading SVGs and writing translations T261192: Rendering multilingual (systemLanguage) SVG files fails locally after upgrading librsvg from 2.40.21 to 2.44.10 T40010: RFC: Re-evaluate librsvg as SVG renderer on Wikimedia wikis T275263: Translation dropdown not available on File: page after translating a specific SVG file on Commons via svgtranslate tool T271595: SVG translate tool replaces all fields with "$1" (style element needs at least one trailing character) Mentioned Here T271595: SVG translate tool replaces all fields with "$1" (style element needs at least one trailing character) T221382: [BUG] Some CSS selectors break translation input T319259: Check that document element is <svg> and in the right namespace T316310: Add Phan to SVG Translate CI T316741: Allow svg namespace prefixes other than 'svg' T221453: Add "newer" open fonts T280718: Re-evaluate whether keeping around https://noc.wikimedia.org/conf/fc-list is a good practive T261192: Rendering multilingual (systemLanguage) SVG files fails locally after upgrading librsvg from 2.40.21 to 2.44.10 T40010: RFC: Re-evaluate librsvg as SVG renderer on Wikimedia wikis T154237: SVG image wikisyntax can't use "lang=zh-hant" T265549: Update librsvg to > 2.44.10
- Phab:T241500 Add link to translate the SVG Translate tool itself
- Phab:T327573 SVG Translate does not output translated clause.
- Phab:T252347 SVG Translate should support textPath
- Phab:T335654 SVG Translate github repo has paused Dependabot
- Phab:T335663 Unable to compile assets (digital envelope routines unsupported)
Hoisting attributes[edit]
SVG Translate produces verbose output. One reason is that it copies all the attributes on the original text
element to the switch
element clauses. Consequently, we see something like
<switch> <text x="100" y="200" font-family="Arial" font-size="10" systemLanguage="en"><tspan>Hello, world.</tspan></text> <text x="100" y="200" font-family="Arial" font-size="10" systemLanguage="de"><tspan>Hallo Welt.</tspan></text> <text x="100" y="200" font-family="Arial" font-size="10"><tspan>Hello, world.</tspan></text> </text>
A more concise version would be
<switch transform="translate(100, 200)" font-family="Arial" font-size="10"> <text systemLanguage="en"><tspan>Hello, world.</tspan></text> <text systemLanguage="de"><tspan>Hallo Welt.</tspan></text> <text><tspan>Hello, world.</tspan></text> </text>
There can be a list of attributes to promote. For example, attributes font-family
, font-size
, font-weight
, and font-style
. If a switch
element has only text
elements and each of those elements have the same attribute z, then delete z from each child and move it to the switch
element. The replacement may override a value on the switch
element.
Class and Style confuse the issue[edit]
The class
and style
attributes complicate the replacement. The simple method would insist those attributes are not present on any element. There may still be a CSS selector that matches the switch
or text
elements. The priority of CSS rules is higher than attributes, so the only real trouble is the class
attribute. Leave the class
attribute on any element.
- Note: SVG 1.1 does not require CSS or
style
attribute support.[33]
There are many ways to specify presentation properties, and those specifications may conflict. The conflicts are resolved by assigning priorities to the property specifications:[34]
- Attributes (lowest priority)
- CSS selectors have calculated priorities based on specificity and order
- inline styles (may be overridden with CSS
!important
selectors) (highest priority)
The style priority makes hoisting difficult. It also means that inline styles are not (usually) overridden with CSS patterns.
Mozilla https://developer.mozilla.org/en-US/docs/Web/CSS/Specificity says
Your global CSS file that sets visual aspects of your site globally may be overwritten by inline styles defined directly on individual elements. Both inline styles and !important are considered bad practice, but sometimes you need the latter to override the former.
Inkscape heavily uses inline styles. In a way, that makes interpretation easier: inline styles have the highest precedence, so it is the least confusing way to apply style information. It is unlikely to be overridden by other information. At the same time, it also means that converting inline style information to attributes may have unexpected results if a CSS selector applies contradicting information.
Prefer class[edit]
It makes sense to prefer class (or other selectors) over explicit style
attributes. For example, say a class
selector sets the font characteristics to certain values and the style
attribute sets the same values. A class
selector may apply to more elements, so it should be favored.
A quick and dirty way to play this game is to remove the style
attribute and then Window.getComputedStyle()
. Any properties in the style
attribute that are already present may be removed. Coding can be a bit tricky.
Being more direct is also difficult. One can access the stylesheets and the style
attribute, but the CSSRule (Media rules make a multiverse), CSSStyleRule (does not parse selectors), and CSSStyleDeclaration do not have more mechanism. CSSStyleRule provides the selectorText
, but the interface does not provide a priority list of which rules apply to an element. Parsing and interpreting selectors is a difficult task.
There is querySelector
, so the inverted test may be done. Should check at how well that method works. The method does not return the priority.
Pseudo selectors may be difficult to get right. For example, :active
suggests the need to example all possibilities.
Animation may also cause trouble.
Transformation rewrites[edit]
The transform
rewrite is more complicated. A potential method chooses a suitable translation, appends it to the switch
element's transform
, and then adjusts the coordinates of all the children. For the text
element, if the coordinate adjusts to zero, then remove the attribute. Do not remove the zero attributes of tspan
elements because they start new text chunks. A transform
element on the text
or tspan
elements would cause a lot of trouble. So hoist the transform
attributes first and give up if they do not hoist.
Is this step a good idea? If the file is localized, all this information would be moved back into the text
element. In addition, the styling may be accomplished with a class
attribute and CSS, so it is not a heavy penalty for each clause. Moving styling information into a text
element may be the better goal. The issue of hoisting position information may still be reasonable.
Multiline trick[edit]
Merge with tspan
count issue.
It may be possible to do a multiline with vertical alignment trick. Instead of a one-line translation or a two-line translation, make it a three-line translation. For one-line translations, only one tspan
is used; the other two are left empty. For a two-line translation, the one-line tspan
is left empty.
<switch>
<text>
<tspan x="0" y="0">One-line translation</tspan>
<tspan x="0" y="-20">Two-line translation, line 1</tspan>
<tspan x="0" y="20">Two-line translation, line 2</tspan>
</text>
</switch>
When will librsvg
support the CSS ls
unit?
WMF warts[edit]
- https://www.debian.org/releases/ Debian releases...
More information:[3]
en:Debian version history (also has end of support)
Debian | librsvg |
WMF | ||||
---|---|---|---|---|---|---|
code name | version | date | status | version | date | WMF deploy |
Jessie | 8 | 2015-04 | archived | |||
Stretch | 9 | 2017-06 | archived | |||
Buster | 10 | 2019-07 | old old stable | 2.44.10 | April 2023 | |
Bullseye | 11 | 2021-08 | old stable | 2.50.3 | ||
Bookworm | 12 | 2023-06 | stable | 2.54.5 | ||
Trixie | 13 | testing | 2.54.5 | |||
Sid | 14 | unstable | 2.54.5 |
SVG and graphics editors[edit]
Also round tripping
Adobe Illustrator[edit]
CorelDraw[edit]
Inkscape[edit]
Inkscape is an editor that will preserve a lot of SVG because Inkscape uses SVG as its internal representation. Unlike other graphics editors, Inkscape does not have a different native format.
Inkscape produces bizarre numbers. Single precision formatted as a double. Metric conversions.
Inkscape uses concrete bounding boxes. Test the following scenario. An SVG image with 4 circles and each circle points to a linearGradient
element. The linear gradient uses the default gradientUnits="objectBoundingBox"
. I believe Inkscape will clone 4 <codeLlinearGradient elements, and those elemnents will be gradientUnits="userSpaceOnUse"
. Test if moving an object changes the coordinates or creates a new linear gradient.
Inkscape units[edit]
I worked on this topic somewhere else....
Notice the strange units in this header.
<svg
width="1281.4634mm"
height="246.60315mm"
viewBox="0 0 5125.8537 986.41263"
version="1.1"
id="svg370"
xml:space="preserve"
inkscape:version="1.2 (dc2aedaf03, 2022-05-15)"
sodipodi:docname="Chronologie_constitutions_françaises.svg"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg">
The viewBox
x-zoom: 0.24999999512276 → 1/4 to a single-float epsilon
The viewBox
y-zoom: 0.24999999239669 → 1/4 to a single-float epsilon
Convert the 5125.8537 pixels to inches: 53.394309375
Convert the 5125.8537 pixels to mm: 1356.215458125
Convert the width of 1281.4634 mm to inches: 50.45131496063
Convert the width of 1281.4634 mm to CSS pixels: 4843.3262362205
Accuracy[edit]
Some random issues about technical accuracy.
Refractometer[edit]
No clear schematic of how it works. Handheld and Abbe.
See w:Refractometer.
Motors[edit]
Compare diagram of a 3-phase induction motors File:Vierpolig-3stränge.svg with File:Asynchronous Motor.svg. Look at the flux lines.
-
Vierpolig-3stränge.svg 4-pole 3-phase motor
Motor winding
Terminal | Phase | Slot | Slot | connect to |
---|---|---|---|---|
1 | 1 | 1 | 6 | 12 |
6 | 2 | 3 | 8 | 2 |
8 | 3 | 5 | 10 | 4 |
4 | 1 | 7 | 12 | 6 |
9 | 2 | 9 | 14 | 20 |
5 | 3 | 11 | 16 | 22 |
7 | 1 | 13 | 18 | 24 |
12 | 2 | 15 | 20 | 14 |
2 | 3 | 17 | 22 | 16 |
10 | 1 | 19 | 24 | 18 |
3 | 2 | 21 | 2 | 8 |
11 | 3 | 23 | 4 | 10 |
- also concentric windings...
Wiring diagram on Commons?
NEMA and IEC
- https://electricmotorwarehouse.com/motor-connection-diagrams/
- https://download.sew-eurodrive.com/download/pdf/9PD0048.pdf
- https://www.gulfelectroquip.com/wp-geq/wp-content/uploads/GEQ-Terminal-Markings-and-Connections-Guide-020520.pdf
General
- https://www.nema.org/docs/default-source/motor-and-generator-guides-and-resources-library/6-electric-motor-terminology-and-performance-characteristics-v2.pdf
- https://motor-hmc.com/wp-content/uploads/2014/03/Motor-Glossary-from-Rockwell.pdf
I have trouble with this diagram of a shaded pole motor. The vertical flux paths are too thick. Compare to an actual design. Also winding flux path should be similar width.
-
shaded pole motor]]
-
shaded pole motor
-
shaded pole motor. Note thin sides.
Microphones and flux path[edit]
A ribbon microphone with no flux return path.
-
ribbon microphone
-
ribbon microphone with return path
Vacuum pumps and details[edit]
Stator shape, reflood, exhaust valve, oil seals, oil pump, foam, vanes through the axle.
Clearances. Bearings and something like an Oldham coupling.
Where was the liquid version? Liquid ring vacuum pump?
Electrochemical cell[edit]
I do not think the salt bridge works that way. KNO3.
-
SVG (15 kB) 5 June 2011
-
JPEG (111 kB) 15 May 2010
-
SVG (215 kB) 23 September 2017
-
PNG (178 kB) 24 November 2017
See https://kids.britannica.com/students/assembly/view/106626
Mechanical seals[edit]
The sealing only happens around the compression nut
Also KF seals prevent over compression.
Biology[edit]
The comment ("This picture is obsolete. the pluripotent stemcell of the blood is giving origin to a lymphoid and a myeloid cell line.") at
and
Many have worked on similar diagrams, so sort out the effort.
- File:Illu blood cell lineage.jpg 2006-05-17 on Commons. 77 kB. NIH (when?).
- File:Hematopoiesis (human) diagram.png 2006-08-11 1.18 MB. A. Rad. Has dense text block. There is also an extensive description about the cell images. It has an incompatible Commons license: "GFDL-self. This image is released under the GFDL-self license and is considered freely distributable. This image or any reproductions/customizations thereof (or any reproductions/customizations of its reproductions/customizations, and so forth) may NOT be sold without my explicit consent." The separate licensing section has just {{self|GFDL}}, so the licensing terms are inconsistent.
- File:Hematopoiesis (human) diagram.svg 2010-02-06 3.1 MB. Spacebirdy, no text
- File:Hematopoiesis (human) diagram en.svg 2010-02-07 1.68 MB. RexxS + others, first versions do not thumbnail in history section, has display=none text
- File:Hematopoiesis (human) diagram switch.svg 2020-07-18 1.33 MB. JoKalliauer, render in X issue. Easy to fix the render in X issue, but the
switch
translations are planar. No default language. Transforms with rotate and then x-y positioning. Only used on ja.Wiki. There are about 75 strings per plane and a dozen planes. Not a quick task. It would be nice to upload it to the _en ver, but then it would be difficult to do graphics edits.
- File:Hematopoiesis (human) diagram switch.svg 2020-07-18 1.33 MB. JoKalliauer, render in X issue. Easy to fix the render in X issue, but the
- File:Hematopoiesis (human) diagram en.svg 2010-02-07 1.68 MB. RexxS + others, first versions do not thumbnail in history section, has display=none text
- File:Hematopoiesis (human) diagram.svg 2010-02-06 3.1 MB. Spacebirdy, no text
The license issue is troubling. It affects all derivatives. It also has further issues because many files have been extracted from File:Hematopoiesis (human) diagram en.svg. See, for example, File:Monoblast.svg.
Comments on individual versions.
-
JPEG NIH
-
PNG human
-
PNG more modern (lost backarrow)
-
SVG more modern (lost backarrow)
Look at Wikidata items. Examine instance and subclass relations. (develops from (P3094), follows (P155), followed by (P156))
- Hemocytoblast hematopoietic stem cell (Q514525)
- Proerythroblast proerythroblast (Q2284254)
- Myeloblast myeloblast (Q1956556)
- lymphoblast lymphoblast (Q1873857)
- monoblast monoblast (Q2617333)
- megakaryoblast megakaryoblast (Q2984950)
- polychromatic erythroblast ??? erythroblast (Q3296909)
- progranulocyte ??? promyelocyte (Q2382063) with dup promyelocyte (Q66590191)
- lymphocyte lymphocyte (Q715347)
- monocyte monocyte (Q107244)
- megakaryocyte megakaryocyte (Q821701)
- erythrocytes red blood cell (Q37187)
- basophil basophil (Q107988)
- eosinophil eosinophil (Q107238)
- neutrophil neutrophil (Q188417)
- granulocytes granulocyte (Q223143)
- agranulocytes agranulocytes (Q1775422)
- leukocytes white blood cell (Q42395)
- thrombocytes platelets (Q101026)
Looking for files extracted from File:Hematopoiesis (human) diagram en.svg. Keeping track of derivatives is nice....
Clean up[edit]
Image maps[edit]
A post at Commons:Graphics village pump:
The template is
It uses <imagemap>.
The loom file is File:Simple_treadle_floorloom,_line_drawing.png.
The imagemap
lines look like:
poly 1360 808 1904 553 1921 1152 1399 1437 [[Heddle|Heddles and heddle frames or harness]]
The post has various questions.
Multilingual from Wikidata items:
{{Label|Q173056}}
→ loom{{Label|Q2748498}}
→ batten{{Label|Q39515}}
→ heddle- many loom terms are absent...
Several people have proposed Wikipedia links from a Wikidata item. From heddle (Q39515), hyperlink to the appropriate language Wikipedia entry if it exists.
Multilingual static (not tool tip) labels (as done at File:2022 Russian invasion of Ukraine.svg) are possible.
SVG has tool tips, and tool tips would be better than an image map because users do not supply the hit geometry. The method fails for MediaWiki because SVG is not served directly. Browsers have good support for tool tips, but they do not support SVG multilingual tool tips such as:
<g> <title lang="en">English</title> <title lang="de">Deutsch</title> <title lang="fr">Francais</title> ... </g>
The SVG 2.0 specification has apparently dropped multilingual titles. See https://svgwg.org/svg2-draft/struct.html#TitleElement
Animated SVG can be done. Make some conventional SVG visible upon a mouse over. The table of chemical elements technology.
Copyright[edit]
- https://commons.wikimedia.org/wiki/Commons:Graphic_Lab/Illustration_workshop/Archive/2022#SVG_for_a_common_anti-capitalist_graffiti
- https://commons.wikimedia.org/w/index.php?title=Commons:Village_pump/Copyright&diff=672607382&oldid=672478602#IWM_Non_Commercial_Licence
MediaWiki upgrades[edit]
See Phab:T265549#8145732 Phab:T216815
The new version of librsvg
now takes the langtag through the --accept-language
command line argument rather than the $LANG
environment variable. Well, it still uses the $LANG
and other environment locale variables, but those environment variables must now be Unix locale strings.
In general, an SVG langtag should never be passed through the $LANG
environment variable. The latter takes an opaque Unix locale string rather than an IETF language tag. It happened to work for early versions of librsvg
because those versions did a getenv("LANG")
and used that result as an IETF langtag. (As a separate issue, the early versions also did not process hyphenated langtags correctly.) The Rust versions of librsvg
now use a library routine to digest the locale string.
Command line arguments[edit]
Outdated man page:
- -f --format: png
- -w --width: integer
- -h --height: integer
- -o --output filename
- -d --dpi-x (default 90!)
- -p --dpi-y (default 90!)
- -a --keep-aspect-ratio
- -u undocumented; see unlimited below
Source
- https://gitlab.gnome.org/GNOME/librsvg
- https://gitlab.gnome.org/GNOME/librsvg/-/blob/main/src/accept_language.rs
- https://gitlab.gnome.org/GNOME/librsvg/-/blob/main/rsvg-convert.rst
- This wants to use different argument names?
- -w --width
- -h --height
- -o --output
- -f --format
- -a --keep-aspect-ratio
- -d --dpi-x
- -p --dpi-y
- -l --accept-language (New)
- -u --unlimited; turn off limited XML parsing
- This wants to use different argument names?
Resvg command line arguments[edit]
- https://github.com/RazrFalcon/resvg
- https://github.com/RazrFalcon/resvg/blob/master/src/main.rs
- resvg in.svg out.png
- -w --width
- -h --height
- --dpi (default 96)
- --languages
MediaWiki external converter[edit]
For conventional MediaWiki, there is a configuration variable:
$wgSVGConverters = [
'ImageMagick' => '$path/convert -background "#ffffff00" -thumbnail $widthx$height\\! $input PNG:$output',
'sodipodi' => '$path/sodipodi -z -w $width -f $input -e $output',
'inkscape' => '$path/inkscape -z -w $width -f $input -e $output',
'batik' => 'java -Djava.awt.headless=true -jar $path/batik-rasterizer.jar -w $width -d $output $input',
'rsvg' => '$path/rsvg-convert -w $width -h $height -o $output $input',
'imgserv' => '$path/imgserv-wrapper -i svg -o png -w$width $input $output',
'ImagickExt' => [ 'SvgHandler::rasterizeImagickExt', ],
];
The configuration should set the $wgSVGConverter
variable. Presumably, that is set to rsvg
.
I thought there were type declarations in MediaWiki PHP code. See, for example,
- https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/refs/heads/master/includes/upload/UploadBase.php
- https://github.com/wikimedia/svgtranslate
See PHP info:
It looks like the SVG Translate source has JavaDoc style annotations (but there was a strict assignment). See
See https://www.php.net/manual/en/function.str-replace.php and its "Replacement order gotcha" comment. Replacement is done left to right. The filename should always be rightmost.
public function rasterize( $srcPath, $dstPath, $width, $height, $lang = false ) {
$mainConfig = MediaWikiServices::getInstance()->getMainConfig();
$svgConverters = $mainConfig->get( MainConfigNames::SVGConverters );
$svgConverter = $mainConfig->get( MainConfigNames::SVGConverter );
$svgConverterPath = $mainConfig->get( MainConfigNames::SVGConverterPath );
$err = false;
$retval = '';
if ( isset( $svgConverters[$svgConverter] ) ) {
if ( is_array( $svgConverters[$svgConverter] ) ) {
// This is a PHP callable
$func = $svgConverters[$svgConverter][0];
if ( !is_callable( $func ) ) {
throw new MWException( "$func is not callable" );
}
$err = $func( $srcPath,
$dstPath,
$width,
$height,
$lang,
...array_slice( $svgConverters[$svgConverter], 1 )
);
$retval = (bool)$err;
} else {
// External command
$cmd = str_replace(
[ '$path/', '$width', '$height', '$input', '$output' ],
[ $svgConverterPath ? Shell::escape( "{$svgConverterPath}/" ) : "",
intval( $width ),
intval( $height ),
Shell::escape( $srcPath ),
Shell::escape( $dstPath ) ],
$svgConverters[$svgConverter]
);
$env = [];
if ( $lang !== false ) {
$env['LANG'] = $lang;
}
wfDebug( __METHOD__ . ": $cmd" );
$err = wfShellExecWithStderr( $cmd, $retval, $env );
}
}
// @phan-suppress-next-line PhanTypeMismatchArgumentNullable False positive
$removed = $this->removeBadFile( $dstPath, $retval );
if ( $retval != 0 || $removed ) {
// @phan-suppress-next-next-line PhanPossiblyUndeclaredVariable cmd is set when used
// @phan-suppress-next-line PhanTypeMismatchArgumentNullable cmd is set when used
$this->logErrorForExternalProcess( $retval, $err, $cmd );
return new MediaTransformError( 'thumbnail_error', $width, $height, $err );
}
return true;
}
The change is straightforward.
Change the rsvg
configuration (or make a rsvglang
).
'rsvg' => '$path/rsvg-convert -w $width -h $height -l $lang -o $output $input',
Should this line also have a -u
to match Thumbor?
Also add a resvg
line
'resvg' => '$path/resvg -w $width -h $height --languages $lang $input $output',
Before line 350, set $lang
to an actual langtag:
if ( $lang === false ) $lang = "en";
It might be cleaner to default $lang = 'en'
in the argument list.... Declare the argument to be a string type.
Add a "$lang"
substitution in lines 352–357. The substitution should be before the $input
position in the array.
Delete lines 362–364 (do not pass a langtag in the environment). This will break old versions of librsvg
.
Thumbor[edit]
For librsvg
used by Thumbor
rsvg-convert source.svg -u -f png -w width -h height --accept-language lang
def create_image(self, buffer):
self.prepare_source(buffer)
command = [
self.context.config.RSVG_CONVERT_PATH,
self.source,
'-u',
'-f',
'png'
]
if self.context.request.width > 0:
command += ['-w', '%d' % self.context.request.width]
if self.context.request.height > 0: # pragma: no cover
command += ['-h', '%d' % self.context.request.height]
env = None
if hasattr(self.context.request, 'lang'):
env = {'LANG': self.context.request.lang.upper()}
png = self.command(command, env)
return super(Engine, self).create_image(png)
Change the above code:
Add before line 57:
if hasattr(self.context.request, 'lang'):
command += ['-l', self.context.request.lang]
The .upper()
is not needed; langtags are not case sensitive. Alternatively, an unspecified lang may be set to "en".
Delete lines 58–59; do not pass the langtag in the environment.
Hyphenated langtags[edit]
Hyphenated langtags show NO TEXT — not even the default text.
Apparent thumbnailer issue with librsvg 2.44.10 and Thumbor/7.3.2.
- Thumbnail for az-latn has no text
- https://upload.wikimedia.org/wikipedia/commons/thumb/4/4c/IPv6_header-en.svg/langaz-latn-506px-IPv6_header-en.svg.png?20230520164902
- Thumbnail for az has text
- https://upload.wikimedia.org/wikipedia/commons/thumb/4/4c/IPv6_header-en.svg/langaz-506px-IPv6_header-en.svg.png?20230520164902
-
default
-
az-latn
-
az
Non-English default langtags[edit]
File using non-English default language displays default language rather than English.
Unit tests[edit]
Investigate https://gerrit.wikimedia.org/r/c/operations/software/thumbor-plugins/+/853402/
In particular,
def run_and_check_ssim_and_size(
self,
url,
mediawiki_reference_thumbnail,
perfect_reference_thumbnail,
expected_width,
expected_height,
expected_ssim,
size_tolerance,
):
"""Request URL and check ssim and size.
Arguments:
url -- thumbnail URL
mediawiki_reference_thumbnail -- reference thumbnail file
expected_width -- expected thumbnail width
expected_height -- expected thumbnail height
expected_ssim -- minimum SSIM score
size_tolerance -- maximum file size ratio between reference and result
perfect_reference_thumbnail -- perfect lossless version of the target thumbnail, for visual comparison
"""
try:
Structural Similarity Index Measure (SSIM).
So make 200px × 200px images named 200px-Test_Patch_000 to 200px-Test_Patch_FFF.
Then check SSIM between those images. That will allow determining a reasonable accept/reject range.
To test default English is generated, test this SVG against the appropriate patches:
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="200" height="200">
<switch>
<rect width="100%" height="100%" fill="#0F0" systemLanguage="en">
<title>en</title>
</rect>
<rect width="100%" height="100%" fill="#00F" systemLanguage="fr">
<title>en</title>
</rect>
<rect width="100%" height="100%" fill="#0FF" systemLanguage="zh">
<title>zh-Hans</title>
</rect>
<rect width="100%" height="100%" fill="#F0F" systemLanguage="zh-Hans">
<title>zh-Hans</title>
</rect>
<rect width="100%" height="100%" fill="#FF0" systemLanguage="zh-Hant">
<title>zh-Hant</title>
</rect>
<rect width="100%" height="100%" fill="#F00">
<title>default</title>
</rect>
</switch>
</svg>
OK, File:SVG Test System Language.svg produces color patches.
-
default should produce 0F0 Green FAIL
-
en should produce 0F0 Green FAIL
-
fr should produce 00F Blue
-
und should produce F00 Red
-
en-gb should produce F00 Red FAIL
-
fr-it should produce F00 Red FAIL
-
zh-tw should produce F00 Red FAIL
-
zh should produce any of 3: 0FF , F0F , FF0
-
zh-hans F0F FAIL
-
zh-hant FF0 FAIL
During an upgrade, Thumbor started producing incorrect thumbnails. See Phab:T335361. That problem ($LANG
versus $LC_ALL
) was fixed. No unit tests were added, and it may be that WMF does not subject Thumbor to unit tests.
Generally, there should be unit tests for librsvg
to make sure it does its job. In additions, there should be unit tests for the SVG thumbnailing code (the standard method and Thumbor). The later tests are needed to make sure that the language is properly communicated to librsvg
. That mechanism will fail again soon.
The reason is Thumbor 7.3.2 is using librsvg v2.44
. IIRC, that version still uses Unix environment variables to communicate the system language. (There may also be problems with hyphenated language tags: Unix may not understand the locale string sr-Latn
or zh-Hans
.) In later versions of librsvg
, the system language should be passed through the --accept-language
command line argument. Without valid unit tests, Thumbor may again quietly fail.
Where was the Phabricator issue that addressed environment variables?
Cannot set lang
for filepath
{{filepath:Multilingual SVG example.svg|nowiki}}
→- https://upload.wikimedia.org/wikipedia/commons/1/1e/Multilingual_SVG_example.svg
{{filepath:Multilingual SVG example.svg|800}}
→{{filepath:Multilingual SVG example.svg|langzh-800}}
→ (fails)
Workaround: generate the filepath for 800px and then replace "/800px" with "/langXX-800px".
Try OCR
- test German
- https://ocr.wmcloud.org/api.php?engine=google&image=https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Multilingual_SVG_example.svg/langde-800px-Multilingual_SVG_example.svg.png&lang=de →
{"image":"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/1e\/Multilingual_SVG_example.svg\/langde-800px-Multilingual_SVG_example.svg.png","engine":"google","langs":["de"],"psm":3,"crop":[],"image_hosts":["upload.wikimedia.org","upload.wikimedia.beta.wmflabs.org"],"text":"Liebe geht\ndurch den Magen."}
- → "Liebe geht\ndurch den Magen."
- https://ocr.wmcloud.org/api.php?engine=google&image=https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Multilingual_SVG_example.svg/langde-800px-Multilingual_SVG_example.svg.png&lang=de →
- test French
- https://ocr.wmcloud.org/api.php?engine=google&image=https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Multilingual_SVG_example.svg/langfr-800px-Multilingual_SVG_example.svg.png&lang=fr →
{"image":"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/1\/1e\/Multilingual_SVG_example.svg\/langfr-800px-Multilingual_SVG_example.svg.png","engine":"google","langs":["fr"],"psm":3,"crop":[],"image_hosts":["upload.wikimedia.org","upload.wikimedia.beta.wmflabs.org"],"text":"L'amour passe\npar l'estomac."}
- → "L'amour passe\npar l'estomac."
- https://ocr.wmcloud.org/api.php?engine=google&image=https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Multilingual_SVG_example.svg/langfr-800px-Multilingual_SVG_example.svg.png&lang=fr →
Interest[edit]
- signs of the austrian KennV Regulation
- Category:Kennzeichnungsverordnung
- Allied Shipping Losses. See https://www.youtube.com/watch?v=gp5diNUq-rU&t=1m37s at 1:37. Should be bar chart. Events should be lines.
HTML 5[edit]
Wikitext supports some HTML markup.
- abbr
- <acronym title="Central Intelligence Agency">acronym CIA</acronym> (deprecated)
- <address>address</address>
- <aside>aside</aside> (aside)
- cite (usage advice varies a lot for this element)
code
- data
- <details><summary>details</summary> The expanded description.</details>
- kbd (user input)
- mark
q
- (rp should disappear; part of ruby)
- ruby 漢 字
- samp (sample output)
- var
- <style></style> (appropriately neutered)
MediaWiki hacks[edit]
- {{Igen}}
- {{Key press}} → A, Ctrl+F, Ctrl+alt+delete
- {{Language}} → English, German, Russian, Traditional Chinese
- {{#language: ... }} → English, Deutsch, русский, 中文(繁體)
- {{#expr: ... }} → 4, 3.14
References[edit]
- ↑ Caching, Mozilla.org
- ↑ https://www.alibabacloud.com/blog/what-is-domain-resolution-and-how-it-works_597610
- ↑ https://serverfault.com/questions/347689/how-to-share-domain-name-with-multiple-servers
- ↑ See https://aeronav.faa.gov/user_guide/20211202/cug-complete.pdf at page 43. In those images, the NDB symbol is a dot, ring, and only 5 dotted rings.
- ↑ SVG 2.0 Chapter 5 Document Structure § 5.8
- ↑ Cory Doctorow, A Bug in early Creative Commons licenses has enabled a new breed of superpredator
- ↑ Village Pump: Cory Doctorow post on "copyleft trolls" mentions Commons
- ↑ Village pump:cc-by < 4.0 not ok any more
- ↑ e.g., https://id.loc.gov/vocabulary/relators.html
- ↑ Adobe, XMP Specification Part 1 at Table 4.
- ↑ Nevile, Liddy; Lissonnet, Sophie (January 2004) The Case for a Person/Agent Dublin Core Metadata Element Set[1]
- ↑ https://www.compart.com/en/unicode/charsets/Adobe-Symbol-Encoding
- ↑ https://www.compart.com/en/unicode/charsets/x-Adobe-Zapf-Dingbats-Encoding
- ↑ https://fonts2u.com/sonata.font https://adobe-type-tools.github.io/font-tech-notes/pdfs/5045.Sonata.pdf
- ↑ https://stackoverflow.com/questions/36486716/the-14-standard-pdf-fonts-and-character-encoding
- ↑ https://www.compart.com/en/unicode/charsets/Adobe-Standard-Encoding
- ↑ Mozilla (2021) SVG Fonts[2]
- ↑ https://www.enzolifesciences.com/science-center/technotes/2019/december/what-are-the-differences-between-northern-southern-and-western-blotting?/
- ↑ https://thumbor.readthedocs.io/en/latest/
- ↑ “ꝺ” U+A77A Latin Small Letter Insular D Unicode Character
- ↑ https://linux.die.net/man/1/xsltproc
- ↑ https://www.wikidata.org/w/index.php?title=Special:Search&limit=100&offset=0&profile=default&search=Pershotravneve&ns0=1&ns120=1
- ↑ https://www.google.com/maps/place/Radekhiv,+Lviv+Oblast,+Ukraine/@50.2811748,24.6012475,13z
- ↑ https://developer.mozilla.org/en-US/docs/Web/SVG/Element/circle
- ↑ See File:SVG CSS Test.svg for a test of
.cls2.cls3
selection. - ↑ https://tc39.es/ecma402/#sec-intl-datetimeformat-constructor
- ↑ Write the Date in French, wikihow.com. The first is pronounced "premier".
- ↑ Italian Ordinal Numbers and Numerical Rank, thoughtco.com. "il primo".
- ↑ Inkscape Tutorial. Chapter 6. SVG File Format. https://inkscapetutorial.org/svg-file-format.html
- ↑ https://www.mediawiki.org/wiki/User:Jarry1250/GSoC_2012_roadmap
- ↑ Meta:Community Wishlist Survey 2017/Multimedia and Commons/SVG-Translate
- ↑ Meta:Community Tech/SVG translation
- ↑ https://www.w3.org/TR/SVG/styling.html
- ↑ https://drafts.csswg.org/selectors/#specificity-rules