Discussion:
[OT] forwards from right-stuf-press at lists.rightstuf.com
(too old to reply)
Ivan Shmakov
2013-03-27 15:47:45 UTC
Permalink
Ivan, Your first problem is that you're using Gnus. This is not a
problem in and of itself but it means that you're using GNU Emacs.
This *is* a problem. GNU Emacs' multi-byte character handling (MULE)
is atrocious. It's broken, it's stupid, it will corrupt messages,
and it can't be turned off. The only person in the world who can fix
it is RMS and he adamantly refuses to acknowledge that it is a
problem.
First of all, I've never experienced an encoding-related issue
with Emacs ever since I've switched to Emacs 23. (Or was it
Emacs 22? Other than a few spurious crashes, that is.)
However, cross-posting to news:comp.emacs, should I indeed be
mistaken regarding the state of Unicode support in GNU Emacs.

Second, the issue manifests itself with Thunderbird (as I've
already mentioned), /and/ with Google Groups (as I do now.)
Consider, for instance:

http://groups.google.com/group/rec.arts.anime.info/msg/278594a283bf56ff

(Look for the apostrophes; e. g., in "Right Stuf's".)

Third, I /can/ decode UTF-8 from a hexdump. Please be sure that
I /did/ the relevant checks before "accusing" the other party of
making "wrongly-encoded" postings.
Your second problem is actually your own problem. Not the groups,
not the moderators, and not RightStuf's. Okay, maybe RightStuf's
since the postings are copied verbatim from their mailings. If you
don't like that the client you're using doesn't have a wide enough
Subject column in the summary list then make the column wider.
Whatever is the width of the single-line field, there's a way to
exceed it. (Unless it's Twitter, that is.) That's why the rule
of the thumb is to try to order the information from the most
important to the least: whether it's netnews or email Subject:,
HTML's <title />, or something else. (Yes, there're still Web
fora which format titles as: "Our new very kewl forum + The Foo
topic + The Bar thread." Instead of doing it in the reverse.)

I surely don't want to argue on this (minor) point any further.
It's an issue of good taste, and as such, it has no RFCs for one
to violate. (While for the encodings, there are.)
Or use a client that lets you do that. Simple.
Heck, you're using Gnus. Write a hook that collapses Subject fields
the way you like.
Indeed, that may be a sensible solution.
--
FSF associate member #7257
Stainless Steel Rat
2013-03-27 20:52:51 UTC
Permalink
First of all, I've never experienced an encoding-related issue with
Emacs ever since I've switched to Emacs 23. (Or was it Emacs 22?
Other than a few spurious crashes, that is.) However, cross-
Then you've been fortunate. An Emacs buffer can have only one file coding
system at a time. If the wrong coding system is selected, or a file
contains multi-byte characters from two or more different coding systems,
then the buffer contents are corrupted.

RightStuf sends out announcements as HTML-encoded UTF-8. Usenet does not
officially acknowledge UTF-8 encodings. The multi-byte munging could
happen at any NNTP server along the path between poster and Google or
whatever NNTP servers you use.
Whatever is the width of the single-line field, there's a way to exceed
it. (Unless it's Twitter, that is.) That's why the rule of the thumb
RFC 1036 (Usenet message format) inherits from RFC 822 (Internet mail
message format). RFC 822 explicitly does not impose limits on header field
lengths. RFC 822 discourages field folding in section 3.4.8. There is no
limit to exceed beyond the cosmetic capabilities of your Usenet client.
--
\m/ (--) \m/
Ivan Shmakov
2013-03-28 09:46:10 UTC
Permalink
[Setting Followup-To: news:comp.emacs, for the encoding support
in Emacs is irrelevant to either anime or netnews.]
First of all, I've never experienced an encoding-related issue with
Emacs ever since I've switched to Emacs 23. (Or was it Emacs 22?
Other than a few spurious crashes, that is.) However, cross-posting
to news:comp.emacs, should I indeed be mistaken regarding the state
of Unicode support in GNU Emacs.
Then you've been fortunate. An Emacs buffer can have only one file
coding system at a time.
Frankly, I can't readily recall of /any/ text editor that would
allow for multiple encodings for a single file. Could you
please name one?
If the wrong coding system is selected, or a file contains multi-byte
characters from two or more different coding systems, then the buffer
contents are corrupted.
As with virtually all the other editors I ever encountered.

Still, it's possible to read a file using the "raw" encoding,
and decode it explicitly later, either as a whole, or
piece-wise, with M-x decode-coding-region.

[...]

PS. Kvankam mi estas laca pro tio diskuto.
--
FSF associate member #7257
Stainless Steel Rat
2013-03-29 02:01:45 UTC
Permalink
Frankly, I can't readily recall of /any/ text editor that would allow
for multiple encodings for a single file. Could you please name one?
Not that it's terribly relevant since there are no text editors out there
other than Emacs that can be used to read mail and news (that I know of).
But the answer is: Emacs but only if the coding system for the buffer is
set to 'binary.
As with virtually all the other editors I ever encountered.
Perhaps, but I can't name one besides Emacs that has ever been responsible
for corrupting an entire mail spool because of it. Because MULE's
developers think that automatic detection is always correct.

It isn't. There are three or four places in pop3.el where I had to
explicitly set the coding system to 'binary because MULE's automatic
detection would eat Gnus users' mail. And then some thick-headed MULE
maintainer removed those settings after pop3.el was submitted for
packaging because auto-detection is always correct, so I was told.

But I'm out of it at this point. I had this argument with the Emacs
maintainers some 10-15 years ago. The results of that argument, along with
issues regarding code forks and responsibility thereof, so angered and
disgusted me with the FSF's development processes that I dropped Emacs
development entirely.

And I don't read comp.emacs any more, either.
--
\m/ (--) \m/
Ivan Shmakov
2013-04-03 10:16:16 UTC
Permalink
[Well, cross-posting to news:news.software.readers and
news:comp.mail.misc. Perhaps news:gnu.emacs.gnus might have
been a better choice, but I'm somewhat cautious as to cross-post
to a Mailman-backed newsgroup.]
Post by Stainless Steel Rat
Post by Ivan Shmakov
Frankly, I can't readily recall of /any/ text editor that would
allow for multiple encodings for a single file. Could you please
name one?
Not that it's terribly relevant since there are no text editors out
there other than Emacs that can be used to read mail and news (that I
know of).
But then, in what I'd consider the "proper" setup, Emacs
accesses IMAP and NNTP servers to get mail and news, and not
some files directly. So, yes, the lack of multiple encodings
per file support is irrelevant for the task.
Post by Stainless Steel Rat
But the answer is: Emacs but only if the coding system for the buffer
is set to 'binary.
Yes.
Post by Stainless Steel Rat
Post by Ivan Shmakov
As with virtually all the other editors I ever encountered.
Perhaps, but I can't name one besides Emacs that has ever been
responsible for corrupting an entire mail spool because of it.
Because MULE's developers think that automatic detection is always
correct.
It isn't. There are three or four places in pop3.el where I had to
explicitly set the coding system to 'binary because MULE's automatic
detection would eat Gnus users' mail. And then some thick-headed
MULE maintainer removed those settings after pop3.el was submitted
for packaging because auto-detection is always correct, so I was
told.
But I'm out of it at this point. I had this argument with the Emacs
maintainers some 10-15 years ago. The results of that argument,
along with issues regarding code forks and responsibility thereof, so
angered and disgusted me with the FSF's development processes that I
dropped Emacs development entirely.
Incidentally, some 10 years ago was the time when I dropped both
POP3 and the Unix mbox format entirely.

Nowadays, it's the Dovecot IMAP server that manages my Maildir
mail boxes (including the local ones.) Never has Gnus corrupted
someone's "entire mail spool" (in my sphere of responsibility)
ever since.

(But I have to agree that multiple encodings support in Emacs
has improved considerably over the last decade.)
Post by Stainless Steel Rat
And I don't read comp.emacs any more, either.
--
FSF associate member #7257 np. Birdeca bird' -- Marchela Fasani
Julian Bradfield
2013-04-03 11:12:12 UTC
Permalink
Post by Ivan Shmakov
But then, in what I'd consider the "proper" setup, Emacs
accesses IMAP and NNTP servers to get mail and news, and not
some files directly. So, yes, the lack of multiple encodings
per file support is irrelevant for the task.
But a single message may contain multiple encodings...which are
correctly dealt with by VM, for example, just as it has to deal with
multiple encodings in a mailbox file.
Γιώργος Κεραμίδας
2013-04-05 09:26:16 UTC
Permalink
Post by Julian Bradfield
Post by Ivan Shmakov
But then, in what I'd consider the "proper" setup, Emacs
accesses IMAP and NNTP servers to get mail and news, and not
some files directly. So, yes, the lack of multiple encodings
per file support is irrelevant for the task.
But a single message may contain multiple encodings...which are
correctly dealt with by VM, for example, just as it has to deal with
multiple encodings in a mailbox file.
Are you talking about multiple encoding in a single 'MIME part' too, or
something else though?

Each MIME part can have its own charset attribute, so Gnus, VM, or any
other email client, can recognize that this is happening and try to
recode / display the MIME part in the appropriate way.

But multiple character sets / encodings within a _single_ MIME part
sounds a bit silly.
Shmuel (Seymour J.) Metz
2013-04-05 13:01:45 UTC
Permalink
Post by Γιώργος Κεραμίδας
Are you talking about multiple encoding in a single 'MIME part' too,
or something else though?
From RFC 2045:

An initial set of seven top-level media types is defined in RFC
2046. Five of these are discrete types whose content is
essentially opaque as far as MIME processing is concerned. The
remaining two are composite types whose contents require
additional handling by MIME processors.

In a composit entity, each body part can have a different charset.
--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to ***@library.lspace.org
Γιώργος Κεραμίδας
2013-04-08 20:29:13 UTC
Permalink
Post by Shmuel (Seymour J.) Metz
Post by Γιώργος Κεραμίδας
Are you talking about multiple encoding in a single 'MIME part' too,
or something else though?
An initial set of seven top-level media types is defined in RFC
2046. Five of these are discrete types whose content is
essentially opaque as far as MIME processing is concerned. The
remaining two are composite types whose contents require
additional handling by MIME processors.
In a composit entity, each body part can have a different charset.
Exactly.

I admit I have only worked with semi-compatible encodings, like two
different parts of the same MIME message which are encoded in utf-8 and
iso-8859-7 respectively. So I am missing something here, because I have
seen Gnus running as utf-8 display both parts of such a message with the
appropriate conversion to the 'wider' encoding.

Is there any combination of MIME parts that makes Gnus barf? This is
probably a bug if it happens, and we should let the Gnus maintainers
know about it.
--
Giorgos Keramidas; ***@gmail.com
Ivan Shmakov
2013-04-09 20:49:29 UTC
Permalink
[…]
Post by Γιώργος Κεραμίδας
I admit I have only worked with semi-compatible encodings, like two
different parts of the same MIME message which are encoded in utf-8
and iso-8859-7 respectively. So I am missing something here, because
I have seen Gnus running as utf-8 display both parts of such a
message with the appropriate conversion to the 'wider' encoding.
I guess the only point to miss is that it was (AIUI) a
pop3.el-specific bug that the OP (= other poster) stumbled upon
some 10 years ago.
Post by Γιώργος Κεραμίδας
Is there any combination of MIME parts that makes Gnus barf? This is
probably a bug if it happens, and we should let the Gnus maintainers
know about it.
FWIW, there aren't any such bug that I know in Gnus 5.13.
--
FSF associate member #7257 http://hfday.org/
Γιώργος Κεραμίδας
2013-04-11 08:23:51 UTC
Permalink
Post by Ivan Shmakov
Post by Γιώργος Κεραμίδας
I admit I have only worked with semi-compatible encodings, like two
different parts of the same MIME message which are encoded in utf-8
and iso-8859-7 respectively. So I am missing something here, because
I have seen Gnus running as utf-8 display both parts of such a
message with the appropriate conversion to the 'wider' encoding.
I guess the only point to miss is that it was (AIUI) a
pop3.el-specific bug that the OP (= other poster) stumbled upon some
10 years ago.
Ah, I see now. Thanks :-)

I found the original message and read it more carefully. Now, having
done a bit of reading, I understand better what the OP wrote. I haven't
used MULE with POP3 as extensively as the OP says he did, and it's a
10-year old problem that even he doesn't really want to talk about.

I was curious because I use Gnus with IMAP a /lot/ now to access, read
and compose multipart MIME messages, and it seemed like there's a
problem with the current codebase. Now, having read more, I don't think
there's a lot to worry about -- at least not until we have an indication
that there /is/ a bug and we have something to work with.

My apologies for the extra noise, until I grasped what was going on.
Ivan Shmakov
2013-03-28 09:57:22 UTC
Permalink
[Dropping news:comp.emacs from Followup-To:.]

[...]
RightStuf sends out announcements as HTML-encoded UTF-8. Usenet does
not officially acknowledge UTF-8 encodings.
That would seem to suggest that such forwards "violate" the
established Usenet conventions, wouldn't it?
The multi-byte munging could happen at any NNTP server along the path
between poster and Google or whatever NNTP servers you use.
These days, I tend to assume that Usenet transports are
"8-bit clean." Moreover, section 3.6 of RFC 5537 explicitly
prohibits such munging:

--cut: urn:ietf:rfc:5537 --
Relaying agents MUST NOT alter, delete, or rearrange any part of an
article except for the Path and Xref header fields. They MUST NOT
modify the body of articles in any way. If an article is not
acceptable as is, the article MUST be rejected rather than modified.
--cut: urn:ietf:rfc:5537 --

(But feel free to provide any examples to prove me mistaken on
that.)

Alternatively, I'd suggest to use the quoted-printable encoding
for the forwards. (Now that I know that the messages are
rendered into plain text from HTML, such re-encoding doesn't
seem all that infeasible, and perhaps only a checkbox away for a
Thunderbird user.)

Or perhaps it may make sense to ASCII-fy the forwards
altogether.
Post by Ivan Shmakov
Whatever is the width of the single-line field, there's a way to
exceed it. (Unless it's Twitter, that is.)
[...]
There is no limit to
exceed beyond the
cosmetic capabilities
of your Usenet
client.
Yes. And you
aren't going to
read netnews
without one, are
you?
--
FSF associate member #7257
Loading...