یونی کد

benyamin_pc · Dec 3, 2006

تاپیک دات نت که را نیافتاد یعنی کسی دس بالا نکرد :neutral:

اگه بخواهیم با سی شارپ متن داخل یه تکست باکس رو یونی کد Savesh کنیم چطوری میشه؟

MnavidM · Dec 5, 2006

سلام.

اگر می خواید کار گروهی انجام بدید ، حداقلش اینکه خود شما شروع به کار کنید و بعد بقیه هم کم کم در اون ضمینه کمک می کنند.

برای یونی کد اینجا رو ببین .

saalek110 · Dec 6, 2006

اگر میشه یکی از دوستان راجع به یونیکد برای من توضیح بدهد.

سئوالها:
اصلا یونیکد یعنی چه؟
چند نوع است کدینگ و یونیکد چند نوع ست؟
کلا وقتی تبدیل می کنیم چی تبدیل میشه؟

البته من مطلب زیاد خوندم راجع به آن ولی برایم گنگ است.
آیا چیزی مثل کدهای اسکی است؟
یعنی چند تا جدول اسکی داریم.
در هارد چطور ذخیره میشه هر یک؟
مثلا آیا یونیکد 32 یعنی دو بایت در هارد جا می گیرد؟
اگر منبعی که به زبان ساده توضیح داده باشد اینها را معرفی کنید هم خیلی خوبه.
ولی فقط من چند خط توضیح می خواهم تا بفهمم که این تبدیلها یعنی چه؟
مثلا وقتی می گویند سایتی اربیک است یا یونیکد یعنی چه.

MnavidM · Dec 9, 2006

سلام.

یونیکد این نوع استاندار کدگذاری می باشد ، که می تواند شامل تمام زبان های محلی نیز باشد.

در کل ، هنگام تبدیل ، کد کاراکتر های به کد کاراتر های یونیکد تبدیل می شود.

یونی کد چیست ؟

حتما این روزها کلمه Unicode بارها به گوشتان خورده و یا در وب سایت ها و برنامه های کاربردی جدید آن را دیده اید و میخواهید بدانید که Unicode چیست. همینطور که حتما میدانید کامپیوتر فقط با عدد و رقم سرکار دارد و تمام اطلاعات نوشتاری,صوتی و تصویری نهایتا بصوت اعداد و ارقام در کامپیوتر ذخیره و پردازش میشوند .خوب حالا ما برای اینکه بتوانیم اطلاعات نوشتاری خودمان را برای کامپیوتر قابل فهم کنیم مجبوریم تا به هر حرف از حروف الفبا, یک کد عددی اختصاص بدهیم. از این رو صدها نوع سیستم کد گذاری به وجود آمده و برای زبانهای مختلف سیستم های مختلف معرفی شده.این مسئله در مورد زبان رسمی ایران یعنی فارسی هم صادق بود و تازه هر شرکت نرم افزاری ایرانی هم یک سیستم کد گذاری مخصوص خودش را داشت و دارد و هیچ وقت یک سیستم کد گذاری استاندارد برای زبان فارسی بوجود نیامد که همه برنامه نویسها از آن استفاده کنند.این مسئله در مورد زبان های دیگر هم وجود داشت.تا اینکه بالاخره Unicode معرفی شد.یونی کد نه یک نوع فونت خاص است نه یک برنامه خاص بلکه یک استاندارد کد گذاری برای حروف الفباست مانند ANSI. یونی کد به همه ی حروف اعداد یکتایی مستقل از محیط سیستم عامل و برنامه و زبان اختصاص میدهد.این سیستم میتواند تمام حروف زبانهای مختلف زنده امروز دنیا را در خودش جای دهد و به هر کدام از این حروف کد یکتایی اختصاص دهد. بکار گیری Unicode در وب سایت ها و برنامه های Clint -Server میتواند بسیار مفید باشد و ما نگران این نیستیم که کاربران وب سایت از چه سیستم عامل و از چه مرورگر اینترنتی استفاده میکنند فقط باید از استاندارد Unicode پشتیبانی کند.امروزه بسیاری از شرکتهای پیشتاز دنیای کامپیوتر این استاندارد را پذیرفته اند و تقریبا تمامی برنامه های کاربردی جدید از این سیستم کد گذاری حمایت میکنند مثلا از سیستم عامل Windows me به بعد یعنی Windows 2000,windows xp و windows server.net تماما بر مبنای Unicode بنا شده اند و سیستم عامل های دیگر مثل MacOs و Solaris و چندین سیستم عامل دیگر هم از Unicode حمایت میکنند.برنامه کاربردی نیز مانند Office2000وOffice Xp بطور کامل از این استاندارد پشتیبانی میکنند. و شما میتوانید با استفاده از Front Page Xp یا Front Page 2000 یا Visual Studio.net صفحات وب فارسی ایجاد کنید.
برای گسترش و ترویج استاندارد Unicode یک کنسر سیوم ایجاد شده است. در حال حاضر این سازمان نسخه جدید Unicode را یعنی Unicode 3.2.0 را منتشر کرده است.
استفاده از Unicode در حال افزایش است و برنامه ها و وب سایت های آینده تماما از این استاندارد استفاده خواهند کرد.این مسئله برای ما فارسی زبانها نیز موقعیت مناسب ایجاد کرده و میتوانیم در عرصه اینترنت مطالب خود را به زبان فارسی بدون استفاده از روش های مثل ایجادفایلهای تصویری از متن و قرار دادن آن درصفحه وب و یا استفاده از کدگذاری زبانهای دیگر مانند زبان تازی(عربی) عرضه کنیم.این مسئله فرایند ایجاد وب سایت ها و برنامه های فارسی را بسیار آسان تر و کم هزینه تر کرده.
این وب سایت هم(RastiSoft) از Unicode برای پیاده سازی زبان فارسی استفاده کرده و برای نمونه شما در صفحه تماس باما میبینید که بدون نیاز به دریافت برنامه خاصی مثلا یک اپلت جاوا یا یک ActiveX , براحتی می توانید مطلب خود را به زبان فارسی تایپ کنید. بنابراین اگر قصد دارید که یک وب سایت و یا یک وب لاگ فارسی شخصی ایجاد کنید پیشنهاد میکنم که حتما از استاندارد Unicode استفاده کنید.

به نقل از فویوم persianweb.com .

موفق باشید.
نوید.

saalek110 · Dec 9, 2006

مقاله بسیار مفیدی بود. واقعا ممنون. کاملا متوجه شدم.

منظور من فهمیدن ماهیت یونیکد بود تا بفهمم توابع تبدیل کننده چه کاری انجام می دهند تا موقع کار با توابع بدانم چه تغییراتی ایجاد می شود.

arezoo.j · Nov 1, 2007

کسی یه مقاله کامل درمورد یونی کد و استانداردهای محلی داره؟
مرسی

saalek110 · Nov 1, 2007

متن زیر از کتاب:
Prentice.Hall.PTR.Core.Web.Application.Development.with.PHP.and.MySQL.Sep.2005
است.

شاید مناسب باشد.

===========================

Character Sets and Unicode
What follows is a brief introduction to the character sets you might encounter as you move around the Internet and the computing world. It is not designed to be comprehensive, but is designed to merely present you with the basics of what you will see. For those wishing to learn more about this topic, there are a plethora of web sites on the Internet with extremely detailed descriptions of this information.

ASCII
When computers were first developed, one of the primary things required was the ability to map digital codes into printable characters. Older systems existed, but none were quite suited to the binary nature of the computer. With this in mind, the American Standards Association announced the American Standard Code for Information Interchange, more commonly known as ASCII, in 1963. This was a 7-bit character set containingin addition to all lower- and uppercase Latin letters used in the (American) English alphabet numbers, punctuation markers, quotation markers, and currency symbols.

Unfortunately, this system proved particularly ill-suited to the needs of Western European countries, which ranged from the British Pound Symbol (£); to accents and other markings for French and Spanish; to new letters and ligatures (two letters combined together, such as æ); and completely different letters, as in modern Greek. In other parts of Europe, users of the Cyrillic, Armenian, or Hebrew/Yiddish alphabets also found themselves left in the dark.

The ISO 8859 Character Sets
Fortunately, most modern computers are able to store data in 8-bit bytes, and the 7-bit ASCII characters were only using the high bit as a parity bit (used for verifying data integrity), and not for useful information. The obvious next step was to start using the upper 128 slots available in an 8-bit byte to add different characters.

What resulted over time was the ISO 8859 series of character sets (also known as code pages). ISO 8859-1 defined the "Latin Alphabet No. 1," or Latin-1 set of characters that covered a vast majority of Western European languages. Over the years, the 8859 series of code pages has grown to 14 (8859-11 is proposed for Thai, and 8859-12 remains unfilled), with 8859-15 created in 1999it is the Latin-1 (8859-1) code page with the Euro Symbol () added. Among other code pages are those for eastern European Slavic languages and Cyrillic languages such as Russian, Hebrew, and Turkish.

The benefit to these character pages is that they remain fully compatible with the old ASCII character set and largely share the same lower 127 characters and control sequences. There were some slightly modified implementations of these character sets, most notably the default code pages in some versions of the Microsoft Windows operating system. This came to be known as the windows-1252 code page, or simply cp-1252 for the English language (windows-1251 for Russian and windows-1254 for Turkish, and so on). The Apple Macintosh also has a slightly modified Latin-1 code page.

Far Eastern Character Sets
It turns out that 256 character codes is not enough to handle the needs of East Asian languages with large alphabets, such as Chinese, Japanese, or Korean. (Korean actually uses a syllabary, where larger units are made up of individual letters, but computers have to store large numbers of these possible syllabic units.) To handle these, many different solutions were developed over the years, with many originating from Japan, where the need arose first.

Initially an 8-bit character set was created that encoded the Latin letters, some symbols, and the characters in the Japanese katakana alphabet (one of the four alphabets used in Japanese). This is called "8-bit JIS" (Japanese Industrial Standards). Afterward came a character code system called "Old JIS," which was superseded by "New JIS." Both are multi-byte character sets that use a special byte called an escape sequence to switch between 8-bit and 16-bit encodings. In addition to the Japanese phonetic alphabets, these codes also included the Latin and Cyrillic alphabets and the more commonly used Chinese characters (Kanji) used in modern Japan.

A slight variation on this was invented by Microsoft Corporation"Shift-JIS" or S-JIS, also known as DBCS (Double-Byte Character Set). This merely specified that if a sequence of bits in the first byte was set, there was a second byte that would be used to specify which character to use, thus avoiding the separate escape sequence mechanism. This meant a reduced number of possible characters that could fit into the 16 bits available in a multi-byte code because certain bits were reserved to mark a character as two bytes instead of one. However, it was felt that the tradeoff was worthwhile on older 8- and 16-bit computer systems, where space and performance were at a premium.

Over the years, similar systems have been created to encode the various forms of Chinese and Korean. In addition to cryptically named standards to cover Simplified Chinese (written in mainland China), such as GB 2312-80, other standards, such as Big-5 (for Traditional Chinese, written in Taiwan) and UHC (Unified Hangul Code) exist for Korean. There are strengths and weaknesses to all these systems, although many do not fully and properly encode the full set of characters available in these languages (particularly those in Chinese).

Unicode
As people started to understand the limitations in the various character sets and the need for fully globalized computer applications grew, various initiatives were taken to develop a character set that could encode every language. Two initiatives were started in the late 1980s to create this standard. Unicode (from "Universal Code") eventually came to dominate, becoming ISO 10646.

The initial Unicode standard suggested that all characters in the world should be encoded into a 16-bit two-byte sequence that would be fully compatible with the old ASCII characters in the first 127 slots. In addition to the Latin alphabets and their variants, support for other alphabets, such as Armenian, Greek, Thai, Bengali, Arabic, Chinese, Japanese, and Korean would be included.

Unfortunately, 16 bits is not enough to encode the characters found in Chinese, Japanese, and Korean, which are in excess of 70,000. The initial approaches of the Unicode Consortium were to try and consolidate the characters in the three languages and eliminate "redundant" characters, but this would clearly prohibit computer encoding of ancient texts and names of places and people in these countries.

Therefore, a 32-bit version of Unicode has recently been introduced. For cases when 16 bits are not sufficient, a 32-bit encoding system can be used. This encoding reserves space not only for modern and living languages, but also dead ones. Newer versions of the standard have provided maximal flexibility in how the language is stored, permitting not only 16-bit and 32-bit character streams, but also single-byte streams.

Unicode Encodings
Next, we must look at how Unicode is transmitted over the Internet, stored on your computer, and sent in HTML or XML (all of which are still typically done in single-byte formats). There are commonly used encodings for Unicode. The most common ones that you will see or hear of are:

UTF-7 This encodes all Unicode characters in 7-bit characters by preserving most of the regular ASCII characters and then using one or two slots to indicate a sequence of extended bytes for others.

UTF-8 This encodes the full ASCII character set in the first 127 slots and then uses a non-trivial scheme to encode the remaining Unicode characters in as many as 6 bytes. This encoding is heavily favored over the 7-bit Unicode encoding for single-byte Unicode transmission.

UTF-16 This encodes Unicode characters into a 16-bit word. Originally envisioned to be a fixed-sized character set, it now supports chaining to correctly handle the full set of characters that Unicode encompasses. Fortunately, a majority of characters still fall into the first 16 bits.

UTF-32 This encodes Unicode characters into a 32-bit double word (often referred to as a "DWORD"). Additionally, it supports multi-DWORD character sequences in case there is a need for more characters in the future.