Markdownの小さなサブセットをReactコンポーネントに解析する方法は？

9

Markdownの非常に小さなサブセットと、Reactコンポーネントに解析したいカスタムHTMLがあります。たとえば、次の文字列を有効にしたいと思います。

hello *asdf* *how* _are_ you !doing! today

次の配列に：

[ "hello ", asdf, " ", how, " ", are, " you ", <MyComponent onClick={this.action}>doing</MyComponent>, " today" ]

そして、それをReactレンダー関数から返します（Reactは配列をフォーマット済みHTMLとして適切にレンダリングします）

基本的に、非常に限られたMarkdownのセットを使用して、テキストをスタイル付きコンポーネント（場合によっては自分のコンポーネント！）に変換するオプションをユーザーに提供したいと思います。

それは、dangerouslySetInnerHTMLには賢明ではなく、外部依存関係を持ち込みたくありません。これらはすべて非常に重いため、基本的な機能だけが必要なためです。

私は現在このようなことをしていますが、それは非常に壊れやすく、すべてのケースで機能するわけではありません。より良い方法があるかどうか私は思っていました：

function matchStrong(result, i) {
  let match = result[i].match(/(^|[^\\])\*(.*)\*/);
  if (match) { result[i] = <strong key={"ms" + i}>{match[2]}</strong>; }
  return match;
}

function matchItalics(result, i) {
  let match = result[i].match(/(^|[^\\])_(.*)_/); // Ignores \_asdf_ but not _asdf_
  if (match) { result[i] = <em key={"mi" + i}>{match[2]}</em>; }
  return match;
}

function matchCode(result, i) {
  let match = result[i].match(/(^|[^\\])```\n?([\s\S]+)\n?```/);
  if (match) { result[i] = <code key={"mc" + i}>{match[2]}</code>; }
  return match;
}

// Very brittle and inefficient
export function convertMarkdownToComponents(message) {
  let result = message.match(/(\\?([!*_`+-]{1,3})([\s\S]+?)\2)|\s|([^\\!*_`+-]+)/g);

  if (result == null) { return message; }

  for (let i = 0; i < result.length; i++) {
    if (matchCode(result, i)) { continue; }
    if (matchStrong(result, i)) { continue; }
    if (matchItalics(result, i)) { continue; }
  }

  return result;
}

これがこの問題につながった私の前の質問です。

— ライアン・ペシェル
ソース

1

入力にネストされたアイテムがある場合はどうなりますfont _italic *and bold* then only italic_ and normalか？期待される結果は何でしょうか？それともネストされることはありませんか？

— トリンコット

1

ネストについて心配する必要はありません。これは、ユーザーが使用する非常に基本的なマークダウンです。実装が最も簡単なものは何でも構いません。あなたの例では、内側の太字が機能しなくても問題はありません。しかし、ネストを実装するよりもネストを実装するほうが簡単な場合は、それでも問題ありません。

— Ryan Peschel

1

npmjs.com/package/react-markdown-itの

— mb21

1

マークダウンは使っていません。これは、非常によく似た小さなサブセットです（ネストされていない太字、斜体、コード、下線とともに、いくつかのカスタムコンポーネントをサポートしています）。私が投稿したスニペットはいくらか機能しますが、あまり理想的ではないようで、いくつかの些細なケースで失敗します（このようにアスタリスクを1つ入力できない：asdf*消えない）

— Ryan Peschel

1

まあ...

— マークダウンやマークダウンの

1

使い方？

これは、文字列のチャンクをチャンクごとに読み取ることで機能します。これは、本当に長い文字列には最適なソリューションではない場合があります。

パーサーは、重要なチャンク、つまり'*'その他のマークダウンタグが読み取られていることを検出すると、パーサーが終了タグを見つけるまで、この要素のチャンクの解析を開始します。

複数行の文字列で機能します。たとえば、コードを参照してください。

注意事項

指定していない場合、または私があなたのニーズを誤解している可能性があります。ボールドとイタリックの両方のタグを解析する必要がある場合、現在の解決策はこの場合機能しない可能性があります。

ただし、上記の条件で作業する必要がある場合は、ここでコメントしてください。コードを調整します。

最初の更新：マークダウンタグの処理方法を微調整

タグはハードコーディングされなくなり、代わりに、ニーズに合わせて簡単に拡張できるマップになります。

コメントで指摘したバグを修正しました。この問題を指摘してくれてありがとう= p

2番目の更新：マルチレングスマークダウンタグ

これを実現する最も簡単な方法：マルチレングス文字をほとんど使用されないユニコードに置き換える

このメソッドparseMarkdownはマルチレングスタグをまだサポートしていませんがstring.replace 、rawMarkdown小道具を送信するときに、これらのマルチレングスタグをシンプルタグに簡単に置き換えることができます。

この例を実際に確認ReactDOM.renderするには、コードの最後にあるをご覧ください。

アプリケーションが複数の言語をサポートしている場合でも、JavaScriptがまだ検出する無効なUnicode文字があります。例："\uFFFF"正しく思い出せば、有効なUnicodeではありませんが、JSはそれを比較できます（"\uFFFF" === "\uFFFF" = true）

最初はハックっぽく見えるかもしれませんが、ユースケースによっては、このルートを使用しても大きな問題は発生しません。

これを達成する別の方法

さて、最後のN（N最長のマルチレングスタグの長さに対応する）チャンクを簡単に追跡できました。

メソッド内のループのparseMarkdown動作を調整する必要があります。つまり、現在のチャンクがマルチレングスタグの一部であるかどうかをチェックし、それがタグとして使用されているかどうかを確認します。それ以外の場合は、などの場合``k、それをnotMultiLengthまたは類似のものとしてマークし、そのチャンクをコンテンツとしてプッシュする必要があります。

コード

// Instead of creating hardcoded variables, we can make the code more extendable
// by storing all the possible tags we'll work with in a Map. Thus, creating
// more tags will not require additional logic in our code.
const tags = new Map(Object.entries({
  "*": "strong", // bold
  "!": "button", // action
  "_": "em", // emphasis
  "\uFFFF": "pre", // Just use a very unlikely to happen unicode character,
                   // We'll replace our multi-length symbols with that one.
}));
// Might be useful if we need to discover the symbol of a tag
const tagSymbols = new Map();
tags.forEach((v, k) => { tagSymbols.set(v, k ); })

const rawMarkdown = `
  This must be *bold*,

  This also must be *bo_ld*,

  this _entire block must be
  emphasized even if it's comprised of multiple lines_,

  This is an !action! it should be a button,

  \`\`\`
beep, boop, this is code
  \`\`\`

  This is an asterisk\\*
`;

class App extends React.Component {
  parseMarkdown(source) {
    let currentTag = "";
    let currentContent = "";

    const parsedMarkdown = [];

    // We create this variable to track possible escape characters, eg. "\"
    let before = "";

    const pushContent = (
      content,
      tagValue,
      props,
    ) => {
      let children = undefined;

      // There's the need to parse for empty lines
      if (content.indexOf("\n\n") >= 0) {
        let before = "";
        const contentJSX = [];

        let chunk = "";
        for (let i = 0; i < content.length; i++) {
          if (i !== 0) before = content[i - 1];

          chunk += content[i];

          if (before === "\n" && content[i] === "\n") {
            contentJSX.push(chunk);
            contentJSX.push(<br />);
            chunk = "";
          }

          if (chunk !== "" && i === content.length - 1) {
            contentJSX.push(chunk);
          }
        }

        children = contentJSX;
      } else {
        children = [content];
      }
      parsedMarkdown.push(React.createElement(tagValue, props, children))
    };

    for (let i = 0; i < source.length; i++) {
      const chunk = source[i];
      if (i !== 0) {
        before = source[i - 1];
      }

      // Does our current chunk needs to be treated as a escaped char?
      const escaped = before === "\\";

      // Detect if we need to start/finish parsing our tags

      // We are not parsing anything, however, that could change at current
      // chunk
      if (currentTag === "" && escaped === false) {
        // If our tags array has the chunk, this means a markdown tag has
        // just been found. We'll change our current state to reflect this.
        if (tags.has(chunk)) {
          currentTag = tags.get(chunk);

          // We have simple content to push
          if (currentContent !== "") {
            pushContent(currentContent, "span");
          }

          currentContent = "";
        }
      } else if (currentTag !== "" && escaped === false) {
        // We'll look if we can finish parsing our tag
        if (tags.has(chunk)) {
          const symbolValue = tags.get(chunk);

          // Just because the current chunk is a symbol it doesn't mean we
          // can already finish our currentTag.
          //
          // We'll need to see if the symbol's value corresponds to the
          // value of our currentTag. In case it does, we'll finish parsing it.
          if (symbolValue === currentTag) {
            pushContent(
              currentContent,
              currentTag,
              undefined, // you could pass props here
            );

            currentTag = "";
            currentContent = "";
          }
        }
      }

      // Increment our currentContent
      //
      // Ideally, we don't want our rendered markdown to contain any '\'
      // or undesired '*' or '_' or '!'.
      //
      // Users can still escape '*', '_', '!' by prefixing them with '\'
      if (tags.has(chunk) === false || escaped) {
        if (chunk !== "\\" || escaped) {
          currentContent += chunk;
        }
      }

      // In case an erroneous, i.e. unfinished tag, is present and the we've
      // reached the end of our source (rawMarkdown), we want to make sure
      // all our currentContent is pushed as a simple string
      if (currentContent !== "" && i === source.length - 1) {
        pushContent(
          currentContent,
          "span",
          undefined,
        );
      }
    }

    return parsedMarkdown;
  }

  render() {
    return (
      <div className="App">
        <div>{this.parseMarkdown(this.props.rawMarkdown)}</div>
      </div>
    );
  }
}

ReactDOM.render(<App rawMarkdown={rawMarkdown.replace(/```/g, "\uFFFF")} />, document.getElementById('app'));

コードへのリンク（TypeScript）https://codepen.io/ludanin/pen/GRgNWPv

コードへのリンク（vanilla / babel）https://codepen.io/ludanin/pen/eYmBvXw

— ルーカス・ダニン
ソース

この解決策は正しい方向に進んでいるように感じますが、他のマークダウン文字を他の文字の中に配置することに問題があるようです。たとえば、で置き換えThis must be *bold*てみてくださいThis must be *bo_ld*。結果のHTMLが不正な形式になる

— Ryan Peschel

適切なテストの欠如がこれを引き起こした= p、私の悪い。私はすでにそれを修正しており、ここに結果を投稿するつもりです。修正するのは簡単な問題のようです。

— Lukas Danin

そう、ありがとう。私はこのソリューションが本当に好きです。とても頑丈できれいに見えます。私はそれがさらにエレガンスのために少しリファクタリングできると思います。少しいじってみるかもしれません。

— Ryan Peschel

ちなみに、私はコードを微調整して、マークダウンタグとそれぞれのJSX値を定義するより柔軟な方法をサポートしています。

— Lukas Danin

こんにちはこれは素晴らしいですね。最後に1つだけ、完璧になると思います。私の元の投稿には、コードスニペット用の関数もあります（トリプルバッククォートを含みます）。それをサポートすることも可能でしょうか？タグがオプションで複数の文字になるようにするには？別の返信では、 `` `のインスタンスをほとんど使用されない文字に置き換えることでサポートが追加されました。これは簡単な方法ですが、それが理想的かどうかはわかりません。

— ライアンペッシェル

4

非常に基本的な小さなソリューションを探しているようです。のような「スーパーモンスター」ではありませんreact-markdown-it:)

かなり軽量で素敵なhttps://github.com/developit/snarkdownをお勧めします！1kbと非常にシンプルで、他の構文機能が必要な場合は、使用して拡張できます。

サポートされているタグのリストhttps://github.com/developit/snarkdown/blob/master/src/index.js#L1

更新

反応コンポーネントに気づいたばかりで、最初は見逃していた。これはあなたにとって素晴らしいことです。ライブラリを例として取り上げ、カスタムに必要なコンポーネントを実装して、HTMLを危険に設定することなくそれを実行すると思います。ライブラリはかなり小さく、明確です。それを楽しんでください！:)

— アレクサンドルシュリギン
ソース

3

var table = {
  "*":{
    "begin":"<strong>",
    "end":"</strong>"
    },
  "_":{
    "begin":"<em>",
    "end":"</em>"
    },
  "!":{
    "begin":"<MyComponent onClick={this.action}>",
    "end":"</MyComponent>"
    },

  };

var myMarkdown = "hello *asdf* *how* _are_ you !doing! today";
var tagFinder = /(?<item>(?<tag_begin>[*|!|_])(?<content>\w+)(?<tag_end>\k<tag_begin>))/gm;

//Use case 1: direct string replacement
var replaced = myMarkdown.replace(tagFinder, replacer);
function replacer(match, whole, tag_begin, content, tag_end, offset, string) {
  return table[tag_begin]["begin"] + content + table[tag_begin]["end"];
}
alert(replaced);

//Use case 2: React components
var pieces = [];
var lastMatchedPosition = 0;
myMarkdown.replace(tagFinder, breaker);
function breaker(match, whole, tag_begin, content, tag_end, offset, string) {
  var piece;
  if (lastMatchedPosition < offset)
  {
    piece = string.substring(lastMatchedPosition, offset);
    pieces.push("\"" + piece + "\"");
  }
  piece = table[tag_begin]["begin"] + content + table[tag_begin]["end"];
  pieces.push(piece);
  lastMatchedPosition = offset + match.length;

}
alert(pieces);

結果：

正規表現テスト結果

説明：

/(?<item>(?<tag_begin>[*|!|_])(?<content>\w+)(?<tag_end>\k<tag_begin>))/

このセクションでタグを定義できます。タグの[*|!|_]1つが一致すると、グループとしてキャプチャされ、「tag_begin」という名前が付けられます。
次に(?<content>\w+)、タグでラップされたコンテンツをキャプチャします。
終了タグは、以前に一致したものと同じである必要があるため、ここではを使用\k<tag_begin>し、テストに合格した場合は、グループとしてキャプチャして、「tag_end」という名前を付け(?<tag_end>\k<tag_begin>))ます。

JSで、次のようなテーブルを設定しました。

var table = {
  "*":{
    "begin":"<strong>",
    "end":"</strong>"
    },
  "_":{
    "begin":"<em>",
    "end":"</em>"
    },
  "!":{
    "begin":"<MyComponent onClick={this.action}>",
    "end":"</MyComponent>"
    },

  };

この表を使用して、一致したタグを置き換えます。

Sting.replaceにはオーバーロードString.replace（regexp、function）があり、パラメーターとしてキャプチャされたグループを取得できます。これらのキャプチャされたアイテムを使用して、テーブルを検索し、置換文字列を生成します。

[更新]
私はコードを更新しました。他の誰かがコンポーネントを反応させる必要がない場合に備えて、最初のコードを残しました。

— サイモン
ソース

残念ながら、これが機能するかどうかはわかりません。実際のReactコンポーネントと要素自体が必要なので、それらの文字列は必要ありません。私の元の投稿を見ると、実際の要素そのものではなく、文字列ではなく配列に追加されていることがわかります。また、dangerouslySetInnerHTMLの使用は、ユーザーが悪意のある文字列を入力する可能性があるため危険です。

— Ryan Peschel

幸い、文字列置換をReactコンポーネントに変換するのは非常に簡単です。コードを更新しました。

— Simon

えっ？私は何かが欠けているに違いありません。なぜなら、それらはまだ私の側のひもです。私もあなたのコードをいじった。console.log出力を読むと、配列が実際のReactコンポーネントではなく文字列でいっぱいであることがわかります。jsfiddle.net

— Ryan Peschel

正直なところ、私はReactを知らないので、すべてを完全にフォローアップしてニーズを満たすことはできませんが、質問を解決する方法についての情報で十分だと思います。質問をReactマシンに配置する必要があります。

— Simon

このスレッドが存在する理由は、それらをReactコンポーネントに解析するのが非常に難しいように思われるためです（そのため、スレッドのタイトルはその正確な必要性を指定しています）。それらを文字列に解析するのはかなり簡単で、文字列置換関数を使用するだけです。文字列は、dangerouslySetInnerHTMLを呼び出さなければならないために遅く、XSSの影響を受けやすいため、理想的なソリューションではありません

— Ryan Peschel

0

あなたはこのようにそれを行うことができます：

//inside your compoenet

   mapData(myMarkdown){
    return myMarkdown.split(' ').map((w)=>{

        if(w.startsWith('*') && w.endsWith('*') && w.length>=3){
           w=w.substr(1,w.length-2);
           w=<strong>{w}</strong>;
         }else{
             if(w.startsWith('_') && w.endsWith('_') && w.length>=3){
                w=w.substr(1,w.length-2);
                w=<em>{w}</em>;
              }else{
                if(w.startsWith('!') && w.endsWith('!') && w.length>=3){
                w=w.substr(1,w.length-2);
                w=<YourComponent onClick={this.action}>{w}</YourComponent>;
                }
            }
         }
       return w;
    })

}


 render(){
   let content=this.mapData('hello *asdf* *how* _are_ you !doing! today');
    return {content};
  }

— ジャティン・パルマー
ソース

0

A working solution purely using Javascript and ReactJs without dangerouslySetInnerHTML.

アプローチ

マークダウン要素の文字ごとの検索。見つかったらすぐに、終了タグを検索してhtmlに変換します。

スニペットでサポートされるタグ

大胆な
斜体
全角
プレ

スニペットからの入力と出力：

JsFiddle： https ://jsfiddle.net/sunil12738/wg7emcz1/58/

コード：

const preTag = "đ"
const map = {
      "*": "b",
      "!": "i",
      "_": "em",
      [preTag]: "pre"
    }

class App extends React.Component {
    constructor(){
      super()
      this.getData = this.getData.bind(this)
    }

    state = {
      data: []
    }
    getData() {
      let str = document.getElementById("ta1").value
      //If any tag contains more than one char, replace it with some char which is less frequently used and use it
      str = str.replace(/```/gi, preTag)
      const tempArr = []
      const tagsArr = Object.keys(map)
      let strIndexOf = 0;
      for (let i = 0; i < str.length; ++i) {
        strIndexOf = tagsArr.indexOf(str[i])
        if (strIndexOf >= 0 && str[i-1] !== "\\") {
          tempArr.push(str.substring(0, i).split("\\").join("").split(preTag).join(""))
          str = str.substr(i + 1);
          i = 0;
          for (let j = 0; j < str.length; ++j) {
            strIndexOf = tagsArr.indexOf(str[j])
            if (strIndexOf >= 0 && str[j-1] !== "\\") {
              const Tag = map[str[j]];
              tempArr.push(<Tag>{str.substring(0, j).split("\\").join("")}</Tag>)
              str = str.substr(j + 1);
              i = 0;
              break
             }
          }
        }
      }
      tempArr.push(str.split("\\").join(""))
      this.setState({
        data: tempArr,
      })
    }
    render() {
      return (
        <div>
          <textarea rows = "10"
            cols = "40"
           id = "ta1"
          /><br/>
          <button onClick={this.getData}>Render it</button><br/> 
          {this.state.data.map(x => x)} 
        </div>
      )
    }
  }

ReactDOM.render(
  <App/>,
  document.getElementById('root')
);

<body>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/react/16.2.0/umd/react.production.min.js"></script>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/react-dom/16.2.0/umd/react-dom.production.min.js"></script>
  <div id="root"></div>
</body>

スニペットを展開

詳細な説明（例付き）：

文字列がHow are *you* doing? タグへのシンボルのマッピングを保持する場合を想定

map = {
 "*": "b"
}

最初の*が見つかるまでループし、その前のテキストは通常の文字列です
配列内にプッシュします。配列["How are "]は次の*が見つかるまで内部ループになり、開始します。
Now next between * and * needs to be bold、それらをテキストによってhtml要素に変換し、マップからTag = bである配列に直接プッシュします。行う場合<Tag>text</Tag>、reactは内部的にテキストに変換し、配列にプッシュします。これで配列は["how are"、you ]になります。内側のループから離脱
次に、そこから外側のループを開始し、タグが見つからないため、配列にプッシュして残ります。配列は次のようになります：["how are"、you、 "doing"]。
UIでレンダリング How are you doing?
Note: you is html and not text

注：ネストも可能です。上記のロジックを再帰的に呼び出す必要があります

新しいタグのサポートを追加するには

*や！のような1文字の場合はmap、キーを文字、値を対応するタグとしてオブジェクトに追加します。
それらが `` `のような複数の文字である場合は、使用頻度の低い文字を使用して1対1のマップを作成してから挿入します（理由：現在、文字ごとの検索に基づいてアプローチしているため、複数の文字が壊れます。ただし、、それはロジックを改善することによっても注意することができます）

ネストをサポートしていますか？ノー
ないことはOPで言及したすべてのユースケースをサポートしていますか？はい

それが役に立てば幸い。

— スニル・チャウダリー
ソース

こんにちは、今これを見てください。これはトリプルバックティックサポートでも使用できますか？それで、 `` `asdf```はコードブロックでも同様に機能しますか？

— Ryan Peschel

ただし、いくつかの変更が必要になる場合があります。現在、*または！には1文字の一致のみがあります。少し変更する必要があります。コードブロックは基本的に暗い背景でasdfレンダリングされることを意味し<pre>asdf</pre>ますよね？これを知らせてください。今でも試すことができます。簡単な方法は次のとおりです。上記のソリューションでは、テキストの `` `を^や〜などの特殊文字に置き換え、それをpreタグにマッピングします。その後、それは正常に動作します。その他のアプローチには、さらに作業が必要です

— Sunil Chaudhary

そうです、 `` `asdf```をに置き換え<pre>asdf</pre>ます。ありがとう！

— ライアンペッシェル

@RyanPeschelこんにちは！preタグのサポートも追加しました。機能するかどうか教えてください

— Sunil Chaudhary

興味深い解決策（レアなキャラクターを使用）。私がまだ見ている1つの問題は、エスケープのサポートの欠如（\ * asdf *が太字にならないなど）です。これは、元の投稿のコードにサポートを含めました（リンクの詳細の最後のリンクにも記載されています）役職）。追加するのは非常に難しいでしょうか？

— Ryan Peschel