HiveBrain v1.2.0
Get Started
← Back to all entries
snippetcsharpMinor

Using Regex to parse a chat transcript

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
chatparseusingregextranscript

Problem

I need to classify each line as "announce, whisper or chat" once I have that sorted out I need to extract certain values to be processed.

Right now my regex is as follow:

var regex = new Regex(@"^\[(\d{2}:\d{2}:\d{2})\]\s*(?:(\[System Message\])?\s*]*)>|((.+) Whisper You :))\s*(.*)$");


  • Group 0 is the entire message.



  • Group 1 is the hour time of when the message was sent.



  • Group 2 is wether it was an announce or chat.



  • Group 3 is who sent the announce.



  • Group 4 is if it was a whisper or not.



  • Group 5 is who sent the whisper.



  • Group 6 is the sent message by the user or system.



Classify each line:

if 4 matches
 means it is a whisper
   else if 2 matches
     means it is an announce
       else
         normal chat


Should I change anything to my regex to make it more precise/accurate on the matches ?

Sample data:

[02:33:03] John Whisper You :  Heya
[02:33:03] John Whisper You :  How is it going
[02:33:12]  [02:33:16] [System Message] bla bla
[02:33:39]  heya
[02:33:40]  hello :S
[02:33:57]  hi
[02:33:57] [System Message]  has left the room 
[02:33:57] [System Message]  has entered the room

Solution

You can always break it down in multiple lines to make it more readable. You can also use named groups which take the "magic" out of the group numbers (4 == whisper, 3 == normal, etc).

var regex = new Regex(@"^\[(?\d{2}:\d{2}:\d{2})\]\s*" +
            @"(?:" +
                @"(?\[System Message\])?\s*" +
                @"[^>]*)>|" +
                @"(?(?.+) Whisper You :))\s*" +
            @"(?.*)$");

        string data = @"[02:33:03] John Whisper You :  Heya
[02:33:03] John Whisper You :  How is it going
[02:33:12]  [02:33:16] [System Message] bla bla
[02:33:39]  heya
[02:33:40]  hello :S
[02:33:57]  hi
[02:33:57] [System Message]  has left the room 
[02:33:57] [System Message]  has entered the room";

        foreach (var msg in data.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries))
        {
            Match match = regex.Match(msg);
            if (match.Success)
            {
                if (match.Groups["Whisper"].Success)
                {
                    Console.WriteLine("[whis from {0}]: {1}", match.Groups["WhisperWho"].Value, msg);
                }
                else if (match.Groups["SysMessage"].Success)
                {
                    Console.WriteLine("[sys msg]: {0}", msg);
                }
                else
                {
                    Console.WriteLine("[normal from {0}]: {1}", match.Groups["NormalWho"].Value, msg);
                }
            }
        }

Code Snippets

var regex = new Regex(@"^\[(?<TimeStamp>\d{2}:\d{2}:\d{2})\]\s*" +
            @"(?:" +
                @"(?<SysMessage>\[System Message\])?\s*" +
                @"<(?<NormalWho>[^>]*)>|" +
                @"(?<Whisper>(?<WhisperWho>.+) Whisper You :))\s*" +
            @"(?<Message>.*)$");

        string data = @"[02:33:03] John Whisper You :  Heya
[02:33:03] John Whisper You :  How is it going
[02:33:12] <John> [02:33:16] [System Message] bla bla
[02:33:39] <John> heya
[02:33:40] <John> hello :S
[02:33:57] <John> hi
[02:33:57] [System Message] <John> has left the room 
[02:33:57] [System Message] <John> has entered the room";

        foreach (var msg in data.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries))
        {
            Match match = regex.Match(msg);
            if (match.Success)
            {
                if (match.Groups["Whisper"].Success)
                {
                    Console.WriteLine("[whis from {0}]: {1}", match.Groups["WhisperWho"].Value, msg);
                }
                else if (match.Groups["SysMessage"].Success)
                {
                    Console.WriteLine("[sys msg]: {0}", msg);
                }
                else
                {
                    Console.WriteLine("[normal from {0}]: {1}", match.Groups["NormalWho"].Value, msg);
                }
            }
        }

Context

StackExchange Code Review Q#2749, answer score: 3

Revisions (0)

No revisions yet.