snippetcsharpMinor
Using Regex to parse a chat transcript
Viewed 0 times
chatparseusingregextranscript
Problem
I need to classify each line as "announce, whisper or chat" once I have that sorted out I need to extract certain values to be processed.
Right now my regex is as follow:
Classify each line:
Should I change anything to my regex to make it more precise/accurate on the matches ?
Sample data:
Right now my regex is as follow:
var regex = new Regex(@"^\[(\d{2}:\d{2}:\d{2})\]\s*(?:(\[System Message\])?\s*]*)>|((.+) Whisper You :))\s*(.*)$");- Group 0 is the entire message.
- Group 1 is the hour time of when the message was sent.
- Group 2 is wether it was an announce or chat.
- Group 3 is who sent the announce.
- Group 4 is if it was a whisper or not.
- Group 5 is who sent the whisper.
- Group 6 is the sent message by the user or system.
Classify each line:
if 4 matches
means it is a whisper
else if 2 matches
means it is an announce
else
normal chatShould I change anything to my regex to make it more precise/accurate on the matches ?
Sample data:
[02:33:03] John Whisper You : Heya
[02:33:03] John Whisper You : How is it going
[02:33:12] [02:33:16] [System Message] bla bla
[02:33:39] heya
[02:33:40] hello :S
[02:33:57] hi
[02:33:57] [System Message] has left the room
[02:33:57] [System Message] has entered the roomSolution
You can always break it down in multiple lines to make it more readable. You can also use named groups which take the "magic" out of the group numbers (4 == whisper, 3 == normal, etc).
var regex = new Regex(@"^\[(?\d{2}:\d{2}:\d{2})\]\s*" +
@"(?:" +
@"(?\[System Message\])?\s*" +
@"[^>]*)>|" +
@"(?(?.+) Whisper You :))\s*" +
@"(?.*)$");
string data = @"[02:33:03] John Whisper You : Heya
[02:33:03] John Whisper You : How is it going
[02:33:12] [02:33:16] [System Message] bla bla
[02:33:39] heya
[02:33:40] hello :S
[02:33:57] hi
[02:33:57] [System Message] has left the room
[02:33:57] [System Message] has entered the room";
foreach (var msg in data.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries))
{
Match match = regex.Match(msg);
if (match.Success)
{
if (match.Groups["Whisper"].Success)
{
Console.WriteLine("[whis from {0}]: {1}", match.Groups["WhisperWho"].Value, msg);
}
else if (match.Groups["SysMessage"].Success)
{
Console.WriteLine("[sys msg]: {0}", msg);
}
else
{
Console.WriteLine("[normal from {0}]: {1}", match.Groups["NormalWho"].Value, msg);
}
}
}Code Snippets
var regex = new Regex(@"^\[(?<TimeStamp>\d{2}:\d{2}:\d{2})\]\s*" +
@"(?:" +
@"(?<SysMessage>\[System Message\])?\s*" +
@"<(?<NormalWho>[^>]*)>|" +
@"(?<Whisper>(?<WhisperWho>.+) Whisper You :))\s*" +
@"(?<Message>.*)$");
string data = @"[02:33:03] John Whisper You : Heya
[02:33:03] John Whisper You : How is it going
[02:33:12] <John> [02:33:16] [System Message] bla bla
[02:33:39] <John> heya
[02:33:40] <John> hello :S
[02:33:57] <John> hi
[02:33:57] [System Message] <John> has left the room
[02:33:57] [System Message] <John> has entered the room";
foreach (var msg in data.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries))
{
Match match = regex.Match(msg);
if (match.Success)
{
if (match.Groups["Whisper"].Success)
{
Console.WriteLine("[whis from {0}]: {1}", match.Groups["WhisperWho"].Value, msg);
}
else if (match.Groups["SysMessage"].Success)
{
Console.WriteLine("[sys msg]: {0}", msg);
}
else
{
Console.WriteLine("[normal from {0}]: {1}", match.Groups["NormalWho"].Value, msg);
}
}
}Context
StackExchange Code Review Q#2749, answer score: 3
Revisions (0)
No revisions yet.