Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pi4 remaining issues #253

Closed
IanSB opened this issue Nov 19, 2021 · 21 comments
Closed

Pi4 remaining issues #253

IanSB opened this issue Nov 19, 2021 · 21 comments

Comments

@IanSB
Copy link
Collaborator

IanSB commented Nov 19, 2021

I though I'd document the remaining issues with Pi 4 support to help in tracking down the required info:

  1. Genlocking doesn't work.
    This is due to the HDMI PLL registers moving or changing
    Also the interrupt controller has changed so the Vsync flag can't be read (used for genlock and displaying the red sync bar)

  2. Screecaps don't work
    lodepng genrates error 55

  3. 16bpp display modes don't work properly because the mailbox will only select 5:6:5 RGB and the display list gets modified to switch this to 4:4:4:4 ARGB. The display list and associated registers have moved or changed so that trick no longer works.
    This could be worked around by reverting to the original 5:6:5 capture code. The reason for the change was to speed up the capture loops as almost no logical masking and shifting was required for 4:4:4:4 ARGB compared to 5:6:5 RGB.
    However due to the new GPU capture code the loss of speed by reverting may not be an issue especially if it is only done for the Pi 4

  4. The PLLs need further experimentation.
    I've managed to get it working using PLLD but all the PLLs seem to be in use so this needs to be looked at further as it currently runs the PLL higher than it's default rather than lower (running lower causes the Pi to lose display or lockup.)

  5. Mode7 is glitchy
    Mode 7 is glitchy even with GPU capture. However this can be fixed by reducing the capture width by a few pixels (it's already wider than the active screen so nothing is lost).
    This is very likely due to higher memory latency than on the other Pi versions.

@IanSB
Copy link
Collaborator Author

IanSB commented Nov 24, 2021

Update:

Issue 1: The interrupt register for the SMI interrupt (fake vsync) is now in IRQ0_PENDING1 on the Pi4 at offset 0x00B204. After changing that the vsync red bar now works (genlocking still not working)

Issue 5: Mode7 is now OK it was using the cached screen option but the memory area wasn't actually set as cached which caused the glitches.

@IanSB
Copy link
Collaborator Author

IanSB commented Nov 24, 2021

Issue2: Screencaps are now working after changing the compiler options to target arm v6 (like the universal Pi0-3 build) instead of arm v8 so the compiler output for Arm v8 has some issues, perhaps something isn't set up correctly like unaligned access.

@IanSB
Copy link
Collaborator Author

IanSB commented Nov 25, 2021

Update:

Issue 3: 16bpp display modes now work properly:

The display list has moved from offset 0x402000 to 0x404000
Also the structure of the list has changed so that the start of video entry is at offset 6 instead of offset 5
Finally the values representing the bit order in the first word of the list have changed:
#define PIXEL_ORDER 2 // ABGR in BCM2711
#define PIXEL_ORDER 3 // ABGR for all others

@hoglet67
Copy link
Owner

Nice work Ian....

Have you found some secret source of information, or are you just figuring this out by trying lots of different things?

@IanSB
Copy link
Collaborator Author

IanSB commented Nov 26, 2021

Have you found some secret source of information

Just the Linux source. I looked through for references to bcm2711 or vc5 or just an additional '5' in all the vc4 related files.
I found a header with the display list referenced as SCALER and SCALER5
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/vc4/vc4_regs.h#L491
#define SCALER_DLIST_START 0x00002000
#define SCALER5_DLIST_START 0x00004000

So I tried that offset. It didn't work but I dumped the display list to the serial port and it was immediately obvious that the screen start address had moved by one word compared to a Pi zero so I changed the code for that and got a working image but with the colours reversed so I then changed the PIXEL_ORDER value in our defs.h and got the correct image.

The only remaining issue is the HDMI clock (or clocks as there are two outputs)
The existing HDMI PLL has been removed but I can't find any reference to any other new PLLs
In this file:
https://elixir.free-electrons.com/linux/v5.7.19/source/drivers/clk/bcm/clk-bcm2835.c#L1835
PLLH is listed as SOC_BCM2835 but the other four PLLs above it are listed as SOC_ALL which means both BCM2835 and BCM2711 and there are no other new PLLs listed so I think the HDMI clocks may be derived from the existing PLLs in some way.

BTW I think it would be useful to compile a wiki page with all the bare metal resources like the above source files, datasheets, the SMI tutorial, the display list tutorial, raspberry pi forum posts etc which we could add to as we find them.

@IanSB
Copy link
Collaborator Author

IanSB commented Nov 27, 2021

This is the output of:
cat /sys/kernel/debug/clk/clk_summary
of a Pi 4 in raspberry pi OS



                   enable  prepare  protect                  duty
   clock            count    count    count        rate   accuracy phase  cycle
---------------------------------------------------------------------------------------------
 fw-clk-pixel-bvb       1        1        0   150000000          0     0  50000
 fw-clk-m2mc            1        1        0   163620000          0     0  50000
 fw-clk-hevc            0        0        0   250000000          0     0  50000
 fw-clk-v3d             0        0        0   500000000          0     0  50000
 fw-clk-core            1        1        0   200000000          0     0  50000
 fw-clk-arm             0        0        0   600000000          0     0  50000
 108MHz-clock           0        0        0   108000000          0     0  50000
    hdmi1-108MHz        0        0        0   108000000          0     0  50000
    hdmi0-108MHz        0        0        0   108000000          0     0  50000
 27MHz-clock            0        0        0    27000000          0     0  50000
 otg                    0        0        0   480000000          0     0  50000
 osc                    4        4        0    54000000          0     0  50000
    tsens               0        0        0     3375000          0     0  50000
    otp                 0        0        0    13500000          0     0  50000
    timer               0        0        0     1000000          0     0  50000
    plld                5        5        0  3000000091          0     0  50000
       plld_dsi1        1        1        0    11718751          0     0  50000
       plld_dsi0        1        1        0    11718751          0     0  50000
       plld_per         3        3        0   750000023          0     0  50000
          emmc2         1        1        0   100000003          0     0  50000
          emmc          0        0        0   250000007          0     0  50000
          uart          1        1        0    48000001          0     0  50000
       plld_core        1        1        0   600000019          0     0  50000
    pllc                2        2        0  2592000000          0     0  50000
       pllc_per         1        1        0   648000000          0     0  50000
       pllc_core2       0        0        0    10125000          0     0  50000
       pllc_core1       0        0        0    10125000          0     0  50000
       pllc_core0       0        0        0    10125000          0     0  50000
    pllb                2        2        0  1200000005          0     0  50000
       pllb_arm         1        1        0   600000003          0     0  50000
    plla                2        2        0  2999999988          0     0  50000
       plla_ccp2        0        0        0    11718750          0     0  50000
       plla_dsi0        0        0        0    11718750          0     0  50000
       plla_core        2        2        0   499999998          0     0  50000
          h264          0        0        0   249999999          0     0  50000
          isp           0        0        0   249999999          0     0  50000
          vpu           2        2        0   500000000          0     0  50000
             aux_spi2   0        0        0   500000000          0     0  50000
             aux_spi1   0        0        0   500000000          0     0  50000
             aux_uart   0        0        0   500000000          0     0  50000
             peri_image 0        0        0   500000000          0     0  50000
 dsi1p                  0        0        0           0          0     0  50000
 dsi0p                  0        0        0           0          0     0  50000
 dsi1e                  0        0        0           0          0     0  50000
 dsi0e                  0        0        0           0          0     0  50000
 cam1                   0        0        0           0          0     0  50000
 cam0                   0        0        0           0          0     0  50000
 dpi                    0        0        0           0          0     0  50000
 tec                    0        0        0           0          0     0  50000
 smi                    0        0        0           0          0     0  50000
 slim                   0        0        0           0          0     0  50000
 gp2                    0        0        0           0          0     0  50000
 gp1                    0        0        0           0          0     0  50000
 gp0                    0        0        0           0          0     0  50000
 dft                    0        0        0           0          0     0  50000
 aveo                   0        0        0           0          0     0  50000
 pcm                    0        0        0           0          0     0  50000
 pwm                    0        0        0           0          0     0  50000
 sdram                  0        0        0           0          0     0  50000
 hsm                    0        0        0           0          0     0  50000
 vec                    0        0        0           0          0     0  50000

This confirms what I had already worked out by observation that unlike the previous models, PLLA controls the core and the aux uart so that one is not suitable for use as the CPLD sampling clock
PLLD controls the emmc and PLLC is set to an unusual value of 2.592 Ghz which is a multiple of 108 Mhz which is required by the video outputs for composite video output
I have found that varying either PLLD or PLLC by more than a few tens of Mhz causes lockups or blank screens.
In the case of PLLC I think that varying the 108Mhz by too much causes the hdmi output to fail (but it doesn't affect the hdmi clock frequency)
I have found that increasing the PLLD per divider allows that to be varied without overclocking the SD card.

@IanSB
Copy link
Collaborator Author

IanSB commented Nov 28, 2021

Screencaps are still randomly hanging or throwing error 55 on the Pi4 when calling lodepng and it seems to be build or memory content sensitive as adding logging or making any unrelated change can alter the behaviour.

I've disabled unaligned access in the compiler but it hasn't helped with this issue although I will leave it disabled because there can be different behaviour between Arm v6, v7 or v8 which might affect a universal binary.
Armv6 has the old model of unaligned access
Armv7 has the new model of unaligned access but can be switched to Arm v6 compatibility (which is done in _enable_unaligned_access)
Armv8 only has the new model

I'm building the Pi4 version with the compiler targeted at Armv8 with cortex A72 optimisation so it should be building with the correct instructions. I also tried updating lodepng to the latest version.

@IanSB
Copy link
Collaborator Author

IanSB commented Nov 29, 2021

Possible location of registers affecting pixel clock at base_address + 0xf00f00 (base of hdmi0 phy according to the above links)

Dump of registers for 1920x1080 @60Hz: with pixel clock of 148.5Mhz
00000000 = 00000030
00000004 = 00000010
00000008 = 0909090C
0000000C = 00094A58
00000010 = 00050003
00000014 = 000865C8
00000018 = 80000000
0000001C = 0001E060
00000020 = 008A7800
00000024 = 00000000
00000028 = 00000300
0000002C = 00000000
00000030 = 00000000
00000034 = 00000001
00000038 = 3E0FFC1F
0000003C = 01F07C00
00000040 = 00007C1F
00000044 = 00000000
00000048 = 001A5A0F
0000004C = 00003210
00000050 = 30000000
00000054 = 00000000
00000058 = 00000003
0000005C = 0E14E147
00000060 = 00A50000
00000064 = 0000D0FF
00000068 = 0800C800
0000006C = 00000003
00000070 = 00000000
00000074 = 00000000
00000078 = 00000000
0000007C = 00000000

Dump of registers for 1600x1200 @ 50Hz with pixel clock of 135Mhz:
00000000 = 00000030
00000004 = 00000010
00000008 = 0909090C
0000000C = 00094A58
00000010 = 00050003
00000014 = 000865C8
00000018 = 80000000
0000001C = 0001E060
00000020 = 008A7800
00000024 = 00000000
00000028 = 00000300
0000002C = 00000000
00000030 = 00000000
00000034 = 00000001
00000038 = 3E0FFC1F
0000003C = 01F07C00
00000040 = 00007C1F
00000044 = 00000000
00000048 = 001A5A0F
0000004C = 00003210
00000050 = 30000000
00000054 = 00000000
00000058 = 00000003
0000005C = 0E14E147
00000060 = 00960000
00000064 = 0000D0FF
00000068 = 0800C800
0000006C = 00000003
00000070 = 00000000
00000074 = 00000000
00000078 = 00000000
0000007C = 00000000

Only difference is register 0x60 which is 0x00960000 for 135Mhz and 0x00A50000 for 148.5Mhz

148.5 *  0x00960000 /  0x00A50000 =  135 so this appears to be directly correlated with the pixel clock.



@IanSB
Copy link
Collaborator Author

IanSB commented Dec 2, 2021

I now have Pi4 genlocking working on 1080p50 by altering the value in 0xfef00f98 (HDMI_RM_OFFSET) but I still need to work out how the value in that register relates back to the pixel clock value (at the moment the code assumes 148.5Mhz).

@IanSB
Copy link
Collaborator Author

IanSB commented Dec 2, 2021

Pi 4 genlocking is now fully working

Remaining issues:

lodepng crashing

I also noticed a problem during startup where the sd card doesn't work properly after a power cycle. It does work after the second power cycle but the error does delay bootup by a few seconds.

...
PLLD: 3000.000092 ANA1 = 00118000
PLLD: PDIV=1 NDIV=55 CTRL=00021037 FRAC=582544 DSI0=256 CORE=4 PER=5 DSI1=256
CPU speed detected as: 1000 Mhz
EMMC: BCM2708 controller power-cycled
SD: error sending SEND_SCR
WARN: Failed to initialize file system
SD: error sending CMD17, error = 01048576.  Retrying...
SD: error sending CMD17, error = 01048576.  Retrying...
SD: error sending CMD17, error = 01048576.  Giving up.
Re-creating /cpld_firmware/Delete_This_File_To_Erase_CPLD.txt
EMMC: BCM2708 controller power-cycled
Keycount = 0
CPLD  Design: 3-12_BIT_BBC
CPLD Version: 7.9

I tried adding extra delays around the power cycle but that didn't seem to help.

@IanSB
Copy link
Collaborator Author

IanSB commented Dec 3, 2021

The reason why lodepng is crashing is that malloc() is returning garbage addresses:

Pi0-3:
malloc 0282E358
malloc 0282E350
malloc 0282E340
malloc 0282E338
malloc 0282E2E8
malloc 0282E2A0
Which are sensible addresses in the heap at the end of the code

Pi4:
malloc 6584BAD0
malloc 6584BA80
malloc 6584BA38
malloc 6584B9E8
malloc 6584B9A0
malloc 6584B950

Might be a bug in the compiler or a buffer overrun which is corrupting the memory allocation tables that only happens on the Pi 4?

@hoglet67
Copy link
Owner

hoglet67 commented Dec 3, 2021

That's very interesting...

Honestly I have no idea about how malloc is meant to work in a bare-metal environment, what region of memory it is using, and how that is controlled.

This issue likely also affects PiTubeDirect

I'll do some digging tomorrow (if I'm not feeling too grotty after being boosted today).

If you find the answer in the mean time, do update this thread.

@hoglet67
Copy link
Owner

hoglet67 commented Dec 3, 2021

Just confirming here what I think you already knew (but I didn't).

It seems the default implementation of malloc in GCC looks for a symbol called _end that usually follows the bss region and starts the heap there.

If I build on the Pi 4, then do a nm I see

arm-none-eabi-nm  rgb-to-hdmi | sort | tail
027c4008 B xsvf_data
027c4010 B display_list
027c4014 B capinfo
027c4018 B clkinfo
027c4030 B RPI_GpioBase
027c4034 B errno
027c4038 B __bss_end__
027c4038 B _bss_end__
027c4038 B _end
027c4038 B __end__

So it looks like the heap should start at that point.

On the Pi 0-3 build the address is similar: 027d0038

So the Pi 0-3 addresses you see look like they are within the heap, where as the Pi 4 addresses look wrong.

I'm not sure how best to track this down.

GCC includes heap consistency checking tools, but I don't expect these are available in an ARM Embedded environment:
https://www.gnu.org/software/libc/manual/html_node/Heap-Consistency-Checking.html

The most likely cause of this is something unrelated to lodepng overflowing an array (or similar) and corrupting the memory allocators view of the heap.

Dominic's been using a static analysis tool called cppcheck to improve the code in PiTubeDirect. It might well spot possible causes. I've never used this myself, so I can't really advise you on how to run it.

@hoglet67
Copy link
Owner

hoglet67 commented Dec 3, 2021

@IanSB
Copy link
Collaborator Author

IanSB commented Dec 4, 2021

@hoglet67

I found out where it is happening but it's not fixed yet:
I created test_png() to compress a png without saving it and inserted that at various points in the code to see what happened.
After some experimentation I found the critical part where the corruption was occurring which appears to be the first time the SD card is accessed and it looks like it only happens after the SD error I mentioned a few posts above occurs. This error never happens on the other Pi versions which explains why they never crash.
Sometimes the SD error doesn't happen on a Pi4 and in that case screencaps work normally.

The first time the SD card is accessed is in cpld_init() when the "/Delete_This_File_To_Erase_CPLD.txt" is checked

I modified the code from:

   int check_delete_file = check_file(FORCE_BLANK_FILE, FORCE_BLANK_FILE_MESSAGE); 
   int force_update_file = test_file(FORCE_UPDATE_FILE); 

To:

   log_info("1: Trying first png encode");   
   test_png();   
   log_info("2: Trying check_file()"); 
   int check_delete_file = check_file(FORCE_BLANK_FILE, FORCE_BLANK_FILE_MESSAGE); 
   log_info("3: Trying test_file()");
   int force_update_file = test_file(FORCE_UPDATE_FILE); 
   log_info("4: Trying second png encode"); 
   test_png();   
   log_info("5: Finished test");

When it works you get:

1: Trying first png encode
Scaling is 1 x 1 x=512 y=256 sx=512 sy=256 px=512 py=256
Encoding png 01EFFFA0, 01EDFD80
2: Trying check_file()
EMMC: BCM2708 controller power-cycled
3: Trying test_file()
4: Trying second png encode
Scaling is 1 x 1 x=512 y=256 sx=512 sy=256 px=512 py=256
Encoding png 01EFFFA0, 01EDFD80
5: Finished test

When it doesn't you get:

1: Trying first png encode
Scaling is 1 x 1 x=512 y=256 sx=512 sy=256 px=512 py=256
Encoding png 01EFFF98, 01EDFD80
2: Trying check_file()
EMMC: BCM2708 controller power-cycled
SD: error sending SEND_SCR
WARN: Failed to initialize file system
SD: error sending CMD17, error = 01048576.  Retrying...
SD: error sending CMD17, error = 01048576.  Retrying...
SD: error sending CMD17, error = 01048576.  Giving up.
Re-creating /cpld_firmware/Delete_This_File_To_Erase_CPLD.txt
EMMC: BCM2708 controller power-cycled
3: Trying test_file()
4: Trying second png encode
Data Abort at 0215C390 on core 0
Registers:
  r[00]=FE000000
  r[01]=026D2668
(the error varies and sometimes just hangs)

If you move the second png encode to between the check_file() and test_file() calls it works so it looks like the error on the check_file() call leaves things in an indeterminate state and the subsequent call to test_file() actually trashes the allocator info.

BTW I've sent a pull request with the latest version that will build on v10.3-2021.10 of the compiler.
(above logging changes not included)

@IanSB
Copy link
Collaborator Author

IanSB commented Dec 4, 2021

@hoglet67

I'll do some digging tomorrow (if I'm not feeling too grotty after being boosted today).

If you want to look at this, note that EMMC_DEBUG is disabled by default even in the debug build as it made startup times very long (see line 52 of src/fatfs/sd_card.c)

@hoglet67
Copy link
Owner

hoglet67 commented Dec 4, 2021

I've enabled EMMC_DEBUG and there does seem to be an intermittent failure of a specific command during the SD Card initialization sequence.

A good startup looks like:

SD: issuing command CMD0
SD: issuing command CMD8
SD: issuing command CMD5 (timeout expected)
SD: issuing command ACMD41
SD: issuing command ACMD41
SD: issuing command ACMD41
SD: issuing command CMD2
SD: issuing command CMD3
SD: issuing command CMD7
SD: issuing command ACMD51
SD: issuing command ACMD6
SD: issuing command CMD13
SD: issuing command CMD17
...

A bad startup looks like:

SD: issuing command CMD0
SD: issuing command CMD8
SD: issuing command CMD5 (timeout expected)
SD: issuing command ACMD41
SD: issuing command ACMD41
SD: issuing command ACMD41
SD: issuing command CMD2
SD: issuing command CMD3
SD: issuing command CMD7
SD: issuing command ACMD51 (timeout)

At this point the EMMC driver just gives up and bails.

The logging around the failed command is:

SD: issuing command ACMD51
SD: error occured whilst waiting for data ready interrupt
SD: error issuing command: interrupts 00000000: TIMEOUT
SD: error 00000000 sending SEND_SCR

This is the first command in the initialization sequence which uses data transfer, to transfer a 8-byte block (which includes the SD Card version and data bus width).

I've tried a number of things:

  • increasing the timeout
  • adding a delay before sending the command
  • multiple retry attempts
  • additional _data_memory_barrier() calls

None of those things worked...

I suspect this is a bug in the EMMC emulation in the Pi 4 which ends up with some data transfer state machine being stuck somewhere.

I did spot the malloc bug, which is in sd_card_init():

The emmc_block_dev structure can either be malloced, or passed in

    // Prepare the device structure
   struct emmc_block_dev *ret;
   if(*dev == NULL)
      ret = (struct emmc_block_dev *)malloc(sizeof(struct emmc_block_dev));
   else
      ret = (struct emmc_block_dev *)*dev;

If sd_card_init bails, that memory is always freed:

   sd_issue_command(ret, SEND_SCR, 0, 500000);
   ret->block_size = 512;
   if(FAIL(ret))
      {
         printf("SD: error %08lx sending SEND_SCR\r\n", ret->last_error);
         myfree(ret->scr);
         myfree(ret);
         return -1;
      }

So you end up freeing memory which was statically allocated, which is a big no-no and is likely to be what's causing the heap corruption.

A fix for this in commit: cab728b

The only workaround I could think of for the SD Card initialization issue was to retry the whole initialization sequence.

That's in a seperate commit: d879a81

What's interesting is the bug only seems to happen once!

EMMC: BCM2708 controller power-cycled
SD: error sending SEND_SCR
******************************************
* Reinitializing SD Card Driver          *
******************************************
EMMC: BCM2708 controller power-cycled
Keycount = 0
...

There are a couple of improvements that could be made from here:

  1. Fix some memory leaks - not everything that is malloced in sdcard.c is freed
  2. Be a bit more systematic about the initialization retry fix, i.e. apply it to all points where the initialization bails, and try it a maximum of say ten times.

hoglet67 added a commit that referenced this issue Dec 4, 2021
Change-Id: I0f643ea2a7dbdb1cfde3d20a2cf0892bfca1f4ad
hoglet67 added a commit that referenced this issue Dec 4, 2021
Change-Id: I801fe7706795a71b08a9ec80534ca21745124d13
@IanSB
Copy link
Collaborator Author

IanSB commented Dec 4, 2021

Thanks for that, it looks like Pi4 support is now complete!

@IanSB
Copy link
Collaborator Author

IanSB commented Dec 12, 2021

@hoglet67

One strange thing I noticed with the Pi4:
Although the GPIO benchmark indicates reading the GPIO registers is much faster on the Pi 4 (~15ns) it doesn't result in a usable performance increase compared to the zero2W / Pi3 and still requires reading the GPIOs from the GPU. (i.e. the ARM build still doesn't work with 12bpp samples)
This seems to indicate that ARM is reading a latched copy of the GPIO state which is only updated at a similar rate to the zero2W.

@IanSB
Copy link
Collaborator Author

IanSB commented Dec 20, 2022

Pi 4 now supports the enhanced CGA artifact emulation

@IanSB IanSB closed this as completed Dec 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants